Skip to content

Commit 3f22037

Browse files
IgorTavcarclaude
andcommitted
Apply 4 upstream PRs: Discussion karpathy#43 config, structural triage, HF proxy, agent confinement
PR karpathy#244 — Discussion karpathy#43 best hyperparameters (baseline/train.py) Applies the community's validated best config from Discussion karpathy#43 (val_bpb 0.997→0.977 on H100). Code changes: parameterized init scale, x0 init, RoPE base, short window divider, weight decay for embeddings. Transferred hyperparams: EMBEDDING_LR 0.6→0.9, UNEMBEDDING_LR 0.004→0.005, WARMDOWN_RATIO 0.5→0.75, FINAL_LR_FRAC 0.0→0.05, INIT_SCALE=0.68, X0_INIT=0.05, momentum warmup 300→200 steps, weight decay for lm_head/ embeddings/value_embeddings. Kept Jetson-specific DEPTH=6, BATCH_SIZE. PR karpathy#204 — Early structural triage at 60s (baseline/train.py) Computes effective rank (spectral entropy of weight SVDs) at init and at 60s. Kills experiments where rank collapses below 50% of initial. Reports eff_rank_init/final/rank_retention in final summary. ~50ms one-shot cost. Set TRIAGE_TIME=0 to disable. PR karpathy#272 — Respect HF_ENDPOINT env var (all 7 prepare.py files) Reads HF_ENDPOINT env var with https://huggingface.co as fallback. Allows users behind proxies to download data without rate limiting: HF_ENDPOINT=http://hf-mirror.com uv run prepare.py PR karpathy#154 — Confine agent to project directory (.claude/hooks/cage.sh) PreToolUse hook that blocks file access outside the project directory and prevents cd/pushd/popd. Registered in .claude/settings.json for Bash, Read, Write, Edit, Glob, and Grep tools. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 1a0a505 commit 3f22037

11 files changed

Lines changed: 225 additions & 75 deletions

File tree

.claude/hooks/cage.sh

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
#!/bin/bash
2+
# PreToolUse hook: confine the agent to the project directory.
3+
# - Blocks directory changes (cd, pushd, popd, chdir)
4+
# - Blocks file reads/writes outside project dir
5+
# - Blocks searches outside project dir
6+
7+
INPUT=$(cat)
8+
TOOL_NAME=$(echo "$INPUT" | jq -r '.tool_name // empty')
9+
PROJECT_DIR="$CLAUDE_PROJECT_DIR"
10+
11+
deny() {
12+
jq -n --arg reason "$1" '{
13+
hookSpecificOutput: {
14+
hookEventName: "PreToolUse",
15+
permissionDecision: "deny",
16+
permissionDecisionReason: $reason
17+
}
18+
}'
19+
exit 0
20+
}
21+
22+
# Resolve a path without requiring it to exist, normalizing .. and symlinks.
23+
# Tries GNU realpath -m (Linux), then Python 3 (macOS/Linux), then raw path.
24+
resolve_path() {
25+
local path="$1"
26+
realpath -m "$path" 2>/dev/null && return
27+
python3 -c "import os, sys; print(os.path.normpath(os.path.abspath(sys.argv[1])))" "$path" 2>/dev/null && return
28+
echo "$path"
29+
}
30+
31+
# Check if a path is within the project directory.
32+
check_path() {
33+
local path="$1"
34+
# Empty/null path means the tool defaults to cwd, which is fine
35+
[ -z "$path" ] && return 0
36+
37+
local resolved
38+
resolved=$(resolve_path "$path")
39+
40+
case "$resolved" in
41+
"$PROJECT_DIR/.claude"|"$PROJECT_DIR/.claude"/*) return 1 ;;
42+
"$PROJECT_DIR"|"$PROJECT_DIR"/*) return 0 ;;
43+
*) return 1 ;;
44+
esac
45+
}
46+
47+
case "$TOOL_NAME" in
48+
Bash)
49+
COMMAND=$(echo "$INPUT" | jq -r '.tool_input.command // empty')
50+
[ -z "$COMMAND" ] && exit 0
51+
52+
# Block directory changes
53+
if echo "$COMMAND" | grep -qE '(^|[;&|`(]|&&|\|\||\$\()\s*(cd|pushd|popd|chdir)(\s|$|;|&|\||\))'; then
54+
deny "Changing the working directory is not allowed."
55+
fi
56+
;;
57+
58+
Read|Write|Edit)
59+
FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')
60+
if ! check_path "$FILE_PATH"; then
61+
deny "Access denied: $FILE_PATH is outside the project directory ($PROJECT_DIR)."
62+
fi
63+
;;
64+
65+
Glob|Grep)
66+
SEARCH_PATH=$(echo "$INPUT" | jq -r '.tool_input.path // empty')
67+
if ! check_path "$SEARCH_PATH"; then
68+
deny "Access denied: $SEARCH_PATH is outside the project directory ($PROJECT_DIR)."
69+
fi
70+
;;
71+
esac
72+
73+
exit 0

.claude/settings.json

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{
2+
"hooks": {
3+
"PreToolUse": [
4+
{
5+
"matcher": "Bash|Read|Write|Edit|Glob|Grep",
6+
"hooks": [
7+
{
8+
"type": "command",
9+
"command": "$CLAUDE_PROJECT_DIR/.claude/hooks/cage.sh"
10+
}
11+
]
12+
}
13+
]
14+
}
15+
}

HOW_IT_WORKS.md

Lines changed: 54 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -28,24 +28,24 @@ An AI agent **edits a training script, runs it for 5 minutes, checks if the mode
2828
│ │
2929
│ Reads program.md for instructions, then loops: │
3030
│ │
31-
│ ┌─────────────┐ ┌─────────────┐ ┌────────────────┐ │
32-
│ │ Think of an │───▶│ Edit │───▶│ Run train.py │ │
33-
│ │ experiment │ │ train.py │ │ (5 min) │ │
34-
│ └─────────────┘ └─────────────┘ └───────┬────────┘
31+
│ ┌─────────────┐ ┌─────────────┐ ┌────────────────┐
32+
│ │ Think of an │───▶│ Edit │───▶│ Run train.py │
33+
│ │ experiment │ │ train.py │ │ (5 min) │
34+
│ └─────────────┘ └─────────────┘ └───────┬───────
3535
│ │ │
3636
│ ┌───────────────────────┘ │
3737
│ ▼ │
38-
│ ┌───────────────┐
39-
│ │ Got better? │
40-
│ └───┬───────┬───┘
41-
│ yes │ │ no
42-
│ ▼ ▼
38+
│ ┌───────────────┐ │
39+
│ │ Got better? │ │
40+
│ └───┬───────┬───┘ │
41+
│ yes │ │ no │
42+
│ ▼ ▼ │
4343
│ ┌────────┐ ┌─────────┐ │
4444
│ │ KEEP │ │ DISCARD │ │
4545
│ │ commit │ │ revert │ │
4646
│ └────┬───┘ └────┬────┘ │
47-
│ │ │
48-
│ ▼ ▼
47+
│ │ │ │
48+
│ ▼ ▼ │
4949
│ ┌────────────────────┐ │
5050
│ │ Log to results.tsv │──▶ loop back │
5151
│ └────────────────────┘ │
@@ -71,10 +71,10 @@ An AI agent **edits a training script, runs it for 5 minutes, checks if the mode
7171
### Phase 1: Setup (one-time, by you)
7272

7373
```
74-
┌──────────┐ ┌────────────────────────────────┐
75-
prepare.py│─────▶│ ~/.cache/autoresearch/ │
76-
└──────────┘ │ │
77-
│ data/ │
74+
┌──────────┐ ┌────────────────────────────────────
75+
│prepare.py│─────▶│ ~/.cache/autoresearch/
76+
└──────────┘ │
77+
│ data/
7878
│ shard_00000.parquet │
7979
│ shard_00001.parquet │
8080
│ ... (10 training shards) │
@@ -123,7 +123,7 @@ The agent then:
123123
│ │ peak_vram_mb: 44100 ← memory used │ │
124124
│ │ │ │
125125
│ │ f. Compare to previous best: │ │
126-
│ │ 0.9821 < 0.9979 → BETTER! Keep the commit. │ │
126+
│ │ 0.9821 < 0.9979 → BETTER! Keep the commit. │ │
127127
│ │ │ │
128128
│ │ g. Append to results.tsv │ │
129129
│ │ │ │
@@ -161,7 +161,7 @@ When you come back:
161161
│ ✗ Edit prepare.py, program.md, or any other file │
162162
│ ✗ Add new dependencies │
163163
│ ✗ Change the tokenizer or data pipeline │
164-
│ ✗ Exceed available GPU memory
164+
│ ✗ Exceed available GPU memory │
165165
│ ✗ Stop (the agent runs until you interrupt it) │
166166
└──────────────────────────────────────────────────────────────┘
167167
```
@@ -176,31 +176,31 @@ A small GPT-style transformer, trained from scratch on text data:
176176
Input tokens (sequence of 2048)
177177
178178
179-
┌──────────────┐
180-
│ Token │ Converts token IDs → vectors
181-
│ Embedding │
182-
└──────┬───────┘
179+
┌────────────────
180+
│ Token │ Converts token IDs → vectors
181+
│ Embedding
182+
└──────┬─────────
183183
184184
185-
┌──────────────┐
186-
│ Transformer │ ×8-12 layers, each containing:
187-
│ Block │
188-
│ ┌──────────┐ │ • RMS Normalization
189-
│ │Attention │ • Multi-head self-attention (with RoPE)
190-
│ │(sliding │ • Sliding window: short/long pattern (SSSL)
191-
│ │ window) │ • Flash Attention 3 kernel
192-
│ └──────────┘ │
193-
│ ┌──────────┐ │ • RMS Normalization
194-
│ │ MLP │ │ • Linear → ReLU² → Linear
195-
│ │(feedfwd) │ │
196-
│ └──────────┘ │
197-
│ + residual │ • Skip connections with learnable scaling
198-
└──────┬───────┘
185+
┌────────────────
186+
│ Transformer │ ×8-12 layers, each containing:
187+
│ Block
188+
│ ┌──────────┐ │ • RMS Normalization
189+
│ │Attention │ • Multi-head self-attention (with RoPE)
190+
│ │(sliding │ • Sliding window: short/long pattern (SSSL)
191+
│ │ window) │ • Flash Attention 3 kernel
192+
│ └──────────┘
193+
│ ┌──────────┐ │ • RMS Normalization
194+
│ │ MLP │ │ • Linear → ReLU² → Linear
195+
│ │(feedfwd) │
196+
│ └──────────┘
197+
│ + residual │ • Skip connections with learnable scaling
198+
└──────┬─────────
199199
200200
201201
┌──────────────┐
202-
LM Head │ Vectors → vocabulary probabilities
203-
(unembedding)│
202+
│ LM Head │ Vectors → vocabulary probabilities
203+
│ (unembedding)│
204204
└──────┬───────┘
205205
206206
@@ -293,37 +293,37 @@ Failed experiments are reverted with `git reset` — they leave no trace in git,
293293

294294
```
295295
┌────────────┐ ┌─────────────────────────────────────┐
296-
│ HuggingFace│────────▶│ ~/.cache/autoresearch/
297-
│ (remote) │ data │ ├── data/*.parquet
298-
└────────────┘ │ └── tokenizer/
299-
│ ├── tokenizer.pkl
300-
│ ├── token_bytes.pt
301-
prepare.py ──────│ └── metadata.json
296+
│ HuggingFace│────────▶│ ~/.cache/autoresearch/ │
297+
│ (remote) │ data │ ├── data/*.parquet │
298+
└────────────┘ │ └── tokenizer/ │
299+
│ ├── tokenizer.pkl │
300+
│ ├── token_bytes.pt │
301+
prepare.py ──────│ └── metadata.json │
302302
(runs once) └──────────────┬──────────────────────┘
303303
304304
│ loaded at runtime
305305
306306
┌────────────┐ edits ┌─────────────────────┐ outputs
307-
│ AI Agent │─────────▶│ train.py │──────────────┐
308-
│ (Claude) │ │ (model + training) │ │
309-
│ │◀─────────│ │ │
307+
│ AI Agent │─────────▶│ train.py │──────────────┐
308+
│ (Claude) │ │ (model + training) │ │
309+
│ │◀─────────│ │ │
310310
│ │ reads └─────────────────────┘ │
311-
│ │ output
312-
│ │
311+
│ │ output │
312+
│ │ ▼
313313
│ │ appends ┌─────────────────────┐ val_bpb: 0.982
314-
│ │──────────▶│ results.tsv │ peak_vram_mb: 44100
314+
│ │──────────▶│ results.tsv │ peak_vram_mb: 44100
315315
│ │ └─────────────────────┘
316316
│ │
317-
│ │ commits ┌─────────────────────┐
318-
│ │──────────▶│ git branch │
317+
│ │ commits ┌───────────────────────
318+
│ │──────────▶│ git branch
319319
│ │ /resets │ autoresearch/<tag> │
320-
└────────────┘ └─────────────────────┘
320+
└────────────┘ └───────────────────────
321321
322322
│ reviewed by human
323323
324324
┌─────────────────────┐
325-
│ analysis.ipynb
326-
│ (plots & insights)
325+
│ analysis.ipynb │
326+
│ (plots & insights) │
327327
└─────────────────────┘
328328
```
329329

baseline/prepare.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,8 @@
3838
CACHE_DIR = os.path.join(os.path.expanduser("~"), ".cache", "autoresearch")
3939
DATA_DIR = os.path.join(CACHE_DIR, "data")
4040
TOKENIZER_DIR = os.path.join(CACHE_DIR, "tokenizer")
41-
BASE_URL = "https://huggingface.co/datasets/karpathy/climbmix-400b-shuffle/resolve/main"
41+
_HF_ENDPOINT = os.environ.get("HF_ENDPOINT", "https://huggingface.co")
42+
BASE_URL = f"{_HF_ENDPOINT}/datasets/karpathy/climbmix-400b-shuffle/resolve/main"
4243
MAX_SHARD = 6542 # the last datashard is shard_06542.parquet
4344
VAL_SHARD = MAX_SHARD # pinned validation shard (shard_06542)
4445
VAL_FILENAME = f"shard_{VAL_SHARD:05d}.parquet"

0 commit comments

Comments
 (0)