You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR karpathy#244 — Discussion karpathy#43 best hyperparameters (baseline/train.py)
Applies the community's validated best config from Discussion karpathy#43
(val_bpb 0.997→0.977 on H100). Code changes: parameterized init scale,
x0 init, RoPE base, short window divider, weight decay for embeddings.
Transferred hyperparams: EMBEDDING_LR 0.6→0.9, UNEMBEDDING_LR 0.004→0.005,
WARMDOWN_RATIO 0.5→0.75, FINAL_LR_FRAC 0.0→0.05, INIT_SCALE=0.68,
X0_INIT=0.05, momentum warmup 300→200 steps, weight decay for lm_head/
embeddings/value_embeddings. Kept Jetson-specific DEPTH=6, BATCH_SIZE.
PR karpathy#204 — Early structural triage at 60s (baseline/train.py)
Computes effective rank (spectral entropy of weight SVDs) at init and
at 60s. Kills experiments where rank collapses below 50% of initial.
Reports eff_rank_init/final/rank_retention in final summary. ~50ms
one-shot cost. Set TRIAGE_TIME=0 to disable.
PR karpathy#272 — Respect HF_ENDPOINT env var (all 7 prepare.py files)
Reads HF_ENDPOINT env var with https://huggingface.co as fallback.
Allows users behind proxies to download data without rate limiting:
HF_ENDPOINT=http://hf-mirror.com uv run prepare.py
PR karpathy#154 — Confine agent to project directory (.claude/hooks/cage.sh)
PreToolUse hook that blocks file access outside the project directory
and prevents cd/pushd/popd. Registered in .claude/settings.json for
Bash, Read, Write, Edit, Glob, and Grep tools.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments