fix(train): guard against zero grad_accum_steps by jluk · Pull Request #264 · karpathy/autoresearch

jluk · 2026-03-14T16:12:44Z

If TOTAL_BATCH_SIZE is smaller than a single forward pass (DEVICE_BATCH_SIZE * MAX_SEQ_LEN), grad_accum_steps silently becomes 0, causing the training loop to skip all gradient accumulation.

Add an assertion to fail fast with a clear message instead.

If TOTAL_BATCH_SIZE is smaller than a single forward pass (DEVICE_BATCH_SIZE * MAX_SEQ_LEN), grad_accum_steps silently becomes 0, causing the training loop to skip all gradient accumulation. Add an assertion to fail fast with a clear message instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rity hardening Cherry-picked improvements from open PRs on karpathy/autoresearch: PR karpathy#265 — Save checkpoint before eval to survive crashes Applied to baseline/train.py (torch.save) and train_mlx.py (mx.save_safetensors). If eval OOMs or crashes, training work is preserved. PR karpathy#264 — Guard against zero grad_accum_steps Added assertion after computing grad_accum_steps. Catches silent misconfiguration when TOTAL_BATCH_SIZE < DEVICE_BATCH_SIZE * MAX_SEQ_LEN. PR karpathy#188 — Add helpful error messages to bare asserts Window pattern and batch size divisibility asserts now explain what went wrong instead of a bare AssertionError. PR karpathy#185 — Print startup_seconds in final summary Useful diagnostic for measuring compilation/init overhead across backends. PR karpathy#138 — Make DEVICE_BATCH_SIZE configurable via env var Avoids source code edits when switching between Apple Silicon tiers: DEVICE_BATCH_SIZE=16 uv run baseline/train.py PR karpathy#216 — SHA-256 verification for cached data shards Each downloaded shard gets a .sha256 sidecar file. On reuse, integrity is verified and corrupted shards are re-downloaded. Uses os.replace() for atomic writes instead of os.rename(). PR karpathy#237 — Harden tokenizer deserialization (pickle → JSON+base64) Replaces unsafe pickle.load with JSON serialization using base64-encoded mergeable_ranks. Legacy tokenizer.pkl is detected and rejected with a clear migration message. Eliminates arbitrary code execution risk from cache poisoning attacks on the tokenizer file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(train): guard against zero grad_accum_steps#264

fix(train): guard against zero grad_accum_steps#264
jluk wants to merge 1 commit intokarpathy:masterfrom
jluk:fix/guard-grad-accum-steps

jluk commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jluk commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant