Train: add gradient clipping before optimizer step by renee-jia · Pull Request #287 · karpathy/autoresearch

renee-jia · 2026-03-16T04:44:35Z

relu² activations can produce gradient spikes that silently degrade model weights. The existing fast-fail (loss > 100) only catches damage after it has already happened. Clipping gradients prevents wasted time long experiment runs.

Adds GRAD_CLIP_NORM hyperparameter (default 1.0, set 0.0 to disable).

relu² activations can produce gradient spikes that silently degrade model weights. The existing fast-fail (loss > 100) only catches damage after it has already happened. Clipping gradients prevents wasted 5-minute experiment runs. Adds GRAD_CLIP_NORM hyperparameter (default 1.0, set 0.0 to disable). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…iler, UCB1 search PR karpathy#287 — Gradient clipping before optimizer step (baseline/train.py) Adds GRAD_CLIP_NORM hyperparameter (default 1.0, set 0.0 to disable). relu² activations can produce gradient spikes that silently degrade weights. The existing loss > 100 fast-fail only catches damage after it has already happened. Clipping prevents wasted experiment runs. PR karpathy#279 — --profile flag for LLM-readable CUDA kernel summary (baseline/train.py) Adds argparse --profile flag that runs torch.profiler over a few warmup steps and prints a Markdown table of top CUDA kernels by self-time, then exits. Lets the agent identify hardware bottlenecks (attention vs MLP vs elementwise) without needing trace visualization tools. Usage: uv run baseline/train.py --profile Issue karpathy#284 — DUSE alt program (baseline/program-alt.md) Alternative program.md integrating Dimensional UCB1 Search + Experiment Memory from issue karpathy#284. Adds: 7-dimension map, experiments.json structured memory, UCB1 dimension selector (exploration vs exploitation), 90-second early abort gate, rescue pool for recombining discarded sub-mechanisms. Pure prompt change, no code modifications required. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train: add gradient clipping before optimizer step#287

Train: add gradient clipping before optimizer step#287
renee-jia wants to merge 1 commit intokarpathy:masterfrom
renee-jia:feat/grad-clip

renee-jia commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

renee-jia commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant