Skip to content

Train: add gradient clipping before optimizer step#287

Open
renee-jia wants to merge 1 commit intokarpathy:masterfrom
renee-jia:feat/grad-clip
Open

Train: add gradient clipping before optimizer step#287
renee-jia wants to merge 1 commit intokarpathy:masterfrom
renee-jia:feat/grad-clip

Conversation

@renee-jia
Copy link

relu² activations can produce gradient spikes that silently degrade model weights. The existing fast-fail (loss > 100) only catches damage after it has already happened. Clipping gradients prevents wasted time long experiment runs.

Adds GRAD_CLIP_NORM hyperparameter (default 1.0, set 0.0 to disable).

relu² activations can produce gradient spikes that silently degrade
model weights. The existing fast-fail (loss > 100) only catches damage
after it has already happened. Clipping gradients prevents wasted
5-minute experiment runs.

Adds GRAD_CLIP_NORM hyperparameter (default 1.0, set 0.0 to disable).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
IgorTavcar added a commit to IgorTavcar/autoresearch that referenced this pull request Mar 17, 2026
…iler, UCB1 search

PR karpathy#287 — Gradient clipping before optimizer step (baseline/train.py)
  Adds GRAD_CLIP_NORM hyperparameter (default 1.0, set 0.0 to disable).
  relu² activations can produce gradient spikes that silently degrade
  weights. The existing loss > 100 fast-fail only catches damage after
  it has already happened. Clipping prevents wasted experiment runs.

PR karpathy#279 — --profile flag for LLM-readable CUDA kernel summary (baseline/train.py)
  Adds argparse --profile flag that runs torch.profiler over a few warmup
  steps and prints a Markdown table of top CUDA kernels by self-time,
  then exits. Lets the agent identify hardware bottlenecks (attention vs
  MLP vs elementwise) without needing trace visualization tools.
  Usage: uv run baseline/train.py --profile

Issue karpathy#284 — DUSE alt program (baseline/program-alt.md)
  Alternative program.md integrating Dimensional UCB1 Search + Experiment
  Memory from issue karpathy#284. Adds: 7-dimension map, experiments.json structured
  memory, UCB1 dimension selector (exploration vs exploitation), 90-second
  early abort gate, rescue pool for recombining discarded sub-mechanisms.
  Pure prompt change, no code modifications required.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant