write results.json for structured agent consumption, harden crash diagnostics by shichangs · Pull Request #331 · karpathy/autoresearch

shichangs · 2026-03-18T16:37:45Z

Summary

Fixes #64

train.py: writes results.json after evaluation with the same metrics already printed to stdout. Gives agents a structured, parseable results channel instead of relying on grepping free-form stdout from run.log.
program.md: instructs agent to read results.json first (fallback to grep). Crash diagnostics now use filtered grep -i "error|exception|traceback" instead of raw tail -n 50, reducing the surface for indirect prompt injection via training output.
.gitignore: adds results.json and run.log as runtime artifacts.

Existing stdout output is unchanged. No new dependencies (json is stdlib).

Similar to #79 but rebased on current main with no merge conflicts.

Test plan

Run uv run train.py and verify results.json is written with correct metrics
Verify stdout output is unchanged
Verify results.json values match stdout summary
Simulate a crash (e.g. syntax error in train.py) and confirm results.json is NOT written

🤖 Generated with Claude Code

…gnostics train.py now writes a results.json file after evaluation with the same metrics already printed to stdout. This gives agents a structured, parseable results channel instead of relying on grepping free-form stdout. program.md updated to read results.json first (fallback to grep), and to use filtered grep for crash diagnostics instead of raw tail, reducing the surface for indirect prompt injection via training output. Fixes karpathy#64 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

shichangs · 2026-03-18T16:38:25Z

cc @johnwaldo (issue author) @mvanhorn (PR #79 author) — would appreciate your review. This takes a similar approach to #79 but rebased cleanly on current main.

Offline verification

The results.json output can be validated without a GPU:

import json

# Simulate what train.py writes
results = {
    "val_bpb": 0.9979,
    "training_seconds": 300.1,
    "total_seconds": 325.9,
    "peak_vram_mb": 45060.2,
    "mfu_percent": 39.8,
    "total_tokens_M": 499.6,
    "num_steps": 953,
    "num_params_M": 50.3,
    "depth": 8,
}
with open("results.json", "w") as f:
    json.dump(results, f, indent=2)

# Verify roundtrip
with open("results.json") as f:
    loaded = json.load(f)
assert loaded == results
assert isinstance(loaded["val_bpb"], float)
assert isinstance(loaded["num_steps"], int)
print("✓ results.json roundtrip OK")

The key behavioral changes:

Success path: agent reads results.json (structured JSON) instead of grepping stdout → no injection surface
Crash path: agent uses grep -i "error|exception|traceback" to filter relevant lines before falling back to tail → reduced injection surface

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write results.json for structured agent consumption, harden crash diagnostics#331

write results.json for structured agent consumption, harden crash diagnostics#331
shichangs wants to merge 1 commit intokarpathy:masterfrom
shichangs:fix/structured-results-output

shichangs commented Mar 18, 2026

Uh oh!

shichangs commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shichangs commented Mar 18, 2026

Summary

Test plan

Uh oh!

shichangs commented Mar 18, 2026

Offline verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant