Skip to content

write results.json for structured agent consumption, harden crash diagnostics#331

Open
shichangs wants to merge 1 commit intokarpathy:masterfrom
shichangs:fix/structured-results-output
Open

write results.json for structured agent consumption, harden crash diagnostics#331
shichangs wants to merge 1 commit intokarpathy:masterfrom
shichangs:fix/structured-results-output

Conversation

@shichangs
Copy link

Summary

Fixes #64

  • train.py: writes results.json after evaluation with the same metrics already printed to stdout. Gives agents a structured, parseable results channel instead of relying on grepping free-form stdout from run.log.
  • program.md: instructs agent to read results.json first (fallback to grep). Crash diagnostics now use filtered grep -i "error|exception|traceback" instead of raw tail -n 50, reducing the surface for indirect prompt injection via training output.
  • .gitignore: adds results.json and run.log as runtime artifacts.

Existing stdout output is unchanged. No new dependencies (json is stdlib).

Similar to #79 but rebased on current main with no merge conflicts.

Test plan

  • Run uv run train.py and verify results.json is written with correct metrics
  • Verify stdout output is unchanged
  • Verify results.json values match stdout summary
  • Simulate a crash (e.g. syntax error in train.py) and confirm results.json is NOT written

🤖 Generated with Claude Code

…gnostics

train.py now writes a results.json file after evaluation with the same
metrics already printed to stdout. This gives agents a structured, parseable
results channel instead of relying on grepping free-form stdout.

program.md updated to read results.json first (fallback to grep), and to
use filtered grep for crash diagnostics instead of raw tail, reducing the
surface for indirect prompt injection via training output.

Fixes karpathy#64

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@shichangs
Copy link
Author

cc @johnwaldo (issue author) @mvanhorn (PR #79 author) — would appreciate your review. This takes a similar approach to #79 but rebased cleanly on current main.

Offline verification

The results.json output can be validated without a GPU:

import json

# Simulate what train.py writes
results = {
    "val_bpb": 0.9979,
    "training_seconds": 300.1,
    "total_seconds": 325.9,
    "peak_vram_mb": 45060.2,
    "mfu_percent": 39.8,
    "total_tokens_M": 499.6,
    "num_steps": 953,
    "num_params_M": 50.3,
    "depth": 8,
}
with open("results.json", "w") as f:
    json.dump(results, f, indent=2)

# Verify roundtrip
with open("results.json") as f:
    loaded = json.load(f)
assert loaded == results
assert isinstance(loaded["val_bpb"], float)
assert isinstance(loaded["num_steps"], int)
print("✓ results.json roundtrip OK")

The key behavioral changes:

  1. Success path: agent reads results.json (structured JSON) instead of grepping stdout → no injection surface
  2. Crash path: agent uses grep -i "error|exception|traceback" to filter relevant lines before falling back to tail → reduced injection surface

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Indirect prompt injection via training output fed back to agent

1 participant