write results.json for structured agent consumption, harden crash diagnostics#331
Open
shichangs wants to merge 1 commit intokarpathy:masterfrom
Open
write results.json for structured agent consumption, harden crash diagnostics#331shichangs wants to merge 1 commit intokarpathy:masterfrom
shichangs wants to merge 1 commit intokarpathy:masterfrom
Conversation
…gnostics train.py now writes a results.json file after evaluation with the same metrics already printed to stdout. This gives agents a structured, parseable results channel instead of relying on grepping free-form stdout. program.md updated to read results.json first (fallback to grep), and to use filtered grep for crash diagnostics instead of raw tail, reducing the surface for indirect prompt injection via training output. Fixes karpathy#64 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
|
cc @johnwaldo (issue author) @mvanhorn (PR #79 author) — would appreciate your review. This takes a similar approach to #79 but rebased cleanly on current main. Offline verificationThe import json
# Simulate what train.py writes
results = {
"val_bpb": 0.9979,
"training_seconds": 300.1,
"total_seconds": 325.9,
"peak_vram_mb": 45060.2,
"mfu_percent": 39.8,
"total_tokens_M": 499.6,
"num_steps": 953,
"num_params_M": 50.3,
"depth": 8,
}
with open("results.json", "w") as f:
json.dump(results, f, indent=2)
# Verify roundtrip
with open("results.json") as f:
loaded = json.load(f)
assert loaded == results
assert isinstance(loaded["val_bpb"], float)
assert isinstance(loaded["num_steps"], int)
print("✓ results.json roundtrip OK")The key behavioral changes:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #64
train.py: writesresults.jsonafter evaluation with the same metrics already printed to stdout. Gives agents a structured, parseable results channel instead of relying on grepping free-form stdout fromrun.log.program.md: instructs agent to readresults.jsonfirst (fallback to grep). Crash diagnostics now use filteredgrep -i "error|exception|traceback"instead of rawtail -n 50, reducing the surface for indirect prompt injection via training output..gitignore: addsresults.jsonandrun.logas runtime artifacts.Existing stdout output is unchanged. No new dependencies (
jsonis stdlib).Similar to #79 but rebased on current main with no merge conflicts.
Test plan
uv run train.pyand verifyresults.jsonis written with correct metricsresults.jsonvalues match stdout summaryresults.jsonis NOT written🤖 Generated with Claude Code