fix(analysis): handle crash-first baselines and empty keep sets by afurm · Pull Request #469 · karpathy/autoresearch

afurm · 2026-04-02T15:03:52Z

Summary

make the analysis notebook use the first non-crash run as the baseline consistently
raise a clear error when results.tsv has no non-crash runs to analyze
handle cases with no KEEP rows yet so the notebook does not fail on summary/plot generation

Why

The notebook previously used two different baseline definitions:

plot cell: first non-crash run
summary cell: first row in results.tsv

That means if the first logged experiment was a crash, the plot and summary could disagree about the baseline. The summary cell could also fail outright when there were no kept runs yet because it assumed idxmin() was always valid.

What changed

plot cell now guards valid.empty before reading the baseline
plot y-axis bounds now handle the no-kept-runs case safely
summary cell now derives the baseline from the first non-crash run
summary cell now prints n/a values instead of failing when there are no kept runs yet

Scope

This is analysis-only. No training code or experiment behavior changed.

Validation

verified analysis.ipynb is still valid JSON
did not run the notebook end-to-end in this environment

fix(analysis): handle crash-first baseline cases

c0aff82

This comment was marked as low quality.

Sign in to view

barsharajyadav-boop approved these changes Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(analysis): handle crash-first baselines and empty keep sets#469

fix(analysis): handle crash-first baselines and empty keep sets#469
afurm wants to merge 1 commit into
karpathy:masterfrom
afurm:af/fix-analysis-baseline

afurm commented Apr 2, 2026

Uh oh!

This comment was marked as low quality.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

afurm commented Apr 2, 2026

Summary

Why

What changed

Scope

Validation

Uh oh!

This comment was marked as low quality.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants