feat: CLI analysis tool for experiment results by MohammadWasi · Pull Request #495 · karpathy/autoresearch

MohammadWasi · 2026-04-07T02:21:27Z

Problem

During long autonomous research sessions (50-100+ experiments), the AI agent has no programmatic way to analyze experiment results from results.tsv. This creates a significant bottleneck where agents:

Must manually parse raw tab-separated data
Cannot track progress trends effectively
Waste experiments retrying approaches in local minima
Lack structured feedback for decision-making

Solution

Implemented a comprehensive CLI analysis tool (analysis.py) that provides structured feedback for both humans and autonomous agents:

Features

Multiple output formats: Text for humans, JSON for agents
Progress trajectory analysis: Detects if experiments are improving/plateauing/stuck
Comprehensive statistics: Keep rates, improvements, top hits
Visualization: Progress plots with matplotlib
Flexible input: Custom TSV file paths
Agent-ready: JSON output with trajectory insights

Usage

uv run analysis.py                    # Human-readable report
uv run analysis.py --json            # Machine-readable for agents  
uv run analysis.py --plot progress.png # Save visualization
uv run analysis.py --tsv custom.tsv  # Custom results file

Replace tiktoken decode/encode approach with direct mergeable_ranks lookup to avoid UTF-8 replacement character inflation in evaluation metrics. The old method could inflate BPB scores when BPE tokens contained invalid UTF-8 sequences, as tiktoken.decode() replaces them with U+FFFD (3 bytes) instead of the actual raw byte length (often 1 byte). Fixes karpathy#384

Explicitly define py-modules in pyproject.toml to resolve setuptools discovery issue that prevents editable installs. This fixes the 'Multiple top-level modules' error when running 'pip install -e .' or 'uv pip install -e .' Fixes karpathy#387

Add analysis.py CLI tool that provides structured feedback for autonomous agents during long experiment runs. Features include: - Text and JSON output formats for human and agent consumption - Progress trajectory analysis (improving/plateauing/stuck) - Experiment statistics and improvement tracking - Progress plot generation with matplotlib - Comprehensive test suite with 20 test cases Usage: uv run analysis.py # text report uv run analysis.py --json # JSON for agents uv run analysis.py --plot progress.png # visualization uv run analysis.py --tsv custom.tsv # custom results file Fixes karpathy#476

ravyg · 2026-04-08T18:26:12Z

Hi @MohammadWasi — heads-up that this PR implements the same feature as #475, which I opened on Apr 3 (4 days before this one) and which closes the same issue (#476) that I authored.

The two PRs are functionally equivalent:

Same analysis.py filename
Same CLI flags: --json, --plot, --tsv
Same trajectory states: improving / plateauing / stuck
Same JSON output shape and same text report structure
Same set of computed stats: experiment counts, baseline vs best, top hits, keep rate

The only meaningful difference I can spot is that you added a test_analysis.py file. Other deltas are stylistic (slightly different function decomposition, different magic numbers in the trajectory thresholds).

If you'd like to collaborate, please drop a comment on #475 — I'm happy to fold any improvements from your version (e.g. the unittest suite, if @karpathy prefers that style) into my PR and happy to add/give credits for any significant changes. That way the maintainer reviews one PR instead of two parallel implementations of the same feature.

svlandeg

Agreed, thanks @ravyg. Closing as duplicate.

ravyg · 2026-04-08T18:33:22Z

Thanks @svlandeg and @MohammadWasi, appreciate it 🙏

@MohammadWasi

Covers load_results, compute_stats, trajectory states, edge cases, text report, and save_plot. Uses stdlib unittest only — no new deps. Credit to @MohammadWasi (karpathy#495) for suggesting test coverage.

MohammadWasi added 3 commits April 7, 2026 07:32

MohammadWasi changed the title ~~Feat/cli analysis tool~~ feat: CLI analysis tool for experiment results Apr 7, 2026

MohammadWasi marked this pull request as ready for review April 7, 2026 02:31

svlandeg reviewed Apr 8, 2026

View reviewed changes

svlandeg closed this Apr 8, 2026

ravyg mentioned this pull request Apr 14, 2026

feat: add CLI analysis tool for experiment results #475

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CLI analysis tool for experiment results#495

feat: CLI analysis tool for experiment results#495
MohammadWasi wants to merge 3 commits into
karpathy:masterfrom
MohammadWasi:feat/cli-analysis-tool

MohammadWasi commented Apr 7, 2026 •

edited

Loading

Uh oh!

ravyg commented Apr 8, 2026 •

edited

Loading

Uh oh!

svlandeg left a comment

Uh oh!

ravyg commented Apr 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MohammadWasi commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Features

Usage

Uh oh!

ravyg commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

svlandeg left a comment

Choose a reason for hiding this comment

Uh oh!

ravyg commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MohammadWasi commented Apr 7, 2026 •

edited

Loading

ravyg commented Apr 8, 2026 •

edited

Loading

ravyg commented Apr 8, 2026 •

edited

Loading