feat: CLI analysis tool for experiment results — structured feedback for the autonomous agent

### Problem

I've been running autoresearch autonomously on my local setup for extended sessions (50-100+ experiments overnight). The agent logs everything to `results.tsv` as instructed by `program.md`, but has **no programmatic way to analyze those results**.

Currently the agent has to:
1. Manually `cat results.tsv` and reason about raw tab-separated data
2. Try to mentally track which experiments improved things and by how much
3. Guess whether it's plateauing or still making progress

This becomes a real bottleneck during long autonomous runs. The agent wastes experiments re-trying approaches that are clearly in a local minimum, because it can't easily see the big picture of what's been tried and what worked.

### Proposed Solution

A lightweight `analysis.py` CLI script that mirrors the existing `analysis.ipynb` but is callable by the agent (or by the human checking in on a run):

```bash
uv run analysis.py                          # text report to stdout
uv run analysis.py --json                   # machine-readable JSON for the agent
uv run analysis.py --plot progress.png      # save progress chart
uv run analysis.py --tsv path/to/results.tsv  # custom TSV path
```

The `--json` mode is the key addition — the agent can call it and get structured data:

```json
{
  "total_experiments": 52,
  "kept": 11,
  "discarded": 38,
  "crashed": 3,
  "keep_rate": 0.2245,
  "baseline_bpb": 0.9979,
  "best_bpb": 0.9612,
  "improvement": 0.0367,
  "improvement_pct": 3.68,
  "best_experiment": "increase batch size to 2**20",
  "top_hits": [...],
  "trajectory": "plateauing"
}
```

The `trajectory` field (`improving` / `plateauing` / `stuck`) is especially useful — `program.md` could instruct the agent to check this periodically and switch strategies when progress stalls.

### Why this matters for autonomous research

The whole point of autoresearch is that the agent runs independently for hours. Right now it's flying blind between experiments — it can see the last result but not the trend. This is like a researcher who records lab notes but never reads them back. Giving the agent a structured summary between experiments directly improves experiment selection quality.

### Design constraints

- Uses only existing dependencies (pandas, numpy, matplotlib — already in `pyproject.toml`)
- Single file, no changes to `prepare.py` or `train.py`
- Reads `results.tsv` in the exact format `program.md` specifies
- Keeps the repo minimal — just one new file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CLI analysis tool for experiment results — structured feedback for the autonomous agent #476

Problem

Proposed Solution

Why this matters for autonomous research

Design constraints

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: CLI analysis tool for experiment results — structured feedback for the autonomous agent #476

Description

Problem

Proposed Solution

Why this matters for autonomous research

Design constraints

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions