Skip to content

feat: real-time web dashboard for experiment monitoring#334

Open
Death-Incarnate wants to merge 1 commit intokarpathy:masterfrom
Death-Incarnate:feature/dashboard
Open

feat: real-time web dashboard for experiment monitoring#334
Death-Incarnate wants to merge 1 commit intokarpathy:masterfrom
Death-Incarnate:feature/dashboard

Conversation

@Death-Incarnate
Copy link

What this adds

A single-file FastAPI dashboard (dashboard.py) that gives autoresearch a proper UI for monitoring and controlling experiments in real time.

Start it

cd autoresearch
uv run dashboard.py
# open http://localhost:7788

Set REPO_DIR env var to point at any autoresearch clone. Set PORT to change from 7788.


Tabs

Overview

  • KPI cards: best val_bpb, baseline, improvement %, total/kept/discarded run counts
  • val_bpb scatter chart colored by status (keep=green, discard=red, crash=gray)
  • Status pie chart
  • Live GPU stats: utilization %, VRAM used, temperature (polls nvidia-smi)

Live Training

  • Step progress bar with % complete
  • Loss + mfu% dual-axis chart, updates every 2 seconds via polling
  • SSE log stream from run.log — color-coded (amber for step lines, green for summary blocks, red for errors)
  • Autoscroll toggle

Experiments

  • Full results table sorted newest-first, best run highlighted in gold
  • Click any commit hash → inline diff of that experiment's train.py changes

Git History

  • Last 40 commits with click-to-diff

Controls

  • Launch form: run tag, model picker, max experiments
  • Stop button for the running experiment
  • Live hyperparameter table parsed directly from train.py

Settings

  • Edit program.md in-browser with save
  • Read-only train.py viewer

Implementation notes

  • Zero frontend dependencies — all HTML/CSS/JS served inline from the single Python file
  • FastAPI + uvicorn only (added to pyproject.toml)
  • SSE endpoint for streaming log tail to the browser
  • All file reads are relative to REPO_DIR so it works with any clone location

Single-file FastAPI dashboard (dashboard.py) with:
- Overview: KPI cards (best val_bpb, baseline, improvement %, run counts),
  val_bpb scatter chart colored by keep/discard/crash, status pie chart,
  live GPU stats (util %, VRAM, temperature)
- Live Training: step progress bar, loss + mfu% dual-axis chart (2s poll),
  SSE log stream with color-coded output, autoscroll toggle
- Experiments: sortable table, best run highlighted in gold, click commit
  hash to see inline train.py diff
- Git History: last 40 commits with click-to-diff
- Controls: launch form (run tag, model picker), stop button, live
  hyperparams table parsed from train.py
- Settings: edit program.md in-browser, read-only train.py viewer

Zero frontend deps — pure HTML/CSS/JS served inline from the single Python
file. Configure repo path via REPO_DIR env var, port via PORT env var
(default 7788).

Start with:
    cd autoresearch && uv run dashboard.py

Adds fastapi>=0.135.1 and uvicorn>=0.42.0 to pyproject.toml.
@Death-Incarnate
Copy link
Author

As context for why this matters right now: Alexey Grigorev just wrote a breakdown of autoresearch that's getting traction — https://alexeyondata.substack.com/p/karpathys-autoresearch-went-viral

One of the main things people hit when they clone this repo and start running experiments is zero visibility into what's happening. The training loop is a black box unless you're tailing logs manually.

The dashboard fills that gap — live loss curve, GPU stats, log stream, experiment history — all without adding any dependencies to the core research loop. It's opt-in and read-only relative to the experiment state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant