Autoresearch documentation follows the Diataxis framework — four quadrants organized by user need.
LEARNING WORKING
┌───────────────────────┐ ┌──────────────────────┐
│ │ │ │
PRACTICAL │ Tutorials │ │ How-To Guides │
│ learning-oriented │ │ task-oriented │
│ │ │ │
└───────────────────────┘ └──────────────────────┘
┌────────────────────────┐ ┌──────────────────────┐
│ │ │ │
THEORETICAL │ Explanation │ │ Reference │
│ understanding-oriented │ │ information-oriented │
│ │ │ │
└────────────────────────┘ └──────────────────────┘
Step-by-step lessons that take you through a complete experience.
| Document | Description |
|---|---|
| Getting Started | Your first autoresearch loop — install, run, review, approve |
| Creating Evals from Scratch | Build evals for a skill that has none using --eval-doctor |
| Improving an Existing Skill | Take a working skill from 65% to 90%+ |
Practical steps for accomplishing a particular goal.
| Document | Description |
|---|---|
| Run the Improvement Loop | Execute the core loop with all available options |
| Manage Evals | Create, fix, and update evaluation cases |
| Interpret Results | Read results.tsv, convergence reports, and diffs |
| Customize Iterations | Change max iterations and understand abort thresholds |
| Apply Changes | Review and apply the best version to your original skill |
| Recover from Failure | Resume after interruption, inspect snapshots, manually revert |
| Integrate with Skill Creator | Post-loop description optimization with skill-creator |
Precise, complete descriptions of the machinery.
| Document | Description |
|---|---|
| CLI Reference | Complete /autoresearch command reference with all flags and modes |
| Algorithm | Formal specification of the improvement loop |
| File Formats | results.tsv schema, workspace layout, snapshot format |
| Eval Schema | evals.json and trigger-eval.json schemas |
| Agents | Agent specs: improver, eval-doctor, convergence-reporter, grader |
| Scripts | Script API: snapshot.py, score.py, results_log.py, diff_report.py |
Discussion and context that illuminate concepts.
| Document | Description |
|---|---|
| The Autoresearch Pattern | Karpathy's pattern, its philosophy, and how it maps to skills |
| Eval-Skill Separation | Why evals and skills are improved separately |
| Convergence and Scoring | How scoring works, what convergence means, non-determinism |
| Lifecycle | Full lifecycle from eval readiness through the meta-loop |
| Component Architecture | How orchestrator, agents, and scripts interact |
| Expected Results | Typical score trajectories, "good enough", common failures |