Template for Workflow 1.5 (
/experiment-bridge). Fill in, save asrefine-logs/EXPERIMENT_PLAN.md, then run/experiment-bridge.
Problem: [What problem does your method solve?] Method Thesis: [One-sentence description of your approach]
| Claim | Why It Matters | Minimum Convincing Evidence | Linked Blocks |
|---|---|---|---|
| C1: [Main claim] | [Why] | [Evidence needed] | B1, B2 |
| C2: [Supporting claim] | [Why] | [Evidence needed] | B3 |
- Claim tested: C1
- Dataset / split / task: [e.g., ImageNet val]
- Compared systems: [Your method vs. Baseline A vs. Baseline B]
- Metrics: [Primary: accuracy/PPL. Secondary: throughput]
- Setup details: [Backbone, optimizer, lr, epochs, seeds]
- Success criterion: [e.g., "> 2% accuracy over baseline"]
- Failure interpretation: [If negative, what does it mean?]
- Priority: MUST-RUN
- Claim tested: C1 (novelty isolation)
- Compared systems: [Full method, -component A, -component B]
- Success criterion: [Each component contributes > 0.5%]
- Priority: MUST-RUN
- Priority: NICE-TO-HAVE
| Milestone | Goal | Runs | Decision Gate | Cost |
|---|---|---|---|---|
| M0: Sanity | Pipeline works | 1 quick run | Loss decreases? | ~0.5h |
| M1: Baselines | Reproduce baselines | Block 3 | Numbers match? | ~4h |
| M2: Main | Full method | Block 1 | Meets criterion? | ~8h |
| M3: Ablation | Components | Block 2 | Each matters? | ~6h |
- Total estimated GPU-hours: ~18h
- Hardware: [e.g., 4x RTX 3090]
- Biggest bottleneck: [e.g., baseline reproduction]
- Risk: [What could go wrong] → Mitigation: [How to handle it]