Add UCB1 Dimension-Aware Search + Experiment Memory to program.md — no code changes required

# Add UCB1 Dimension-Aware Search + Experiment Memory to `program.md` — no code changes required

## Problem: The Current Loop is a Memoryless Random Walk

The autoresearch agent runs a sequential keep/discard loop, but it has no memory of what it has tried and no principled strategy for what to try next. After 50 overnight experiments, the agent has the same strategic information it started with: none.

Concretely, three structural gaps:

1. **No experiment memory.** The agent's only record of history is the current state of `train.py` — i.e., only what was *kept*. Zero record of what failed, what was close, or what dimension it already exhausted.

2. **No guided exploration.** No mechanism to balance trying a new class of change (exploration) vs. doubling down on a dimension that already showed improvement (exploitation). It is a pure random walk across code-editing decisions.

3. **No early abort.** Every run burns the full 5 minutes regardless of whether the first 90 seconds of loss curves indicate a regression. At 12 experiments/hour, roughly half of overnight runs are discards that could have been caught at the 90-second mark.

---

## Current Loop vs. Proposed Loop

```mermaid
flowchart LR
    subgraph NOW ["❌ Current Loop"]
        direction TB
        A1([Read train.py]) --> B1([Guess a change])
        B1 --> C1([Run 5 min])
        C1 --> D1{val_bpb\nimproved?}
        D1 -- yes --> E1([Keep])
        D1 -- no --> F1([Discard])
        E1 --> A1
        F1 --> A1
        style NOW fill:#2d0000,stroke:#ff4444,color:#fff
        style A1 fill:#3d0000,stroke:#ff6666,color:#fff
        style B1 fill:#3d0000,stroke:#ff6666,color:#fff
        style C1 fill:#3d0000,stroke:#ff6666,color:#fff
        style D1 fill:#4d0000,stroke:#ff6666,color:#fff
        style E1 fill:#3d0000,stroke:#ff6666,color:#fff
        style F1 fill:#3d0000,stroke:#ff6666,color:#fff
    end

    subgraph NEW ["✅ DUSE Loop"]
        direction TB
        A2([Read experiments.json]) --> B2([Compute UCB1\nacross 7 dims])
        B2 --> C2([Select highest\nUCB1 dimension])
        C2 --> D2([Propose targeted\nchange])
        D2 --> E2([90s gate:\nloss regressing?])
        E2 -- abort --> A2
        E2 -- continue --> F2([Run full 5 min])
        F2 --> G2{val_bpb\nimproved?}
        G2 -- yes --> H2([Keep + log])
        G2 -- no --> I2([Discard + rescue\npool check])
        H2 --> A2
        I2 --> A2
        style NEW fill:#001a00,stroke:#44ff44,color:#fff
        style A2 fill:#002200,stroke:#66ff66,color:#fff
        style B2 fill:#002200,stroke:#66ff66,color:#fff
        style C2 fill:#002200,stroke:#66ff66,color:#fff
        style D2 fill:#002200,stroke:#66ff66,color:#fff
        style E2 fill:#003300,stroke:#66ff66,color:#fff
        style F2 fill:#002200,stroke:#66ff66,color:#fff
        style G2 fill:#003300,stroke:#66ff66,color:#fff
        style H2 fill:#002200,stroke:#66ff66,color:#fff
        style I2 fill:#002200,stroke:#66ff66,color:#fff
    end
```

---

## How UCB1 Dimension Selection Works

```mermaid
%%{init: {'theme': 'dark', 'themeVariables': {'fontSize': '14px'}}}%%
xychart-beta
    title "UCB1 Scores After 20 Experiments — Agent Selects 'attention' (never tried)"
    x-axis ["optimizer", "architecture", "attention", "normalization", "schedule", "regularization", "batching"]
    y-axis "UCB1 Score" 0 --> 3.5
    bar [0.91, 0.73, 3.20, 0.58, 1.45, 2.10, 1.88]
```

> The agent computes `UCB1(dim) = avg_improvement(dim) + 1.0 × √(ln(N) / n(dim))` for each dimension before every experiment. High scores emerge from **either** strong past returns (exploitation) or low trial count (exploration). Untried dimensions always get a large exploration bonus — meaning no dimension ever gets permanently abandoned.

---

## Proposed Modification: Dimensional UCB1 Search + Experiment Memory (DUSE)

**Pure `program.md` addition. Zero changes to `train.py`, `prepare.py`, or any code. No new dependencies.**

### Section 1 — Dimension Map

Add to `program.md`:

```markdown
## Dimension Map

Every experiment belongs to exactly one of these seven architectural dimensions.
Assign a label before proposing any change.

| Dimension       | Covers |
|-----------------|--------|
| `optimizer`     | optimizer algorithm, LR, weight decay, gradient clipping, momentum |
| `architecture`  | n_layer, n_embd, n_head, feedforward ratio, parameter count |
| `attention`     | attention pattern, relative position, sparse or windowed attention |
| `normalization` | RMSNorm vs LayerNorm, pre/post norm, placement |
| `schedule`      | LR warmup, decay shape (cosine/linear), cycle length |
| `regularization`| dropout, weight decay schedule, stochastic depth |
| `batching`      | batch size, gradient accumulation, sequence packing |

One change = one dimension. If a change spans two dimensions, split into two experiments.
```

### Section 2 — Experiment Log

```markdown
## Experiment Log

After every run, append one record to `experiments.json`:

{
  "id": 1,
  "dimension": "optimizer",
  "delta": "switched AdamW weight_decay 0.1 → 0.01",
  "val_bpb": 0.991,
  "baseline_bpb": 0.998,
  "improvement": 0.007,
  "status": "keep"
}

If experiments.json does not exist, create it as an empty array [] before run 1.
```

### Section 3 — UCB1 Dimension Selector

```markdown
## Choosing What to Experiment On Next

Read experiments.json. Compute UCB1 for each of the seven dimensions:

  UCB1(dim) = mean_improvement(dim) + 1.0 * sqrt( ln(N) / n(dim) )

Where:
  mean_improvement(dim) = average improvement across all experiments in this dim (0.0 if none)
  N = total experiments logged
  n(dim) = experiments in this dim (use 0.5 if none, to avoid divide-by-zero)

Select the dimension with the highest score.
Print all seven scores before proposing your change so the reasoning is auditable.

Example:
  UCB1 scores (N=12):
    optimizer:     0.003 + 0.930 = 0.933
    architecture:  0.001 + 1.315 = 1.316
    attention:     0.000 + 2.630 = 2.630  ← SELECTED (never tried)
    normalization: 0.002 + 1.045 = 1.047
    schedule:      0.005 + 0.740 = 0.745
    regularization:0.000 + 1.858 = 1.858
    batching:      0.001 + 1.315 = 1.316
```

### Section 4 — Early Abort Gate

```markdown
## Early Abort Gate

At the 90-second training checkpoint, compare current val loss to the baseline
val loss at the same step from the last kept experiment.

If current val loss > baseline * 1.05, abort.
Log the run as discarded with "early_abort": true.
Immediately start the next UCB1 selection cycle.

This recovers 3-4 minutes per bad experiment and lifts throughput from ~12 to ~16-18 experiments/hour.
```

### Section 5 — Crossover Rescue Pool (Optional)

```markdown
## Rescue Pool

When discarding, note whether any sub-mechanism showed local promise before the regression
(e.g., faster early convergence even though final val_bpb regressed).

Append to rescue_pool.json:

{
  "from_experiment": 7,
  "dimension": "schedule",
  "mechanism": "linear warmup over 200 steps showed faster initial convergence",
  "reuse_signal": "recombine with a lower peak LR"
}

Before any new experiment, scan rescue_pool.json for recombination candidates in the same dimension.
```

---

## Why This Is Novel

The bandit and NAS literature (Hyperband, BOHB, SMAC, PBT) applies UCB1 / Thompson sampling to **hyperparameter values** within a predefined search space. None apply it to the meta-question of *which code-region category an autonomous code-editing agent should touch next*. That is the specific gap this fills.

Cross-domain research backing each mechanism:

| Mechanism | Source Signal |
|---|---|
| UCB1 arm selection | *Exploration vs. Exploitation: Comparative Analysis and Practical Implications*; *In-depth Exploration and Implementation of Multi-Armed Bandit* — UCB1 is robust to non-stationary arm rewards, which is true here since improving one dimension shifts marginal returns on others |
| Dimension mutation strategy | *Improving Evolutionary Neural Architecture Search: Flexibility* — mutation strategy choice matters more than specific mutation; Dimension Map operationalizes this for a code-editing agent |
| Early abort = PBT restart | *Iterated Population Based Training with Task-Agnostic Restarts* — abort underperforming workers early, reallocate budget; DUSE applies this to sequential single-agent setting |
| Rescue pool = partial crossover | *A Gradient-Guided Evolutionary Neural Architecture Search* — retain sub-mechanisms from discarded experiments rather than discarding wholesale |
| Transferable schedules | *MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks*; *Curvature-Adaptive Learning Rate Optimizer* — LR schedule structure transfers; agent can reuse schedule patterns from successful runs in new contexts |

---

## Expected Impact

| Metric | Current | With DUSE |
|---|---|---|
| Experiments/hour | ~12 | ~16–18 (early abort) |
| Dimension stagnation | Common — agent re-explores same territory | Bounded by UCB1 exploration term |
| Knowledge after 100 runs | Implicit in train.py state only | `experiments.json` — queryable per-dimension improvement breakdown |
| Agent reasoning transparency | Implicit | UCB1 scores printed every step, fully auditable |
| Wasted compute on clear regressions | ~50% of runs | Caught at 90s gate |

After 100 overnight runs, `experiments.json` is a research artifact in its own right — a structured record of which architectural dimensions drove improvement, which were explored but unproductive, and whether the search converged or kept diversifying.

---

## Implementation

**Changes to `train.py`:** None
**Changes to `prepare.py`:** None
**New dependencies:** None
**Changes to `program.md`:** The five sections above

The agent computes UCB1 in its own reasoning step using the JSON log it maintains. The only new files are `experiments.json` and optionally `rescue_pool.json`, both created and maintained by the agent itself.

Happy to draft the exact `program.md` diff if useful.

---

*Research powered by https://insider77circle.github.io/redstorm-research/ — cutting-edge open-source research AI built for next-generation intelligence.*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add UCB1 Dimension-Aware Search + Experiment Memory to program.md — no code changes required #284

Add UCB1 Dimension-Aware Search + Experiment Memory to `program.md` — no code changes required

Problem: The Current Loop is a Memoryless Random Walk

Current Loop vs. Proposed Loop

How UCB1 Dimension Selection Works

Proposed Modification: Dimensional UCB1 Search + Experiment Memory (DUSE)

Section 1 — Dimension Map

Section 2 — Experiment Log

Section 3 — UCB1 Dimension Selector

Section 4 — Early Abort Gate

Section 5 — Crossover Rescue Pool (Optional)

Why This Is Novel

Expected Impact

Implementation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Mechanism	Source Signal
UCB1 arm selection	Exploration vs. Exploitation: Comparative Analysis and Practical Implications; In-depth Exploration and Implementation of Multi-Armed Bandit — UCB1 is robust to non-stationary arm rewards, which is true here since improving one dimension shifts marginal returns on others
Dimension mutation strategy	Improving Evolutionary Neural Architecture Search: Flexibility — mutation strategy choice matters more than specific mutation; Dimension Map operationalizes this for a code-editing agent
Early abort = PBT restart	Iterated Population Based Training with Task-Agnostic Restarts — abort underperforming workers early, reallocate budget; DUSE applies this to sequential single-agent setting
Rescue pool = partial crossover	A Gradient-Guided Evolutionary Neural Architecture Search — retain sub-mechanisms from discarded experiments rather than discarding wholesale
Transferable schedules	MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks; Curvature-Adaptive Learning Rate Optimizer — LR schedule structure transfers; agent can reuse schedule patterns from successful runs in new contexts

Metric	Current	With DUSE
Experiments/hour	~12	~16–18 (early abort)
Dimension stagnation	Common — agent re-explores same territory	Bounded by UCB1 exploration term
Knowledge after 100 runs	Implicit in train.py state only	`experiments.json` — queryable per-dimension improvement breakdown
Agent reasoning transparency	Implicit	UCB1 scores printed every step, fully auditable
Wasted compute on clear regressions	~50% of runs	Caught at 90s gate

Add UCB1 Dimension-Aware Search + Experiment Memory to program.md — no code changes required #284

Description

Add UCB1 Dimension-Aware Search + Experiment Memory to program.md — no code changes required

Problem: The Current Loop is a Memoryless Random Walk

Current Loop vs. Proposed Loop

How UCB1 Dimension Selection Works

Proposed Modification: Dimensional UCB1 Search + Experiment Memory (DUSE)

Section 1 — Dimension Map

Section 2 — Experiment Log

Section 3 — UCB1 Dimension Selector

Section 4 — Early Abort Gate

Section 5 — Crossover Rescue Pool (Optional)

Why This Is Novel

Expected Impact

Implementation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Add UCB1 Dimension-Aware Search + Experiment Memory to `program.md` — no code changes required