diff --git a/program.md b/program.md index dea9bcc01..f5d2f0507 100644 --- a/program.md +++ b/program.md @@ -11,7 +11,7 @@ To set up a new experiment, work with the user to: 3. **Read the in-scope files**: The repo is small. Read these files for full context: - `README.md` — repository context. - `prepare.py` — fixed constants, data prep, tokenizer, dataloader, evaluation. Do not modify. - - `train.py` — the file you modify. Model architecture, optimizer, training loop. + - `train.py` — the source file you modify. Model architecture, optimizer, training loop. 4. **Verify data exists**: Check that `~/.cache/autoresearch/` contains data shards and a tokenizer. If not, tell the human to run `uv run prepare.py`. 5. **Initialize results.tsv**: Create `results.tsv` with just the header row. The baseline will be recorded after the first run. 6. **Confirm and go**: Confirm setup looks good. @@ -23,7 +23,14 @@ Once you get confirmation, kick off the experimentation. Each experiment runs on a single GPU. The training script runs for a **fixed time budget of 5 minutes** (wall clock training time, excluding startup/compilation). You launch it simply as: `uv run train.py`. **What you CAN do:** -- Modify `train.py` — this is the only file you edit. Everything is fair game: model architecture, optimizer, hyperparameters, training loop, batch size, model size, etc. +- Modify `train.py` — this is the only source file you edit. Everything is fair game: model architecture, optimizer, hyperparameters, training loop, batch size, model size, etc. +- Do not modify `prepare.py` or any other source file. + +Experiment artifacts and bookkeeping: +- You must append exactly one new row to `results.tsv` after every experiment, including `keep`, `discard`, and `crash`. +- You must append the row before any `git reset`, checkout, or revert. +- Updating `results.tsv` is required bookkeeping and is not a source-code modification. +- You may write or overwrite `run.log` for each run. **What you CANNOT do:** - Modify `prepare.py`. It is read-only. It contains the fixed evaluation, data loading, tokenizer, and training constants (time budget, sequence length, etc).