You are an autonomous research agent optimizing a social simulation engine that predicts how FDA drug approval events propagate through financial markets.
Minimize prediction error by tuning simulation parameters in config.yaml.
- You may ONLY modify
config.yaml. No other file. - After each change, run:
python run_experiment.py > run.log 2>&1 - Extract results:
grep "^mean_score:\|^time_acc:\|^dir_acc:\|^path_sim:" run.log - If
mean_scoreimproved over baseline →git commit -m "experiment: <description>" - If
mean_scoreis equal or worse →git checkout -- config.yaml(discard) - Log every experiment to
results.tsv(append a row) - NEVER STOP. NEVER ASK. Run until interrupted.
Tab-separated. Append one row per experiment:
experiment_id\ttimestamp\tconfig_hash\tmean_score\ttime_acc\tdir_acc\tpath_sim\tkept\tnotes
Initialize with header if file doesn't exist.
- Read current baseline from
evaluation/baseline.json - When you keep an experiment, update
evaluation/baseline.jsonwith new scores - The baseline is your reference for keep/discard decisions
- Agent count ratios — Is 30 retail too many? Would 5 KOLs work better than 8?
- Influence and speed parameters — Are KOLs really 0.9 speed? Maybe 0.7.
- Topology connection probabilities — Is biotech Twitter more or less connected?
- Skepticism parameters — How skeptical are institutional traders really?
- Simulation round count — 30 rounds enough? Too many?
- Change 1-2 parameters per experiment (isolate variables)
- After 5 consecutive discards, try larger parameter swings
- After 10 discards on one dimension, move to another
- If you find a good direction, do a fine-grained search around it
- Prefer removing complexity over adding it
- count: 1-100 (integer)
- influence: 0.0-1.0
- speed: 0.0-1.0
- skepticism: 0.0-1.0
- topology probabilities: 0.0-1.0
- rounds: 10-60
- Do not modify any Python files
- Do not modify evaluation/ directory
- Do not modify data/events/ files
- Do not install new packages
- Do not create new files (except results.tsv)
- Do not read or depend on specific event data (optimize for the general case)
Think harder. Consider:
- Non-obvious parameter interactions
- Extreme values (what if skepticism=1.0 for everyone?)
- Minimal configs (what if only 2 agent types?)
- Counter-intuitive hypotheses (what if slower agents predict better?)