Karpathy's AutoResearch with Memory-in-the-Loop States by The Adimension#302
Karpathy's AutoResearch with Memory-in-the-Loop States by The Adimension#302habanwer wants to merge 11 commits intokarpathy:masterfrom
Conversation
…earch prepare.py Extract hardcoded constants (data paths, tokenizer settings, time budgets, processor overrides) from karpathy/autoresearch prepare.py into a user-owned, read-only JSON config. Enables transparent platform configuration without modifying source code. Fields: mode (test/train), data (HuggingFace cache/URL/shards), tokenizer (vocab_size=8192, BPE split pattern, special tokens), training (max_seq_len=2048, time budgets: test=60s/train=300s), processor (dtype/compile/flash_attention/peak_flops — all 'auto' by default). Upstream ref: karpathy/autoresearch master @ c2450ad Blob SHA: 823225c
…arch train.py Extract architecture and optimization constants from karpathy/autoresearch train.py into an agent-owned JSON config. The agent modifies this file during experiment iterations; the human reviews via version control. Fields: architecture (depth=8, aspect_ratio=128, head_dim=64, window_pattern=SL), optimization (total_batch_size_power=17, device_batch_size=16, LRs, betas, warmup/warmdown ratios), evaluation (batch_size=16, tokens=3145728). Upstream ref: karpathy/autoresearch master @ c2450ad Blob SHA: b0227af
…fig from ground.json Replace hardcoded constants with ground.json reads at import time. Add _GPU_OPS_PER_CYCLE_PER_SM lookup table for compute capabilities: Volta (7.0), Turing (7.5), Ampere (8.0/8.6/8.7), Ada (8.9), Hopper (9.0), Blackwell (10.0). New functions: - _estimate_peak_flops(): compute peak FP16/BF16 tensor TFLOPS from SM count, clock rate, and ops-per-cycle lookup. - _detect_platform(): auto-select dtype, attention backend (flash/sdpa), torch.compile, GradScaler, and embedding_dtype per GPU generation. Hopper+: bf16/flash/compile. Ampere/Ada: bf16/flash/compile. Turing/older: fp16/sdpa/no-compile/GradScaler. Windows compile guards (sys.platform != 'win32') for triton. Exports: MAX_SEQ_LEN, TIME_BUDGET, PLATFORM dict. ground.json processor overrides applied for non-'auto' values. Platform detection ref: jsegov/autoresearch-win-rtx (Windows RTX adaptation) Upstream ref: karpathy/autoresearch master @ c2450ad Master blob: 06bea91 Modified blob: ed13834
… GPU-safe numerics Replace hardcoded hyperparameters with model.json reads at startup. Import PLATFORM dict from prepare.py for dtype, attention, compile, GradScaler configuration. Key changes: - fp32 optimizer moments for fp16 parameters (Turing numerical stability) - Gradient upcast to fp32 in AdamW update step - _MUON_ORTHO_DTYPE: float32 for Turing (CC<8), bfloat16 for Ampere+ - Sliding-window attention mask caching (avoid recomputation per step) - torch.amp.GradScaler(enabled=PLATFORM['use_grad_scaler']) - autocast dtype from PLATFORM['dtype'] - update_research_memory(): append experiment outcome to sessions/memory.md (agent-owned, never writes to program.md) - _crash_handler: sys.excepthook that calls update_research_memory on crash - Parseable '---'-delimited key=value summary block at end of training Upstream ref: karpathy/autoresearch master @ c2450ad Master blob: 2e74397 Modified blob: bba5418
…free-form narrative Rewrite agent instructions as a structured protocol with numbered sections: 1. Orientation — mandatory file reads (ground.json, model.json, prepare.py, train.py) 2. Decision metrics — table: val_bpb, peak_vram_mb, mfu_percent, training_seconds, total_tokens_M, num_params_M 3. File ownership — governance table: user-owned read-only (ground.json, prepare.py, program.md) vs agent-owned editable (model.json, train.py, results.tsv) 4. Execution sequence — first run (setup + baseline) and subsequent runs (hypothesis-driven experiment loop with keep/discard/crash status) 5. Logging rules — per-run log files in sessions/, append-only results.tsv 6. Constraints — time budget enforcement, no new packages, edit restrictions 7. Autonomy — continue iterating until manually stopped Upstream ref: karpathy/autoresearch master @ c2450ad Master blob: dea9bcc Modified blob: 46ca3df
…DEITY Principles mapping Add new section between karpathy's introduction and 'How it works': - Title: Karpathy's AutoResearch with Memory-in-the-Loop States - Byline: Shehab Anwer, MD (habanwer, The Adimension) - DEITY Principles mapping: Data (JSON configs), Ethics (file ownership governance), Informatics (structured protocol), Technology (GPU platform detection Volta-Blackwell), You (human-machine loop via update_research_memory) - Blockquote linking to published framework paper: doi.org/10.1093/ehjimp/qyaf038 (Eur Heart J Imaging Methods Pract, 2025) - Upstream and related fork attribution Master blob: 2bc3051 Modified blob: 44296ae
There was a problem hiding this comment.
Pull request overview
This PR restructures the autoresearch workflow around explicit configuration files (ground.json for user-owned runtime constraints and model.json for agent-owned hyperparameters), adds GPU/platform auto-detection, and introduces “memory-in-the-loop” experiment summarization to support repeatable autonomous runs across diverse CUDA generations (and Windows constraints).
Changes:
- Extracts hardcoded constants into
ground.json(runtime/platform) andmodel.json(model/optimizer/eval). - Updates training to consume
PLATFORMsettings, add GradScaler support, fp32 optimizer moments, and write experiment memory tosessions/memory.md. - Rewrites
program.mdinto a structured protocol with file ownership governance and logging rules; updates README to describe the fork.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
train.py |
Loads model.json, consumes PLATFORM, adds crash/memory logging, changes attention implementation, adds fp32 moments + GradScaler, and replaces final eval behavior |
prepare.py |
Loads ground.json, exports PLATFORM via GPU detection, and makes CLI defaults configurable |
program.md |
Replaces baseline agent loop with a structured, governed protocol and new logging conventions |
model.json |
New agent-owned hyperparameter source of truth |
ground.json |
New user-owned runtime/platform/data configuration |
.gitignore |
Adjusts ignored artifacts (adds *.pkl, run.log; stops ignoring results.tsv) |
README.md |
Adds fork overview and DEITY mapping |
analysis.ipynb |
Metadata-only kernel/version update |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@copilot open a new pull request to apply changes based on the comments in this thread |
There was a problem hiding this comment.
Pull request overview
This PR refactors the autoresearch workflow to separate user-owned “ground” configuration from agent-owned model hyperparameters, adds GPU/platform auto-detection, and introduces persistent experiment memory/logging conventions intended to support cross-GPU and cross-OS experimentation.
Changes:
- Introduces
ground.json(platform/data/time-budget config) andmodel.json(architecture/optimization/eval hyperparameters), and wires them intoprepare.py/train.py. - Adds platform detection in
prepare.py(dtype/compile/attention/peak FLOPS) and updates training loop to useGradScalerand fp32 optimizer moments for fp16 params. - Updates agent protocol/docs (
program.md,README.md) and adjusts ignored artifacts (.gitignore).
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
train.py |
Loads hyperparams from model.json, adds memory/crash handler, swaps attention to SDPA + mask cache, adds GradScaler + fp32 moments, and implements a “fast eval” path. |
prepare.py |
Loads constants from ground.json and exports PLATFORM auto-detection (dtype/attention/compile/peak FLOPS + overrides). |
program.md |
Rewrites agent run protocol, config ownership, and results/logging guidance. |
ground.json |
Adds user-owned config for cache/data/tokenizer/training/processor overrides. |
model.json |
Adds agent-owned architecture/optimization/evaluation settings. |
.gitignore |
Adds ignores for *.pkl and run.log, removes ignoring results.tsv. |
analysis.ipynb |
Updates notebook kernel metadata (Python version). |
README.md |
Adds fork introduction and DEITY/ownership mapping. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Hi @habanwer! My two cents: it's much nicer if you would iterate on the PR first locally, including asking an LLM to review and making corresponding edits etc, all BEFORE you open the PR to merge to upstream. If you're asking copilot to review on the PR online, you'll be expected by OS maintainers to also at least address all those review comments (comment on them, apply them, resolve them). Opening multiple PRs for the same issue/feature adds a lot of noise/notications for maintainers. |
|
Dear @svlandeg Thanks for your response, I am terribly sorry I clicked it by mistake, and then it ran - just noticed that! I will try to revert that or resubmit the pull - thanks again! |
…, PLATFORM[compile] honoring, rotary dtype cast, unused imports, TSV header format, README grammar, notebook version alignment
- train.py: remove unused 'import re'; add cos/sin dtype cast in apply_rotary_emb
to avoid fp32 upcast under autocast; clarify SDPA boolean mask convention
(True = allowed, verified against PyTorch 2.6); guard eval_steps with max(1,...)
to prevent div-by-zero; honor PLATFORM['compile'] instead of unconditional disable
- prepare.py: replace bare open() with try/except for FileNotFoundError and
JSONDecodeError, validate required top-level keys, update comment to reflect
ground.json is required (not optional)
- program.md: show results.tsv header with explicit \t separators to match
csv.DictReader(delimiter='\t') usage in train.py
- README.md: fix subject-verb agreement ('are' not 'is'), possessive 'branch's ID',
capitalize 'Timestamped'
- analysis.ipynb: align language_info.version with .python-version (3.10.0)
|
Dear @svlandeg Thank you for the guidance — I've taken it to heart. The train.py:
prepare.py:
program.md:
README.md:
analysis.ipynb:
Two items intentionally left as-is: Looking forward to your insightful comments and review - Thanks again for your time! Shehab @habanwer |
Agent Instructions - Experiment AutoResearch with Memory in the Loop
Summary
This fork applies the DEITY Principles Framework (Data, Ethics, Informatics, Technology, You) to restructure autoresearch for transparent human-machine collaboration across GPU platforms — from Volta (SM 7.0) through Blackwell (SM 10.0).
Changes (8 per-file commits)
ground.jsonmodel.jsonprepare.pyPLATFORMdict exporttrain.pyupdate_research_memory()program.md.gitignoreanalysis.ipynbREADME.mdKey design decisions
ground.json(user-owned) +model.json(agent-owned)update_research_memory()persists outcomes tosessions/memory.mdso the agent's next hypothesis is informed by all prior runs, in addition to logs, and standard structure of tab-separated values.Attribution
Author: Shehab Anwer, MD — habanwer · The Adimension