Karpathy's AutoResearch with Memory-in-the-Loop States by The Adimension by habanwer · Pull Request #302 · karpathy/autoresearch

habanwer · 2026-03-16T18:34:15Z

Summary

This fork applies the DEITY Principles Framework (Data, Ethics, Informatics, Technology, You) to restructure autoresearch for transparent human-machine collaboration across GPU platforms — from Volta (SM 7.0) through Blackwell (SM 10.0).

Changes (8 per-file commits)

File	Type	Description
`ground.json`	NEW	User-owned read-only config: data paths, tokenizer, time budgets, processor overrides
`model.json`	NEW	Agent-owned hyperparameters: architecture, optimization, evaluation
`prepare.py`	MODIFIED +125/-17	GPU platform detection, ground.json reads, `PLATFORM` dict export
`train.py`	MODIFIED +194/-55	model.json reads, PLATFORM import, fp32 moments, crash handler, `update_research_memory()`
`program.md`	MODIFIED +96/-114	Structured agent protocol with file ownership governance
`.gitignore`	MODIFIED +5/-2	Add *.pkl, run.log; remove results.tsv from ignore
`analysis.ipynb`	MODIFIED metadata	Kernel updated to Python 3.12.10
`README.md`	MODIFIED	Fork introduction with DEITY Principles mapping

Key design decisions

Config extraction: Hardcoded constants → ground.json (user-owned) + model.json (agent-owned)
File ownership governance: Clear boundaries for human/agent through program.md
GPU auto-detection: dtype, attention backend, torch.compile, GradScaler per GPU generation
Memory-in-the-loop: update_research_memory() persists outcomes to sessions/memory.md so the agent's next hypothesis is informed by all prior runs, in addition to logs, and standard structure of tab-separated values.
Windows support: compile guards for sys.platform, extended in reference to notable repo: jsegov/autoresearch-win-rtx

Attribution

Upstream: karpathy/autoresearch
Platform support: jsegov/autoresearch-win-rtx
Framework: The Adimension: DEITY Principles — Eur Heart J Imaging Methods Pract, Shehab Anwer, 2025.

Author: Shehab Anwer, MD — habanwer · The Adimension

…earch prepare.py Extract hardcoded constants (data paths, tokenizer settings, time budgets, processor overrides) from karpathy/autoresearch prepare.py into a user-owned, read-only JSON config. Enables transparent platform configuration without modifying source code. Fields: mode (test/train), data (HuggingFace cache/URL/shards), tokenizer (vocab_size=8192, BPE split pattern, special tokens), training (max_seq_len=2048, time budgets: test=60s/train=300s), processor (dtype/compile/flash_attention/peak_flops — all 'auto' by default). Upstream ref: karpathy/autoresearch master @ c2450ad Blob SHA: 823225c

…arch train.py Extract architecture and optimization constants from karpathy/autoresearch train.py into an agent-owned JSON config. The agent modifies this file during experiment iterations; the human reviews via version control. Fields: architecture (depth=8, aspect_ratio=128, head_dim=64, window_pattern=SL), optimization (total_batch_size_power=17, device_batch_size=16, LRs, betas, warmup/warmdown ratios), evaluation (batch_size=16, tokens=3145728). Upstream ref: karpathy/autoresearch master @ c2450ad Blob SHA: b0227af

…fig from ground.json Replace hardcoded constants with ground.json reads at import time. Add _GPU_OPS_PER_CYCLE_PER_SM lookup table for compute capabilities: Volta (7.0), Turing (7.5), Ampere (8.0/8.6/8.7), Ada (8.9), Hopper (9.0), Blackwell (10.0). New functions: - _estimate_peak_flops(): compute peak FP16/BF16 tensor TFLOPS from SM count, clock rate, and ops-per-cycle lookup. - _detect_platform(): auto-select dtype, attention backend (flash/sdpa), torch.compile, GradScaler, and embedding_dtype per GPU generation. Hopper+: bf16/flash/compile. Ampere/Ada: bf16/flash/compile. Turing/older: fp16/sdpa/no-compile/GradScaler. Windows compile guards (sys.platform != 'win32') for triton. Exports: MAX_SEQ_LEN, TIME_BUDGET, PLATFORM dict. ground.json processor overrides applied for non-'auto' values. Platform detection ref: jsegov/autoresearch-win-rtx (Windows RTX adaptation) Upstream ref: karpathy/autoresearch master @ c2450ad Master blob: 06bea91 Modified blob: ed13834

… GPU-safe numerics Replace hardcoded hyperparameters with model.json reads at startup. Import PLATFORM dict from prepare.py for dtype, attention, compile, GradScaler configuration. Key changes: - fp32 optimizer moments for fp16 parameters (Turing numerical stability) - Gradient upcast to fp32 in AdamW update step - _MUON_ORTHO_DTYPE: float32 for Turing (CC<8), bfloat16 for Ampere+ - Sliding-window attention mask caching (avoid recomputation per step) - torch.amp.GradScaler(enabled=PLATFORM['use_grad_scaler']) - autocast dtype from PLATFORM['dtype'] - update_research_memory(): append experiment outcome to sessions/memory.md (agent-owned, never writes to program.md) - _crash_handler: sys.excepthook that calls update_research_memory on crash - Parseable '---'-delimited key=value summary block at end of training Upstream ref: karpathy/autoresearch master @ c2450ad Master blob: 2e74397 Modified blob: bba5418

…free-form narrative Rewrite agent instructions as a structured protocol with numbered sections: 1. Orientation — mandatory file reads (ground.json, model.json, prepare.py, train.py) 2. Decision metrics — table: val_bpb, peak_vram_mb, mfu_percent, training_seconds, total_tokens_M, num_params_M 3. File ownership — governance table: user-owned read-only (ground.json, prepare.py, program.md) vs agent-owned editable (model.json, train.py, results.tsv) 4. Execution sequence — first run (setup + baseline) and subsequent runs (hypothesis-driven experiment loop with keep/discard/crash status) 5. Logging rules — per-run log files in sessions/, append-only results.tsv 6. Constraints — time budget enforcement, no new packages, edit restrictions 7. Autonomy — continue iterating until manually stopped Upstream ref: karpathy/autoresearch master @ c2450ad Master blob: dea9bcc Modified blob: 46ca3df

Remove results.tsv from ignore list (now tracked as append-only experiment log). Add *.pkl (serialized model checkpoints) and run.log (runtime log). Master blob: 99c30f5 Modified blob: 986b512

Kernel display name changed from '.venv' to 'Python 3'. Python version updated from 3.10.12 to 3.12.10. No code cell changes; 11 cells (none executed). Master blob: 8455ea4 Modified blob: af82856

…DEITY Principles mapping Add new section between karpathy's introduction and 'How it works': - Title: Karpathy's AutoResearch with Memory-in-the-Loop States - Byline: Shehab Anwer, MD (habanwer, The Adimension) - DEITY Principles mapping: Data (JSON configs), Ethics (file ownership governance), Informatics (structured protocol), Technology (GPU platform detection Volta-Blackwell), You (human-machine loop via update_research_memory) - Blockquote linking to published framework paper: doi.org/10.1093/ehjimp/qyaf038 (Eur Heart J Imaging Methods Pract, 2025) - Upstream and related fork attribution Master blob: 2bc3051 Modified blob: 44296ae

Copilot

Pull request overview

This PR restructures the autoresearch workflow around explicit configuration files (ground.json for user-owned runtime constraints and model.json for agent-owned hyperparameters), adds GPU/platform auto-detection, and introduces “memory-in-the-loop” experiment summarization to support repeatable autonomous runs across diverse CUDA generations (and Windows constraints).

Changes:

Extracts hardcoded constants into ground.json (runtime/platform) and model.json (model/optimizer/eval).
Updates training to consume PLATFORM settings, add GradScaler support, fp32 optimizer moments, and write experiment memory to sessions/memory.md.
Rewrites program.md into a structured protocol with file ownership governance and logging rules; updates README to describe the fork.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`train.py`	Loads `model.json`, consumes `PLATFORM`, adds crash/memory logging, changes attention implementation, adds fp32 moments + GradScaler, and replaces final eval behavior
`prepare.py`	Loads `ground.json`, exports `PLATFORM` via GPU detection, and makes CLI defaults configurable
`program.md`	Replaces baseline agent loop with a structured, governed protocol and new logging conventions
`model.json`	New agent-owned hyperparameter source of truth
`ground.json`	New user-owned runtime/platform/data configuration
`.gitignore`	Adjusts ignored artifacts (adds `*.pkl`, `run.log`; stops ignoring `results.tsv`)
`README.md`	Adds fork overview and DEITY mapping
`analysis.ipynb`	Metadata-only kernel/version update

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

train.py

prepare.py

program.md

README.md

train.py

habanwer · 2026-03-17T11:39:13Z

@copilot open a new pull request to apply changes based on the comments in this thread

Copilot

Pull request overview

This PR refactors the autoresearch workflow to separate user-owned “ground” configuration from agent-owned model hyperparameters, adds GPU/platform auto-detection, and introduces persistent experiment memory/logging conventions intended to support cross-GPU and cross-OS experimentation.

Changes:

Introduces ground.json (platform/data/time-budget config) and model.json (architecture/optimization/eval hyperparameters), and wires them into prepare.py/train.py.
Adds platform detection in prepare.py (dtype/compile/attention/peak FLOPS) and updates training loop to use GradScaler and fp32 optimizer moments for fp16 params.
Updates agent protocol/docs (program.md, README.md) and adjusts ignored artifacts (.gitignore).

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
`train.py`	Loads hyperparams from `model.json`, adds memory/crash handler, swaps attention to SDPA + mask cache, adds GradScaler + fp32 moments, and implements a “fast eval” path.
`prepare.py`	Loads constants from `ground.json` and exports `PLATFORM` auto-detection (dtype/attention/compile/peak FLOPS + overrides).
`program.md`	Rewrites agent run protocol, config ownership, and results/logging guidance.
`ground.json`	Adds user-owned config for cache/data/tokenizer/training/processor overrides.
`model.json`	Adds agent-owned architecture/optimization/evaluation settings.
`.gitignore`	Adds ignores for `*.pkl` and `run.log`, removes ignoring `results.tsv`.
`analysis.ipynb`	Updates notebook kernel metadata (Python version).
`README.md`	Adds fork introduction and DEITY/ownership mapping.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

train.py

program.md

prepare.py

train.py

analysis.ipynb

README.md

train.py

svlandeg · 2026-03-17T12:30:03Z

@copilot open a new pull request to apply changes based on the comments in this thread

Hi @habanwer!

My two cents: it's much nicer if you would iterate on the PR first locally, including asking an LLM to review and making corresponding edits etc, all BEFORE you open the PR to merge to upstream.

If you're asking copilot to review on the PR online, you'll be expected by OS maintainers to also at least address all those review comments (comment on them, apply them, resolve them). Opening multiple PRs for the same issue/feature adds a lot of noise/notications for maintainers.

habanwer · 2026-03-17T12:32:26Z

Dear @svlandeg Thanks for your response, I am terribly sorry I clicked it by mistake, and then it ran - just noticed that! I will try to revert that or resubmit the pull - thanks again!

…, PLATFORM[compile] honoring, rotary dtype cast, unused imports, TSV header format, README grammar, notebook version alignment - train.py: remove unused 'import re'; add cos/sin dtype cast in apply_rotary_emb to avoid fp32 upcast under autocast; clarify SDPA boolean mask convention (True = allowed, verified against PyTorch 2.6); guard eval_steps with max(1,...) to prevent div-by-zero; honor PLATFORM['compile'] instead of unconditional disable - prepare.py: replace bare open() with try/except for FileNotFoundError and JSONDecodeError, validate required top-level keys, update comment to reflect ground.json is required (not optional) - program.md: show results.tsv header with explicit \t separators to match csv.DictReader(delimiter='\t') usage in train.py - README.md: fix subject-verb agreement ('are' not 'is'), possessive 'branch's ID', capitalize 'Timestamped' - analysis.ipynb: align language_info.version with .python-version (3.10.0)

habanwer · 2026-03-17T16:52:32Z

Dear @svlandeg

Thank you for the guidance — I've taken it to heart. The @copilot trigger was genuinely accidental, and I apologize for the noise. I managed it locally and then force-pushed a single commit (6181369) addressing the substantive findings from the automated review:

train.py:

Removed unused import re
Added dtype cast in apply_rotary_emb so rotary embeddings match the model's autocast dtype instead of forcing fp32 upcast
Guarded eval_steps with max(1, ...) to prevent division-by-zero crash specifically when token budget is small
Replaced unconditional torch.compile disable with PLATFORM["compile"] so Linux/Triton platforms benefit from compilation
Added clarifying comment on SDPA boolean mask convention (verified True = allowed against PyTorch 2.6 — the bot's claim of inverted semantics was inaccurate - for your kind review and approval)

prepare.py:

Replaced bare open() with explicit error handling (FileNotFoundError, JSONDecodeError) and top-level key validation — comment updated to reflect ground.json is required, not optional.

program.md:

Fixed results.tsv header to show explicit \t separators matching csv.DictReader(delimiter='\t') in train.py

README.md:

Fixed grammar: subject–verb agreement, possessive form, and capitalisation.

analysis.ipynb:

Aligned language_info.version with .python-version (3.10)

Two items intentionally left as-is:
1- fast eval path: intentional design to run ~3M tokens instead of 20M to stay within resources and examine within the timebudget and user-editable.
2- PLATFORM["attention"] branching for the FlashAttention integration as it is not within the scope for this PR, which I ran on Turing architecture GPU (RTX 5000).

Looking forward to your insightful comments and review - Thanks again for your time! Shehab @habanwer

analysis.ipynb

Agent Instructions - Experiment AutoResearch with Memory in the Loop

habanwer added 8 commits March 16, 2026 15:59

.gitignore [MODIFIED +5/-2] — update ignore rules for multi-GPU workflow

849b0fd

Remove results.tsv from ignore list (now tracked as append-only experiment log). Add *.pkl (serialized model checkpoints) and run.log (runtime log). Master blob: 99c30f5 Modified blob: 986b512

analysis.ipynb [MODIFIED metadata] — update kernel to Python 3.12.10

64850c1

Kernel display name changed from '.venv' to 'Python 3'. Python version updated from 3.10.12 to 3.12.10. No code cell changes; 11 cells (none executed). Master blob: 8455ea4 Modified blob: af82856

Copilot AI review requested due to automatic review settings March 16, 2026 18:34

Copilot started reviewing on behalf of habanwer March 16, 2026 18:34 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

train.py Show resolved Hide resolved

train.py Show resolved Hide resolved

prepare.py Outdated Show resolved Hide resolved

program.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

train.py Show resolved Hide resolved

train.py Show resolved Hide resolved

habanwer requested a review from Copilot March 17, 2026 11:47

Copilot started reviewing on behalf of habanwer March 17, 2026 11:47 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

habanwer commented Mar 17, 2026

View reviewed changes

analysis.ipynb Show resolved Hide resolved

Agent Instructions - Experiment AutoResearch with Memory in the Loop

ecab7a7

Agent Instructions - Experiment AutoResearch with Memory in the Loop

habanwer closed this Mar 18, 2026

Merge branch 'karpathy:master' into autoresearch/memory-in-the-loop

1d3afbc

habanwer reopened this Mar 18, 2026

Conversation

habanwer commented Mar 16, 2026

Summary

Changes (8 per-file commits)

Key design decisions

Attribution

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

habanwer commented Mar 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

svlandeg commented Mar 17, 2026

Uh oh!

habanwer commented Mar 17, 2026

Uh oh!

habanwer commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants