Skip to content

Karpathy's AutoResearch with Memory-in-the-Loop States by The Adimension#302

Open
habanwer wants to merge 11 commits intokarpathy:masterfrom
habanwer:autoresearch/memory-in-the-loop
Open

Karpathy's AutoResearch with Memory-in-the-Loop States by The Adimension#302
habanwer wants to merge 11 commits intokarpathy:masterfrom
habanwer:autoresearch/memory-in-the-loop

Conversation

@habanwer
Copy link

Summary

This fork applies the DEITY Principles Framework (Data, Ethics, Informatics, Technology, You) to restructure autoresearch for transparent human-machine collaboration across GPU platforms — from Volta (SM 7.0) through Blackwell (SM 10.0).

Changes (8 per-file commits)

File Type Description
ground.json NEW User-owned read-only config: data paths, tokenizer, time budgets, processor overrides
model.json NEW Agent-owned hyperparameters: architecture, optimization, evaluation
prepare.py MODIFIED +125/-17 GPU platform detection, ground.json reads, PLATFORM dict export
train.py MODIFIED +194/-55 model.json reads, PLATFORM import, fp32 moments, crash handler, update_research_memory()
program.md MODIFIED +96/-114 Structured agent protocol with file ownership governance
.gitignore MODIFIED +5/-2 Add *.pkl, run.log; remove results.tsv from ignore
analysis.ipynb MODIFIED metadata Kernel updated to Python 3.12.10
README.md MODIFIED Fork introduction with DEITY Principles mapping

Key design decisions

  • Config extraction: Hardcoded constants → ground.json (user-owned) + model.json (agent-owned)
  • File ownership governance: Clear boundaries for human/agent through program.md
  • GPU auto-detection: dtype, attention backend, torch.compile, GradScaler per GPU generation
  • Memory-in-the-loop: update_research_memory() persists outcomes to sessions/memory.md so the agent's next hypothesis is informed by all prior runs, in addition to logs, and standard structure of tab-separated values.
  • Windows support: compile guards for sys.platform, extended in reference to notable repo: jsegov/autoresearch-win-rtx

Attribution

Author: Shehab Anwer, MD — habanwer · The Adimension

…earch prepare.py

Extract hardcoded constants (data paths, tokenizer settings, time budgets,
processor overrides) from karpathy/autoresearch prepare.py into a user-owned,
read-only JSON config. Enables transparent platform configuration without
modifying source code.

Fields: mode (test/train), data (HuggingFace cache/URL/shards),
tokenizer (vocab_size=8192, BPE split pattern, special tokens),
training (max_seq_len=2048, time budgets: test=60s/train=300s),
processor (dtype/compile/flash_attention/peak_flops — all 'auto' by default).

Upstream ref: karpathy/autoresearch master @ c2450ad
Blob SHA: 823225c
…arch train.py

Extract architecture and optimization constants from karpathy/autoresearch
train.py into an agent-owned JSON config. The agent modifies this file
during experiment iterations; the human reviews via version control.

Fields: architecture (depth=8, aspect_ratio=128, head_dim=64, window_pattern=SL),
optimization (total_batch_size_power=17, device_batch_size=16, LRs, betas,
warmup/warmdown ratios), evaluation (batch_size=16, tokens=3145728).

Upstream ref: karpathy/autoresearch master @ c2450ad
Blob SHA: b0227af
…fig from ground.json

Replace hardcoded constants with ground.json reads at import time.
Add _GPU_OPS_PER_CYCLE_PER_SM lookup table for compute capabilities:
Volta (7.0), Turing (7.5), Ampere (8.0/8.6/8.7), Ada (8.9),
Hopper (9.0), Blackwell (10.0).

New functions:
- _estimate_peak_flops(): compute peak FP16/BF16 tensor TFLOPS from
  SM count, clock rate, and ops-per-cycle lookup.
- _detect_platform(): auto-select dtype, attention backend (flash/sdpa),
  torch.compile, GradScaler, and embedding_dtype per GPU generation.
  Hopper+: bf16/flash/compile. Ampere/Ada: bf16/flash/compile.
  Turing/older: fp16/sdpa/no-compile/GradScaler.
  Windows compile guards (sys.platform != 'win32') for triton.

Exports: MAX_SEQ_LEN, TIME_BUDGET, PLATFORM dict.
ground.json processor overrides applied for non-'auto' values.

Platform detection ref: jsegov/autoresearch-win-rtx (Windows RTX adaptation)
Upstream ref: karpathy/autoresearch master @ c2450ad
Master blob: 06bea91
Modified blob: ed13834
… GPU-safe numerics

Replace hardcoded hyperparameters with model.json reads at startup.
Import PLATFORM dict from prepare.py for dtype, attention, compile,
GradScaler configuration.

Key changes:
- fp32 optimizer moments for fp16 parameters (Turing numerical stability)
- Gradient upcast to fp32 in AdamW update step
- _MUON_ORTHO_DTYPE: float32 for Turing (CC<8), bfloat16 for Ampere+
- Sliding-window attention mask caching (avoid recomputation per step)
- torch.amp.GradScaler(enabled=PLATFORM['use_grad_scaler'])
- autocast dtype from PLATFORM['dtype']
- update_research_memory(): append experiment outcome to sessions/memory.md
  (agent-owned, never writes to program.md)
- _crash_handler: sys.excepthook that calls update_research_memory on crash
- Parseable '---'-delimited key=value summary block at end of training

Upstream ref: karpathy/autoresearch master @ c2450ad
Master blob: 2e74397
Modified blob: bba5418
…free-form narrative

Rewrite agent instructions as a structured protocol with numbered sections:
1. Orientation — mandatory file reads (ground.json, model.json, prepare.py, train.py)
2. Decision metrics — table: val_bpb, peak_vram_mb, mfu_percent, training_seconds, total_tokens_M, num_params_M
3. File ownership — governance table: user-owned read-only (ground.json, prepare.py, program.md) vs agent-owned editable (model.json, train.py, results.tsv)
4. Execution sequence — first run (setup + baseline) and subsequent runs (hypothesis-driven experiment loop with keep/discard/crash status)
5. Logging rules — per-run log files in sessions/, append-only results.tsv
6. Constraints — time budget enforcement, no new packages, edit restrictions
7. Autonomy — continue iterating until manually stopped

Upstream ref: karpathy/autoresearch master @ c2450ad
Master blob: dea9bcc
Modified blob: 46ca3df
Remove results.tsv from ignore list (now tracked as append-only experiment log).
Add *.pkl (serialized model checkpoints) and run.log (runtime log).

Master blob: 99c30f5
Modified blob: 986b512
Kernel display name changed from '.venv' to 'Python 3'.
Python version updated from 3.10.12 to 3.12.10.
No code cell changes; 11 cells (none executed).

Master blob: 8455ea4
Modified blob: af82856
…DEITY Principles mapping

Add new section between karpathy's introduction and 'How it works':
- Title: Karpathy's AutoResearch with Memory-in-the-Loop States
- Byline: Shehab Anwer, MD (habanwer, The Adimension)
- DEITY Principles mapping: Data (JSON configs), Ethics (file ownership
  governance), Informatics (structured protocol), Technology (GPU platform
  detection Volta-Blackwell), You (human-machine loop via update_research_memory)
- Blockquote linking to published framework paper:
  doi.org/10.1093/ehjimp/qyaf038 (Eur Heart J Imaging Methods Pract, 2025)
- Upstream and related fork attribution

Master blob: 2bc3051
Modified blob: 44296ae
Copilot AI review requested due to automatic review settings March 16, 2026 18:34
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures the autoresearch workflow around explicit configuration files (ground.json for user-owned runtime constraints and model.json for agent-owned hyperparameters), adds GPU/platform auto-detection, and introduces “memory-in-the-loop” experiment summarization to support repeatable autonomous runs across diverse CUDA generations (and Windows constraints).

Changes:

  • Extracts hardcoded constants into ground.json (runtime/platform) and model.json (model/optimizer/eval).
  • Updates training to consume PLATFORM settings, add GradScaler support, fp32 optimizer moments, and write experiment memory to sessions/memory.md.
  • Rewrites program.md into a structured protocol with file ownership governance and logging rules; updates README to describe the fork.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
train.py Loads model.json, consumes PLATFORM, adds crash/memory logging, changes attention implementation, adds fp32 moments + GradScaler, and replaces final eval behavior
prepare.py Loads ground.json, exports PLATFORM via GPU detection, and makes CLI defaults configurable
program.md Replaces baseline agent loop with a structured, governed protocol and new logging conventions
model.json New agent-owned hyperparameter source of truth
ground.json New user-owned runtime/platform/data configuration
.gitignore Adjusts ignored artifacts (adds *.pkl, run.log; stops ignoring results.tsv)
README.md Adds fork overview and DEITY mapping
analysis.ipynb Metadata-only kernel/version update

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@habanwer
Copy link
Author

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the autoresearch workflow to separate user-owned “ground” configuration from agent-owned model hyperparameters, adds GPU/platform auto-detection, and introduces persistent experiment memory/logging conventions intended to support cross-GPU and cross-OS experimentation.

Changes:

  • Introduces ground.json (platform/data/time-budget config) and model.json (architecture/optimization/eval hyperparameters), and wires them into prepare.py/train.py.
  • Adds platform detection in prepare.py (dtype/compile/attention/peak FLOPS) and updates training loop to use GradScaler and fp32 optimizer moments for fp16 params.
  • Updates agent protocol/docs (program.md, README.md) and adjusts ignored artifacts (.gitignore).

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
train.py Loads hyperparams from model.json, adds memory/crash handler, swaps attention to SDPA + mask cache, adds GradScaler + fp32 moments, and implements a “fast eval” path.
prepare.py Loads constants from ground.json and exports PLATFORM auto-detection (dtype/attention/compile/peak FLOPS + overrides).
program.md Rewrites agent run protocol, config ownership, and results/logging guidance.
ground.json Adds user-owned config for cache/data/tokenizer/training/processor overrides.
model.json Adds agent-owned architecture/optimization/evaluation settings.
.gitignore Adds ignores for *.pkl and run.log, removes ignoring results.tsv.
analysis.ipynb Updates notebook kernel metadata (Python version).
README.md Adds fork introduction and DEITY/ownership mapping.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@svlandeg
Copy link
Collaborator

@copilot open a new pull request to apply changes based on the comments in this thread

Hi @habanwer!

My two cents: it's much nicer if you would iterate on the PR first locally, including asking an LLM to review and making corresponding edits etc, all BEFORE you open the PR to merge to upstream.

If you're asking copilot to review on the PR online, you'll be expected by OS maintainers to also at least address all those review comments (comment on them, apply them, resolve them). Opening multiple PRs for the same issue/feature adds a lot of noise/notications for maintainers.

@habanwer
Copy link
Author

Dear @svlandeg Thanks for your response, I am terribly sorry I clicked it by mistake, and then it ran - just noticed that! I will try to revert that or resubmit the pull - thanks again!

…, PLATFORM[compile] honoring, rotary dtype cast, unused imports, TSV header format, README grammar, notebook version alignment

- train.py: remove unused 'import re'; add cos/sin dtype cast in apply_rotary_emb
  to avoid fp32 upcast under autocast; clarify SDPA boolean mask convention
  (True = allowed, verified against PyTorch 2.6); guard eval_steps with max(1,...)
  to prevent div-by-zero; honor PLATFORM['compile'] instead of unconditional disable
- prepare.py: replace bare open() with try/except for FileNotFoundError and
  JSONDecodeError, validate required top-level keys, update comment to reflect
  ground.json is required (not optional)
- program.md: show results.tsv header with explicit \t separators to match
  csv.DictReader(delimiter='\t') usage in train.py
- README.md: fix subject-verb agreement ('are' not 'is'), possessive 'branch's ID',
  capitalize 'Timestamped'
- analysis.ipynb: align language_info.version with .python-version (3.10.0)
@habanwer
Copy link
Author

Dear @svlandeg

Thank you for the guidance — I've taken it to heart. The @copilot trigger was genuinely accidental, and I apologize for the noise. I managed it locally and then force-pushed a single commit (6181369) addressing the substantive findings from the automated review:

train.py:

  • Removed unused import re
  • Added dtype cast in apply_rotary_emb so rotary embeddings match the model's autocast dtype instead of forcing fp32 upcast
  • Guarded eval_steps with max(1, ...) to prevent division-by-zero crash specifically when token budget is small
  • Replaced unconditional torch.compile disable with PLATFORM["compile"] so Linux/Triton platforms benefit from compilation
  • Added clarifying comment on SDPA boolean mask convention (verified True = allowed against PyTorch 2.6 — the bot's claim of inverted semantics was inaccurate - for your kind review and approval)

prepare.py:

  • Replaced bare open() with explicit error handling (FileNotFoundError, JSONDecodeError) and top-level key validation — comment updated to reflect ground.json is required, not optional.

program.md:

  • Fixed results.tsv header to show explicit \t separators matching csv.DictReader(delimiter='\t') in train.py

README.md:

  • Fixed grammar: subject–verb agreement, possessive form, and capitalisation.

analysis.ipynb:

  • Aligned language_info.version with .python-version (3.10)

Two items intentionally left as-is:
1- fast eval path: intentional design to run ~3M tokens instead of 20M to stay within resources and examine within the timebudget and user-editable.
2- PLATFORM["attention"] branching for the FlashAttention integration as it is not within the scope for this PR, which I ran on Turing architecture GPU (RTX 5000).

Looking forward to your insightful comments and review - Thanks again for your time! Shehab @habanwer

Agent Instructions - Experiment AutoResearch with Memory in the Loop
@habanwer habanwer closed this Mar 18, 2026
@habanwer habanwer reopened this Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants