A research scaffold for building an interpretable, drive‑based agent that develops without external reward. SOMA proceeds in small, auditable milestones (M0→M11) with structured logs, a replay tool, and a lightweight evaluation harness.
Status: Core build complete through M11 (Evaluation Harness v1). M12 (richer sandbox/causal puzzles) is available as an optional extension. See Definition of Done below.
# from repo root
python -m venv .venv
. .\.venv\Scripts\Activate.ps1
python -m pip install -U pip
pip install -r requirements.txt
# optional for editable installs
# pip install -e .# Grid v0 (baseline)
python -m scripts.run --env grid-v0 --ticks 60 --seed 123 --size 9 --n-objects 18 --view-radius 1
# Grid v1 (richer affordances / state toggles)
python -m scripts.run --env grid-v1 --ticks 60 --seed 123 --size 9 --n-objects 16 --view-radius 1Each run writes a timestamped folder under runs/ with:
meta.json— run metadataevents.jsonl— append‑only event streamevents.sqlite— structured store (events table)report.md— autogenerated short run report (M11)caregiver_*.jsonl/json— query/answer/tag files (M10)
# Most recent run
python -m scripts.replay --kind note
# Or specify a folder
python -m scripts.replay --run runs\m10care_YYYYMMDDTHHMMSSZ --kind symbolList pending queries and answer with token→gloss tags.
# list queries
python -m scripts.caregiver ls runs\m10care_YYYYMMDDTHHMMSSZ
# answer a query (repeat --tag for multiple)
python -m scripts.caregiver answer runs\m10care_YYYYMMDDTHHMMSSZ --qid m10care_...:41 `
--tag N!=sudden-color-change --note "looked totally new"SOMA ingests answers during the next ticks; merged tags persist in caregiver_tags.json and appear in future symbol notes and reports.
# evaluate the most recent run
python -m scripts.eval last
# or a specific folder
python -m scripts.eval runs\m10care_YYYYMMDDTHHMMSSZOutputs report.md with novelty stats, memory‑reuse ratios, symbol diversity, and caregiver‑gloss usage.
- grid‑v0 — static colored shapes; no rewards; small viewport.
- grid‑v1 — adds affordances: toggleable objects (e.g., buttons/gates), simple multi‑step interactions, and persistent state. Both expose the same observation schema to SOMA.
Actions: up, down, left, right, noop, ping
- M0 — deterministic tick loop; JSONL events
- M1 — SQLite event store; SelfNotes; replay CLI
- M2 — Sandbox v0 (grid‑v0)
- M3 — Reflex engine (overload/loop safe‑guards)
- M4 — Memory v1 (episodic + vector store), similarity recall notes
- M5 — Curiosity v1 (novelty + attention)
- M6 — Motivation v1 (drive pressures)
- M7 — Behavior planner v1 (drive→policy)
- M8 — State tracker v1 (self‑model for interpretability)
- M9 — Symbolic channel v0 (compact utterances: N!, Stab↓, ?, …)
- M10 — Caregiver interface v0 (query/answer/tag loop)
- M11 — Evaluation harness v1 (markdown report)
- M12 (optional) — Sandbox v1 (richer affordances) with baseline planner support
soma/
configs/ # (reserved)
runs/ # per‑run artifacts
scripts/ # CLI entrypoints: run, replay, caregiver, eval
soma/
core/ # loop, state, events, store
cogs/ # reflex, memory, curiosity, motivation, planner, …
sandbox/ # envs: grid‑v0, grid‑v1
...
tests/ # (minimal stubs; expand as needed)
- One CLI:
python -m scripts.runwith flags for env/ticks/seed ✅ - Structured logs: JSONL + SQLite; per‑run
report.md✅ - Evaluation:
scripts.evalcomputes core developmental metrics ✅ - Replay:
scripts.replaysurfaces self‑notes & symbols ✅ - Caregiver loop:
scripts.caregiverfor queries/answers/tags ✅ - Docs: this README ✅
- Tests: minimal smoke tests in
tests/-- add more unit + scenario tests as follow‑ups
With M0–M11 shipped and v1 sandbox available, v0 SOMA is functional for experimentation and evaluation. Recommended follow‑ups: strengthen unit tests and add a couple of scenario tests (overload, contradiction) and 1–2 Architecture Decision Records.
- Module not found (soma): run commands from the repo root:
python -m scripts.run. - Typer/Click errors: ensure
typer>=0.12installed; preferpython -mstyle invocations. - Stale imports / odd behavior: clear
__pycache__and re‑run. - “noop lock” late‑run: parameters tuned in
staleness& planner; seesoma/cogs/working_memory/staleness.py.
MIT (or your preferred license).