What variant of Codex are you using?
CLI
What feature would you like to see?
Summary
Codex can already read procedural knowledge (skills) and recall context (memories), but it has no first-class way for the agent to turn one of its own successful, repeatable runs into a reusable, executable, verifiable procedure — and to repair that procedure cheaply when the environment drifts, instead of re-deriving the whole thing from scratch.
I'd like to propose an agent-recordable procedural skill capability (call it "executable memory"): a small, surface-consistent extension that sits between today's static skills and background-extracted memories.
Additional information
The gap (grounded in the current tree)
Three adjacent primitives already exist, but none closes the loop:
- Skills are static and human-authored.
SKILL.md is parsed at load time (codex-rs/core-skills/src/loader.rs), and SkillMetadata is a read-only snapshot (codex-rs/core-skills/src/model.rs). There is no agent-facing path to write or update a skill from a successful run.
- Memories are unstructured context, not executable procedure. The agent can append a free-form markdown note via
AddAdHocNote (codex-rs/ext/memories/src/tools/ad_hoc_note.rs), and memories are auto-injected into context by a ContextContributor. But a note is prose, not a replayable step sequence with success criteria.
- Traces are diagnostic, not re-executable.
rollout-trace records everything and exposes replay_bundle (codex-rs/rollout-trace/src/lib.rs), but explicitly for offline reconstruction / debugging — there is no production pathway to replay a recorded sequence against new inputs.
So the agent re-discovers the same multi-step procedure (a chain of shell/tool calls) on every repeat of a recurring task, paying full exploration cost in tokens and latency each time.
Proposed capability (minimal slice)
Reuse what already exists; add only the missing loop. Concretely:
- Record — an agent-callable tool (e.g.
skill.record) that, after a successful repeatable task, persists the procedure as a SKILL.md under ./.codex/skills/<name> (existing format), where the body captures steps = the tool-call sequence plus postconditions = how to verify success. This is the agent-write counterpart to today's human-authored skills, and a structured counterpart to AddAdHocNote.
- Recall + replay — surface matching recorded skills the way memories already auto-surface (
ContextContributor), and let the agent replay the stored steps instead of re-exploring.
- Verify — on replay, check the recorded postconditions rather than assuming success.
- Minimal repair — when a postcondition fails, patch the failing step (and persist the patch) instead of discarding and re-deriving the whole procedure. Keeping runtime patches separate from the trusted baseline keeps a recorded skill auditable.
- Safety gate — route any step the agent flags as high-risk through the existing approval flow before replay, so a recorded procedure can't silently execute a destructive step.
Capture could optionally be assisted by existing PostToolUse / Stop hook events (codex-rs/hooks/src/lib.rs) and the rollout-trace reducer as the source of the step sequence — no new recording infrastructure required.
Why this fits Codex (and not a separate product)
- Surface-consistent. It's pure text/tool-call procedure persisted as files. It works headless and applies equally to CLI, IDE, and cloud surfaces — no new platform-specific dependencies, no display requirement.
- Directly on the existing roadmap surface. It's an incremental extension of
core-skills + memories, not a new subsystem. The novel pieces are narrow: an agent write path for skills, postcondition verification, and minimal repair.
- Concrete payoff. Recurring tasks (repeated build/test/debug loops, multi-step setup/migration chores, fixed tool pipelines) stop paying full re-exploration cost on every run — fewer tokens, lower latency, more determinism.
Open design questions (happy to discuss before any code)
- Should a recorded skill be a
SKILL.md superset (add an optional executable/verify block) or a sibling artifact type, to avoid confusing human-authored vs. agent-recorded skills?
- What's the right trigger for record — an explicit tool the model calls, a user command, or a
Stop-hook prompt at end of a successful task?
- How should postconditions be expressed so they're cheap to evaluate and safe (no side effects)?
- What's the minimal-repair contract — patch-and-persist vs. propose-patch-for-confirmation?
- Where does this sit relative to the memories Phase-1 background pipeline — complementary, or should recording feed that pipeline?
I'm glad to iterate on the design in this thread before anything is implemented; identifying the right shape here seems like the hard part.
What variant of Codex are you using?
CLI
What feature would you like to see?
Summary
Codex can already read procedural knowledge (skills) and recall context (memories), but it has no first-class way for the agent to turn one of its own successful, repeatable runs into a reusable, executable, verifiable procedure — and to repair that procedure cheaply when the environment drifts, instead of re-deriving the whole thing from scratch.
I'd like to propose an agent-recordable procedural skill capability (call it "executable memory"): a small, surface-consistent extension that sits between today's static skills and background-extracted memories.
Additional information
The gap (grounded in the current tree)
Three adjacent primitives already exist, but none closes the loop:
SKILL.mdis parsed at load time (codex-rs/core-skills/src/loader.rs), andSkillMetadatais a read-only snapshot (codex-rs/core-skills/src/model.rs). There is no agent-facing path to write or update a skill from a successful run.AddAdHocNote(codex-rs/ext/memories/src/tools/ad_hoc_note.rs), and memories are auto-injected into context by aContextContributor. But a note is prose, not a replayable step sequence with success criteria.rollout-tracerecords everything and exposesreplay_bundle(codex-rs/rollout-trace/src/lib.rs), but explicitly for offline reconstruction / debugging — there is no production pathway to replay a recorded sequence against new inputs.So the agent re-discovers the same multi-step procedure (a chain of shell/tool calls) on every repeat of a recurring task, paying full exploration cost in tokens and latency each time.
Proposed capability (minimal slice)
Reuse what already exists; add only the missing loop. Concretely:
skill.record) that, after a successful repeatable task, persists the procedure as aSKILL.mdunder./.codex/skills/<name>(existing format), where the body captures steps = the tool-call sequence plus postconditions = how to verify success. This is the agent-write counterpart to today's human-authored skills, and a structured counterpart toAddAdHocNote.ContextContributor), and let the agent replay the stored steps instead of re-exploring.Capture could optionally be assisted by existing
PostToolUse/Stophook events (codex-rs/hooks/src/lib.rs) and therollout-tracereducer as the source of the step sequence — no new recording infrastructure required.Why this fits Codex (and not a separate product)
core-skills+memories, not a new subsystem. The novel pieces are narrow: an agent write path for skills, postcondition verification, and minimal repair.Open design questions (happy to discuss before any code)
SKILL.mdsuperset (add an optional executable/verify block) or a sibling artifact type, to avoid confusing human-authored vs. agent-recorded skills?Stop-hook prompt at end of a successful task?I'm glad to iterate on the design in this thread before anything is implemented; identifying the right shape here seems like the hard part.