AGENTS.md

Purpose

This file defines how coding agents must operate in this repository to build the Agentic Video RAG system safely, consistently, and with high signal.

Scope and Priority

This file applies to the entire repository.
If a deeper AGENTS.md is added in a subdirectory, it overrides this file for that subtree.
Product truth source: design/spec_groundtruth.md.
Execution tracking source: design/execution_plan.md (live milestone and status file).
If code and spec disagree, update code to match the spec or propose a spec change explicitly.

Codex Instruction Discovery

Codex should be expected to discover instructions from multiple layers:

~/.codex/AGENTS.md (user/global defaults).
Repository root AGENTS.md (this file).
Subdirectory AGENTS.md files (more specific scope).

Priority rule: the most specific in-scope file wins if there is a conflict.

Local Overrides (Do Not Commit)

For personal/local machine customization, use AGENTS.override.md.

AGENTS.override.md may override this file for local workflows.
AGENTS.override.md must stay uncommitted (add to .gitignore if needed).
Project-wide policy changes must go into AGENTS.md, not override files.

Project North Star

Ship an evidence-grounded 7-stage Video RAG pipeline that:

Produces camera/time-linked claims.
Preserves entity consistency across cameras.
Surfaces uncertainty instead of hallucinating certainty.
Is reproducible via strict config and versioned artifacts.

Non-Negotiable Engineering Rules

Config format must be YAML.
Config composition/merging must use OmegaConf.
All merged config must be validated with strict Pydantic (extra="forbid").
Follow DRY strictly: define model/store IDs and thresholds once; reference everywhere else.
Never hardcode thresholds/model IDs in runtime logic when they belong in config.
Every claim-producing path must preserve evidence references.
Any fallback path must emit explicit uncertainty/failure flags.

Architecture Guardrails

Preserve the 7-stage contract from design/spec_groundtruth.md.
Keep stage boundaries explicit: ingestion, retrieval, grounding, ReID, temporal localization, graph memory, synthesis.
Use unresolved states for ambiguous identity links; do not force merges.
Retrieval confidence is not proof; temporal grounding + evidence linking are required before synthesis.

Required Deliverables Per Change

For any non-trivial change, include:

Code changes.
Config/schema updates (if behavior changes).
Tests for the changed behavior.
Execution progress update in design/execution_plan.md when milestone status changes.
Short note in the Learning Log (see "Continuous Improvement Rules").

Instruction Quality Rules (Best Practices)

When editing this file, keep instructions high signal:

Be explicit, concrete, and testable.
Prefer repo-specific rules over generic advice.
State priorities and exceptions clearly.
Avoid contradictory requirements across sections.
Keep this file concise; move long procedures to dedicated docs and link them.

Repository Conventions

Use src/ for implementation code.
Use tests/ for tests.
Use config/ for YAML configs and override profiles.
Use scripts/ for runnable helpers.
Keep experimental notebooks and one-off scripts out of core runtime paths.

Coding Standards

Python 3.11+ only.
Type hints required on public functions.
Prefer small, pure functions for stage logic.
Avoid hidden global state.
Use clear names matching spec terms (stage_1, activity_ingestion, ObjectClusterID, etc.).
Raise explicit errors for broken contracts; fail early.

Config and Schema Standards

Centralize constants (thresholds, top-k, retry limits).
Add cross-reference validators (stage IDs, model IDs, datastore/resource IDs).
Maintain a single stage_catalog map from stage_id to crisp stage_name.
Validate stage completeness (Stage 1..7 exactly once in stage specs).
Validate each stage's stage_name against the canonical stage_catalog mapping.
Constrain confidence-like parameters to [0, 1].
Add migration notes when renaming config keys.

Testing Standards

Minimum required test coverage for each new feature:

Happy-path unit test.
At least one failure/edge-case test.
Config validation test for relevant schema changes.
Regression test when fixing a bug.

Pipeline-specific checks should include:

Stage I/O contract validation.
Evidence linkage completeness for synthesizeable claims.
Ambiguity handling (unresolved identities remain unresolved).

Agent Execution Workflow

Read relevant spec section(s) first.
Read and align with current milestone state in design/execution_plan.md.
State assumptions briefly before major edits.
Implement smallest coherent change that passes tests.
Run tests/lint for touched components.
Update docs/config/tests together.
Update design/execution_plan.md when status, risk, or dates changed.
Add a Learning Log entry for substantial changes.

Execution Plan Operating Rules

Treat design/execution_plan.md as a live control file.

Keep milestone IDs stable; do not rename without explicit migration note.
Allowed statuses: not_started, in_progress, blocked, done.
On status transition to in_progress, set Start Date if empty.
On status transition to done, set Completed Date.
If blocked, include clear unblock condition in Notes / Risks.
Do not mark a milestone done unless its acceptance gate is satisfied.
If a change affects scope/timeline, update Target Date and note reason.

Standard Commands (Update As Repo Evolves)

Agents should prefer these canonical commands when available:

Setup: uv sync (or project-approved equivalent).
Tests: uv run pytest.
Lint: uv run ruff check ..
Format: uv run ruff format ..

If these commands change, update this section in the same PR.

What Agents Must Avoid

Bypassing schema validation for speed.
Introducing duplicate config values across files.
Coupling stage internals tightly across boundaries.
Silent fallback behavior without logging/flags.
Large refactors without incremental verification.

Pull Request / Change Checklist

Before finalizing, agents must verify:

Behavior matches design/spec_groundtruth.md.
design/execution_plan.md status/notes are updated for impacted milestone(s).
YAML + OmegaConf + Pydantic flow is preserved.
DRY constraints are upheld (no duplicated constants/IDs).
Tests cover new behavior and pass.
Learning Log updated when applicable.

Continuous Improvement Rules (Living AGENTS.md)

This file must evolve with real project outcomes.

When to Update

Update this file when any of the following occurs:

A repeated failure pattern is discovered.
A new practice significantly improves delivery speed or quality.
A rule here is found ambiguous, outdated, or counterproductive.
A new subsystem or stage-level constraint is introduced.

How to Update

Keep updates small and specific.
Prefer adding concrete rules over broad statements.
Record the reason in the Learning Log table.
If a rule changes behavior, include effective date.
Remove or rewrite stale rules that no longer reflect real workflows.
If file length grows too much, split operational detail into linked docs while keeping this file as the concise control plane.

Learning Log (What Worked / What Didn't)

Add entries in reverse chronological order.

Date	Area	What Worked	What Didn't	Action / Rule Update
2026-02-16	Onboarding clarity	A single root README with commands, stage map, and extension workflow reduces startup friction across new chats/agents.	Spreading onboarding details only across spec and plan docs slows ramp-up.	Keep `README.md` as the practical entrypoint and update it when run/test commands or module anchors change.
2026-02-16	Full-pipeline delivery	Implementing all stage contracts with deterministic adapters enabled complete P2-P8 validation in one passable runtime.	Waiting for real model integrations before contract-level tests would have blocked milestone progress.	Keep a deterministic reference path that must pass before model-backed integrations.
2026-02-16	Retrieval robustness	Combining full-query scoring with decomposed-query recall and clip diversity improved downstream grounding/graph quality.	Single-query ranking could miss critical windows, causing empty evidence graphs.	Enforce decomposed-query recall path and clip-diversity selection in Stage 2.
2026-02-16	P1 foundation	Building strict schema + merge loader first made later stage code safer and easier to test.	Starting orchestration runtime wiring before contracts increases rework risk.	Keep milestone order: config/schema (`M1.1`/`M1.2`) before full orchestration runtime wiring (`M1.3`).
2026-02-16	Tracking hygiene	Separating stable spec and live execution tracker reduces spec churn and review noise.	Keeping milestones in the SSOT spec mixed stable contracts with rapidly changing status data.	Set `design/execution_plan.md` as live tracking source and made updates mandatory in workflow/checklist.
2026-02-16	Stage naming	Explicit `stage_id -> stage_name` mapping improves readability and validation.	ID-only stage references are harder to scan and easier to misuse.	Added mandatory `stage_catalog` and stage-name validation rule.
2026-02-16	Spec governance	Ground-truth spec in Markdown with explicit config governance section reduced ambiguity.	Treating YAML as the spec artifact caused expectation mismatch.	Clarified: Markdown is SSOT; YAML is runtime config only.

Decision Log Template (Use in PRs and major commits)

Use this lightweight template in PR description or commit message for meaningful decisions:

Context
Decision
Alternatives considered
Trade-offs
Follow-up actions

Escalation Guidelines

Agents should pause and ask for direction when:

Spec and user instruction conflict materially.
A change requires destructive operations.
Required dependencies/tools cannot run in the environment.
A stage contract must be broken to proceed.

Done Criteria

A task is done only when:

Implementation is complete.
Relevant tests pass.
Documentation is updated.
Impacted milestone status in design/execution_plan.md is updated.
Config/schema remains strict and DRY.
Evidence/uncertainty behavior is preserved for claim-producing paths.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AGENTS.md

Purpose

Scope and Priority

Codex Instruction Discovery

Local Overrides (Do Not Commit)

Project North Star

Non-Negotiable Engineering Rules

Architecture Guardrails

Required Deliverables Per Change

Instruction Quality Rules (Best Practices)

Repository Conventions

Coding Standards

Config and Schema Standards

Testing Standards

Agent Execution Workflow

Execution Plan Operating Rules

Standard Commands (Update As Repo Evolves)

What Agents Must Avoid

Pull Request / Change Checklist

Continuous Improvement Rules (Living AGENTS.md)

When to Update

How to Update

Learning Log (What Worked / What Didn't)

Decision Log Template (Use in PRs and major commits)

Escalation Guidelines

Done Criteria

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS.md

Purpose

Scope and Priority

Codex Instruction Discovery

Local Overrides (Do Not Commit)

Project North Star

Non-Negotiable Engineering Rules

Architecture Guardrails

Required Deliverables Per Change

Instruction Quality Rules (Best Practices)

Repository Conventions

Coding Standards

Config and Schema Standards

Testing Standards

Agent Execution Workflow

Execution Plan Operating Rules

Standard Commands (Update As Repo Evolves)

What Agents Must Avoid

Pull Request / Change Checklist

Continuous Improvement Rules (Living AGENTS.md)

When to Update

How to Update

Learning Log (What Worked / What Didn't)

Decision Log Template (Use in PRs and major commits)

Escalation Guidelines

Done Criteria