This file defines how coding agents must operate in this repository to build the Agentic Video RAG system safely, consistently, and with high signal.
- This file applies to the entire repository.
- If a deeper
AGENTS.mdis added in a subdirectory, it overrides this file for that subtree. - Product truth source:
design/spec_groundtruth.md. - Execution tracking source:
design/execution_plan.md(live milestone and status file). - If code and spec disagree, update code to match the spec or propose a spec change explicitly.
Codex should be expected to discover instructions from multiple layers:
~/.codex/AGENTS.md(user/global defaults).- Repository root
AGENTS.md(this file). - Subdirectory
AGENTS.mdfiles (more specific scope).
Priority rule: the most specific in-scope file wins if there is a conflict.
For personal/local machine customization, use AGENTS.override.md.
AGENTS.override.mdmay override this file for local workflows.AGENTS.override.mdmust stay uncommitted (add to.gitignoreif needed).- Project-wide policy changes must go into
AGENTS.md, not override files.
Ship an evidence-grounded 7-stage Video RAG pipeline that:
- Produces camera/time-linked claims.
- Preserves entity consistency across cameras.
- Surfaces uncertainty instead of hallucinating certainty.
- Is reproducible via strict config and versioned artifacts.
- Config format must be YAML.
- Config composition/merging must use OmegaConf.
- All merged config must be validated with strict Pydantic (
extra="forbid"). - Follow DRY strictly: define model/store IDs and thresholds once; reference everywhere else.
- Never hardcode thresholds/model IDs in runtime logic when they belong in config.
- Every claim-producing path must preserve evidence references.
- Any fallback path must emit explicit uncertainty/failure flags.
- Preserve the 7-stage contract from
design/spec_groundtruth.md. - Keep stage boundaries explicit: ingestion, retrieval, grounding, ReID, temporal localization, graph memory, synthesis.
- Use unresolved states for ambiguous identity links; do not force merges.
- Retrieval confidence is not proof; temporal grounding + evidence linking are required before synthesis.
For any non-trivial change, include:
- Code changes.
- Config/schema updates (if behavior changes).
- Tests for the changed behavior.
- Execution progress update in
design/execution_plan.mdwhen milestone status changes. - Short note in the Learning Log (see "Continuous Improvement Rules").
When editing this file, keep instructions high signal:
- Be explicit, concrete, and testable.
- Prefer repo-specific rules over generic advice.
- State priorities and exceptions clearly.
- Avoid contradictory requirements across sections.
- Keep this file concise; move long procedures to dedicated docs and link them.
- Use
src/for implementation code. - Use
tests/for tests. - Use
config/for YAML configs and override profiles. - Use
scripts/for runnable helpers. - Keep experimental notebooks and one-off scripts out of core runtime paths.
- Python 3.11+ only.
- Type hints required on public functions.
- Prefer small, pure functions for stage logic.
- Avoid hidden global state.
- Use clear names matching spec terms (
stage_1,activity_ingestion,ObjectClusterID, etc.). - Raise explicit errors for broken contracts; fail early.
- Centralize constants (thresholds, top-k, retry limits).
- Add cross-reference validators (stage IDs, model IDs, datastore/resource IDs).
- Maintain a single
stage_catalogmap fromstage_idto crispstage_name. - Validate stage completeness (Stage 1..7 exactly once in stage specs).
- Validate each stage's
stage_nameagainst the canonicalstage_catalogmapping. - Constrain confidence-like parameters to
[0, 1]. - Add migration notes when renaming config keys.
Minimum required test coverage for each new feature:
- Happy-path unit test.
- At least one failure/edge-case test.
- Config validation test for relevant schema changes.
- Regression test when fixing a bug.
Pipeline-specific checks should include:
- Stage I/O contract validation.
- Evidence linkage completeness for synthesizeable claims.
- Ambiguity handling (unresolved identities remain unresolved).
- Read relevant spec section(s) first.
- Read and align with current milestone state in
design/execution_plan.md. - State assumptions briefly before major edits.
- Implement smallest coherent change that passes tests.
- Run tests/lint for touched components.
- Update docs/config/tests together.
- Update
design/execution_plan.mdwhen status, risk, or dates changed. - Add a Learning Log entry for substantial changes.
Treat design/execution_plan.md as a live control file.
- Keep milestone IDs stable; do not rename without explicit migration note.
- Allowed statuses:
not_started,in_progress,blocked,done. - On status transition to
in_progress, setStart Dateif empty. - On status transition to
done, setCompleted Date. - If
blocked, include clear unblock condition inNotes / Risks. - Do not mark a milestone
doneunless its acceptance gate is satisfied. - If a change affects scope/timeline, update
Target Dateand note reason.
Agents should prefer these canonical commands when available:
- Setup:
uv sync(or project-approved equivalent). - Tests:
uv run pytest. - Lint:
uv run ruff check .. - Format:
uv run ruff format ..
If these commands change, update this section in the same PR.
- Bypassing schema validation for speed.
- Introducing duplicate config values across files.
- Coupling stage internals tightly across boundaries.
- Silent fallback behavior without logging/flags.
- Large refactors without incremental verification.
Before finalizing, agents must verify:
- Behavior matches
design/spec_groundtruth.md. design/execution_plan.mdstatus/notes are updated for impacted milestone(s).- YAML + OmegaConf + Pydantic flow is preserved.
- DRY constraints are upheld (no duplicated constants/IDs).
- Tests cover new behavior and pass.
- Learning Log updated when applicable.
This file must evolve with real project outcomes.
Update this file when any of the following occurs:
- A repeated failure pattern is discovered.
- A new practice significantly improves delivery speed or quality.
- A rule here is found ambiguous, outdated, or counterproductive.
- A new subsystem or stage-level constraint is introduced.
- Keep updates small and specific.
- Prefer adding concrete rules over broad statements.
- Record the reason in the Learning Log table.
- If a rule changes behavior, include effective date.
- Remove or rewrite stale rules that no longer reflect real workflows.
- If file length grows too much, split operational detail into linked docs while keeping this file as the concise control plane.
Add entries in reverse chronological order.
| Date | Area | What Worked | What Didn't | Action / Rule Update |
|---|---|---|---|---|
| 2026-02-16 | Onboarding clarity | A single root README with commands, stage map, and extension workflow reduces startup friction across new chats/agents. | Spreading onboarding details only across spec and plan docs slows ramp-up. | Keep README.md as the practical entrypoint and update it when run/test commands or module anchors change. |
| 2026-02-16 | Full-pipeline delivery | Implementing all stage contracts with deterministic adapters enabled complete P2-P8 validation in one passable runtime. | Waiting for real model integrations before contract-level tests would have blocked milestone progress. | Keep a deterministic reference path that must pass before model-backed integrations. |
| 2026-02-16 | Retrieval robustness | Combining full-query scoring with decomposed-query recall and clip diversity improved downstream grounding/graph quality. | Single-query ranking could miss critical windows, causing empty evidence graphs. | Enforce decomposed-query recall path and clip-diversity selection in Stage 2. |
| 2026-02-16 | P1 foundation | Building strict schema + merge loader first made later stage code safer and easier to test. | Starting orchestration runtime wiring before contracts increases rework risk. | Keep milestone order: config/schema (M1.1/M1.2) before full orchestration runtime wiring (M1.3). |
| 2026-02-16 | Tracking hygiene | Separating stable spec and live execution tracker reduces spec churn and review noise. | Keeping milestones in the SSOT spec mixed stable contracts with rapidly changing status data. | Set design/execution_plan.md as live tracking source and made updates mandatory in workflow/checklist. |
| 2026-02-16 | Stage naming | Explicit stage_id -> stage_name mapping improves readability and validation. |
ID-only stage references are harder to scan and easier to misuse. | Added mandatory stage_catalog and stage-name validation rule. |
| 2026-02-16 | Spec governance | Ground-truth spec in Markdown with explicit config governance section reduced ambiguity. | Treating YAML as the spec artifact caused expectation mismatch. | Clarified: Markdown is SSOT; YAML is runtime config only. |
Use this lightweight template in PR description or commit message for meaningful decisions:
- Context
- Decision
- Alternatives considered
- Trade-offs
- Follow-up actions
Agents should pause and ask for direction when:
- Spec and user instruction conflict materially.
- A change requires destructive operations.
- Required dependencies/tools cannot run in the environment.
- A stage contract must be broken to proceed.
A task is done only when:
- Implementation is complete.
- Relevant tests pass.
- Documentation is updated.
- Impacted milestone status in
design/execution_plan.mdis updated. - Config/schema remains strict and DRY.
- Evidence/uncertainty behavior is preserved for claim-producing paths.