marcusquinn · johnwaldo · Mar 30, 2026 · Mar 30, 2026
diff --git a/.agents/aidevops/graduated-learnings.md b/.agents/aidevops/graduated-learnings.md
@@ -13,126 +13,49 @@ tools:
 
 # Graduated Learnings
 
-Validated learnings promoted from local memory databases into shared documentation.
-These patterns have been confirmed through repeated use across sessions.
+Validated patterns promoted from local memory. Qualify at high confidence or 3+ accesses.
+Managed by `memory-graduate-helper.sh graduate` / `memory-helper.sh graduate [candidates|graduate|status]`.
 
-**How memories graduate**: Memories qualify when they reach high confidence or are
-accessed frequently (3+ times). The `memory-graduate-helper.sh` script identifies
-candidates and appends them here. Each graduation batch is timestamped.
-
-**Categories**:
-
-- **Solutions & Fixes**: Working solutions to real problems
-- **Anti-Patterns**: Approaches that failed (avoid repeating)
-- **Patterns & Best Practices**: Proven approaches
-- **Architecture Decisions**: Key design choices and rationale
-- **Configuration & Preferences**: Tool and workflow settings
-- **Context & Background**: Important background information
-
-**Usage**: `memory-helper.sh graduate [candidates|graduate|status]`
-
-## Graduated: 2026-02-08
-
-### Anti-Patterns (What NOT to Do)
-
-- **[FAILED_APPROACH]** Tried using PostgreSQL for memory but it adds deployment complexity - SQLite FTS5 is simpler
-  *(confidence: high, validated: 9x)*
-
-- **[FAILURE_PATTERN]** [task:refactor] Haiku missed edge cases when refactoring complex shell scripts with many conditionals [model:haiku]
-  *(confidence: high, validated: 3x)*
-
-### Architecture Decisions
-
-- **[ARCHITECTURAL_DECISION]** YAML handoffs are more token-efficient than markdown (~400 vs ~2000 tokens)
-  *(confidence: high, validated: 0x)*
-
-- **[DECISION]** Mailbox uses SQLite (`mailbox.db`) not TOON files. Prune shows storage report by default, `--force` to delete. Migration from TOON runs automatically on `aidevops update` via `setup.sh`.
-  *(confidence: medium, validated: 8x)*
-
-- **[DECISION]** Agent lifecycle uses three tiers: `draft/` (R&D, orchestration-created), `custom/` (private, permanent), shared (`.agents/` via PR). Both `draft/` and `custom/` survive `setup.sh` deployments. Orchestration agents (Build+, Ralph loop, runners) know they can create drafts for reusable parallel processing context and propose them for inclusion in aidevops.
-  *(confidence: medium, validated: 3x)*
-
-### Configuration & Preferences
-
-- **[USER_PREFERENCE]** Prefer conventional commits with scope: feat(memory): description
-  *(confidence: medium, validated: 4x)*
-
-### Patterns & Best Practices
-
-- **[SUCCESS_PATTERN]** [task:feature] Breaking task into 4 phases with separate commits worked well for Claude-Flow feature adoption [model:sonnet]
-  *(confidence: high, validated: 3x)*
-
-- **[SUCCESS_PATTERN]** [task:bugfix] Opus identified root cause of race condition by reasoning through concurrent execution paths [model:opus]
-  *(confidence: high, validated: 2x)*
-
-- **[CODEBASE_PATTERN]** Memory daemon should auto-extract learnings from thinking blocks when sessions end
-  *(confidence: medium, validated: 5x)*
-
-## Graduated: 2026-02-11
-
-### Anti-Patterns (What NOT to Do)
-
-- **[FAILURE_PATTERN]** Session anti-pattern: mentioning issues in summary text without logging them as TODOs or fixing them. This creates an illusion of thoroughness while actually losing the improvements. The fix is mechanical: every time you type a sentence describing a bug or limitation, STOP and either (1) fix it now, or (2) add a TODO entry. Then continue writing the summary. Do not batch issue logging to the end.
-  *(confidence: high, validated: 1x)*
-
-### Architecture Decisions
-
-- **[DECISION]** When discovering bugs or issues during a task, log them as TODOs IMMEDIATELY — do not defer until the end of the session or until the user asks. This session discovered 8 issues but only logged 2 until prompted. The development lifecycle rule is clear: issues discovered during work must be fixed on the fly or logged as TODOs. Deferring loses context and risks forgetting entirely.
-  *(confidence: high, validated: 1x)*
-
-- **[DECISION]** For content generation tasks (images, video, UGC, ads), ALWAYS read domain subagents BEFORE generating. `content/production-image.md` has Nanobanana Pro JSON prompt templates that produce dramatically better results than freehand prompts. `tools/video/video-prompt-design.md` has the 7-component video prompt format. `content/story.md` has hook frameworks. Using structured templates from subagents vs freehand: the difference was visible in output quality during the Trinity Windows UGC test.
-  *(confidence: high, validated: 1x)*
-
-- **[DECISION]** UGC content generation needs a complete assembled sequence, not just individual assets. When storyboarding multi-shot content: (1) generate video for ALL shots not just the hero, (2) assemble into a single sequence with transitions using `ffmpeg`, (3) output the final assembled video as the primary deliverable. Individual clips are intermediates, not the final product.
-  *(confidence: high, validated: 1x)*
-
-- **[DECISION]** CRITICAL self-improvement: The supervisor needs a post-evaluation orphaned PR scanner. Pattern observed across 47+ tasks: workers create PRs but the supervisor records `task_only` or `no_pr` because (1) worker emits `TASK_COMPLETE` instead of `FULL_LOOP_COMPLETE`, (2) PR creation happens after the signal, or (3) `evaluate_worker` fails to parse the PR URL from logs. Fix: add a Phase 3c to the pulse cycle that runs `gh pr list --state open --head feature/tXXX` for all tasks in complete/deployed/failed states with `task_only`/`no_pr`/NULL `pr_url`, and links any found PRs. This would have caught t199.2 (PR #849), t199.3 (PR #846), t199.5 (PR #872) automatically instead of requiring manual intervention.
-  *(confidence: high, validated: 0x)*
-
-### Configuration & Preferences
-
-- **[USER_PREFERENCE]** User-facing generated assets (images, videos, documents) should be output to `~/Downloads/` so the user can immediately review them in Finder. Do NOT bury outputs in `~/.aidevops/.agent-workspace/` for interactive sessions — that path is invisible to the user. Reserve `.agent-workspace` for headless/pipeline runs only.
-  *(confidence: high, validated: 0x)*
-
-- **[USER_PREFERENCE]** Runtime identity hazard: misidentifying as Claude Code when running in OpenCode wastes cycles investigating wrong config paths, wrong CLI commands, wrong prompt loading. The AGENTS.md rule says `use the app name from the version check output — do not guess`. Enforce this strictly — wrong identity leads to wrong assumptions about how `build.txt` loads, where configs live, and which CLI to use for dispatch.
-  *(confidence: high, validated: 0x)*
-
-### Patterns & Best Practices
-
-- **[CODEBASE_PATTERN]** OpenCode system prompt override: the agent `prompt` field in `opencode.json` replaces `anthropic_default` (not appends). Code path: `input.agent.prompt ? [input.agent.prompt] : SystemPrompt.provider(input.model)`. The `{file:path}` syntax is resolved by template matching. All active agents must have `build.txt` set or they fall back to upstream `anthropic.txt`, losing all aidevops overrides. Verified: all 12 active agents have it; 4 disabled agents (build, plan, Plan+, AI-DevOps) don't need it.
-  *(confidence: high, validated: 1x)*
-
-- **[CODEBASE_PATTERN]** Task ID collision: t264 was assigned by two sessions simultaneously. Another session used t264 for memory monitoring (PR #1040), while this session used t264 for version-manager unbound variable fix. The pre-dispatch check caught it (`t264 is marked [x] in TODO.md`). Prevention: always `git pull` and re-read TODO.md before assigning IDs. The collision prevention rule in AGENTS.md exists but needs enforcement during monitoring sessions that create tasks.
-  *(confidence: high, validated: 1x)*
-
-- **[CODEBASE_PATTERN]** Stale TODO.md pattern: tasks completed in previous sessions (t231 via PR #955, t247 via subtask PRs, t259 via PR #1020) remain open in TODO.md because the supervisor's `update_todo_on_complete()` only runs during the post-PR lifecycle. When a monitoring session manually merges PRs or when tasks are completed across session boundaries, TODO.md falls out of sync. Fix: run `supervisor-helper.sh reconcile-todo` periodically, and before dispatching tasks, check if the work is already done (workers now do this and report `task_obsolete`).
-  *(confidence: high, validated: 0x)*
+---
 
-- **[SUCCESS_PATTERN]** [task:feature] Supervisor task t136.5 completed successfully | PR: https://github.com/marcusquinn/aidevops/pull/792 | Task: Scaffold aidevops-pro and aidevops-anon repos - create initial plugin repos with proper structure [model:opus] [duration:1206s]
-  *(confidence: medium, validated: 51x)*
+## Anti-Patterns
 
-### Solutions & Fixes
+- **PostgreSQL for memory** adds deployment complexity — SQLite FTS5 is simpler. *(9x)*
+- **Haiku on complex shell refactors** misses edge cases with many conditionals. Use sonnet+. *(3x)*
+- **Mentioning issues in summary text** without logging them as TODOs loses improvements. Fix: every sentence describing a bug → either fix it now or add a TODO entry. Do not batch to end of session. *(1x)*
 
-- **[ERROR_FIX]** Deploying auto-recovery infinite loop: when a task is stuck in `deploying` and its PR is already merged, `cmd_pr_lifecycle` Step 4b runs `cleanup_after_merge` (worktree already gone) and `update_todo_on_complete` (already marked [x]). The `retry_count` variable was LOCAL and reset every pulse cycle, allowing infinite recovery attempts across pulses. Additionally, if `cmd_transition` to `deployed` fails AND the fallback `cmd_transition` to `failed` also fails, the task stays in `deploying` forever. Fixed by t263 (PR #1036): persistent `deploying_recovery_attempts` DB column, max 10 attempts across all pulses, fallback direct SQL UPDATE.
-  *(confidence: high, validated: 0x)*
+## Architecture Decisions
 
-- **[ERROR_FIX]** Pulse silent failure pattern: with `set -euo pipefail`, Phase 3 (`process_post_pr_lifecycle`) can fail silently because it's called with `2>/dev/null || true`. If the function crashes internally (e.g., infinite loop in deploying auto-recovery), the pulse exits with code 1 but produces no output after the header line. The `|| true` prevents the error from propagating, but the exit code still leaks through. Symptom: pulse prints `=== Supervisor Pulse <timestamp> ===` and nothing else. Diagnosis: check `post-pr.log` for repeated entries, check exit code of manual pulse run.
-  *(confidence: high, validated: 0x)*
+- **YAML handoffs** are more token-efficient than markdown (~400 vs ~2000 tokens).
+- **Mailbox** uses SQLite (`mailbox.db`) not TOON files. Prune shows storage report by default; `--force` to delete. Migration from TOON runs automatically on `aidevops update`. *(8x)*
+- **Agent lifecycle tiers**: `draft/` (R&D, orchestration-created), `custom/` (private, permanent), shared (`.agents/` via PR). Both `draft/` and `custom/` survive `setup.sh`. *(3x)*
+- **Bugs found during a task** must be logged as TODOs IMMEDIATELY — not deferred to end of session. Deferring loses context. *(1x)*
+- **Content generation tasks** (images, video, UGC, ads): ALWAYS read domain subagents BEFORE generating. `content/production-image.md` has JSON prompt templates; `tools/video/video-prompt-design.md` has the 7-component format; `content/story.md` has hook frameworks. *(1x)*
+- **UGC multi-shot content**: generate ALL shots (not just hero), assemble with `ffmpeg`, output assembled video as primary deliverable. Individual clips are intermediates. *(1x)*
+- **Orphaned PR scanner needed**: workers create PRs but supervisor records `task_only`/`no_pr` when (1) worker emits `TASK_COMPLETE` instead of `FULL_LOOP_COMPLETE`, (2) PR created after signal, or (3) `evaluate_worker` fails to parse PR URL. Fix: Phase 3c in pulse — `gh pr list --state open --head feature/tXXX` for tasks in complete/deployed/failed with no `pr_url`. Caught t199.2 (PR #849), t199.3 (PR #846), t199.5 (PR #872) manually.
+- **OpenCode system prompt override**: `prompt` field in `opencode.json` replaces `anthropic_default` (not appends). All active agents must have `build.txt` set or fall back to upstream `anthropic.txt`. *(1x)*
 
-- **[WORKING_SOLUTION]** Bash associative arrays (`declare -A`) + `set -u` = unbound variable on empty arrays and subscript access. Use newline-delimited string + grep instead for portable `set -u`-safe lookups. Fixed in `issue-sync-helper.sh` PR #1086.
-  *(confidence: high, validated: 0x)*
+## Configuration & Preferences
 
-- **[WORKING_SOLUTION]** Worker PRs dispatched in parallel for tasks with dependency chains (blocked-by) will create merge conflicts. t008.1-4 and t012.3-5 all conflicted because workers ran simultaneously on overlapping files. Solution: dispatch sequentially respecting blocked-by dependencies, or use a single worker for the entire plan.
-  *(confidence: high, validated: 0x)*
+- **Conventional commits with scope**: `feat(memory): description` *(4x)*
+- **User-facing generated assets** → `~/Downloads/` for interactive sessions. Do NOT use `~/.aidevops/.agent-workspace/` — invisible to user in Finder. Reserve `.agent-workspace` for headless/pipeline runs.
+- **Runtime identity**: always use the app name from version-check output. Misidentifying (e.g., Claude Code vs OpenCode) leads to wrong config paths, CLI commands, and prompt loading assumptions.
 
-- **[WORKING_SOLUTION]** Decomposition workers marking parent #plan tasks [x] is a known bug (t278). Parents t008 and t012 were falsely completed while subtasks were still [ ]. Always verify subtask completion before marking parent done.
-  *(confidence: high, validated: 0x)*
+## Patterns & Best Practices
 
-- **[WORKING_SOLUTION]** issue-sync `find_closing_pr()` bug pattern: when TODO.md uses a different format (`pr:#NNN`) than what the code searches for (`PR #NNN`), close comments silently omit the PR reference. Always check that regex patterns match the actual data format in TODO.md. Fixed in t291/PR#1129.
-  *(confidence: high, validated: 0x)*
+- **Phase-based task breakdown** (4 phases, separate commits) worked well for complex feature adoption. *(3x)*
+- **Opus for race condition root cause**: reasoning through concurrent execution paths identified the issue. *(2x)*
+- **Memory daemon** should auto-extract learnings from thinking blocks when sessions end. *(5x)*
+- **Task ID collision** (t264 assigned twice): always `git pull` and re-read TODO.md before assigning IDs. The pre-dispatch check catches it. *(1x)*
+- **Stale TODO.md**: `update_todo_on_complete()` only runs during post-PR lifecycle. Cross-session merges leave TODO.md out of sync. Fix: `supervisor-helper.sh reconcile-todo` periodically; workers check `task_obsolete` before starting.
 
-- **[WORKING_SOLUTION]** CRITICAL FIX: Cron supervisor pulse requires three things to work on macOS: (1) `/usr/sbin` in PATH for `sysctl`, (2) `GH_TOKEN` cached to file since macOS keyring is inaccessible from cron - `supervisor-helper.sh` now auto-caches token from interactive sessions to `~/.aidevops/.agent-workspace/supervisor/.gh-token-cache`, (3) `get_aidevops_identity` must validate `gh api` output is not JSON error. Fixed in PR #780.
-  *(confidence: medium, validated: 52x)*
+## Solutions & Fixes
 
-- **[WORKING_SOLUTION]** SYSTEMIC: After merging PRs that modify `supervisor-helper.sh` or other scripts in `.agents/scripts/`, the deployed copy at `~/.aidevops/agents/scripts/` is NOT automatically updated. The cron pulse runs the deployed copy. Must run `rsync -a --exclude=loop-state/ --exclude=custom/ --exclude=draft/ ~/Git/aidevops/.agents/ ~/.aidevops/agents/` or `aidevops update` after merging script changes. `setup.sh` `deploy_aidevops_agents()` handles this but may not run to completion in all modes. Consider adding auto-deploy to the supervisor's post-merge hook.
-  *(confidence: medium, validated: 37x)*
+- **Deploying auto-recovery infinite loop** (t263/PR #1036): `retry_count` was LOCAL and reset every pulse cycle. Fixed with persistent `deploying_recovery_attempts` DB column, max 10 attempts, fallback direct SQL UPDATE.
+- **Pulse silent failure** with `set -euo pipefail`: Phase 3 called with `2>/dev/null || true` — crashes silently, pulse exits after printing only the header. Diagnosis: check `post-pr.log` for repeated entries.
+- **Bash `declare -A` + `set -u`** = unbound variable on empty arrays. Use newline-delimited string + grep for portable `set -u`-safe lookups. Fixed in `issue-sync-helper.sh` PR #1086.
+- **Parallel worker PRs on dependency chains** create merge conflicts. Dispatch sequentially respecting `blocked-by`, or use a single worker for the plan. (t008.1-4, t012.3-5)
+- **Decomposition workers marking parent `#plan` tasks `[x]`** is a known bug (t278). Always verify subtask completion before marking parent done.
+- **`issue-sync find_closing_pr()` format mismatch**: `pr:#NNN` in TODO.md vs `PR #NNN` in regex → close comments silently omit PR reference. Fixed in t291/PR#1129.
+- **Cron supervisor pulse on macOS** (PR #780): requires (1) `/usr/sbin` in PATH for `sysctl`, (2) `GH_TOKEN` cached to file (`~/.aidevops/.agent-workspace/supervisor/.gh-token-cache`) since macOS keyring is inaccessible from cron, (3) `get_aidevops_identity` must validate `gh api` output is not a JSON error. *(52x)*
+- **Script deploy lag** (37x): after merging PRs that modify `.agents/scripts/`, the deployed copy at `~/.aidevops/agents/scripts/` is NOT auto-updated. Run `aidevops update` or `rsync -a --exclude=loop-state/ --exclude=custom/ --exclude=draft/ ~/Git/aidevops/.agents/ ~/.aidevops/agents/` after merging script changes.