maxritter · maxritter · Apr 29, 2026 · Apr 29, 2026 · Apr 29, 2026 · coderabbitai
diff --git a/.devcontainer/Dockerfile b/.devcontainer/Dockerfile
@@ -22,6 +22,9 @@ RUN rm -rf /var/lib/apt/lists/* && \
         vim \
         netcat-openbsd \
         socat \
+        bubblewrap \
+        iptables \
+        ipset \
         chromium && \
     apt-get clean && \
     rm -rf /var/lib/apt/lists/*
diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json
@@ -8,6 +8,9 @@
     "${localWorkspaceFolderBasename}"
   ],
   "workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}",
+  "mounts": [
+    "source=claude-code-config-${devcontainerId},target=/root/.claude,type=volume"
+  ],
   "customizations": {
     "vscode": {
       "extensions": [

diff --git a/README.md b/README.md
@@ -44,6 +44,7 @@ curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.
 **Pilot Shell is different.** Every component solves a real problem:
 
 - **`/spec`** — plans, implements, and verifies features end-to-end with TDD
+- **`/fix`** — bugfix workflow with RED-before-GREEN discipline; bails out when complexity exceeds the standard fix lane
 - **`/prd`** — brainstorm ideas into clear requirements through with optional deep research
 - **Quality hooks** — enforce linting, formatting, type checking, and tests as quality gates
 - **Context engineering** — preserves decisions and knowledge across sessions
@@ -107,6 +108,8 @@ curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/uninstal
 
 Pilot Shell works inside Dev Containers. Copy the [`.devcontainer`](https://github.com/maxritter/pilot-shell/tree/main/.devcontainer) folder from this repository into your project, adapt it to your needs (base image, extensions, dependencies), and run the installer inside the container. The installer auto-detects the container environment and skips system-level dependencies like Homebrew.
 
+For tighter isolation when working with untrusted code, combine the dev container with Claude Code's [`/sandbox`](https://code.claude.com/docs/en/sandboxing) — `bubblewrap`, `socat`, `iptables`, and `ipset` are pre-installed in the Dockerfile so it works out of the box on Linux. See Anthropic's [development containers](https://code.claude.com/docs/en/devcontainer) and [sandboxing](https://code.claude.com/docs/en/sandboxing) docs for hardening patterns (egress allowlist, managed settings, persistent volumes).
+
 </details>
 
 <details>
@@ -130,16 +133,16 @@ Pilot Shell works inside Dev Containers. Copy the [`.devcontainer`](https://gith
 
 Just chat — no plan, no approval gate. [Quick mode](https://pilot-shell.com/docs/workflows/quick-mode) is the default: quality hooks and TDD enforcement still apply, best for small tasks and exploration. For anything that needs a plan, use `/spec` — not Claude Code's built-in plan mode.
 
-### /spec — Spec-Driven Development
+### /spec — Spec-Driven Development (features)
 
-**[`/spec`](https://pilot-shell.com/docs/workflows/spec) replaces Claude Code's built-in plan mode** (Shift+Tab). It provides a complete planning workflow with TDD, verification, and code review — use `/spec` instead of plan mode for all planned work.
+**[`/spec`](https://pilot-shell.com/docs/workflows/spec) replaces Claude Code's built-in plan mode** (Shift+Tab) for new features, refactoring, and architectural work. It provides a complete planning workflow with TDD, verification, and code review.
 
-Features, bug fixes, refactoring — describe it and `/spec` handles the rest. Auto-detects whether it's a feature or a bugfix and adapts the workflow. Specs are saved to `docs/plans/` and visible in the Console's **Specification** tab.
+For bugs, use [`/fix`](https://pilot-shell.com/docs/workflows/fix) (see below). Specs are saved to `docs/plans/` and visible in the Console's **Specification** tab.
 
 ```bash
 pilot
-> /spec "Add user authentication with OAuth and JWT tokens"   # → feature mode
-> /spec "Fix the crash when deleting nodes with two children"  # → bugfix mode (auto-detected)
+> /spec "Add user authentication with OAuth and JWT tokens"
+> /spec "Migrate the REST API to GraphQL"
 ```
 
 ```
@@ -163,18 +166,46 @@ Full exploration workflow for new functionality, refactoring, or architectural c
 
 </details>
 
+### /fix — Bugfix Workflow
+
+**[`/fix`](https://pilot-shell.com/docs/workflows/fix) is the bugfix command.** Investigate the bug, write the failing test, fix at the root cause, single-pass audit, done. No plan file, no approval mid-flow, no separate verify phase.
+
+```bash
+pilot
+> /fix "annotation persistence drops fields between save and reload"
+> /fix "off-by-one in pagination at boundary"
+> /fix "wrong default for max_retries"
+```
+
+```
+Investigate  →  RED  →  Fix  →  Audit  →  Quality Gate  →  Done
+```
+
+If investigation reveals the bug is multi-component or architectural, `/fix` stops cleanly and tells you to re-invoke with `/spec`. `/fix` is always quick; `/spec` is the full workflow.
+
 <details>
-<summary><b>Bugfix Mode</b></summary>
+<summary><b>How <code>/fix</code> works</b></summary>
 
-Investigation-first workflow for targeted fixes. Finds the root cause before touching any code.
+For local bugs. Single file, obvious-once-traced root cause. No plan file, no approval mid-flow, no separate verify phase. RED-before-GREEN discipline still enforced — bugfixes without a failing test don't ship.
 
-**Investigate:** Reproduces the bug → traces backward through the call chain to find the **root cause** at a specific `file:line` → compares against working code patterns → states the fix with confidence level. If 3+ hypotheses fail, escalates as an architectural problem.
+- **Investigate:** Reproduce the bug → trace to root cause at `file:line` with `codegraph_context` + targeted reads → state confidence (High/Medium required to proceed). For UI / async / race bugs that don't surface from a static read, add temporary `SPEC-DEBUG:`-marked logs at component boundaries before tracing.
+- **RED:** Write the failing test via an existing public entry point → run, must fail with the documented symptom
+- **Fix:** Minimal change at the root cause. Symptom patches (`try/except` hiding the bug, swallowed returns) are forbidden. Targeted test module re-runs between fix iterations — full suite runs once at the Quality Gate, not per-fix-task.
+- **Audit:** Single-pass scope sanity + symptom-patching grep + **mandatory end-to-end verification** — re-runs the user's actual repro against the running program (Claude Code Chrome → Chrome DevTools MCP → playwright-cli → agent-browser for UI; CLI/API/REPL for non-UI). A passing unit test alone is never accepted as proof; concrete evidence (command + observation) is required in the completion report.
+- **Quality Gate:** Lint + types + build + full anti-regression suite, once
+- **Bail-out:** If investigation reveals the bug is multi-component, architectural, needs defense-in-depth at multiple layers, or two fix attempts have failed, `/fix` stops cleanly and tells you to re-invoke with `/spec`. It does not silently switch lanes.
 
-**Test-Before-Fix:** Writes a regression test that FAILS on current code → implements the minimal fix at the root cause → verifies all tests pass. Defense-in-depth validation at multiple layers when the bug involves data flowing through shared code paths.
+</details>
 
-**Verify:** Lightweight verification — regression test confirmation → full test suite → lint + type check → quality checks. No review sub-agents — the regression test proves the fix works, the full suite proves nothing else broke.
+<details>
+<summary><b>How <code>/spec</code> handles bugs</b></summary>
+
+When you type `/spec "<bug description>"`, the full bugfix workflow runs — for bugs that warrant a written plan, approval, code review, and the full verify ceremony.
 
-**Why this matters:** Root cause investigation prevents "fix one thing, break another." The regression test locks in the fix. No formal notation overhead — just trace, test, fix, verify.
+- **Behavior Contract:** every plan pins down `Given / When / Currently / Expected / Anti-regression` — the invariant the fix must produce and the behavior it must not break
+- **Three uniform tasks** (always, regardless of bug size): Write Reproducing Test (RED) → Implement Fix at Root Cause → Quality Gate
+- **Verify audit:** always-on `cp`+`trap` revert-test (proves the reproducing test would genuinely fail without the fix — rules out retroactive rubber-stamp tests) + root-cause-at-source audit (flags symptom patches and caller-side workarounds) + original-symptom re-check — no sub-agents, tests carry the proof
+- **Iteration cap at 3:** after three failed verify cycles, the workflow stops and asks if the bug is architectural rather than letting you loop forever
 
 </details>
 
@@ -719,6 +750,8 @@ On **Team**, every developer runs `pilot customize install <source>` once and st
 
 Yes. Copy the `.devcontainer` folder from this repository into your project, adapt it to your needs (base image, extensions, dependencies), and install Pilot Shell inside the container. Everything works the same — hooks, rules, MCP servers, persistent memory, and the Console dashboard all run inside the container. This is a great option for teams that want a consistent, reproducible development environment.
 
+For tighter isolation when working with untrusted code, layer Claude Code's [`/sandbox`](https://code.claude.com/docs/en/sandboxing) on top — the Dockerfile pre-installs `bubblewrap`, `socat`, `iptables`, and `ipset` so it works out of the box. See Anthropic's [development containers](https://code.claude.com/docs/en/devcontainer) and [sandboxing](https://code.claude.com/docs/en/sandboxing) docs for the hardening patterns.
+
 </details>
 
 <details>

diff --git a/console/package.json b/console/package.json
@@ -35,8 +35,12 @@
   "devDependencies": {
     "@git-diff-view/file": "^0.1.1",
     "@git-diff-view/react": "^0.1.1",
+    "@happy-dom/global-registrator": "^20.9.0",
     "@iconify/react": "^6.0.2",
     "@tailwindcss/vite": "^4.1.18",
+    "@testing-library/dom": "^10.4.1",
+    "@testing-library/react": "^16.3.2",
+    "@testing-library/user-event": "^14.6.1",
     "@types/bun": "^1.3.8",
     "@types/cookie-parser": "^1.4.10",
     "@types/cors": "^2.8.19",

diff --git a/console/src/ui/viewer/hooks/useSettings.ts b/console/src/ui/viewer/hooks/useSettings.ts
diff --git a/console/src/ui/viewer/views/Settings/index.tsx b/console/src/ui/viewer/views/Settings/index.tsx
diff --git a/console/src/ui/viewer/views/Spec/annotation/PlanAnnotator.tsx b/console/src/ui/viewer/views/Spec/annotation/PlanAnnotator.tsx
diff --git a/console/src/ui/viewer/views/Spec/annotation/useAnnotation.ts b/console/src/ui/viewer/views/Spec/annotation/useAnnotation.ts
diff --git a/console/src/ui/viewer/views/Spec/index.tsx b/console/src/ui/viewer/views/Spec/index.tsx
diff --git a/console/src/ui/viewer/views/Spec/parsePlanContent.ts b/console/src/ui/viewer/views/Spec/parsePlanContent.ts
diff --git a/console/tests/annotation/plan-annotator-persistence.test.tsx b/console/tests/annotation/plan-annotator-persistence.test.tsx
diff --git a/console/tests/ui/spec-section-rendering.test.ts b/console/tests/ui/spec-section-rendering.test.ts
diff --git a/console/tsconfig.json b/console/tsconfig.json
@@ -25,7 +25,8 @@
   "include": [
     "src/**/*.ts",
     "src/**/*.tsx",
-    "tests/**/*.ts"
+    "tests/**/*.ts",
+    "tests/**/*.tsx"
   ],
   "exclude": [
     "node_modules",

diff --git a/docs/docusaurus/docs/getting-started/installation.md b/docs/docusaurus/docs/getting-started/installation.md
@@ -49,6 +49,13 @@ When enabled, Codex provides an independent adversarial review during `/spec` pl
 
 Pilot Shell works inside Dev Containers. Copy the `.devcontainer` folder from the [Pilot Shell repository](https://github.com/maxritter/pilot-shell/tree/main/.devcontainer) into your project, adapt it to your needs (base image, extensions, dependencies), and run the installer inside the container. The installer auto-detects the container environment and skips system-level dependencies like Homebrew.
 
+For tighter isolation when working with untrusted code, layer Claude Code's [`/sandbox`](https://code.claude.com/docs/en/sandboxing) on top — `bubblewrap`, `socat`, `iptables`, and `ipset` are pre-installed in the Dockerfile so it works out of the box on Linux.
+
+### Further reading
+
+- [Claude Code · Development containers](https://code.claude.com/docs/en/devcontainer) — Anthropic's reference container, persistent volumes, organization policy, network egress, the `--dangerously-skip-permissions` flag.
+- [Claude Code · Sandboxing](https://code.claude.com/docs/en/sandboxing) — Seatbelt (macOS) and bubblewrap (Linux/WSL2), `/sandbox` modes, `allowedDomains`, filesystem allow/deny rules, security limitations.
+
 ## Install Specific Version
 
 ```bash

diff --git a/docs/docusaurus/docs/workflows/fix.md b/docs/docusaurus/docs/workflows/fix.md
@@ -0,0 +1,119 @@
+---
+sidebar_position: 5
+title: /fix
+description: Bugfix workflow — investigate, RED test, fix, audit, done.
+---
+
+# /fix
+
+Bugfix workflow with RED-before-GREEN discipline. Investigates the bug, writes a failing test, fixes at the root cause, audits, finishes. No plan file, no approval mid-flow, no separate verify phase.
+
+Use `/fix` for bugs. Use [`/spec`](/docs/workflows/spec) for features and architectural changes — including bugfixes that warrant a full plan with approval and code review.
+
+```bash
+$ pilot
+> /fix "annotation persistence drops fields between save and reload"
+> /fix "off-by-one in pagination at boundary"
+> /fix "wrong default for max_retries"
+```
+
+`/fix` is **always quick**. If investigation reveals the bug is multi-component, architectural, or otherwise larger than a quick fix, `/fix` stops cleanly and tells you to re-invoke with `/spec`. It does not silently switch lanes — `/fix` means quick, `/spec` means the full workflow.
+
+## Workflow
+
+```
+Investigate  →  RED  →  Fix  →  Audit  →  Quality Gate  →  Done
+```
+
+### Investigate
+
+Trace the bug to `file:lineN — function() does X but should do Y` with **High** or **Medium** confidence.
+
+- Reproduce the bug. Restate symptom, trigger, expected behaviour.
+- Skim recent changes (`git log --oneline -10`).
+- Start with `codegraph_context(task=…)` for orientation. For local bugs, one or two targeted reads is enough — no full call-graph traversal.
+- For UI / async / race / timing bugs that don't surface from a static read: add temporary `SPEC-DEBUG:`-marked logs at component boundaries, trigger the bug, read the output, then proceed. Step 4 audit greps the marker — leftover diagnostics fail the audit.
+- State the root cause out loud before writing any test. If confidence stays Low: bail out.
+
+### RED — Write the Reproducing Test
+
+Encode `Currently → Expected` via an existing public entry point. Run it; it must **fail** with an error matching the symptom.
+
+A test that passes against buggy code doesn't encode the bug — re-investigate. A test that errors for unrelated reasons (import error, missing fixture) is not a valid signal.
+
+### Fix at the Root Cause
+
+Make the **minimal** change at the root cause. One change, one variable, one logical fix. No "while I'm here" cleanups. No bundled refactoring.
+
+Forbidden: broad new `try/except`, `if value is None: return default` at the caller when the bug is upstream, swallowed exceptions, silently normalised bad inputs.
+
+Re-run the reproducing test → must **pass**. Then run the test module(s) covering the fix file (fast, scoped). The full anti-regression suite runs once at the Quality Gate, not after every fix iteration.
+
+### Audit
+
+Single pass — replaces the eight-substep audit of the full lane:
+
+- **Scope sanity** — root-cause file IS in the diff, no unplanned files appear, diff is small.
+- **Symptom-patching grep** — `git diff | grep` for new `try/except`, swallowed returns, leftover `print`/`console.log` and `SPEC-DEBUG:` markers. Justify each match or revert.
+- **End-to-end verification — MANDATORY** — re-run the user's actual repro and capture concrete evidence. **A passing unit test does NOT prove the bug is fixed.** Skip is not an option, no exceptions.
+  - **UI bugs:** browser automation against the running app. 4-tier resolution: **Claude Code Chrome** → **Chrome DevTools MCP** → **playwright-cli** → **agent-browser**. Walk the user's repro steps, read the page, confirm correct behaviour.
+  - **CLI:** run the exact command the user ran, capture output + exit code.
+  - **API:** `curl` / HTTP client, capture status + the field that proves the fix.
+  - **Library / SDK:** `python -c '…'`, `node -e '…'`, or scratch script with the user's args, capture the returned value.
+  - **Background job:** trigger manually, read logs.
+
+Bare assertions ("looks fixed", "behaves correctly") are insufficient — the finalise step requires evidence in the report. If the symptom persists, the unit test is at the wrong layer: move the assertion up to the user's actual entry point and re-run RED → fix → audit.
+
+### Quality Gate
+
+Lint + types + build (when applicable), then the full test suite. If a far-from-the-fix test breaks, the bug has unintended cross-coupling — bail out.
+
+### Finalise
+
+- Worktree mode: bundle test + fix into one commit (`fix: <one-line>`).
+- Approval gate fires only if **Plan Approval** is enabled in Console Settings.
+- The completion report includes a mandatory **E2E** line documenting what was actually run and observed — not just "tests pass". Without it, the workflow is incomplete.
+- Console notification + report.
+
+## When to bail out — use `/spec` instead
+
+`/fix` stops and tells you to re-invoke with `/spec` when:
+
+- Bug spans 3+ files or 2+ components.
+- Root cause is architectural, not a single line.
+- Fix needs defense-in-depth at multiple layers.
+- Confidence stays Low — root cause can't be pinned to file:line.
+- Two quick-lane fix attempts have already failed.
+- Fix has non-trivial UI implications that warrant a recorded Verification Scenario.
+
+The full lane (`/spec`) adds: Behavior Contract, three-task structure, plan file with approval gate, Console annotation cycle, `cp`+`trap` revert-test proof in verify, iteration cap at 3.
+
+## Common issues
+
+| Symptom | What it means | What to do |
+| ------- | ------------- | ---------- |
+| Can't reproduce | Description is too vague or environment-dependent | Ask for exact steps, env, stack trace. Do not write a speculative fix. |
+| Test passes without the fix | Test doesn't encode the bug | Tighten the assertion or pick a more specific input. |
+| Fix breaks far-away tests | Cross-coupling beyond the quick lane | Bail out. Re-invoke with `/spec`. |
+| Reproducing test green but user still hits the bug | Test sits below the user's layer | Move the assertion to the user's actual entry point (API, browser, CLI). |
+| Three failed fix attempts | Architectural problem, not a fix problem | Bail out. The pattern needs reconsidering, not another patch. |
+
+## Configurable Toggles
+
+`/fix` honours the same Console Settings as `/spec`:
+
+| Toggle | Default | Effect when disabled |
+| ------ | ------- | -------------------- |
+| **Ask Questions** | On | Investigation skips clarifying questions and uses defaults. |
+| **Plan Approval** | On | The end-of-flow approval gate is skipped — fix is finalised immediately. |
+
+When both are off, `/fix` runs end-to-end with no user interaction. Worktree isolation is not honoured — use `/spec` if you want a worktree.
+
+## When to use `/spec` vs `/fix`
+
+| Use `/fix` | Use `/spec` |
+| ---------- | ----------- |
+| Something is broken | Building new functionality |
+| Bug fits in 1–2 files | Architecture decisions matter |
+| Root cause is locatable to a line/function | Multiple sub-systems involved |
+| Fix is small and contained | Work warrants a written plan + approval |