Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ RUN rm -rf /var/lib/apt/lists/* && \
vim \
netcat-openbsd \
socat \
bubblewrap \
iptables \
ipset \
chromium && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
3 changes: 3 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
"${localWorkspaceFolderBasename}"
],
"workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}",
"mounts": [
"source=claude-code-config-${devcontainerId},target=/root/.claude,type=volume"
],
"customizations": {
"vscode": {
"extensions": [
Expand Down
55 changes: 44 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.
**Pilot Shell is different.** Every component solves a real problem:

- **`/spec`** — plans, implements, and verifies features end-to-end with TDD
- **`/fix`** — bugfix workflow with RED-before-GREEN discipline; bails out when complexity exceeds the standard fix lane
- **`/prd`** — brainstorm ideas into clear requirements through with optional deep research
- **Quality hooks** — enforce linting, formatting, type checking, and tests as quality gates
- **Context engineering** — preserves decisions and knowledge across sessions
Expand Down Expand Up @@ -107,6 +108,8 @@ curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/uninstal

Pilot Shell works inside Dev Containers. Copy the [`.devcontainer`](https://github.com/maxritter/pilot-shell/tree/main/.devcontainer) folder from this repository into your project, adapt it to your needs (base image, extensions, dependencies), and run the installer inside the container. The installer auto-detects the container environment and skips system-level dependencies like Homebrew.

For tighter isolation when working with untrusted code, combine the dev container with Claude Code's [`/sandbox`](https://code.claude.com/docs/en/sandboxing) — `bubblewrap`, `socat`, `iptables`, and `ipset` are pre-installed in the Dockerfile so it works out of the box on Linux. See Anthropic's [development containers](https://code.claude.com/docs/en/devcontainer) and [sandboxing](https://code.claude.com/docs/en/sandboxing) docs for hardening patterns (egress allowlist, managed settings, persistent volumes).

</details>

<details>
Expand All @@ -130,16 +133,16 @@ Pilot Shell works inside Dev Containers. Copy the [`.devcontainer`](https://gith

Just chat — no plan, no approval gate. [Quick mode](https://pilot-shell.com/docs/workflows/quick-mode) is the default: quality hooks and TDD enforcement still apply, best for small tasks and exploration. For anything that needs a plan, use `/spec` — not Claude Code's built-in plan mode.

### /spec — Spec-Driven Development
### /spec — Spec-Driven Development (features)

**[`/spec`](https://pilot-shell.com/docs/workflows/spec) replaces Claude Code's built-in plan mode** (Shift+Tab). It provides a complete planning workflow with TDD, verification, and code review — use `/spec` instead of plan mode for all planned work.
**[`/spec`](https://pilot-shell.com/docs/workflows/spec) replaces Claude Code's built-in plan mode** (Shift+Tab) for new features, refactoring, and architectural work. It provides a complete planning workflow with TDD, verification, and code review.

Features, bug fixes, refactoring — describe it and `/spec` handles the rest. Auto-detects whether it's a feature or a bugfix and adapts the workflow. Specs are saved to `docs/plans/` and visible in the Console's **Specification** tab.
For bugs, use [`/fix`](https://pilot-shell.com/docs/workflows/fix) (see below). Specs are saved to `docs/plans/` and visible in the Console's **Specification** tab.

```bash
pilot
> /spec "Add user authentication with OAuth and JWT tokens" # → feature mode
> /spec "Fix the crash when deleting nodes with two children" # → bugfix mode (auto-detected)
> /spec "Add user authentication with OAuth and JWT tokens"
> /spec "Migrate the REST API to GraphQL"
```

```
Expand All @@ -163,18 +166,46 @@ Full exploration workflow for new functionality, refactoring, or architectural c

</details>

### /fix — Bugfix Workflow

**[`/fix`](https://pilot-shell.com/docs/workflows/fix) is the bugfix command.** Investigate the bug, write the failing test, fix at the root cause, single-pass audit, done. No plan file, no approval mid-flow, no separate verify phase.

```bash
pilot
> /fix "annotation persistence drops fields between save and reload"
> /fix "off-by-one in pagination at boundary"
> /fix "wrong default for max_retries"
```

```
Investigate → RED → Fix → Audit → Quality Gate → Done
```
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated

If investigation reveals the bug is multi-component or architectural, `/fix` stops cleanly and tells you to re-invoke with `/spec`. `/fix` is always quick; `/spec` is the full workflow.

<details>
<summary><b>Bugfix Mode</b></summary>
<summary><b>How <code>/fix</code> works</b></summary>

Investigation-first workflow for targeted fixes. Finds the root cause before touching any code.
For local bugs. Single file, obvious-once-traced root cause. No plan file, no approval mid-flow, no separate verify phase. RED-before-GREEN discipline still enforced — bugfixes without a failing test don't ship.

**Investigate:** Reproduces the bug → traces backward through the call chain to find the **root cause** at a specific `file:line` → compares against working code patterns → states the fix with confidence level. If 3+ hypotheses fail, escalates as an architectural problem.
- **Investigate:** Reproduce the bug → trace to root cause at `file:line` with `codegraph_context` + targeted reads → state confidence (High/Medium required to proceed). For UI / async / race bugs that don't surface from a static read, add temporary `SPEC-DEBUG:`-marked logs at component boundaries before tracing.
- **RED:** Write the failing test via an existing public entry point → run, must fail with the documented symptom
- **Fix:** Minimal change at the root cause. Symptom patches (`try/except` hiding the bug, swallowed returns) are forbidden. Targeted test module re-runs between fix iterations — full suite runs once at the Quality Gate, not per-fix-task.
- **Audit:** Single-pass scope sanity + symptom-patching grep + **mandatory end-to-end verification** — re-runs the user's actual repro against the running program (Claude Code Chrome → Chrome DevTools MCP → playwright-cli → agent-browser for UI; CLI/API/REPL for non-UI). A passing unit test alone is never accepted as proof; concrete evidence (command + observation) is required in the completion report.
- **Quality Gate:** Lint + types + build + full anti-regression suite, once
- **Bail-out:** If investigation reveals the bug is multi-component, architectural, needs defense-in-depth at multiple layers, or two fix attempts have failed, `/fix` stops cleanly and tells you to re-invoke with `/spec`. It does not silently switch lanes.

**Test-Before-Fix:** Writes a regression test that FAILS on current code → implements the minimal fix at the root cause → verifies all tests pass. Defense-in-depth validation at multiple layers when the bug involves data flowing through shared code paths.
</details>

**Verify:** Lightweight verification — regression test confirmation → full test suite → lint + type check → quality checks. No review sub-agents — the regression test proves the fix works, the full suite proves nothing else broke.
<details>
<summary><b>How <code>/spec</code> handles bugs</b></summary>

When you type `/spec "<bug description>"`, the full bugfix workflow runs — for bugs that warrant a written plan, approval, code review, and the full verify ceremony.

**Why this matters:** Root cause investigation prevents "fix one thing, break another." The regression test locks in the fix. No formal notation overhead — just trace, test, fix, verify.
- **Behavior Contract:** every plan pins down `Given / When / Currently / Expected / Anti-regression` — the invariant the fix must produce and the behavior it must not break
- **Three uniform tasks** (always, regardless of bug size): Write Reproducing Test (RED) → Implement Fix at Root Cause → Quality Gate
- **Verify audit:** always-on `cp`+`trap` revert-test (proves the reproducing test would genuinely fail without the fix — rules out retroactive rubber-stamp tests) + root-cause-at-source audit (flags symptom patches and caller-side workarounds) + original-symptom re-check — no sub-agents, tests carry the proof
- **Iteration cap at 3:** after three failed verify cycles, the workflow stops and asks if the bug is architectural rather than letting you loop forever

</details>

Expand Down Expand Up @@ -719,6 +750,8 @@ On **Team**, every developer runs `pilot customize install <source>` once and st

Yes. Copy the `.devcontainer` folder from this repository into your project, adapt it to your needs (base image, extensions, dependencies), and install Pilot Shell inside the container. Everything works the same — hooks, rules, MCP servers, persistent memory, and the Console dashboard all run inside the container. This is a great option for teams that want a consistent, reproducible development environment.

For tighter isolation when working with untrusted code, layer Claude Code's [`/sandbox`](https://code.claude.com/docs/en/sandboxing) on top — the Dockerfile pre-installs `bubblewrap`, `socat`, `iptables`, and `ipset` so it works out of the box. See Anthropic's [development containers](https://code.claude.com/docs/en/devcontainer) and [sandboxing](https://code.claude.com/docs/en/sandboxing) docs for the hardening patterns.

</details>

<details>
Expand Down
4 changes: 4 additions & 0 deletions console/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,12 @@
"devDependencies": {
"@git-diff-view/file": "^0.1.1",
"@git-diff-view/react": "^0.1.1",
"@happy-dom/global-registrator": "^20.9.0",
"@iconify/react": "^6.0.2",
"@tailwindcss/vite": "^4.1.18",
"@testing-library/dom": "^10.4.1",
"@testing-library/react": "^16.3.2",
"@testing-library/user-event": "^14.6.1",
"@types/bun": "^1.3.8",
"@types/cookie-parser": "^1.4.10",
"@types/cors": "^2.8.19",
Expand Down
Binary file modified console/src/ui/viewer/hooks/useSettings.ts
Binary file not shown.
Binary file modified console/src/ui/viewer/views/Settings/index.tsx
Binary file not shown.
Binary file modified console/src/ui/viewer/views/Spec/annotation/PlanAnnotator.tsx
Binary file not shown.
Binary file modified console/src/ui/viewer/views/Spec/annotation/useAnnotation.ts
Binary file not shown.
Binary file modified console/src/ui/viewer/views/Spec/index.tsx
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added console/tests/ui/spec-section-rendering.test.ts
Binary file not shown.
3 changes: 2 additions & 1 deletion console/tsconfig.json
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@
"include": [
"src/**/*.ts",
"src/**/*.tsx",
"tests/**/*.ts"
"tests/**/*.ts",
"tests/**/*.tsx"
],
"exclude": [
"node_modules",
Expand Down
7 changes: 7 additions & 0 deletions docs/docusaurus/docs/getting-started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,13 @@ When enabled, Codex provides an independent adversarial review during `/spec` pl

Pilot Shell works inside Dev Containers. Copy the `.devcontainer` folder from the [Pilot Shell repository](https://github.com/maxritter/pilot-shell/tree/main/.devcontainer) into your project, adapt it to your needs (base image, extensions, dependencies), and run the installer inside the container. The installer auto-detects the container environment and skips system-level dependencies like Homebrew.

For tighter isolation when working with untrusted code, layer Claude Code's [`/sandbox`](https://code.claude.com/docs/en/sandboxing) on top — `bubblewrap`, `socat`, `iptables`, and `ipset` are pre-installed in the Dockerfile so it works out of the box on Linux.

### Further reading

- [Claude Code · Development containers](https://code.claude.com/docs/en/devcontainer) — Anthropic's reference container, persistent volumes, organization policy, network egress, the `--dangerously-skip-permissions` flag.
- [Claude Code · Sandboxing](https://code.claude.com/docs/en/sandboxing) — Seatbelt (macOS) and bubblewrap (Linux/WSL2), `/sandbox` modes, `allowedDomains`, filesystem allow/deny rules, security limitations.

## Install Specific Version

```bash
Expand Down
119 changes: 119 additions & 0 deletions docs/docusaurus/docs/workflows/fix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
sidebar_position: 5
title: /fix
description: Bugfix workflow — investigate, RED test, fix, audit, done.
---

# /fix

Bugfix workflow with RED-before-GREEN discipline. Investigates the bug, writes a failing test, fixes at the root cause, audits, finishes. No plan file, no approval mid-flow, no separate verify phase.

Use `/fix` for bugs. Use [`/spec`](/docs/workflows/spec) for features and architectural changes — including bugfixes that warrant a full plan with approval and code review.

```bash
$ pilot
> /fix "annotation persistence drops fields between save and reload"
> /fix "off-by-one in pagination at boundary"
> /fix "wrong default for max_retries"
```

`/fix` is **always quick**. If investigation reveals the bug is multi-component, architectural, or otherwise larger than a quick fix, `/fix` stops cleanly and tells you to re-invoke with `/spec`. It does not silently switch lanes — `/fix` means quick, `/spec` means the full workflow.

## Workflow

```
Investigate → RED → Fix → Audit → Quality Gate → Done
```
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated

### Investigate

Trace the bug to `file:lineN — function() does X but should do Y` with **High** or **Medium** confidence.

- Reproduce the bug. Restate symptom, trigger, expected behaviour.
- Skim recent changes (`git log --oneline -10`).
- Start with `codegraph_context(task=…)` for orientation. For local bugs, one or two targeted reads is enough — no full call-graph traversal.
- For UI / async / race / timing bugs that don't surface from a static read: add temporary `SPEC-DEBUG:`-marked logs at component boundaries, trigger the bug, read the output, then proceed. Step 4 audit greps the marker — leftover diagnostics fail the audit.
- State the root cause out loud before writing any test. If confidence stays Low: bail out.

### RED — Write the Reproducing Test

Encode `Currently → Expected` via an existing public entry point. Run it; it must **fail** with an error matching the symptom.

A test that passes against buggy code doesn't encode the bug — re-investigate. A test that errors for unrelated reasons (import error, missing fixture) is not a valid signal.

### Fix at the Root Cause

Make the **minimal** change at the root cause. One change, one variable, one logical fix. No "while I'm here" cleanups. No bundled refactoring.

Forbidden: broad new `try/except`, `if value is None: return default` at the caller when the bug is upstream, swallowed exceptions, silently normalised bad inputs.

Re-run the reproducing test → must **pass**. Then run the test module(s) covering the fix file (fast, scoped). The full anti-regression suite runs once at the Quality Gate, not after every fix iteration.

### Audit

Single pass — replaces the eight-substep audit of the full lane:

- **Scope sanity** — root-cause file IS in the diff, no unplanned files appear, diff is small.
- **Symptom-patching grep** — `git diff | grep` for new `try/except`, swallowed returns, leftover `print`/`console.log` and `SPEC-DEBUG:` markers. Justify each match or revert.
- **End-to-end verification — MANDATORY** — re-run the user's actual repro and capture concrete evidence. **A passing unit test does NOT prove the bug is fixed.** Skip is not an option, no exceptions.
- **UI bugs:** browser automation against the running app. 4-tier resolution: **Claude Code Chrome** → **Chrome DevTools MCP** → **playwright-cli** → **agent-browser**. Walk the user's repro steps, read the page, confirm correct behaviour.
- **CLI:** run the exact command the user ran, capture output + exit code.
- **API:** `curl` / HTTP client, capture status + the field that proves the fix.
- **Library / SDK:** `python -c '…'`, `node -e '…'`, or scratch script with the user's args, capture the returned value.
- **Background job:** trigger manually, read logs.

Bare assertions ("looks fixed", "behaves correctly") are insufficient — the finalise step requires evidence in the report. If the symptom persists, the unit test is at the wrong layer: move the assertion up to the user's actual entry point and re-run RED → fix → audit.

### Quality Gate

Lint + types + build (when applicable), then the full test suite. If a far-from-the-fix test breaks, the bug has unintended cross-coupling — bail out.

### Finalise

- Worktree mode: bundle test + fix into one commit (`fix: <one-line>`).
- Approval gate fires only if **Plan Approval** is enabled in Console Settings.
- The completion report includes a mandatory **E2E** line documenting what was actually run and observed — not just "tests pass". Without it, the workflow is incomplete.
- Console notification + report.

## When to bail out — use `/spec` instead

`/fix` stops and tells you to re-invoke with `/spec` when:

- Bug spans 3+ files or 2+ components.
- Root cause is architectural, not a single line.
- Fix needs defense-in-depth at multiple layers.
- Confidence stays Low — root cause can't be pinned to file:line.
- Two quick-lane fix attempts have already failed.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unify the failed-attempt threshold (currently contradictory).

The page states both two and three failed attempts as the bail-out threshold. Please pick one value and use it consistently.

Also applies to: 99-99

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/docusaurus/docs/workflows/fix.md` at line 86, The document currently
contradicts itself by mentioning both "Two quick-lane fix attempts have already
failed." and a separate reference to "three" as the bail-out threshold; pick one
threshold (e.g., two or three) and update every occurrence to that single value
— specifically replace the sentence fragment "Two quick-lane fix attempts have
already failed." and any other mentions of "three" or the bail-out threshold
(including the referenced "99-99" occurrence) so the text consistently uses the
chosen numeric threshold throughout.

- Fix has non-trivial UI implications that warrant a recorded Verification Scenario.

The full lane (`/spec`) adds: Behavior Contract, three-task structure, plan file with approval gate, Console annotation cycle, `cp`+`trap` revert-test proof in verify, iteration cap at 3.

## Common issues

| Symptom | What it means | What to do |
| ------- | ------------- | ---------- |
| Can't reproduce | Description is too vague or environment-dependent | Ask for exact steps, env, stack trace. Do not write a speculative fix. |
| Test passes without the fix | Test doesn't encode the bug | Tighten the assertion or pick a more specific input. |
| Fix breaks far-away tests | Cross-coupling beyond the quick lane | Bail out. Re-invoke with `/spec`. |
| Reproducing test green but user still hits the bug | Test sits below the user's layer | Move the assertion to the user's actual entry point (API, browser, CLI). |
| Three failed fix attempts | Architectural problem, not a fix problem | Bail out. The pattern needs reconsidering, not another patch. |

## Configurable Toggles

`/fix` honours the same Console Settings as `/spec`:

| Toggle | Default | Effect when disabled |
| ------ | ------- | -------------------- |
| **Ask Questions** | On | Investigation skips clarifying questions and uses defaults. |
| **Plan Approval** | On | The end-of-flow approval gate is skipped — fix is finalised immediately. |

When both are off, `/fix` runs end-to-end with no user interaction. Worktree isolation is not honoured — use `/spec` if you want a worktree.

## When to use `/spec` vs `/fix`

| Use `/fix` | Use `/spec` |
| ---------- | ----------- |
| Something is broken | Building new functionality |
| Bug fits in 1–2 files | Architecture decisions matter |
| Root cause is locatable to a line/function | Multiple sub-systems involved |
| Fix is small and contained | Work warrants a written plan + approval |
Loading
Loading