Skip to content

Commit 03f945a

Browse files
committed
fix: tighten /fix and refine spec/prd/create-skill/setup-rules skills
- /fix: rationalizations table, bisect/perf/stack-trace investigation hooks, fast-signal lock-in, single-site defense-in-depth, pre-report checklist, optional architectural post-mortem flag. - /spec-plan: code-first rule before user questions, conditional File Structure section for 4+ task plans, no-placeholders self-check before reviewer launch. - /spec-implement: anti-pattern callout against horizontal slicing. - /prd: HARD-GATE against pre-approval implementation, code-first rule. - /create-skill: behavioural-vs-cosmetic edit taxonomy, rationalization-resistant design pattern (scoped to discipline skills). - /setup-rules: explainer-first prompt guideline. - approval flows: collapse two No options into one No, I have feedback (annotations auto-save, no ready handshake).
1 parent b3fd8ef commit 03f945a

15 files changed

Lines changed: 159 additions & 13 deletions

File tree

pilot/skills/create-skill/orchestrator.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,14 @@ model: opus
88
# /create-skill — Skill Creator
99

1010
**Create a reusable skill.** Provide a topic or workflow description, and this command explores the codebase, gathers relevant patterns, and builds a well-structured skill interactively with you. If no topic is given, it evaluates the current session for extractable knowledge.
11+
12+
## Editing existing skills
13+
14+
For NEW skills, Step 6 runs the with-skill vs baseline subagent comparison — no skill ships without it.
15+
16+
For EDITS, classify the change first:
17+
18+
- **Behavioural** — adds/removes a step, changes a rule, reorders critical sections, edits the description, changes Iron Laws / red flags / rationalization tables, modifies trigger keywords. **Re-run Step 6 (or write 2–3 prompts if none exist).** These changes shift trigger accuracy and step compliance.
19+
- **Cosmetic** — typo, prose polish, link fix, formatting, example clarification with no semantic shift. Skip the test re-run.
20+
21+
When uncertain, treat as behavioural. Skill changes that go unverified are how skill quality drifts.

pilot/skills/create-skill/steps/07-anti-patterns.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,3 +20,13 @@
2020
| **Overtriggering** | Skill loads for irrelevant queries, users disabling it | Add negative triggers ("Do NOT use for..."), be more specific |
2121
| **Instructions not followed** | Claude loads skill but ignores steps | Instructions too verbose (condense), buried (move critical to top), or ambiguous (use exact commands) |
2222
| **Large context issues** | Skill seems slow or responses degraded | Move detailed docs to `references/`, keep SKILL.md under 5,000 words |
23+
24+
### Rationalization-Resistant Design (for discipline skills only)
25+
26+
For skills that enforce discipline (TDD, verification, root-cause-before-fix), use baseline testing (Step 6.2) to capture verbatim agent rationalizations and counter them with three patterns:
27+
28+
- **Excuse → Reality table** — each rationalization the agent produced in baseline, paired with the technical counter.
29+
- **Iron Laws + Red Flags list** — short numbered laws and a symptom-checklist that points back to "STOP, return to step N".
30+
- **Closed loopholes** — don't just state the rule, forbid specific workarounds (e.g. "Don't keep code as 'reference'. Delete means delete.").
31+
32+
Skip these for technique/reference skills — they don't have rationalization pressure to defend against.

pilot/skills/fix/orchestrator.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,17 @@ Bugfix workflow with TDD. Investigate the bug, write the failing test, fix at th
3030
5. STOP WHEN OVER YOUR HEAD — multi-component / architectural bugs need /spec.
3131
```
3232

33+
### Common rationalizations
34+
35+
| Excuse | Reality |
36+
|--------|---------|
37+
| "Emergency, no time for process" | Systematic is faster than guess-and-check thrashing. |
38+
| "Issue is simple, don't need to trace" | Simple bugs have root causes too — tracing one file is fast. |
39+
| "I'll write the test after confirming the fix" | Tests written after pass immediately and prove nothing. RED first or it isn't TDD. |
40+
| "One more fix attempt" (after 2 failures) | Two failed quick-lane attempts means the lane is wrong. Bail to /spec. |
41+
| "Looks fixed, tests pass" | E2E evidence required. Unit-pass is not user-pass. |
42+
| "Quick patch now, investigate later" | This IS the quick lane. If you can't trace it now, bail to /spec — don't patch blind. |
43+
3344
---
3445

3546
## Critical Constraints

pilot/skills/fix/steps/01-investigate.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ git log --oneline -10 -- <suspected_file_or_dir> 2>/dev/null
1616

1717
Look for the commit that introduced the bug. If recent, read that diff. If nothing obvious, skip.
1818

19+
**Bisect when `git log` doesn't reveal it.** If the bug appeared between two known-good and known-bad states and the suspect commit isn't obvious, run `git bisect start <bad> <good>` then `git bisect run <test-cmd>` against the reproducing test you'll write in Step 2 — this pinpoints the introducing commit automatically. Skip when the surface area is small enough that a single read finds it.
20+
1921
### 1.3 Trace to root cause
2022

2123
**Start with `codegraph_context(task="<bug description>")`** — single call, returns entry points and related symbols. Read it carefully.
@@ -37,6 +39,17 @@ For bugs that don't surface clearly through stack traces or static reading — U
3739

3840
**Mark every temporary log with `SPEC-DEBUG:`** (e.g. `console.log("SPEC-DEBUG: filters=", filters)`). Step 3.5 greps for this marker — any unremoved match fails the diff sanity check.
3941

42+
**Unknown caller?** If the bug is in shared code reached from many sites and you don't know which caller triggers it, capture the call chain inline:
43+
44+
```js
45+
// SPEC-DEBUG: who is calling this with bad input?
46+
console.error("SPEC-DEBUG:", { args, stack: new Error().stack });
47+
```
48+
49+
Run, read the stack, identify the offending caller, then trace upward to the original trigger.
50+
51+
**Performance regressions are different.** Value-logging is the wrong tool — replace it with baseline measurement: timing harness, `performance.now()`, profiler, or query plan / `EXPLAIN`. Establish current vs expected timing first, then bisect against that signal (Step 1.2). Measure first, fix second.
52+
4053
Skip 1.4 when the stack trace already names the failing line, or when a static read of the file is enough to see the bug. Skip is the default.
4154

4255
### 1.5 State the root cause
@@ -47,6 +60,18 @@ Out loud, in one sentence to the user, before writing any test:
4760
4861
If confidence is Low: bail out. Don't guess in the quick lane.
4962

63+
### 1.6 Lock in a fast signal before Step 2
64+
65+
Your reproducing signal — what runs in <30s and definitively shows fail/pass — must be **fast and deterministic** before you write the fix. For most bugs that's the unit test you're about to write in Step 2. For UI / integration / async bugs the unit-test seam may be wrong, and your real signal is a `curl`, CLI invocation, or headless-browser command (the same one you'll run in Step 4 E2E).
66+
67+
Whichever it is, sharpen it now:
68+
69+
- **Slow loop (>30s)?** Narrow the test scope, skip unrelated setup, cache fixtures. A flaky 30s loop is the slowest path to a fix.
70+
- **Flaky?** Pin time, seed RNG, isolate filesystem, freeze network. For non-deterministic bugs, raise the reproduction rate (loop the trigger 50–100×, parallelise) until it's debuggable — a 1% flake is not.
71+
- **Wrong symptom?** The signal must fail with the **user's** reported symptom, not a different failure that happens to be nearby. Wrong bug = wrong fix.
72+
73+
If you cannot get a fast deterministic signal after one pass of sharpening, that's a bail-out trigger — tell the user to use `/spec`. Don't hypothesise into a flaky loop.
74+
5075
### Red flags — STOP and re-investigate
5176

5277
- "Quick fix for now, investigate later" → STOP, this IS the quick lane.

pilot/skills/fix/steps/03-fix.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ If the buggy data flows from upstream, fix upstream. Red flags in the diff:
1414
- Early return that hides wrong state from the caller.
1515
- Renamed/suppressed log lines that previously surfaced the bug.
1616

17-
A defensive layer is occasionally legitimate (defense-in-depth). When it is, document it explicitly in the Step 6 summary. Otherwise revert it.
17+
**Quick-lane fixes are single-site.** If you find yourself adding validation at 2+ layers (entry + business + storage, etc.) to make the bug "structurally impossible," you're in `/spec` territory — bail out. Defense-in-depth is a feature of `/spec`, not `/fix`. The one exception: a single boundary check that prevents the same root cause from re-entering through one other obvious path — document it in the Step 6 summary.
1818

1919
### 3.3 Run the reproducing test — it MUST pass
2020

pilot/skills/fix/steps/06-finalise.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,20 @@ Handle:
4444

4545
Best-effort — don't block on failure.
4646

47-
### 6.4 Report
47+
### 6.4 Pre-report verification checklist
48+
49+
⛔ Walk every box before writing the report. Missing any one = not done — return to the relevant step.
50+
51+
- [ ] Reproducing test passes (Step 3.3 fresh run, this message).
52+
- [ ] Full anti-regression suite green (Step 5.2 fresh run).
53+
- [ ] E2E executed against the actual program with concrete evidence captured (Step 4).
54+
- [ ] `git diff | grep -E "SPEC-DEBUG|^\\+.*\\b(console\\.log|print\\()"` returns nothing (no leftover instrumentation).
55+
- [ ] Diff is small and every changed line traces to the bug (lineage rule).
56+
- [ ] Worktree mode: single bundled `fix:` commit. Non-worktree: changes ready, no commit yet.
57+
58+
If any box is unchecked, do not write the report and do not ask for approval — fix the gap first.
59+
60+
### 6.5 Report
4861

4962
```
5063
Bugfix complete — <bug>.
@@ -57,4 +70,14 @@ Run /clear before starting new work — this resets context while keeping projec
5770

5871
The `E2E:` line is **mandatory** — it documents that the actual program was exercised, not just the unit tests.
5972

73+
### 6.6 Post-mortem flag (optional, one line)
74+
75+
Ask once, now that you have more information than when you started: **what would have prevented this bug?** If the answer is architectural — no clean test seam, hidden coupling between modules, validation absent at the boundary the bad data crossed, repeated near-miss in the same area — name it as a `/spec` follow-up candidate in one line:
76+
77+
```
78+
Follow-up (architectural): <one-line description> — candidate for /spec.
79+
```
80+
81+
Skip when the answer is "nothing structural, it was a one-line typo / off-by-one / wrong default." Don't manufacture follow-ups.
82+
6083
ARGUMENTS: $ARGUMENTS

pilot/skills/prd/orchestrator.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,12 @@ model: opus
88

99
# /prd - Generate Product Requirements Documents
1010

11+
<HARD-GATE>
12+
Do NOT invoke `/spec`, `/spec-plan`, `/spec-implement`, write any code, scaffold any project, or take any implementation action until you have written a PRD and the user has approved it. This applies to EVERY idea regardless of perceived simplicity.
13+
14+
`/prd`'s output is a written PRD at the path determined in Step 6 (write-prd). The terminal state is offering hand-off to `/spec` and waiting for the user. The skill does not invoke implementation skills directly.
15+
</HARD-GATE>
16+
1117
**Strategic thought partner and brainstorming surface** — turns vague ideas into concrete Product Requirements Documents (PRDs) through one-on-one conversation, with optional research. Produces a PRD that can be handed off directly to `/spec` for implementation.
1218

1319
**Use `/prd` when:**

pilot/skills/prd/steps/03-clarify.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@
88

99
**⛔ Skip obvious questions.** Do not ask anything already answered by the one-line idea, the codebase exploration in Step 1, or earlier answers in this conversation. The goal is to surface what the user hasn't thought about yet, not to collect a standard intake form.
1010

11+
**⛔ Code-first rule.** Before each question, ask "can the codebase answer this?" If yes — read the code. Use `codegraph_context`, `codegraph_search`, `codegraph_explore`, or Probe to resolve "how does X currently work / where does Y live / what's the existing pattern for Z". Only ask the user about things code can't tell you: purpose, priorities, audience, constraints, scope boundaries, behavioural expectations not yet encoded. Asking the user about facts the code already encodes wastes their time and signals you didn't explore.
12+
1113
Coverage areas (ask only where genuinely unclear):
1214
- **Purpose** — what's the core outcome the user wants?
1315
- **Users** — who benefits from this? What's their workflow today?

pilot/skills/setup-rules/steps/01-reference.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
### Guidelines
66

77
- **Always use AskUserQuestion** when asking the user anything
8+
- **Explainer-first prompts** — when prompting the user to choose between options, lead with one short paragraph that names what the term means, why these skills need it, and what changes if they pick differently. Then show the choices and the recommended default. Assume the user does not know the term — never present `paths` frontmatter, MCP scoping, or rule-vs-skill distinctions as a multiple-choice without context. Walk them through one section at a time, not all sections at once.
89
- **Read before writing** — check existing rules before creating
910
- **Write concise rules** — every word costs tokens in context
1011
- **Idempotent** — running multiple times produces consistent results

pilot/skills/spec-bugfix-plan/steps/07-approval.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,11 @@
1111
~/.pilot/bin/pilot notify plan_approval "Bugfix Plan Ready" "<plan-slug> — annotate in Console or approve here" --plan-path "<plan_path>" 2>/dev/null || true
1212
```
1313
1. Summarize: symptom → root cause → fix approach → task structure
14-
2. AskUserQuestion: "Yes, proceed" | "No, let me edit" | "No, I'll annotate in the Console"
14+
2. AskUserQuestion:
15+
- "Yes, proceed" — Approve as-is and start spec-implement
16+
- "No, I have feedback" — I've annotated in the Console or edited the plan file; process my feedback
17+
18+
The user can pause at this prompt, annotate in the Console's Specifications tab (auto-saves), or edit the plan file directly, then pick option 2. No "ready" handshake required.
1519
3. **Yes:** Set `Approved: Yes`, invoke `Skill(skill='spec-implement', args='<plan-path>')`
16-
**No (edit or annotate):** Tell user to edit the plan or annotate in the Console Specifications tab — annotations auto-save. Say "ready" when done. Re-run Step 6 (check for annotation feedback), re-read plan, ask again. **Other:** Incorporate, re-ask.
20+
**No, I have feedback:** Re-run Step 6 (process Console annotations), re-read the plan file (in case the user edited it), then return to Step 7 and ask again.
21+
**Other free-text feedback:** Incorporate the changes into the plan, then re-ask with a fresh AskUserQuestion.

0 commit comments

Comments
 (0)