QA Runbook

Audience: dora developers (human or agent) running QA locally. Purpose: tell you which command to run, how to read the output, and what to do when something fails. Deep dives: see plan-agentic-qa-strategy.md for the strategy and rationale; this document is the operational reference. Coverage by capability: see testing-capabilities.md (#1633) when you need to know which tests cover a feature before touching it. Validation bar by change class: see agentic-qa-policy.md (#1634) to know how much validation a change needs.

1. TL;DR

# Before every commit (~15 seconds)
make qa-fast

# Before every push (~5-10 minutes)
make qa-full

# Target Tier 1 gate -- stronger than today's CI (~15 minutes)
make qa-deep

# Overnight run on a beefy machine -- Tier 2 equivalent (~30-60 min)
make qa-nightly

# Before tagging a release
make qa-release-gate

# Deliberate test-quality audit (NOT every nightly; takes 10-18 hrs)
make qa-mutation-audit

# Run all smoke-eligible example dataflows end-to-end (~15-20 min, orthogonal to ladder)
# Skips CUDA/ROS2/webcam/C++/interactive examples -- run `scripts/smoke-all.sh -h` for the SKIP list
make qa-examples
# or scope it: make qa-examples ARGS="--rust-only"

If you only remember one command: make qa-fast.

First-time setup on a fresh clone:

make qa-install     # installs cargo-audit, cargo-deny, cargo-llvm-cov, cargo-mutants, cargo-semver-checks
pip install diff-cover   # optional; enables diff coverage gate
rustup component add miri --toolchain nightly   # optional; for unsafe-code analysis

2. What each target runs

Target	Runs	Budget	When to use
`make qa-fast`	fmt + clippy + audit + unwrap-budget + typos	~15 s	Pre-commit
`make qa-full`	`qa-fast` + full test suite + coverage	~5-10 min	Pre-push
`make qa-deep`	`qa-full` + mutation testing on diff + semver	~15 min	Target Tier 1 local gate (stronger than today's CI: adds coverage, adversarial, mutants, semver)
`make qa-tier1`	alias for `qa-deep`	—	Back-compat; prefer `qa-deep`
`make qa-nightly`	`qa-deep` + proptest@1000 + miri (if installed) + example-smoke (in scratch venv with `-e apis/python/node`) + hub-smoke + ci-nightly-jobs	~3-4 hours	Full parity with `.github/workflows/nightly.yml` after the #1716 rebalance: 19 test jobs total. example-smoke covers the 4 example-backed GHA jobs (smoke-suite, log-sinks, service-action, streaming); hub-smoke covers the Hub e2e job (tests/hub-smoke.rs); `scripts/qa/ci-nightly-jobs.sh` drives the 14 remaining with platform-aware dispatch (record-replay, cluster-smoke, topic-and-top, cpu-affinity [Linux], redb-backend, daemon-reconnect [Linux], state-reconstruction, test-cross-platform [macOS+Windows], examples, cli-tests, bench-example, cross-check, ros2-bridge [Linux+ROS2], msrv). Requires both `uv` and Python 3.12 — both preflighted; fails fast with a specific install hint for whichever is missing (`curl -LsSf https://astral.sh/uv/install.sh \| sh` for uv, `uv python install 3.12` for the interpreter). example-smoke installs workspace Python bindings into the scratch venv to match the GHA Python setup (avoids PyPI drift, #1710). Green local run on platform X predicts a green CI nightly for platform X's jobs; jobs that can't run on the dev's OS SKIP cleanly. Does NOT include full-repo mutation testing (see `qa-mutation-audit`).
`make qa-release-gate`	`qa-deep` + semver	~15 min	The automatable subset of Tier 3. Non-automatable: security audit + dogfood + migration validation (see strategy doc §7)
`make qa-mutation-audit`	`cargo-mutants --full` on 6 critical crates	~10-18 hrs	Deliberate test-quality audit, not every nightly
`make qa-examples`	`scripts/smoke-all.sh` -- all smoke-eligible example dataflows end-to-end (skips CUDA/ROS2/webcam/C++/interactive)	~15-20 min	When you want actual dataflows exercised. Orthogonal to ladder -- qa-fast/full/deep all `--exclude dora-examples`. Pass `ARGS="--rust-only"` etc.
`make qa-fmt`	`cargo fmt --all -- --check`	~2 s	Spot-check
`make qa-clippy`	`cargo clippy --all -- -D warnings` (excluding Python)	~1 min	After mechanical edits
`make qa-audit`	`cargo audit` + `cargo deny check`	~10 s	After bumping deps
`make qa-unwrap`	count `.unwrap()` / `.expect(` in production code	~2 s	After adding unwraps
`make qa-test`	`cargo test --all` (excluding Python)	~3-5 min	After code changes
`make qa-coverage`	`cargo llvm-cov` (writes `lcov.info`)	~5 min	To see coverage locally
`make qa-mutants`	`cargo mutants --in-diff origin/main` on critical crates	~5-30 min	To verify tests actually detect bugs
`make qa-semver`	`cargo semver-checks` vs last tag	~1-2 min	Before bumping published crate versions

All targets call scripts under scripts/qa/. The scripts are the source of truth — if something looks wrong, read the script.

3. Reading failures and fixing them

3.1 `fmt` failed

Cause: your edits don't match rustfmt. Fix: cargo fmt --all. Re-run make qa-fast.

3.2 `clippy` failed

Cause: a clippy lint fires on your code. Common ones in dora:

collapsible_if — nested if lets can use && chain syntax (Rust 2024).
let-and-return
Needless .clone()

Fix:

Read the error — clippy points at file:line.
Try cargo clippy --fix --allow-dirty --allow-staged for mechanical fixes.
Re-run. If it still fails, address manually.
Do not add #[allow(...)] without a comment explaining why.

3.3 `audit` failed (cargo-audit)

Cause: one of our transitive dependencies has a new RustSec advisory.

Step 1 — classify the advisory:

Real vulnerability (Severity: Medium/High/Critical): must fix before merging. Follow step 2.
Unmaintained crate warning (most common): can be waived if fixing is expensive. Follow step 3.

Step 2 — fix a real vulnerability:

# Often cargo update alone resolves it:
cargo update -p <crate> --precise <fixed-version>

# If the vulnerable crate is a transitive dep of something else that pins it,
# you may need to upgrade the parent crate. Read the "Dependency tree" in
# the cargo-audit output to find the path.

Verify: make qa-audit returns exit 0.

Step 3 — waive an unmaintained warning:

Open deny.toml, add the advisory ID to advisories.ignore with a comment and review date:

[advisories]
ignore = [
    "RUSTSEC-2025-XXXX",  # foo-crate unmaintained; transitive via bar; review 2026-10
]

Run make qa-audit again — should pass.

3.4 `unwrap-budget` failed

Cause: your changes added a .unwrap() or .expect( in non-test code.

Step 1 — identify the new unwraps:

rg --type rust '\.unwrap\(\)|\.expect\(' libraries/ binaries/ apis/ \
  -g '!**/tests/**' -g '!**/benches/**' -g '!**/examples/**'

Compare against git diff to see which ones are yours.

Step 2 — decide:

Replace with proper error handling: the preferred fix. Use ? on Result, or ok_or() / ok_or_else() on Option.
Replace with .expect("reason") if genuinely infallible: still counts toward the budget but at least documents the invariant.
Bump the budget: acceptable only when (a) the unwrap is genuinely infallible by construction (e.g., writing to a String) AND (b) you justify it in the commit message. Update .unwrap-budget in the same commit.

Note: the budget ratchet is intentionally asymmetric. You can reduce the number freely; any increase needs justification.

3.5 `test` failed

Cause: you broke a test.

Fix: run the failing test in isolation for clearer output:

cargo test -p <crate> <test_name> -- --nocapture

If the test was wrong and the code is right, fix the test. If the code was wrong, fix the code. Don't fix the test to match broken code.

3.6 `coverage` (soft) flagged

Cause: the diff coverage gate (if running on a PR) found less than 70% of your new/changed lines are covered by tests.

Fix: add tests that exercise the new code paths. The gate is soft on main — it only fails on PRs. Locally you can see the uncovered lines via:

make qa-coverage
open target/llvm-cov/html/index.html   # if you also run `cargo llvm-cov --html`

3.7 `mutation` escaped

Cause: cargo-mutants found a mutation that no test detected — meaning your tests are incomplete for the mutated code path.

Step 1 — understand what the mutation is:

The output shows lines like:

libraries/core/src/types.rs:168:18: replace == with != in types_match

Read: "cargo-mutants changed == to != on line 168, re-ran the tests, and they all still passed."

Step 2 — write a test that catches it:

Construct an input where the mutated version produces a different output from the original. For the example above: a test that asserts types_match("a", "b") == false would catch the ==→!= mutation (unmutated returns false, mutated returns true).

Step 3 — if it's a genuine equivalent mutant: document it in .cargo/mutants.toml with a detailed comment explaining why the mutation produces semantically identical behavior. See the types_match ||→&& example already in the file.

Do not waive mutations just to make the gate pass. The point of the gate is to surface weak tests.

3.8 `semver` (soft) flagged

Cause: cargo-semver-checks found a breaking change in a publishable crate's public API since the last tag.

Fix:

If the change is intentional: make sure the crate's version is bumped to a new major or minor (per SemVer rules) before release.
If unintentional: revert the breaking change.

The gate is soft during 0.x development — it only warns. It becomes a hard gate after the 1.0 release per plan-dora-1.0-consolidation.md.

4. Running the adversarial LLM review (local only today)

Prerequisites: either codex (OpenAI Codex CLI) or claude (Claude Code CLI) installed and logged in.

# Auto: detects backend, diffs vs origin/main
./scripts/qa/adversarial.sh

# Explicit backend
./scripts/qa/adversarial.sh --backend claude
./scripts/qa/adversarial.sh --backend codex

# Review a specific diff file
./scripts/qa/adversarial.sh --diff my-patch.diff

# Review against a different base
./scripts/qa/adversarial.sh --base HEAD~5

Output goes to stdout and /tmp/adversarial-review-<short-sha>.md. Read the review, respond to each flagged issue.

The prompt template is at scripts/qa/adversarial-prompt.md. Tune it if the review is too noisy or too lax.

CI integration is pending — requires ANTHROPIC_API_KEY or equivalent set as a repo secret.

5. Running miri on the unsafe hotspots

Prerequisites:

rustup component add miri --toolchain nightly

Run:

# metadata.rs — known-good target with focused unit tests
cargo +nightly miri test -p dora-core metadata::tests

# Add more targets as they gain focused unit tests:
# cargo +nightly miri test -p dora-coordinator-store
# cargo +nightly miri test -p dora-arrow-convert

Do NOT run miri on shared-memory-server — its tests call libc's shm_open which miri does not support. Every test aborts with "unsupported operation". See plan-agentic-qa-strategy.md Section T2.3 for the explanation and the long-term fix (Zenoh SHM migration).

6. Running property tests

Proptest strategies live inside #[cfg(test)] mod tests in the target source files. They run automatically under cargo test.

cargo test -p dora-message ws_protocol::tests::prop_

To increase the case count for a focused hunt:

PROPTEST_CASES=20000 cargo test -p dora-message ws_protocol::tests::prop_

Failed proptest cases are saved in libraries/message/proptest-regressions/ and should be committed to source control — they re-run as fast regression checks on every test run.

7. Running focused mutation tests (when investigating a file)

# Mutation test a single file
# (Timeout 45s mirrors scripts/qa/mutants.sh default. Was 120s until the
# qa-nightly split; real passing tests fit comfortably under 45s and the
# shorter cap prevents broken-channel mutations from burning 2m each.)
cargo mutants --package <pkg> --file <path/to/file.rs> --jobs 4 --timeout 45

# List mutations without running (fast)
cargo mutants --package <pkg> --file <path/to/file.rs> --list

# Only mutations in the current diff (fast, for PR checks)
cargo mutants --in-diff origin/main --package <pkg>

Tip: mutation runs are expensive. When investigating a specific bug hypothesis, scope to a single file.

Tip: the full critical-crate chain takes about 17 hours (dora-core, dora-message, dora-coordinator-store, dora-coordinator, dora-daemon) with workspace-scoped tests. Budget accordingly.

8. The unwrap budget ratchet explained

.unwrap-budget contains a single integer: the maximum allowed count of .unwrap() and .expect( in production code.

The number can only go down over time.
Your PR's unwrap count must be ≤ the stored budget.
If your PR reduces the count, commit the smaller number in the same PR.
If your PR increases the count, either (a) fix the new unwraps or (b) bump the number with justification in the commit message.

The script counts everything under libraries/, binaries/, apis/ except:

Files inside tests/, benches/, examples/ directories
Files named tests.rs (submodule test files)
Lines after the first #[cfg(test)] in any source file

See the KNOWN LIMITATION notes in scripts/qa/unwrap-budget.sh for edge cases.

9. Where to look when something is weird

Symptom	Look here
Local run differs from CI	`.github/workflows/ci.yml` vs `scripts/qa/*.sh` — they should call the same commands
Tool not installed	`make qa-install`
CI cache stale	Clear cache in GitHub Actions UI, or bump `Swatinem/rust-cache` key
Mutation score looks wrong	Check `.cargo/mutants.toml` `test_workspace = true` is set
`coverage.sh` hangs	Usually one test is stuck — run with `cargo test` directly to find it
Unwrap budget counts test code	The script should exclude test code; if it doesn't, report a bug in the script

10. Adding a new gate

If you want to add a new QA check:

Write the check as a shell script under scripts/qa/<name>.sh.
Make it executable, runnable standalone, and fail-fast.
Add a target to Makefile (qa-<name>).
Add it to scripts/qa/all.sh in the appropriate tier (fast / full / tier1).
Add a CI job to .github/workflows/ci.yml that calls make qa-<name>.
Document it in this runbook (Section 3).

11. Related documents

qa-followups.md — open items tracker: everything the POC deferred, organized by effort and trigger. Check here for "what's left to do"
plan-agentic-qa-strategy.md — the full strategy, including the three-tier model, case studies, and meta-findings
qa-baseline-2026-04-07.md — metrics snapshot as of the POC session
plan-dogfood-campaign.md — pre-release dogfood campaign spec
plan-fault-injection.md — chaos scenarios queued for implementation
qa-poc-report-2026-04-09.md — comprehensive POC report for outside readers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QA Runbook

1. TL;DR

2. What each target runs

3. Reading failures and fixing them

3.1 `fmt` failed

3.2 `clippy` failed

3.3 `audit` failed (cargo-audit)

3.4 `unwrap-budget` failed

3.5 `test` failed

3.6 `coverage` (soft) flagged

3.7 `mutation` escaped

3.8 `semver` (soft) flagged

4. Running the adversarial LLM review (local only today)

5. Running miri on the unsafe hotspots

6. Running property tests

7. Running focused mutation tests (when investigating a file)

8. The unwrap budget ratchet explained

9. Where to look when something is weird

10. Adding a new gate

11. Related documents

FilesExpand file tree

qa-runbook.md

Latest commit

History

qa-runbook.md

File metadata and controls

QA Runbook

1. TL;DR

2. What each target runs

3. Reading failures and fixing them

3.1 fmt failed

3.2 clippy failed

3.3 audit failed (cargo-audit)

3.4 unwrap-budget failed

3.5 test failed

3.6 coverage (soft) flagged

3.7 mutation escaped

3.8 semver (soft) flagged

4. Running the adversarial LLM review (local only today)

5. Running miri on the unsafe hotspots

6. Running property tests

7. Running focused mutation tests (when investigating a file)

8. The unwrap budget ratchet explained

9. Where to look when something is weird

10. Adding a new gate

11. Related documents

3.1 `fmt` failed

3.2 `clippy` failed

3.3 `audit` failed (cargo-audit)

3.4 `unwrap-budget` failed

3.5 `test` failed

3.6 `coverage` (soft) flagged

3.7 `mutation` escaped

3.8 `semver` (soft) flagged