Assurance Contract

ARIS audits emit machine-readable verdicts. The assurance axis decides whether those verdicts are advisory (draft mode) or load-bearing gates (submission mode). This contract is referenced by paper-writing, paper-claim-audit, citation-audit, proof-checker, and the external verifier (canonical name verify_paper_audits.sh; callers resolve the actual path via integration-contract.md §2).

Why a separate axis from `effort`

Historically effort (lite/balanced/max/beast) was conflated with audit strictness. The result: effort: beast did not guarantee mandatory audits ran — phases were gated by content detectors (e.g. if \begin{theorem} exists) and could silently skip. A user reported effort: beast produced a "draft-quality" paper with all three submission-gate audits skipped.

The fix is to split the concerns:

Axis	Controls	Default
`effort`	depth/cost (papers, rounds, ideation)	`balanced`
`assurance`	audit strictness — silent-skip-allowed vs verdict-required	derived from `effort` (see mapping)

Override either independently: — effort: balanced, assurance: submission is legal and means "normal depth, but every audit must emit a verdict before finalization."

Assurance Levels

`draft` — current behavior, no breakage

Audits run only if their content detector matches.
Silent skip allowed.
paper-writing Phase 6 produces a final report regardless.
For: rapid iteration, exploratory drafts, early-stage research.

`submission` — load-bearing audits

All mandatory audits must emit a verdict (one of the 6 below).
Silent skip is forbidden.
paper-writing Phase 6 invokes verify_paper_audits.sh (resolved per integration-contract.md §2); non-zero exit blocks Final Report.
The Final Report tags itself submission-ready: yes/no based on verifier output.
For: conference / journal submission, anything you'd put your name on.

Default Mapping (derived if `assurance` not given)

`effort`	implied `assurance`
`lite`	`draft`
`balanced`	`draft`
`max`	`submission`
`beast`	`submission`

This means a user passing only — effort: beast automatically gets full audit enforcement — matching their intent ("turn everything up"). Users wanting strict audits at lower depth pass — assurance: submission explicitly.

Verdict State Machine

Every mandatory audit must emit exactly one of these — never silent skip:

Verdict	Meaning	Audit ran?	Submission-blocking?
`PASS`	All checks passed	Yes	No
`WARN`	Issues found, none disqualifying	Yes	No
`FAIL`	Disqualifying issues found	Yes	Yes
`NOT_APPLICABLE`	Detector negative; nothing to audit (e.g., no theorems in paper, no `\cite`s, no numeric claims)	Audit phase ran, child audit invocation may have been skipped	No
`BLOCKED`	Audit should apply but prerequisites are missing or unsupported (e.g., paper has numeric claims but no `results/` directory; paper cites references but `.bib` missing)	Could not complete	Yes
`ERROR`	Audit invocation failed (network, timeout, malformed reviewer output)	Attempted but errored	Yes at submission

Why `NOT_APPLICABLE` is not the same as `SKIP`

NOT_APPLICABLE means the audit phase ran, the detector returned negative, and a verdict artifact was written documenting "we checked, there's nothing to verify." This is verifiable from outside the LLM — the artifact file exists.

A silent skip leaves no record. There's no way to distinguish "we checked and there was nothing" from "we forgot." This contract makes that distinction mandatory.

Why `BLOCKED` is more dangerous than `NOT_APPLICABLE`

BLOCKED means the audit should have run but cannot. Example: a paper claims accuracy = 89.2% but has no results/ directory to verify against. That's not "nothing to audit" — that's "we cannot verify a load-bearing claim." Treating this as SKIP masks the danger; BLOCKED surfaces it and blocks submission.

Required Audit Artifact Schema

Every mandatory audit must write a JSON artifact (and may also write a human-readable Markdown sibling). The JSON must contain at minimum:

{
  "audit_skill": "paper-claim-audit",       // citation-audit, proof-checker, etc.
  "verdict": "PASS",                         // one of the 6 above
  "reason_code": "all_numbers_match",        // skill-specific short string
  "summary": "Verified 23 numeric claims against 4 result files; no mismatches.",
  "audited_input_hashes": {
    "main.tex":                          "sha256:a3f8...",
    "sections/5.evidence.tex":           "sha256:b2d1...",
    "/Users/me/project/results/run_2026_04_19.json": "sha256:c9e4..."
  },
  "trace_path": ".aris/traces/paper-claim-audit/2026-04-21_run01/",
  "thread_id":  "019dae73-fc12-4ab8-...",
  "reviewer_model": "gpt-5.5",
  "reviewer_reasoning": "xhigh",
  "generated_at": "2026-04-21T14:23:01Z",
  "details": {
    // skill-specific structured data
  }
}

Field semantics:

audited_input_hashes — SHA256 of every file the audit consumed.
- Keys are paths relative to the paper directory (the argument passed to verify_paper_audits.sh) for files inside it, or absolute paths for files outside it (e.g. ../results/run.json is legal but /Users/me/project/results/run.json is more portable). Do NOT prefix in-paper files with paper/ — the verifier already resolves relative to the paper dir and paper/paper/main.tex will false-fail. The verifier rehashes the current files and flags STALE if any hash changed since the audit ran. (User edited main.tex after running paper-claim-audit? The next verifier run will catch it.)
trace_path — directory containing the full reviewer prompt + response pair, per review-tracing.md. Required for mandatory audits — not optional.
thread_id — Codex MCP thread ID, for forensic traceability.
reviewer_model + reviewer_reasoning — proves cross-family review invariant was honored.
generated_at — UTC ISO-8601 timestamp.

Verifier Contract

verify_paper_audits.sh <paper-dir> (canonical name; resolved per integration-contract.md §2) is the single source of truth for "are mandatory audits complete and current?" It must:

Locate the paper-writing manifest (which mandatory audits applied this run).
For each, check artifact JSON exists at expected path.
Validate artifact JSON against required-fields schema (above).
Verify verdict is one of the 6 allowed values.
Recompute SHA256 of every file in audited_input_hashes; flag STALE if any mismatches.
Verify trace_path exists and is non-empty.
Output a structured JSON report and exit 0 (all green) or 1 (any FAIL / BLOCKED / ERROR / STALE / missing artifact).

Phase 6 of paper-writing invokes the verifier; at assurance: submission, non-zero exit blocks Final Report generation.

Subskill Contract: "Always Emit, Never Block"

Child audit skills (paper-claim-audit, citation-audit, proof-checker) follow this contract:

Always emit a verdict artifact, even on detector-negative or error paths.
Never block the parent's flow themselves — they only emit verdicts.
The parent skill (paper-writing Phase 6 + verifier) decides whether a given verdict blocks finalization. This decision lives in one place (assurance axis + verifier), not duplicated across child skills.

Earlier wording in paper-claim-audit and citation-audit (e.g., "audit is advisory, never blocking") referred to this division of labor — but conflicted with paper-writing's declaration that they were "mandatory submission gates." This contract resolves the conflict: child = always emit; parent = decides blocking based on assurance level.

Examples

Theory paper, beast effort

— effort: beast    (implies assurance: submission)

proof-checker runs, audits theorems → PASS or WARN or FAIL
paper-claim-audit runs, finds numbers → PASS
citation-audit runs, audits refs → PASS
Verifier: all green
Final Report: submission-ready: yes

Position paper (no theorems, no numbers, no experiments), beast effort

— effort: beast    (implies assurance: submission)

proof-checker invoked → no theorems found → emits NOT_APPLICABLE
paper-claim-audit invoked → no numeric claims → emits NOT_APPLICABLE
citation-audit invoked → audits refs → PASS
Verifier: all green (NOT_APPLICABLE is not blocking)
Final Report: submission-ready: yes with note "no theorems / no numeric claims to audit"

Empirical paper missing raw results, beast effort

— effort: beast

proof-checker → NOT_APPLICABLE
paper-claim-audit invoked → finds claims like accuracy = 89.2% but results/ is empty → emits BLOCKED with reason_code no_raw_evidence
citation-audit → PASS
Verifier: exit 1 (BLOCKED is submission-blocking)
Final Report: refuses to finalize; surfaces "Mandatory audit BLOCKED: paper-claim-audit cannot verify numeric claims — no raw result files found. Add results/ or downgrade to — assurance: draft."

Stale audit (user edited paper after running audits)

User runs /paper-writing at beast → all audits PASS, files written
User edits sec/5.evidence.tex to change a number
User reruns the verifier (or re-finalizes)
Verifier rehashes → audited_input_hashes mismatch → STALE flag → exit 1
Final Report: refuses; instructs user to rerun paper-claim-audit and citation-audit before re-finalizing.

Backward Compatibility

Users on effort: balanced (default) get assurance: draft — identical current behavior, no breakage.
Users explicitly using effort: max or effort: beast automatically get assurance: submission — matching their intent.
Users wanting the old "beast = depth only, no audit enforcement" can pass — effort: beast, assurance: draft (explicit override). This combination is legal but discouraged for actual submissions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assurance Contract

Why a separate axis from `effort`

Assurance Levels

`draft` — current behavior, no breakage

`submission` — load-bearing audits

Default Mapping (derived if `assurance` not given)

Verdict State Machine

Why `NOT_APPLICABLE` is not the same as `SKIP`

Why `BLOCKED` is more dangerous than `NOT_APPLICABLE`

Required Audit Artifact Schema

Verifier Contract

Subskill Contract: "Always Emit, Never Block"

Examples

Theory paper, beast effort

Position paper (no theorems, no numbers, no experiments), beast effort

Empirical paper missing raw results, beast effort

Stale audit (user edited paper after running audits)

Backward Compatibility

See Also

FilesExpand file tree

assurance-contract.md

Latest commit

History

assurance-contract.md

File metadata and controls

Assurance Contract

Why a separate axis from effort

Assurance Levels

draft — current behavior, no breakage

submission — load-bearing audits

Default Mapping (derived if assurance not given)

Verdict State Machine

Why NOT_APPLICABLE is not the same as SKIP

Why BLOCKED is more dangerous than NOT_APPLICABLE

Required Audit Artifact Schema

Verifier Contract

Subskill Contract: "Always Emit, Never Block"

Examples

Theory paper, beast effort

Position paper (no theorems, no numbers, no experiments), beast effort

Empirical paper missing raw results, beast effort

Stale audit (user edited paper after running audits)

Backward Compatibility

See Also

Why a separate axis from `effort`

`draft` — current behavior, no breakage

`submission` — load-bearing audits

Default Mapping (derived if `assurance` not given)

Why `NOT_APPLICABLE` is not the same as `SKIP`

Why `BLOCKED` is more dangerous than `NOT_APPLICABLE`