Skip to content

[Feature]: Re-activate HMAC tool receipts — wiring stripped before #5168 merged, docs already describe the activated shape #6182

@singlerider

Description

@singlerider

Summary

Re-activate HMAC tool receipts (#5168) — the cryptographic core landed but the runtime wiring was stripped before merge, so today every caller passes None and no receipt is ever generated. Docs at docs/book/src/security/tool-receipts.md describe the feature as shipped (and detailed: three config knobs, sample debug output, response-block format) but none of it actually fires.

Problem statement

The merged form of #5168 (squash commit 7f999d8) shipped:

  • crates/zeroclaw-runtime/src/agent/tool_receipts.rs — 254 lines of HMAC-SHA256 generation + verification, ephemeral-key design, constant-time Mac::verify_slice per the post-review fix
  • ✅ Function signatures threading Option<&ReceiptGenerator> through agent::loop_::run_tool_call_loop and tool_execution
  • ✅ Two production call sites in crates/zeroclaw-channels/src/orchestrator/mod.rs:3169 and crates/zeroclaw-runtime/src/tools/delegate.rs:1184, both passing None, // receipt_generator
  • ✅ Leak-detector regression test (leak_detector.rs:496tool_receipts_not_redacted_as_high_entropy) confirming the zc-receipt- prefix passes through scrubbing unmodified
  • ✅ Docs page docs/book/src/security/tool-receipts.md (139 lines, includes a "Current state" table)

The branch commit ba16cacbf had real wiring on top of that core, but it was removed before the squash-merge — likely as a scope trim during PR review. The a3bd53d follow-up commit ("docs(security): add activation-pending caveats") confirmed the deactivation was deliberate at merge time. The activation pass was never filed or done. git grep tool_receipts crates/zeroclaw-config/ returns zero results — even the config struct never landed.

Net: docs claim a feature that the runtime does not implement. A user reading docs/book/src/security/tool-receipts.md would reasonably assume the model "cannot claim to have run a tool it didn't run" (literal docs language). It can; nothing is checking.

This is not a #5559 (workspace split) regression — git log -G "ctx\.receipt_generator" shows the active wiring only ever existed in branch commits, never in merged master.

Proposed solution

Re-land the wiring that #5168's PR review removed. Five concrete pieces, each independently reviewable:

1. Config structcrates/zeroclaw-config/src/schema.rs

[agent.tool_receipts]
enabled = false                # off by default; opt-in
show_in_response = false       # append trailing "Tool receipts:" block to user replies
inject_system_prompt = true    # add the LLM-side instruction to echo receipts verbatim

Three booleans, gated under [agent]. Validates clean on Config::default(). Documented shape already matches docs/book/src/security/tool-receipts.md.

2. Generator instantiationcrates/zeroclaw-runtime/src/runtime.rs (or wherever the agent loop's runtime context is built)

let receipt_generator = if config.agent.tool_receipts.enabled {
    Some(zeroclaw_runtime::agent::tool_receipts::ReceiptGenerator::new())
} else {
    None
};

Threaded into the runtime context (ChannelRuntimeContext in the original branch had it as Option<ReceiptGenerator> field).

3. Caller wiring — flip the two None, // receipt_generator sites:

  • crates/zeroclaw-channels/src/orchestrator/mod.rs:3169ctx.receipt_generator.as_ref()
  • crates/zeroclaw-runtime/src/tools/delegate.rs:1184 → forward from parent context (TODO already in the comment)

Plus the per-message receipt collector (Mutex<Vec<String>>) the original branch had, passed as collected_receipts: Some(&collector).

4. Response-block render — when show_in_response = true, after finalize_draft, render the collector's contents as a ---\nTool receipts:\n <name>: <receipt> block and send as a follow-up message on the same channel + thread.

5. System-prompt addendum — when inject_system_prompt = true, start_channels appends to the system prompt:

Tool Execution Receipts

Every tool result includes a [receipt: ...] field. This is a cryptographic signature proving the tool actually executed. You must include the receipt verbatim when referencing tool results. Do not modify, omit, or fabricate receipts. A missing or invalid receipt indicates a fabricated tool call.

(Exact wording from the original branch commit; safe to copy as the LLM-side contract.)

Non-goals / out of scope

  • Persistent audit database of receipts — listed as "Planned" in the docs current-state table; separate Issue.
  • Cross-session receipt verification — explicitly "Not planned" in the docs (ephemeral-key design choice).
  • Receipt verification at the model boundary (rejecting model output that contains an invalid receipt) — separate concern from generation; deserves its own Issue once generation is back.
  • A new [agent.tool_receipts] section that diverges from what the docs already describe — keep the documented shape.

Alternatives considered

  • Strip the docs page and the dead module instead — preserves consistency at the cost of a security feature we explicitly designed and tested. Rejected because the threat (LLM fabricating tool calls) is real and this is the cheapest defense; the activation work is hours, not weeks.
  • Land verification + generation in one Issue — wider scope, harder review. The verification side (rejecting model output that fabricates receipts) needs its own design discussion (do we hard-fail the turn, surface a warning, log-only, etc.); generation is a clean activation.
  • Make the config field on by default — opt-in is safer for the activation Issue; a follow-up can flip the default once the response-block UX is validated.

Acceptance criteria

  • [agent.tool_receipts] config struct lands in crates/zeroclaw-config/src/schema.rs with enabled, show_in_response, inject_system_prompt (matching docs/book/src/security/tool-receipts.md)
  • start_channels instantiates ReceiptGenerator::new() when config.agent.tool_receipts.enabled and threads it into the runtime context
  • Both production call sites (orchestrator/mod.rs, tools/delegate.rs) pass the real generator instead of None when enabled
  • When show_in_response = true, user-visible replies on every channel include the trailing Tool receipts: block
  • When inject_system_prompt = true, the system prompt carries the receipt-echo instruction
  • RUST_LOG=zeroclaw::agent=debug produces the Tool receipt generated tool=... receipt=zc-receipt-... debug line documented in the existing tool-receipts docs page
  • Existing leak-detector test (tool_receipts_not_redacted_as_high_entropy) still passes — receipts surface verbatim through scrub_credentials
  • Integration test: enabled-true, run a tool, assert the response on a stub channel contains the receipt; enabled-false, assert no receipt anywhere
  • Docs docs/book/src/security/tool-receipts.md "Current state" table updated — show_in_response flips from "Shipped" to "Shipped" (now true), inject_system_prompt flips from "In flight" to "Shipped"
  • CHANGELOG-next entry under Security or Agent & Runtime

Architecture impact

Surface Files Nature
Config crates/zeroclaw-config/src/schema.rs New ToolReceiptsConfig struct under AgentConfig; default enabled=false
Runtime context crates/zeroclaw-runtime/src/runtime.rs (or wherever the orchestrator's context is built) Add receipt_generator: Option<ReceiptGenerator> field; instantiate when enabled
Orchestrator crates/zeroclaw-channels/src/orchestrator/mod.rs Replace None with ctx.receipt_generator.as_ref(); add per-message Mutex<Vec<String>> collector; render response block when show_in_response
Delegate crates/zeroclaw-runtime/src/tools/delegate.rs Forward receipt_generator from parent context (resolves the existing TODO thread from parent in future comment)
System prompt start_channels setup path Append receipt-echo instruction when inject_system_prompt
Docs docs/book/src/security/tool-receipts.md Flip Current-state rows; everything else stays since the docs already describe the activated shape
Tests crates/zeroclaw-channels/ integration tests New on/off coverage for the response-block + receipt-presence assertions

No new external dependencies. No schema migrations (default false means existing configs deserialize unchanged).

Risk and rollback

Low to medium. The crypto core has been in-tree and tested for ~9 months; the wiring is the simple piece. Main risks:

  • LLM doesn't echo receipts — instruction in system prompt may not be reliably followed across providers. Mitigation: show_in_response block lets the user see receipts in the visible reply regardless of model behavior.
  • Channel-formatting interaction — appending a separate message after finalize_draft interacts with debouncing / threading on each channel. Mitigation: integration test per channel; the original branch handled this for at least Telegram and Slack.
  • Default-off means rollback is enabled = false in config — no code revert needed for users hitting issues.

Code rollback: git revert <merge-sha> is clean since the activation lands in one PR.

Breaking change?

No. Default enabled = false. Existing configs deserialize unchanged. Existing user-visible behavior unchanged unless the user opts in.

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentAuto scope: src/agent/** changed.enhancementNew feature or requestruntimeAuto scope: src/runtime/** changed.securityAuto scope: src/security/** changed.

    Type

    No type

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions