Skip to content

Proposer QBFT boundary-layer instrumentation #1067

@jnhsigmap

Description

@jnhsigmap

Description

Implement concretely the instrumentation taxonomy from #1066 into the qbft_instance() loop, emitting metrics (Prometheus) and tracing events (OTLP, currently no-op without a collector) at QBFT lifecycle boundaries.

Aim to satisfy all acceptance criteria of #921 without altering consensus semantics.

Sub-issue of #921. Depends on #1066.

Present Behaviour

The proposer QBFT instance loop (qbft_manager/src/instance.rs) runs without emitting any metrics or structured tracing about round advances, handoff budget, outcome, or decided round.

Expected Behaviour

This owns implementation of:

  • anchor_proposer_qbft_decided_round histogram records the round on which each proposer instance decided or timed out.
  • anchor_proposer_round_advance_total counter (labelled by reason: timeout, f_plus_1_rc, rc_quorum, future_proposal) tracks round advances.
  • anchor_proposer_qbft_handoff_budget_seconds histogram records slot budget remaining at QBFT start. Enables distinguishing late-start from round-churn.
  • anchor_proposer_qbft_outcome_total counter (labelled: decided, max_round_timeout, channel_closed) tracks instance outcomes.
  • anchor_proposer_qbft_duration_seconds histogram records total instance duration.
  • A proposer_qbft_instance tracing span with structured checkpoint events covers the full lifecycle.
  • All observation is external (boundary-layer) with no changes to common/qbft consensus logic.

Implementation Detail

  1. Add handoff_budget_ms: Option<u64> field to QbftInitialization.
  2. Modify qbft_instance() to snapshot (state, round) before/after each recv arm and emit metrics + tracing events for proposer instances.
  3. Add handoff_budget_ms parameter to decide_instance signature and ConsensusDecider trait.
  4. Compute handoff_budget_ms in validator_store proposer path using slot_clock.slot_duration() and determine_slot_elapsed_ms(); pass None for non-proposer callers.

Acceptance criteria mapping

#921 criterion How this PR satisfies it
Record decided/timed-out round PROPOSER_QBFT_DECIDED_ROUND histogram
Round timeout and round-advance visible PROPOSER_ROUND_ADVANCE_TOTAL counter + checkpoint tracing events
Distinguish late-start from round-churn PROPOSER_QBFT_HANDOFF_BUDGET_SECONDS histogram (low budget = late start) vs PROPOSER_ROUND_ADVANCE_TOTAL (high count = churn)
No consensus semantics altered All logic is observation-only; common/qbft gains only a getter

Scope

In scope: Wiring metrics into qbft_instance loop, cross-crate signature threading, tracing span with checkpoint events.

Out of scope: Changing timeout formulas, leader selection, or round-change behavior. Dashboard/alerting configuration and OTLP collector.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions