Description
Implement concretely the instrumentation taxonomy from #1066 into the qbft_instance() loop, emitting metrics (Prometheus) and tracing events (OTLP, currently no-op without a collector) at QBFT lifecycle boundaries.
Aim to satisfy all acceptance criteria of #921 without altering consensus semantics.
Sub-issue of #921. Depends on #1066.
Present Behaviour
The proposer QBFT instance loop (qbft_manager/src/instance.rs) runs without emitting any metrics or structured tracing about round advances, handoff budget, outcome, or decided round.
Expected Behaviour
This owns implementation of:
anchor_proposer_qbft_decided_round histogram records the round on which each proposer instance decided or timed out.
anchor_proposer_round_advance_total counter (labelled by reason: timeout, f_plus_1_rc, rc_quorum, future_proposal) tracks round advances.
anchor_proposer_qbft_handoff_budget_seconds histogram records slot budget remaining at QBFT start. Enables distinguishing late-start from round-churn.
anchor_proposer_qbft_outcome_total counter (labelled: decided, max_round_timeout, channel_closed) tracks instance outcomes.
anchor_proposer_qbft_duration_seconds histogram records total instance duration.
- A
proposer_qbft_instance tracing span with structured checkpoint events covers the full lifecycle.
- All observation is external (boundary-layer) with no changes to
common/qbft consensus logic.
Implementation Detail
- Add
handoff_budget_ms: Option<u64> field to QbftInitialization.
- Modify
qbft_instance() to snapshot (state, round) before/after each recv arm and emit metrics + tracing events for proposer instances.
- Add
handoff_budget_ms parameter to decide_instance signature and ConsensusDecider trait.
- Compute
handoff_budget_ms in validator_store proposer path using slot_clock.slot_duration() and determine_slot_elapsed_ms(); pass None for non-proposer callers.
Acceptance criteria mapping
| #921 criterion |
How this PR satisfies it |
| Record decided/timed-out round |
PROPOSER_QBFT_DECIDED_ROUND histogram |
| Round timeout and round-advance visible |
PROPOSER_ROUND_ADVANCE_TOTAL counter + checkpoint tracing events |
| Distinguish late-start from round-churn |
PROPOSER_QBFT_HANDOFF_BUDGET_SECONDS histogram (low budget = late start) vs PROPOSER_ROUND_ADVANCE_TOTAL (high count = churn) |
| No consensus semantics altered |
All logic is observation-only; common/qbft gains only a getter |
Scope
In scope: Wiring metrics into qbft_instance loop, cross-crate signature threading, tracing span with checkpoint events.
Out of scope: Changing timeout formulas, leader selection, or round-change behavior. Dashboard/alerting configuration and OTLP collector.
Description
Implement concretely the instrumentation taxonomy from #1066 into the
qbft_instance()loop, emitting metrics (Prometheus) and tracing events (OTLP, currently no-op without a collector) at QBFT lifecycle boundaries.Aim to satisfy all acceptance criteria of #921 without altering consensus semantics.
Sub-issue of #921. Depends on #1066.
Present Behaviour
The proposer QBFT instance loop (
qbft_manager/src/instance.rs) runs without emitting any metrics or structured tracing about round advances, handoff budget, outcome, or decided round.Expected Behaviour
This owns implementation of:
anchor_proposer_qbft_decided_roundhistogram records the round on which each proposer instance decided or timed out.anchor_proposer_round_advance_totalcounter (labelled by reason:timeout,f_plus_1_rc,rc_quorum,future_proposal) tracks round advances.anchor_proposer_qbft_handoff_budget_secondshistogram records slot budget remaining at QBFT start. Enables distinguishing late-start from round-churn.anchor_proposer_qbft_outcome_totalcounter (labelled:decided,max_round_timeout,channel_closed) tracks instance outcomes.anchor_proposer_qbft_duration_secondshistogram records total instance duration.proposer_qbft_instancetracing span with structured checkpoint events covers the full lifecycle.common/qbftconsensus logic.Implementation Detail
handoff_budget_ms: Option<u64>field toQbftInitialization.qbft_instance()to snapshot(state, round)before/after each recv arm and emit metrics + tracing events for proposer instances.handoff_budget_msparameter todecide_instancesignature andConsensusDecidertrait.handoff_budget_msinvalidator_storeproposer path usingslot_clock.slot_duration()anddetermine_slot_elapsed_ms(); passNonefor non-proposer callers.Acceptance criteria mapping
PROPOSER_QBFT_DECIDED_ROUNDhistogramPROPOSER_ROUND_ADVANCE_TOTALcounter + checkpoint tracing eventsPROPOSER_QBFT_HANDOFF_BUDGET_SECONDShistogram (low budget = late start) vsPROPOSER_ROUND_ADVANCE_TOTAL(high count = churn)common/qbftgains only a getterScope
In scope: Wiring metrics into
qbft_instanceloop, cross-crate signature threading, tracing span with checkpoint events.Out of scope: Changing timeout formulas, leader selection, or round-change behavior. Dashboard/alerting configuration and OTLP collector.