Skip to content

Compact-filter committed_height freezes one block below tip after client restart, leaving SyncProgress::is_synced() permanently false #824

Description

@lklimek

Crate: dash-spv
Rev observed: 981e97f1015960ae5d277afdabcba1cbbc0b3a63
Network: Testnet (core chain ~1,502,6xx)
Severity: High — a single mistimed block permanently wedges sync with no client-side recovery.

Summary

After a DashSpvClient restart in which compact-filter headers are already persisted up to the chain tip, a core block that arrives during the client's reinit window falls into a one-block gap between the from-genesis filter-body rescan and the live tip tracker. That block's compact-filter body is never requested or committed, so the Filters phase committed_height freezes one block below the chain tip — permanently. Because SyncProgress::is_synced() requires the Filters phase to be Synced, it stays false forever, and every downstream consumer reads the client as "syncing" indefinitely even though the chain, filter headers, masternodes, and chainlocks are all fully synced.

This was observed in a host application (Dash Evo Tool) where a Disconnect→Reconnect recreates the DashSpvClient; the reinit window during that recreation is when the offending block arrived. The host UI then displayed "Syncing…" forever with no recovery path.

Symptom

After the restart, every sync phase reaches Synced except Filters, which is wedged:

Headers:        Synced 1502629/1502629
Filter Headers: Synced 1502629/1502629
Filters:        Syncing 1502615/1502629 (100.0%) stored:1502615, downloaded: 0, processed: 0
Blocks:         Synced ...
Masternodes:    Synced 1502629/1502629
ChainLocks:     Synced ; InstantSend: Synced ; Mempool: Synced

FiltersProgress::current_height() returns committed_height (1502615), which never crosses 1502616, so the phase state stays Syncing (committed_height < target_height) and SyncProgress::is_synced() (all phases must be Synced) returns false for the life of the client.

Root cause (analysis)

The offending block (1502616) arrived inside the reinit window — its header was stored ~2s after the fresh client initialized filter headers to the prior tip (1502615), straddling the boundary between:

  1. the from-genesis historical filter-body rescan (whose tail batch nominally covered up to 1502616 but never committed it — batch-complete download ≠ commit), and
  2. the live tip tracker, which began requesting filter bodies at 1502617.

The contiguous commit pointer (committed_height) therefore cannot cross the 1502616 hole. All later tip filters (1502617…1502632) download and "complete" fine but are wasted — the pointer is stuck behind the gap.

Relevant code paths (rev 981e97f):

  • dash-spv/src/sync/filters/sync_manager.rs ~L89 — pipeline.init(stored_filters_tip + 1, stored_filters_tip) on restart; the boundary block that lands between init and the first live FilterHeadersStored is the one that slips through.
  • dash-spv/src/sync/filters/pipeline.rs / manager.rs — batch download vs. commit/contiguity accounting; a downloaded-but-not-committed batch does not advance committed_height.
  • dash-spv/src/sync/filters/progress.rscurrent_height() returns committed_height.
  • dash-spv/src/sync/progress.rsSyncProgress::is_synced() requires all phases (incl. Filters) Synced.

Evidence (falsifiable, from a ~1.9M-line client log)

  1. The body for the gap block was never requested. GetCFilters: 1502616 to 15026160 occurrences in the entire post-restart log. Tip body requests are contiguous except 1502616: … 1502614, 1502615, [1502616 ABSENT], 1502617, 1502618 ….
  2. committed_height is frozen at 1502615 across the entire post-restart log. Every Filters progress line reads 1502615/…; not one line has a first value ≥ 1502616.
  3. Download ≠ commit. Immediately after the rescan's final batch logged Filter batch 1502001-1502616 complete, the very next progress still read Filters: Syncing 1502615/1502617 stored:1502615.
  4. Not environmental / not a peer-capability issue. Peers served ~1.5M cfilter responses with zero unknown block hash / validation rejections post-restart. The wedge is purely client-side commit accounting. (This distinguishes it from dash-spv: compact-filter sync stalls silently when the only connected peer lacks NODE_COMPACT_FILTERS #815, which is the same symptom — Filters phase never reaching Synced — via a different cause, a peer lacking NODE_COMPACT_FILTERS.)

Timeline (UTC)

Time Event
08:25:41 Old client shutdown complete; storage persisted (Disconnect)
08:25:44 Fresh client starts (SpvRuntime::run() recreates DashSpvClient)
08:25:46 BlockHeadersManager initialized at height 1502615; sync-manager tasks spawned; best_height=1502615
08:25:54 New header arrives: Segment 0: 1 headers ready to store from height 1502616
08:25:56 FilterHeadersStored(1502616-1502616, tip=1502616); from-genesis filter-body rescan begins
08:26–08:28 ~1.5M cfilter messages downloaded; rescan tail Filter batch 1502001-1502616 completeyet committed_height stays 1502615
08:28→tail Tip requests 1502615 → [1502616 skipped] → 1502617, 1502618, …; every later filter "completes" but cannot advance the frozen pointer
later Every phase Synced except Filters: Syncing 1502615/1502629

Reproduction

  1. Sync a DashSpvClient to tip (filter headers + bodies committed to height H).
  2. Restart the client (recreate DashSpvClient) while a new block H+1 lands during the reinit window — i.e. after filter headers init to H but before the live tip tracker starts requesting bodies.
  3. Observe: committed_height stays at H; the Filters phase reports Syncing H/(H+k) indefinitely; SyncProgress::is_synced() never returns true, despite all other phases Synced and peers healthy.

Suggested fix

On restart, guarantee no gap at the rescan/tip boundary: re-request and commit any filter body in (committed_height, filter_header_tip] that is not yet committed before (or as part of) starting live tip tracking. In practice, resume tip tracking from committed_height + 1 rather than filter_header_tip + 1, and ensure the historical-rescan tail batch actually advances committed_height to its upper bound on completion (investigate why batch 1502001-1502616 completed download without committing 1502616 — the commit/contiguity logic in filters/pipeline.rs + manager.rs).

A committed_height-stall guard (Filters committed_height not advancing while filter_header_tip/Headers advance for N seconds → re-drive the gap) would also make this self-healing rather than terminal.

Related

🤖 Co-authored by Claudius the Magnificent AI Agent

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions