Bound retry loops and detect reorgs to prevent sync stalls by pragmaxim · Pull Request #1491 · trezor/blockbook

pragmaxim · 2026-05-02T05:38:29Z

Refactored the sync coordinator and workers to handle persistent backend failures and chain reorgs more robustly. Previously, the sync process could get stuck in infinite retry loops if the chain tip moved or a block became temporarily unavailable.

Abstract sync logic into helpers (getBlockHashForSync, waitForBlockWorkers) to reduce duplication between parallel and bulk sync modes.
Implement deadlock prevention by using non-blocking selects on channel sends to workers.
Add reorg detection that triggers a sync restart (errResync) when the requested height is found to be past the current chain tip.
Enhance signal handling to ensure clean exits during worker synchronization.

This solution won over #1490

Map Avalanche "cannot query unfinalized data" RPC errors to ErrBlockNotFound instead of swallowing them. This prevents failed trace calls from being treated as successful empty results, which caused trace length mismatches during sync.

Refactor the sync coordinator and workers to handle persistent backend failures and chain reorgs more robustly. Previously, the sync process could get stuck in infinite retry loops if the chain tip moved or a block became temporarily unavailable. - Abstract sync logic into helpers (getBlockHashForSync, waitForBlockWorkers) to reduce duplication between parallel and bulk sync modes. - Implement deadlock prevention by using non-blocking selects on channel sends to workers. - Add reorg detection that triggers a sync restart (errResync) when the requested height is found to be past the current chain tip. - Enhance signal handling to ensure clean exits during worker synchronization.

Copilot

Pull request overview

This PR hardens the DB sync coordinator/worker logic to avoid sync stalls caused by unbounded retries when the chain tip moves (reorg/rollback) or the backend temporarily can’t serve a block/hash.

Changes:

Added bounded retry + “tip fell below requested height” detection in GetBlockHash coordination to trigger errResync instead of retrying indefinitely.
Refactored worker-coordination error handling via recordBlockWorkerAbort and waitForBlockWorkers to prevent coordinator deadlocks and improve shutdown behavior.
Added/expanded regression tests for reorg detection, worker abort propagation, and EthereumRPC.GetBlockHash error mapping to bchain.ErrBlockNotFound.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
`db/sync.go`	Adds helper functions for bounded hash retries, reorg detection, and abort-aware worker waiting; updates parallel/bulk connect loops to use them.
`db/sync_test.go`	Adds unit regression tests covering resync signaling, abort handling, and deadlock prevention paths.
`bchain/coins/eth/ethrpc_blockhash_test.go`	Verifies `EthereumRPC.GetBlockHash` maps “not found” errors to `bchain.ErrBlockNotFound` for sync retry logic.

Comments suppressed due to low confidence (1)

db/sync.go:695

getBlockWorker unconditionally increments w.metrics.IndexResyncErrors on retryable GetBlock errors. Since SyncWorker.metrics is a pointer and constructors don’t enforce it being non-nil, this can panic in callers/tests that create a worker without metrics when a retryable error occurs. Either guard this increment with if w.metrics != nil (similar to getBlockHashForSync) or ensure metrics is always initialized before starting workers.

				w.metrics.IndexResyncErrors.With(common.Labels{"error": "failure"}).Inc()
				select {
				case <-terminating:
					return
				case <-w.chanOsSignal:
					return
				case <-time.After(cfg.RetryDelay):

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…nstable

fix(avalanche): retry unfinalized RPC responses

d24ac0a

Map Avalanche "cannot query unfinalized data" RPC errors to ErrBlockNotFound instead of swallowing them. This prevents failed trace calls from being treated as successful empty results, which caused trace length mismatches during sync.

pragmaxim force-pushed the fix/stuck-syncing-2 branch from 90b63c8 to 6020aa5 Compare May 2, 2026 05:51

pragmaxim changed the title ~~fix(stuck-syncing): bound retry loops that could stall chain-tip resync~~ fix(stuck-syncing) V2: bound retry loops that could stall chain-tip resync May 2, 2026

pragmaxim force-pushed the fix/stuck-syncing-2 branch from 6020aa5 to 800626e Compare May 2, 2026 07:03

pragmaxim mentioned this pull request May 2, 2026

fix(stuck-syncing): bound retry loops that could stall chain-tip resync #1490

Closed

pragmaxim requested review from Copilot and cranycrane May 2, 2026 07:09

Copilot started reviewing on behalf of pragmaxim May 2, 2026 07:10 View session

Copilot AI reviewed May 2, 2026

View reviewed changes

fix(sync): metrics could be nil since many tests would have to mock them

de57725

pragmaxim changed the title ~~fix(stuck-syncing) V2: bound retry loops that could stall chain-tip resync~~ Bound retry loops and detect reorgs to prevent sync stalls May 2, 2026

fix(sync): do not log expected events as errors, consider quicknode u…

775bd9f

…nstable

Base automatically changed from fix/avax-retry-unfinalized-responses to master May 4, 2026 04:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bound retry loops and detect reorgs to prevent sync stalls#1491

Bound retry loops and detect reorgs to prevent sync stalls#1491
pragmaxim wants to merge 4 commits intomasterfrom
fix/stuck-syncing-2

pragmaxim commented May 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

pragmaxim commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pragmaxim commented May 2, 2026 •

edited

Loading