feat: adaptive subtreeData skip during catchup when txs exist locally#539
feat: adaptive subtreeData skip during catchup when txs exist locally#539freemans13 wants to merge 3 commits intobsv-blockchain:mainfrom
Conversation
During catchup, subtree validation downloads massive subtreeData files from peers. When both nodes have received the same transactions (e.g. dev-scale with tx-blaster), the catching-up node already has all txs in its UTXO store. This optimization checks UTXO existence per-subtree before downloading subtreeData, skipping the download entirely when all txs exist locally. Uses a shared atomic flag so that once any subtree has missing txs, all remaining subtrees skip the probe and go straight to full subtreeData fetch, minimizing wasted bandwidth. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
🤖 Claude Code Review Status: Complete ✅ No new issues found The implementation is well-designed with proper error handling and fallback mechanisms. The atomic flag pattern correctly closes the race window with the re-check at line 449 in get_blocks.go. History:
|
|
|
||
| g.Go(func() error { | ||
| // If flag is already set, go straight to full fetch | ||
| if needsSubtreeData.Load() { |
There was a problem hiding this comment.
The atomic flag pattern here has a race condition. When multiple goroutines check needsSubtreeData.Load() at line 436 before any have called .Store(true), they all proceed to the UTXO check. If one goroutine finds a missing tx and sets the flag, other goroutines that already passed the check will still download subtreeData even though the flag is now set.
Impact: The optimization won't be as effective as intended - some goroutines will unnecessarily download subtreeData even after the flag is set.
Solution: This is an inherent limitation of the optimization pattern and may be acceptable since it still provides bandwidth savings. However, if you want stricter behavior, consider checking the flag again before calling fetchAndStoreSubtreeData at lines 458 and 464.
Update: ✅ Fixed - The code now includes a re-check of the flag at line 449 after fetching the subtree, which closes the race window. This ensures goroutines that are in-flight will check the flag again before proceeding to download subtreeData.
Add a second flag check after the subtree-hash fetch completes but before starting the expensive UTXO BatchDecorate call. This closes the window where a goroutine passes the initial check, another goroutine sets the flag while the first is fetching hashes, and the first goroutine then does an unnecessary UTXO existence check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The UTXO check in check_block_subtrees.go and streaming_processor.go caused validation to be skipped entirely when all txs existed in the UTXO store. This broke tests using the SQL backend (which doesn't store the Creating field) and would also skip block-level validation in production. The optimization is kept only in fetchBlockSubtreesAdaptive (pre-fetch phase), which correctly skips subtreeData downloads while letting the validation pipeline run fully. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|


Problem
During catchup, subtree validation downloads subtreeData files (full serialized transactions, hundreds of MB each) from peers for every subtree in every block. In environments like dev-scale where a tx-blaster sends each transaction to both nodes, the catching-up node already has all these transactions locally in its UTXO store. Downloading subtreeData in this scenario is pure waste — bandwidth, time, and memory spent re-fetching data we already have.
Solution
For each subtree, stream the tx hashes and batch-check UTXO existence. If all txs exist locally, skip the subtreeData download entirely. If any are missing, abort the check immediately and switch to subtreeData for that subtree and all remaining ones in the block. Subtrees already validated via subtree-only are not re-downloaded.
A shared atomic flag across concurrent goroutines ensures that once a miss is found, every subsequent subtree goes straight to full fetch with zero probing overhead. This is applied across all three catchup code paths: block validation pre-fetch,
CheckBlockSubtrees, and the streaming processor pipeline.Changes
pkg/utxocheck/existence.goCheckAllTxsExistInUTXO()— batch-checks tx hashes viaBatchDecorate, skips coinbase, short-circuits on first misspkg/utxocheck/existence_test.goservices/blockvalidation/get_blocks.gofetchBlockSubtreesAdaptive()with per-subtree UTXO probe; wired intoblockWorker()services/blockvalidation/get_blocks_test.goservices/subtreevalidation/check_block_subtrees.goneedsSubtreeDataatomic flag + UTXO check before subtreeData downloadservices/subtreevalidation/streaming_processor.goloadSubtreeOnly()helper + UTXO check instreamAndFilterSubtrees()Error handling
ValidateSubtreeInternalhas its own fallback viaprocessMissingTransactions()if txs are somehow missing at validation time despite passing the existence checkTest plan
go buildpasses for all modified packagesgo vetcleanmake lint— 0 issuespkg/utxocheck— 9/9 unit tests passservices/subtreevalidation— all tests passservices/blockvalidation— all catchup tests pass (TestBlockWorker,TestFetchBlocksConcurrently,TestFetchSubtreeDataForBlock,TestCatchup_*)🤖 Generated with Claude Code