perf(subtreeprocessor): optimize reorg with parallel bulk operations#526
perf(subtreeprocessor): optimize reorg with parallel bulk operations#526freemans13 wants to merge 3 commits intobsv-blockchain:mainfrom
Conversation
…InpointsMap with parallel bulk operations - Deleted hashSlicePool to streamline memory management. - Introduced parallel bulk operations in SplitTxInpointsMap for improved performance during transaction handling. - Added comprehensive tests for SplitTxInpointsMap functionalities, ensuring correctness under concurrent access. - Removed outdated benchmark tests and added new benchmarks for reorg operations.
|
🤖 Claude Code Review Status: Complete Found 1 issue:
Overall assessment: This PR delivers significant reorg performance improvements through well-designed parallel bulk operations: Strong points:
Context propagation issue: Minor observations:
|
…ocessing - Updated the calculation of numWorkers in processRemainderTxHashes to cap the maximum number of workers at 16, enhancing resource management during parallel processing.
…and improve error handling - Updated the moveBackBlock method to return fewer values, enhancing clarity and reducing complexity. - Adjusted related benchmark and test cases to reflect the new method signature. - Improved error handling for cases where a block is not provided, ensuring consistent error messaging.
| } | ||
|
|
||
| stp.chainedSubtrees = append(stp.chainedSubtrees, fullSubtrees...) | ||
| } |
There was a problem hiding this comment.
The errgroup.WithContext(context.Background()) here ignores the parent context. This means cancellation wont propagate, tracing spans are lost, and timeouts are ignored. Pass the context through: func bulkBuildSubtrees(ctx context.Context, ...) and use errgroup.WithContext(ctx)
|



Summary
Profiling reorg operations at 100K transaction scale revealed three main bottlenecks: per-node lock acquisition in
SplitTxInpointsMap(O(N) sequential lock/unlock cycles), sequential UTXO marking wheremark(true)andmark(false)operate on disjoint hash sets but ran one after the other, and sequential subtree construction throughaddNode()calls.This PR addresses each bottleneck:
Replace
txmap.SyncedMapwithswiss.Map+sync.Mutexper bucket inSplitTxInpointsMap, matching the proven pattern already used bySplitSwissMap,SplitSyncedParentMap, and the block persister UTXO maps. Contiguous[]txInpointsBucketslice instead ofmap[uint16]*for cache-friendly access.Add
ParallelBulkSetIfNotExiststhat groups entries by bucket, then processes all non-empty buckets in parallel with a single lock acquisition per bucket. Reduces lock overhead from O(N) sequential to O(N/buckets) parallel. Wired intomoveBackBlockBulkBuildandprocessRemainderTxHashes.Introduce
bulkBuildSubtreesfor parallel subtree construction, replacing per-nodeaddNode()calls (which each acquire a mutex and checkIsComplete). Full subtrees are built concurrently viaerrgroup.Run
MarkTransactionsOnLongestChain(true)andMarkTransactionsOnLongestChain(false)concurrently viaerrgroup— they operate on disjoint hash sets.Parallel subtree announcements: batch send to
newSubtreeChan, then batch wait for responses, overlapping send/receive.Remove
hash_slice_pool.go—sync.Pooloverhead exceeded the allocation savings at reorg scale.Benchmark Results
Reorg performance (mock UTXO store, no race detector):
Test plan
-raceSplitTxInpointsMapunit tests (9 tests covering CRUD, concurrency, bucket distribution, bulk operations)🤖 Generated with Claude Code