Skip to content

perf(subtreeprocessor): optimize reorg with parallel bulk operations#526

Open
freemans13 wants to merge 3 commits intobsv-blockchain:mainfrom
freemans13:stu/blockassembly-reorg-performance-optimisation
Open

perf(subtreeprocessor): optimize reorg with parallel bulk operations#526
freemans13 wants to merge 3 commits intobsv-blockchain:mainfrom
freemans13:stu/blockassembly-reorg-performance-optimisation

Conversation

@freemans13
Copy link
Collaborator

@freemans13 freemans13 commented Feb 24, 2026

Summary

Profiling reorg operations at 100K transaction scale revealed three main bottlenecks: per-node lock acquisition in SplitTxInpointsMap (O(N) sequential lock/unlock cycles), sequential UTXO marking where mark(true) and mark(false) operate on disjoint hash sets but ran one after the other, and sequential subtree construction through addNode() calls.

This PR addresses each bottleneck:

  • Replace txmap.SyncedMap with swiss.Map + sync.Mutex per bucket in SplitTxInpointsMap, matching the proven pattern already used by SplitSwissMap, SplitSyncedParentMap, and the block persister UTXO maps. Contiguous []txInpointsBucket slice instead of map[uint16]* for cache-friendly access.

  • Add ParallelBulkSetIfNotExists that groups entries by bucket, then processes all non-empty buckets in parallel with a single lock acquisition per bucket. Reduces lock overhead from O(N) sequential to O(N/buckets) parallel. Wired into moveBackBlockBulkBuild and processRemainderTxHashes.

  • Introduce bulkBuildSubtrees for parallel subtree construction, replacing per-node addNode() calls (which each acquire a mutex and check IsComplete). Full subtrees are built concurrently via errgroup.

  • Run MarkTransactionsOnLongestChain(true) and MarkTransactionsOnLongestChain(false) concurrently via errgroup — they operate on disjoint hash sets.

  • Parallel subtree announcements: batch send to newSubtreeChan, then batch wait for responses, overlapping send/receive.

  • Remove hash_slice_pool.gosync.Pool overhead exceeded the allocation savings at reorg scale.

Benchmark Results

Reorg performance (mock UTXO store, no race detector):

Scale MoveBack MoveForward Total Alloc (MB) Throughput
1K 2ms 7ms 8ms 146 239K tx/sec
10K 7ms 28ms 35ms 294 565K tx/sec
50K 10ms 34ms 44ms 389 2.3M tx/sec
100K 13ms 55ms 68ms 443 2.9M tx/sec

Test plan

  • All existing subtreeprocessor tests pass with -race
  • New SplitTxInpointsMap unit tests (9 tests covering CRUD, concurrency, bucket distribution, bulk operations)
  • New reorg benchmarks at 1K–100K scale

🤖 Generated with Claude Code

…InpointsMap with parallel bulk operations

- Deleted hashSlicePool to streamline memory management.
- Introduced parallel bulk operations in SplitTxInpointsMap for improved performance during transaction handling.
- Added comprehensive tests for SplitTxInpointsMap functionalities, ensuring correctness under concurrent access.
- Removed outdated benchmark tests and added new benchmarks for reorg operations.
@freemans13 freemans13 self-assigned this Feb 24, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 24, 2026

🤖 Claude Code Review

Status: Complete

Found 1 issue:

  • Context not propagated in bulkBuildSubtrees - Using context.Background() instead of passed context means cancellation, tracing, and timeouts will not work correctly during reorg operations. See inline comment

Overall assessment:

This PR delivers significant reorg performance improvements through well-designed parallel bulk operations:

Strong points:

  • Excellent use of swiss.Map with per-bucket locking pattern (matches existing patterns in codebase)
  • ParallelBulkSetIfNotExists correctly groups by bucket then processes in parallel - good lock granularity
  • Comprehensive test coverage including concurrency safety and duplicate handling
  • Parallel mark(true) and mark(false) operations properly verified to be disjoint
  • Benchmark results show ~40x improvement at 100K tx scale

Context propagation issue:
The bulkBuildSubtrees function uses context.Background() which breaks cancellation/tracing chains. This should be addressed before merge.

Minor observations:

  • The removal of hash_slice_pool.go is well-justified by profiling data
  • Error handling patterns are consistent with project conventions
  • Map type changes from map[hash]struct{} to map[hash]bool simplify filtering logic

@freemans13 freemans13 changed the title =refactor(subtreeprocessor): remove hashSlicePool and enhance SplitTx… perf(subtreeprocessor): optimize reorg with parallel bulk operations Feb 24, 2026
…ocessing

- Updated the calculation of numWorkers in processRemainderTxHashes to cap the maximum number of workers at 16, enhancing resource management during parallel processing.
…and improve error handling

- Updated the moveBackBlock method to return fewer values, enhancing clarity and reducing complexity.
- Adjusted related benchmark and test cases to reflect the new method signature.
- Improved error handling for cases where a block is not provided, ensuring consistent error messaging.
}

stp.chainedSubtrees = append(stp.chainedSubtrees, fullSubtrees...)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errgroup.WithContext(context.Background()) here ignores the parent context. This means cancellation wont propagate, tracing spans are lost, and timeouts are ignored. Pass the context through: func bulkBuildSubtrees(ctx context.Context, ...) and use errgroup.WithContext(ctx)

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant