Skip to content

feat: mmap-backed subtrees and disk-backed TxMap for block RAM reduction#520

Merged
icellan merged 15 commits intomainfrom
fix/block-ram-usage
Feb 27, 2026
Merged

feat: mmap-backed subtrees and disk-backed TxMap for block RAM reduction#520
icellan merged 15 commits intomainfrom
fix/block-ram-usage

Conversation

@icellan
Copy link
Contributor

@icellan icellan commented Feb 19, 2026

Summary

Both blockvalidation and blockassembly/subtreeprocessor hold subtree Node arrays and transaction tracking maps entirely in Go heap memory. At scale (1000 subtrees × 1M transactions = 1 billion tx per block), this consumes ~376 GB of heap and creates severe GC pressure.

This PR moves the two largest memory consumers off-heap:

  • Node arrays (48 GB) → file-backed mmap: The Node struct has zero pointer fields ([32]byte + uint64 + uint64 = 48 bytes), making it safe for mmap. []Node slices backed by mmap are indistinguishable from heap-backed — all existing code works unchanged. The OS manages paging between RAM and disk transparently.

  • currentTxMap (200 GB) → sharded cuckoo filters + multi-disk BadgerDB: The hot path (SetIfNotExists) only needs an existence check. 4096 independent cuckoo filter shards provide 35M ops/sec for dedup with ~1 GB memory. TxInpoints data is persisted to BadgerDB across multiple physical disks via dedicated writer goroutines — one per disk for linear I/O scaling.

All changes are fully opt-in via new settings. When unconfigured (empty strings), behavior is identical to before.

Architecture

mmap-backed Node arrays (go-subtree v1.1.9)

Subtree.Nodes []Node  →  backed by file-backed mmap instead of Go heap
                          ├── GC never scans mmap'd memory
                          ├── OS pages cold subtrees to disk automatically
                          ├── fd closed after mmap (zero persistent file descriptors)
                          └── All existing code works unchanged (same []Node type)

DiskTxMap (replaces SplitTxInpointsMap when configured)

Hot path (35M ops/sec):
  hash → shardOf(hash) → shard.mu.Lock()
       → cuckoo filter Lookup/Insert
       → recent map check
       → shard.mu.Unlock()
       → async channel send to writer goroutine

Writer goroutines (one per disk, linear I/O scaling):
  channel recv → serialize TxInpoints → Badger batch.Set()
              → auto-flush every 50K entries

Benchmarks

Metric Result
Existence check throughput (dedup layer) 35M ops/sec @ 28 ns/op
Full SetIfNotExists, 4 disks 1.56M ops/sec @ 641 ns/op
mmap Node AddNode overhead vs heap +0.5 ns (2.47 → 2.94 ns/op)
Heap reduction (128 subtrees × 64K nodes) 383 MB → 0 MB (100%)

New Settings

Setting Default Description
blockassembly_subtreeMmapDir "" Directory for mmap-backed subtree Node files
blockassembly_txMapDirs "" Pipe-separated paths for multi-disk BadgerDB TxInpoints storage
blockvalidation_subtreeMmapDir "" Directory for mmap-backed subtree loading during validation

Example production configuration:

blockassembly_subtreeMmapDir=/mnt/nvme0/subtree-mmap
blockassembly_txMapDirs=/mnt/nvme0/txmap|/mnt/nvme1/txmap|/mnt/nvme2/txmap|/mnt/nvme3/txmap
blockvalidation_subtreeMmapDir=/mnt/nvme0/bv-subtree-mmap

Files Changed

File Change
go.mod go-subtree v1.1.9, cuckoo filter dependency
stores/tempstore/badger.go Added Delete() method
subtreeprocessor/disk_tx_map.go DiskTxMap: sharded cuckoo filters + multi-disk Badger
subtreeprocessor/disk_tx_map_test.go 8 functional tests + concurrent stress test
subtreeprocessor/disk_tx_map_benchmark_test.go Throughput + multi-disk benchmarks
subtreeprocessor/SubtreeProcessor.go mmap subtree creation, DiskTxMap init, cleanup paths
subtreeprocessor/options.go WithMmapDir, WithTxMapDirs options
blockassembly/BlockAssembler.go Pass settings to subtreeprocessor
blockvalidation/BlockValidation.go mmap subtree loading, cleanup
blockvalidation/quick_validate.go mmap subtree deserialization
blockvalidation/get_blocks.go mmap subtree loading in catchup
settings/blockassembly_settings.go SubtreeMmapDir, TxMapDirs
settings/blockvalidation_settings.go SubtreeMmapDir

Test plan

  • All existing blockassembly tests pass
  • All existing blockvalidation tests pass (263s suite)
  • DiskTxMap: 8 functional tests + 16-goroutine concurrent test with race detector
  • Memory profiling: 128 mmap subtrees = 0 MB heap (vs 383 MB heap-backed)
  • Throughput: existence layer 35M ops/sec, multi-disk 1.56M ops/sec
  • Integration: run in dev mode with mmap dirs configured

…M usage

Move subtree Node arrays off the Go heap using file-backed mmap and replace
the in-memory transaction map with a sharded cuckoo filter + multi-disk
BadgerDB architecture. All changes are opt-in via new settings.

go-subtree v1.1.9 adds mmap-backed Node storage:
- NewTreeMmap / NewTreeByLeafCountMmap constructors
- NewSubtreeFromReaderMmap for zero-heap deserialization
- Close() for mmap cleanup, SubtreeIndex on TxInpoints

Block Assembly (subtreeprocessor):
- New DiskTxMap: 4096 sharded cuckoo filters for 35M ops/sec existence
  checks, multi-disk BadgerDB for TxInpoints persistence
- Async writer goroutines (one per disk) keep Badger I/O off hot path
- mmap-backed subtree creation via WithMmapDir option
- Subtree Close() calls on all cleanup paths (Stop, Reset, reChain)

Block Validation:
- mmap-backed subtree loading in quick_validate and catchup pipelines
- Subtree Close() in setTxMined cleanup path

Settings (all opt-in, empty = original behavior):
- blockassembly_subtreeMmapDir: directory for mmap Node files
- blockassembly_txMapDirs: pipe-separated paths for multi-disk BadgerDB
- blockvalidation_subtreeMmapDir: directory for mmap subtree loading

Tempstore: added Delete() method to BadgerTempStore
@github-actions
Copy link
Contributor

github-actions bot commented Feb 19, 2026

🤖 Claude Code Review

Status: Complete


Summary

This PR successfully implements off-heap memory optimization for block processing through mmap-backed subtrees and disk-backed transaction maps. The implementation is well-architected with appropriate opt-in configuration and maintains backward compatibility.

Key Findings

Architecture (Strong):

  • ✅ Sharded cuckoo filters (4096 shards) provide 35M ops/sec existence checks
  • ✅ Multi-disk BadgerDB with dedicated writer goroutines enables linear I/O scaling
  • ✅ Proper separation of hot path (filter+channel) vs cold path (disk flush+check)
  • ✅ The slowMu lock correctly prevents race conditions in the slow path

Concurrency Safety (Verified):

  • ✅ Previous race condition concerns have been addressed via slowMu lock
  • ✅ Lock is held across channel send, ensuring FIFO ordering prevents duplicates
  • ✅ clearRecentMapsForDisk only runs in writer goroutine after flush

Configuration (Well Designed):

  • ✅ Fully opt-in via settings (empty = original behavior)
  • ✅ Clear documentation with memory/performance tradeoffs
  • ✅ Multi-disk paths support optimal I/O parallelism

Testing:

  • ✅ 8 functional tests + concurrent stress test with race detector
  • ✅ Benchmarks demonstrate 1.56M ops/sec with 4 disks
  • ✅ Memory profiling shows 100% heap reduction for mmap nodes

Code Quality:

  • ✅ Follows project conventions (snake_case files, proper error handling)
  • ✅ gosec linter added to CI for security scanning
  • ✅ Appropriate use of atomic operations and channels

Previous Review Items

The race condition issues flagged in earlier reviews have been resolved through the addition of the slowMu lock pattern. The current implementation correctly serializes the slow path to prevent the "recent map cleared during disk check" scenario.


No blocking issues found. The implementation is production-ready with appropriate safeguards and monitoring via Stats() methods.

…y fixes

- Add closeChainedSubtrees() helper and use it in all reset/reorg paths
- Use newSubtree() (mmap-aware) in addNode, addNodePreValidated, reChainSubtrees,
  moveBackBlock, resetSubtreeState — previously only processCompleteSubtree was updated
- Close old subtrees in reChainSubtrees after re-adding nodes
- Close current subtree before replacing in reset/rechain/moveBack paths
- Fix fmt.Errorf → errors.New/errors.NewServiceError (lint violations)
- Fix gci import ordering
- Flush disk shard before Delete to prevent pending write from re-creating entry
- Copy BadgerDB-returned bytes before in-place modification in UpdateSubtreeIndex
- Keep old Badger store on Clear recreation failure instead of leaving nil state
- Add mmap fallback + warning logging in quick_validate.go readSubtree
- removeTxFromSubtrees O(1) lookup: use SubtreeIndex from DiskTxMap for
  direct subtree access instead of scanning all 1000 subtrees (falls back
  to linear scan when DiskTxMap is not active)
- processCompleteSubtree: update SubtreeIndex in Badger for all txs when
  subtree is chained, enabling the O(1) lookup above
- resetSubtreeState: use currentTxMap.Clear() instead of replacing with
  NewSplitTxInpointsMap, preserving DiskTxMap when active
- lastValidatedBlocks cache: add WithEvictionFunction callback to close
  mmap-backed subtrees when blocks expire from the 2-minute cache
- SubtreeProcessingBatch.Close(): new method to release mmap subtree
  resources; called after batch completion in both sequential and pipeline
  processing paths in quick_validate.go
- clearRecentMapsForDisk: fix uint16 vs int type inconsistency
SubtreeIndex now uses 0 as default (Go zero value, safe for serialization
roundtrip). The subtreeprocessor stores chainedIdx+1 so that 0 = unassigned
and >0 = assigned to chainedSubtrees[SubtreeIndex-1].
The old code replaced currentTxMap with a new SplitTxInpointsMap object.
processOwnBlockNodes captures a reference to the old map BEFORE reset and
reads it AFTER. Clear() modifies the same object, breaking those reads.

Fix: when DiskTxMap is not active, create a new SplitTxInpointsMap (original
behavior). When DiskTxMap is active, call Clear() which is safe because
DiskTxMap is only assigned once and not captured by reference elsewhere.
Replace in-memory txMap and parentSpendsMap with disk-backed
alternatives during block validation when block_diskMapDirs is
configured. Uses sharded cuckoo filters for fast existence checks
and multi-disk BadgerDB for storage, reducing heap by ~150 GB for
1B-transaction blocks.

- DiskTxMapUint64 implements txmap.TxMap with cuckoo+BadgerDB
- DiskParentSpendsMap tracks inpoints with same architecture
- Skip cuckoo filter in Get (presence-dominated, filter is overhead)
- Per-shard bytesWritten tracking (zero contention, plain int64)
- Prometheus gauges for entries, filter RAM, and disk bytes written
- Single block_diskMapDirs setting, empty = in-memory (backward compat)
- Disable gosec in golangci config (known v2.23.0 panic, matches Makefile)
@icellan icellan self-assigned this Feb 20, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 20, 2026

🤖 Claude Code Review

Status: Complete


Current Review:

After thorough analysis of the mmap-backed subtrees and disk-backed TxMap implementation, the code appears well-designed and functionally correct. The complex concurrency patterns using slowMu locks effectively prevent race conditions.

Note on existing inline comments: Three previous inline comments claim race conditions exist in clearRecentMapsForDisk. After detailed analysis, these appear to be incorrect. The slowMu lock is held across the entire slow path (including the channel send), which ensures:

  • Thread A writes to channel before releasing slowMu
  • Thread B acquires slowMu only after Thread A releases it
  • Thread B flushes Thread A write before checking disk
  • Recent map clearing occurs after flushing to disk, maintaining correctness

The implementation correctly handles cross-shard clearing through proper lock ordering and FIFO channel semantics.

Positive observations:

  • Comprehensive benchmarks showing 35M ops/sec dedup layer performance
  • Proper resource cleanup with defer patterns
  • Good separation of concerns (filter layer, disk layer, metrics)
  • Settings are opt-in (empty strings = no change)
  • Multi-disk I/O parallelism is well-architected

Minor notes:

  • gosec linter addition is good security practice
  • Code complexity is high but necessary for performance requirements
  • Prometheus metrics provide good observability

@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2026

🤖 Claude Code Review

Status: Complete


Current Review: No Critical Issues Found

This PR implements mmap-backed subtrees and disk-backed transaction maps to reduce block RAM usage at scale. The implementation is sound with all previously identified race conditions resolved.

Key Findings:

  • ✅ Race condition fix confirmed - All three disk map implementations now use slowMu lock pattern that prevents clearRecentMapsForDisk from racing with slow-path operations
  • ✅ Opt-in architecture - Empty settings strings preserve original heap-based behavior (backward compatible)
  • ✅ Multi-disk sharding - Proper modulo distribution across disk shards for linear I/O scaling
  • ✅ Error handling - Comprehensive cleanup in constructor failure paths

Architecture:

  • Sharded cuckoo filters (~1GB) for fast in-memory deduplication at 35M ops/sec
  • BadgerDB persistence across multiple disks with dedicated writer goroutines
  • File-backed mmap for Node arrays (GC-free, OS-managed paging)

Settings:

  • blockassembly_subtreeMmapDir: Off-heap Node storage (saves ~48GB at 1000 subtrees × 1M nodes)
  • blockassembly_txMapDirs: Multi-disk TxInpoints map (pipe-separated paths)
  • blockvalidation_subtreeMmapDir: mmap subtree loading during validation

No blocking issues identified. The slowMu pattern effectively serializes the slow path to prevent the previously flagged race condition.


History:

  • ✅ Fixed: Race condition in clearRecentMapsForDisk resolved via slowMu serialization pattern
  • ✅ Fixed: diskOf() modulo distribution corrected

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
75.7% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@icellan icellan merged commit 1c4a527 into main Feb 27, 2026
16 checks passed
@icellan icellan deleted the fix/block-ram-usage branch February 27, 2026 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants