fix: pruner OOM — skip input fetching, fix slice race, auto-configure GOMEMLIMIT#538
fix: pruner OOM — skip input fetching, fix slice race, auto-configure GOMEMLIMIT#538freemans13 wants to merge 1 commit intobsv-blockchain:mainfrom
Conversation
…ce data race When skipParentUpdates is enabled, the pruner was still fetching the inputs bin from Aerospike, parsing every input into bt.Input objects, and accumulating parent updates — only to discard them in flushCleanupBatches. This wastes network I/O and causes billions of allocations during billion-record prune cycles, contributing to OOM kills at the 2Gi container limit. Now the existing skipParentUpdates setting also skips the upstream fetch/parse/ accumulate work. Additionally fixes a slice reuse data race where chunk[:0] resets the length but shares the backing array with in-flight goroutines, and adds automatic GOMEMLIMIT configuration from cgroup limits (90%) so Go's GC can self-regulate within container memory budgets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
🤖 Claude Code Review Status: Complete Findings: This PR introduces three important improvements: skipping unnecessary input fetching when However, there is a logic bug introduced by the In Suggested fix: Move The memory limit and race fix changes are excellent and address real production issues. |
|


Summary
Skip inputs bin + parsing when
skipParentUpdates=true: The existingskipParentUpdatessetting already skipped the final flush of parent updates, but the pruner was still fetching theinputsbin from Aerospike, parsing every input intobt.Inputobjects, and accumulating them intoallParentUpdates— only to discard everything. For billion-record prune cycles at 1.3M records/sec, this means billions of wasted allocations and significant network I/O for the largest per-record bin. Now the setting also skips the upstream fetch/parse/accumulate, eliminating the allocation pressure at its source.Fix slice reuse data race in
partitionWorker:chunk = chunk[:0]resets the slice length but keeps the same backing array, which the main loop then overwrites viaappendwhile goroutines are still reading from it. Replaced withchunk = make([]*aerospike.Result, 0, s.chunkSize)to allocate a fresh backing array per chunk.Auto-configure
GOMEMLIMITfrom cgroup limits: Reads cgroup v2/v1 memory limits at daemon startup and setsdebug.SetMemoryLimitto 90% of the container limit. Combined with the existingGOGC=200, this lets Go allocate fast but tighten GC as it approaches the soft limit — preventing OOM kills while maintaining throughput. No-op on local dev (no cgroup files). Benefits all services, not just the pruner.Test plan
make build-teranodecompiles cleanlymake lintpasses with 0 issuesgo test -v -race ./services/pruner/...passesgo test -v -race ./daemon/...passes-raceflag in CI catches no data races in pruner path🤖 Generated with Claude Code