Pruning: add batched commits with metrics instrumentation #753

elldeeone · 2025-11-12T07:03:14Z

Summary

Inspired by Michael Sutton’s pruning discussion during the June‑2025 dev workshop (https://www.youtube.com/watch?v=cMFeijKSv1g), this implements the pruning-batching strategy in consensus/src/pipeline/pruning_processor/processor.rs. Batches flush when they hit 256 blocks, 50 000 ops, 4 MB, or 50 ms, and the lock yields every 25 ms so other consensus work continues smoothly.
Every prune pass now emits a [PRUNING METRICS] summary that includes the active config plus per–commit-type stats (count, avg/max ops, bytes, latency). Operators can confirm exactly how a run behaved just by inspecting the logs.
Benchmarking on my 2024 NVMe-backed M3 Air shows the expected behavior:
- Baseline (per-block commits): 6 340 s duration, 474 882 commits, 417 863 lock yields, commit time = 1.84 % of pruning.
- Batched (25 ms lock): 5 562 s duration, 129 188 commits, 129 190 lock yields, commit time = 1.60 % of pruning.
- Net effect: 12.3 % faster prune, 72.8 % fewer commits, 69.1 % fewer lock yields even on a fast SSD. Full logs + perf-monitor traces live under pruning-optimisation/testing-runs.
Hypothesis: Slow SSD's/HDD nodes should see dramatically larger gains because per-commit latency dominates there. I’m reaching out to community members with slower disks to capture additional runs before we merge, and I’ll update the PR description/thread with their numbers as they come in.

Context & Motivation

Per-block commits keep the pruning lock hot and thrash RocksDB. Batching was originally floated by Michael; this PR delivers that idea plus the telemetry to prove it.
Even though SSD results already show a 12% win, the primary goal is to unblock slower hardware. The new metrics plus external runs will let operators validate improvements on their setups without rebuilding the branch.

Field Data

Run folders:
- Baseline + pruning metrics: 20251112-baseline-no-batch
- Batched + pruning metrics: 20251112-prune-batching-branch
Remaining Follow-ups
1. Gather/attach slower SSD/HDD results (coordinating this currently).
Testing
- cargo fmt --all -- --check
- cargo test -p kaspa-consensus --lib

freshair18 · 2026-01-10T17:39:29Z

consensus/src/pipeline/pruning_processor/processor.rs

+        mem::take(&mut self.batch)
+    }
+}
+


I do support giving off these statistics, but these classes should be in a different file

Agree. These helper structs do not belong in this file. I will move CommitStats, PruningPhaseMetrics, PruneBatch, and the consts into a dedicated module to keep processor.rs focused.

freshair18 · 2026-01-10T21:55:57Z

consensus/src/pipeline/pruning_processor/processor.rs

+                        .expect("reachability guard should be available")
+                        .is_dag_ancestor_of_result(new_pruning_point, h)
+                        .unwrap()
+                })


could you explain what is the point of this change?

The change is fallout of making reachability_read movable into staging. I will refactor to avoid the Option indirection and keep direct reads here. Refactored to keep reachability guard direct; now take a fresh upgradable read per batch iteration and drop the read-only guard before staging commit to avoid the lock hold. Behavior unchanged; this fixes a deadlock we saw in pruning tests.

freshair18 · 2026-01-10T21:56:40Z

consensus/src/pipeline/pruning_processor/processor.rs

-        let mut queue = VecDeque::<Hash>::from_iter(reachability_read.get_children(ORIGIN).unwrap().iter().copied());
+        let mut queue = VecDeque::<Hash>::from_iter(
+            reachability_read.as_ref().expect("reachability guard should be available").get_children(ORIGIN).unwrap().iter().copied(),
+        );


Same comment is above on why this is an option

Same as above; I will refactor so reachability_read stays non-optional. Refactored to keep reachability guard direct; now take a fresh upgradable read per batch iteration and drop the read-only guard before staging commit to avoid the lock hold. Behavior unchanged; this fixes a deadlock we saw in pruning tests.

freshair18 · 2026-01-10T22:00:24Z

rpc/grpc/core/src/convert/message.rs

 use std::{str::FromStr, sync::Arc};

 macro_rules! from {
-    // Response capture


I'm not sure how I feel about all these comment changes here and below and above, they require a big context switch from me to review and I don't understand how you got to them anyways.

To what extent are you sure about their veracity?

Anyway, I think you can submit them in a separate cosmetic PR.

Agree; cosmetic comment change should be separate. I will revert this comment change in this PR.

freshair18 · 2026-01-10T22:04:34Z

consensus/src/pipeline/pruning_processor/processor.rs

+            let mut staging_relations = StagingRelationsStore::new(&mut relations_write);
+            let mut staging_reachability_relations = StagingRelationsStore::new(&mut reachability_relations_write);
+            let mut staging_reachability =
+                StagingReachabilityStore::new(reachability_read.take().expect("reachability guard should be available"));


same comment

Same as above; I will remove the Option/take pattern and pass a scoped guard. Refactored to keep reachability guard direct; now take a fresh upgradable read per batch iteration and drop the read-only guard before staging commit to avoid the lock hold. Behavior unchanged; this fixes a deadlock we saw in pruning tests.

freshair18 · 2026-01-10T23:11:15Z

consensus/src/pipeline/pruning_processor/processor.rs

            drop(pruning_point_write);
        }
+
+        self.flush_prune_batch(&mut prune_batch, &mut metrics);


Here you finally flush the prune batch post the loop, but as mentioned, you are not protected by the prune_guard.

Agree; I will ensure the flush happens while prune_guard is held (along with the retention checkpoint).

freshair18 · 2026-01-10T23:13:19Z

consensus/src/pipeline/pruning_processor/processor.rs

+
+    fn record_lock_reacquire(&mut self) {
+        self.lock_reacquire_count += 1;
+    }


I think the lock datas is TMI though

Agree; lock stats are too noisy. I will drop lock metrics and keep only batch-write stats.

freshair18 · 2026-01-10T23:14:44Z

consensus/src/pipeline/pruning_processor/processor.rs

+            let bytes = batch.size_in_bytes();
+            let commit_start = Instant::now();
            self.db.write(batch).unwrap();
+            metrics.record_commit("ghostdag_adjust", ops, bytes, commit_start.elapsed());


I think the only thing we should collect statistics on is the batch writes.

Agree; I will remove per-commit metrics and keep only batch-write stats.

freshair18 · 2026-01-10T23:15:47Z

consensus/src/pipeline/pruning_processor/processor.rs

-        let mut reachability_read = self.reachability_store.upgradable_read();
+        lock_acquire_time = Instant::now();
+        metrics.record_lock_reacquire();
+        let mut reachability_read = Some(self.reachability_store.upgradable_read());


why did we make it an option?

Agree; I will refactor to avoid the Option for reachability read. Refactored to keep reachability guard direct; now take a fresh upgradable read per batch iteration and drop the read-only guard before staging commit to avoid the lock hold. Behavior unchanged; this fixes a deadlock we saw in pruning tests.

freshair18 · 2026-01-10T23:16:12Z

consensus/src/pipeline/pruning_processor/processor.rs

+            let bytes = batch.size_in_bytes();
+            let commit_start = Instant::now();
            self.db.write(batch).unwrap();
+            metrics.record_commit("tips_and_selected_chain", ops, bytes, commit_start.elapsed());


same comment, we don't actually care for this statistic.

Agree; I will drop this per-commit metric and keep batch-write stats only.

elldeeone · 2026-01-11T23:27:55Z

@freshair18

Changes:

Move prune batching structs/helpers into pruning_processor/batching.rs; processor.rs slimmed.
Reachability guard refactor: no Option/take; per-iteration upgradable read, drop read-only before staging commit → fixes pruning-test deadlock; pruning behavior unchanged.
Final flush/retention checkpoint under prune_guard; lock metrics + per-commit stats dropped; comment revert in grpc convert.

Tests:

cargo fmt --all
cargo clippy --workspace --tests --benches --examples -- -D warnings
cargo test --release -p kaspa-testing-integration --lib -- goref_custom_pruning_depth_test --nocapture
cargo test --release -p kaspa-testing-integration --lib -- integration_tests::pruning_test --nocapture
./test (full suite incl wasm) with LLVM toolchain: PATH=/opt/homebrew/opt/llvm/bin CC=…/clang AR=…/llvm-ar RANLIB=…/llvm-ranlib

freshair18 · 2026-01-12T13:48:56Z

consensus/src/pipeline/pruning_processor/processor.rs

+                    prune_guard.blocking_yield();
+                    lock_acquire_time = Instant::now();
+                    queue.push_front(current);
+                    continue 'prune_batch;


This structure of repushing to queue and continuing another iteration is not unheard of, but is a bit harder to follow. Come to think, I'm not sure we really need an outer loop at all. We can reacquire the write locks here.

freshair18 · 2026-01-12T13:49:48Z

consensus/src/pipeline/pruning_processor/processor.rs

-            }
+                let is_block_in_retention_root_future = {
+                    let reachability_read_only = self.reachability_store.read();
+                    reachability_read_only.is_dag_ancestor_of_result(retention_period_root, current).unwrap()


remove only

elldeeone · 2026-01-14T01:48:05Z

@freshair18 addressed both items. Traversal loop now avoids requeue/outer loop: we peek queue.front(), commit/flush + yield when the lock budget elapses, then re-acquire locks in place and continue. Also removed the extra read-only reachability guard; retention-future check now uses staging_reachability.is_dag_ancestor_of_result(...).

D-Stacks · 2026-01-14T14:04:55Z

consensus/src/pipeline/pruning_processor/processor.rs

+                drop(reachability_relations_write);
+                drop(relations_write);
+
+                prune_batch.flush(&self.db, &mut metrics);


I noticed you measure the time on how long a batch write executes, do you have some info on this? unsure if i am missing something but since you flush after measuring time elapsed, and initiate a new timer afterwards, after blocking yield, i believe the time it takes to flush is added to the prune lock time? which might be negligible.. but should maybe be considered as this adds to the expected PRUNE_LOCK_MAX_DURATION_MS .

Good catch, I’d overlooked that. Flush latency is currently inside the lock window, so the limit is best‑effort. We could either document it, or subtract a rolling avg commit time from the check to keep holds closer to the bound (at the cost of potentially smaller batches/more churn though). What do you prefer?

D-Stacks · 2026-01-14T14:21:47Z

consensus/src/pipeline/pruning_processor/processor.rs

                drop(relations_write);

-                reachability_read = self.reachability_store.upgradable_read();
+                prune_batch.flush(&self.db, &mut metrics);


same exceeding of the PRUNE_LOCK_MAX_DURATION_MS, as in the first closure applies here btw.. but maybe in a more obscured "worst case" scenario, as there are other triggers besides PRUNE_LOCK_MAX_DURATION_MS that can influence the trigger to enter into this closure. But the fact remains that exceeding PRUNE_LOCK_MAX_DURATION_MS is possible here.

Yep same issue as your first comment. Other triggers can push us over; the budget is best‑effort. We can handle it the same way: document it, or bias the check by subtracting avg commit time. Whats your preference?

I think for now I would just add a comment and document this potential behavior over this line. Would prefer if other reviewers could also assess relevance of this behavior.

Taking just the average times (that have been recorded) of a few milliseconds, or a fraction thereof on newer hardware, it seems like it shouldn't influence lock time much. But then again it seems like worst case goes to a few hundred milliseconds, which seems relevant. But I am assuming that removing block bodies from the batch might add some consistency to these times.

Might also be worth renaming PRUNE_LOCK_MAX_DURATION_MS to PRUNE_LOCK_TARGET_MAX_DURATION_MS, I think current naming suggests a hard upper bound, which kind of is no longer a given.

Added a comment noting the lock budget is best‑effort and renamed to PRUNE_LOCK_TARGET_MAX_DURATION_MS.

D-Stacks · 2026-01-14T20:46:03Z

consensus/src/pipeline/pruning_processor/processor.rs

+        let mut staging_reachability = StagingReachabilityStore::new(self.reachability_store.upgradable_read());
+        let mut statuses_write = self.statuses_store.write();
+
+        loop {


I think this can be re-written more succinctly: while !queue.is_empty

Np, happy to switch to the simpler loop.

D-Stacks · 2026-01-14T21:01:09Z

consensus/src/pipeline/pruning_processor/processor.rs

-            // If we have the lock for more than a few milliseconds, release and recapture to allow consensus progress during pruning
-            if lock_acquire_time.elapsed() > Duration::from_millis(5) {
-                drop(reachability_read);
+            if lock_acquire_time.elapsed() > Duration::from_millis(PRUNE_LOCK_MAX_DURATION_MS) {


I actually don't really see the point of this closure, at the end of the loop you call the exact same things as in this closure, only with the should_flush condition, should_flush also implicitly does this same check, so i think you can just use should_flush right here? and remove the closure at the end. So, either a) should_flush triggers at the end, you reset the timer, and you will not enter this closure b) if should_flush didn't trigger, we don't expect this to trigger (unless in some extreme edge cases, where condition is met in the few nano secs timeframe upon loop restart).

Good point. I was aiming for a pre‑work budget check so we can yield even when the batch is empty (e.g. keep‑only traversal). should_flush returns false on empty, so moving everything there would drop that yield. If you want one path, I can tweak should_flush or add a should_yield helper to cover both while keeping behaviour identical?

I think you should just go for an or operator something like:

if prune_batch.should_flush() || lock_acquire_time.elapsed() > Duration::from_millis(PRUNE_LOCK_MAX_DURATION_MS) { //Do work }

I think then you can also avoid sending the lock_acquire_time into the should_flush condition and we keep the lock_acquire condition separate and explicit.

Done. Lock budget check is now explicit (should_flush() + lock_elapsed check), and should_flush no longer takes lock_elapsed.

D-Stacks · 2026-01-14T21:19:32Z

cli/src/modules/rpc.rs

            // }
            RpcApiOps::GetMempoolEntries => {
-                // TODO
+                // TODO: expose filter flags in CLI args instead of hard-coded defaults.


not sure how this pertains to the current pr, maybe remove this, or pr it separately?

Agree off‑scope. I’ll drop it.

D-Stacks · 2026-01-14T21:19:39Z

cli/src/modules/rpc.rs


    async fn display_help(self: Arc<Self>, ctx: Arc<KaspaCli>, _argv: Vec<String>) -> Result<()> {
-        // RpcApiOps that do not contain docs are not displayed
+        // Hide ops without docs so help output does not include empty entries.


not sure how this pertains to the current pr, or why it was changed, maybe remove this?

Agree off‑scope. I’ll drop it.

D-Stacks · 2026-01-18T07:11:46Z

consensus/src/pipeline/pruning_processor/processor.rs

+
+        while !queue.is_empty() {
+            // Lock budget is best-effort because batch flush happens under the prune lock.
+            if lock_acquire_time.elapsed() > Duration::from_millis(PRUNE_LOCK_TARGET_MAX_DURATION_MS) {


So what I meant is now you can move the closure, with its condition, at line 574, here, and remove the whole closure at line 574.

Sorry i misunderstood earlier, thanks for clarifying. I’ve now moved the flush/yield block to the top of the loop and removed the bottom one, so there’s a single flush path with the same condition.

elldeeone force-pushed the prune-batching branch from a1a9674 to 3ea6111 Compare November 13, 2025 09:57

elldeeone added 6 commits December 8, 2025 20:10

Ignore pruning-optimisation workspace

67820df

Add pruning metrics instrumentation and parser

b70c3fb

Implement batched pruning processor

f735a3d

Include config values in pruning metrics output

fe3f0d4

Prune branch cleanup

fec28c6

Fix pruning retention check during batched pruning

332b803

elldeeone force-pushed the prune-batching branch from 3ea6111 to 332b803 Compare December 8, 2025 09:17

elldeeone added 3 commits January 5, 2026 07:47

Merge origin/master into prune-batching

05e2983

fix: flush prune batch before sanity checks

9f7a1fc

Merge branch 'master' into prune-batching

c6fc55c

freshair18 reviewed Jan 11, 2026

View reviewed changes

refactor: prune batching layout and reachability guard

37a4b73

freshair18 reviewed Jan 12, 2026

View reviewed changes

fix: simplify pruning batch traversal

a452b12

D-Stacks reviewed Jan 14, 2026

View reviewed changes

elldeeone added 2 commits January 16, 2026 11:15

fix: align pruning loop and drop comment noise

934880d

fix: make pruning lock budget explicit

4dbbe96

D-Stacks reviewed Jan 18, 2026

View reviewed changes

fix: simplify pruning batch flush flow

34e2da6

Pruning: add batched commits with metrics instrumentation #753

Are you sure you want to change the base?

Pruning: add batched commits with metrics instrumentation #753

Conversation

elldeeone commented Nov 12, 2025

Summary

Context & Motivation

Field Data

Remaining Follow-ups

Testing

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elldeeone commented Jan 11, 2026

Uh oh!

freshair18 Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elldeeone commented Jan 14, 2026

Uh oh!

D-Stacks Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

D-Stacks Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elldeeone Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

D-Stacks Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

freshair18 Jan 12, 2026 •

edited

Loading

D-Stacks Jan 14, 2026 •

edited

Loading

D-Stacks Jan 14, 2026 •

edited

Loading

elldeeone Jan 15, 2026 •

edited

Loading

D-Stacks Jan 16, 2026 •

edited

Loading

D-Stacks Jan 14, 2026 •

edited

Loading

D-Stacks Jan 18, 2026 •

edited

Loading