perf: optimise staking endblocker #25486

randy-cro · 2025-10-22T04:19:41Z

Description

This PR optimises the staking endblocker with a non-breaking change.
It was discovered that on RocksDB/VersionDB nodes, the staking endblocker could take up to 1100ms on archival nodes, causing them to consistently lag behind. This is the case even when there were no entries picked up by the iterator. As such, there is an urgent need to improve block sync performance.

Root Cause

Fetching the following iterators was slow:
- ValidatorQueueIterator
- UBDQueueIterator
- RedelegationQueueIterator
Each iterator took ~300ms to return, even when there were no entries, due to excessively large scan ranges.

Changes Made

Cache unbonding validators, delegations and redelegations :
Instead of scanning the database from the beginning of time to the latest block height or timestamp on every block, an in-memory cache now stores these entries, significantly reducing I/O. The iterator is invoked only once during cache initialization when the node starts.

Results

With telemetry metrics enabled, we observed a significant performance improvement after these optimisations were applied.

Before:

After:

zsystm · 2025-10-22T06:47:21Z

@randy-cro

Just out of curiosity — do we know roughly how many entries were being iterated in those queues (ValidatorQueueIterator, UBDQueueIterator, RedelegationQueueIterator)?

Also, were the before/after measurements (1109 → 1.43) taken on the same state size?
And just to confirm, the unit here is milliseconds (ms), right?

randy-cro · 2025-10-22T06:52:28Z

@randy-cro

Just out of curiosity — do we know roughly how many entries were being iterated in those queues (ValidatorQueueIterator, UBDQueueIterator, RedelegationQueueIterator)?

Also, were the before/after measurements (1109 → 1.43) taken on the same state size? And just to confirm, the unit here is milliseconds (ms), right?

For the chain i tested on in particular, there were zero entries, but the sheer number of files it has to go through (as i tested it on an archive testnet node) when getting iterator resulted in the huge time taken. And yes the benchmarks were in milliseconds.

zsystm · 2025-10-22T07:02:19Z

@randy-cro
Just out of curiosity — do we know roughly how many entries were being iterated in those queues (ValidatorQueueIterator, UBDQueueIterator, RedelegationQueueIterator)?
Also, were the before/after measurements (1109 → 1.43) taken on the same state size? And just to confirm, the unit here is milliseconds (ms), right?

For the chain i tested on in particular, there were zero entries, but the sheer number of files it has to go through (as i tested it on an archive testnet node) when getting iterator resulted in the huge time taken. And yes the benchmarks were in milliseconds.

Interesting — I didn’t know the staking endblocker could take that much time on an archive node.
My testbed was based on a full node, so I hadn’t noticed that difference. This is a great catch.

I haven’t fully reviewed the code yet, but the direction looks great to me.
(FYI: I’m not a reviewer or maintainer.)

One potential downside I can think of is if the cached data becomes dirty — the performance gain is clear when the cached state remains consistent.

randy-cro · 2025-10-22T07:06:54Z

@randy-cro
Just out of curiosity — do we know roughly how many entries were being iterated in those queues (ValidatorQueueIterator, UBDQueueIterator, RedelegationQueueIterator)?
Also, were the before/after measurements (1109 → 1.43) taken on the same state size? And just to confirm, the unit here is milliseconds (ms), right?

For the chain i tested on in particular, there were zero entries, but the sheer number of files it has to go through (as i tested it on an archive testnet node) when getting iterator resulted in the huge time taken. And yes the benchmarks were in milliseconds.

Interesting — I didn’t know the staking endblocker could take that much time on an archive node. My testbed was based on a full node, so I hadn’t noticed that difference. This is a great catch.

I haven’t fully reviewed the code yet, but the direction looks great to me. (FYI: I’m not a reviewer or maintainer.)

One potential downside I can think of is if the cached data becomes dirty — the performance gain is clear when the cached state remains consistent.

yeap, haven't fully tested with a dirty cache but I believe it is safe to assume that the performance gain should still be significant because the cache is in memory and the cache would eventually be cleared because these redelegations/undelegations/unbonding would eventually reach maturity

perf: optimise staking endblocker (crypto-org-chain#1725)

fafd805

github-actions bot added the C:x/staking label Oct 22, 2025

aljo242 and others added 8 commits October 22, 2025 16:53

Merge branch 'main' into perf/staking-optimization

c045daa

fix lint

65102fa

fix build

dcdb17d

fix tests

5693cab

normalize cache validator queue key to be UTC (crypto-org-chain#1730)

6077cd1

Merge branch 'main' into perf/staking-optimization

74696dd

Merge branch 'main' into perf/staking-optimization

2795692

Merge branch 'main' into perf/staking-optimization

4caf35f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: optimise staking endblocker #25486

perf: optimise staking endblocker #25486

Uh oh!

randy-cro commented Oct 22, 2025 •

edited

Loading

Uh oh!

zsystm commented Oct 22, 2025

Uh oh!

randy-cro commented Oct 22, 2025 •

edited

Loading

Uh oh!

zsystm commented Oct 22, 2025

Uh oh!

randy-cro commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf: optimise staking endblocker #25486

Are you sure you want to change the base?

perf: optimise staking endblocker #25486

Uh oh!

Conversation

randy-cro commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Root Cause

Changes Made

Results

Uh oh!

zsystm commented Oct 22, 2025

Uh oh!

randy-cro commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zsystm commented Oct 22, 2025

Uh oh!

randy-cro commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

randy-cro commented Oct 22, 2025 •

edited

Loading

randy-cro commented Oct 22, 2025 •

edited

Loading