Skip to content

Conversation

@randy-cro
Copy link

@randy-cro randy-cro commented Oct 22, 2025

Description

This PR optimises the staking endblocker with a non-breaking change.
It was discovered that on RocksDB/VersionDB nodes, the staking endblocker could take up to 1100ms on archival nodes, causing them to consistently lag behind. This is the case even when there were no entries picked up by the iterator. As such, there is an urgent need to improve block sync performance.


Root Cause

  • Fetching the following iterators was slow:
    • ValidatorQueueIterator
    • UBDQueueIterator
    • RedelegationQueueIterator
  • Each iterator took ~300ms to return, even when there were no entries, due to excessively large scan ranges.

Changes Made

  • Cache unbonding validators, delegations and redelegations :
    Instead of scanning the database from the beginning of time to the latest block height or timestamp on every block, an in-memory cache now stores these entries, significantly reducing I/O. The iterator is invoked only once during cache initialization when the node starts.

Results

With telemetry metrics enabled, we observed a significant performance improvement after these optimisations were applied.

Before:
image

After:
image

@zsystm
Copy link

zsystm commented Oct 22, 2025

@randy-cro

Just out of curiosity — do we know roughly how many entries were being iterated in those queues (ValidatorQueueIterator, UBDQueueIterator, RedelegationQueueIterator)?

Also, were the before/after measurements (1109 → 1.43) taken on the same state size?
And just to confirm, the unit here is milliseconds (ms), right?

@randy-cro
Copy link
Author

randy-cro commented Oct 22, 2025

@randy-cro

Just out of curiosity — do we know roughly how many entries were being iterated in those queues (ValidatorQueueIterator, UBDQueueIterator, RedelegationQueueIterator)?

Also, were the before/after measurements (1109 → 1.43) taken on the same state size? And just to confirm, the unit here is milliseconds (ms), right?

For the chain i tested on in particular, there were zero entries, but the sheer number of files it has to go through (as i tested it on an archive testnet node) when getting iterator resulted in the huge time taken. And yes the benchmarks were in milliseconds.

@zsystm
Copy link

zsystm commented Oct 22, 2025

@randy-cro
Just out of curiosity — do we know roughly how many entries were being iterated in those queues (ValidatorQueueIterator, UBDQueueIterator, RedelegationQueueIterator)?
Also, were the before/after measurements (1109 → 1.43) taken on the same state size? And just to confirm, the unit here is milliseconds (ms), right?

For the chain i tested on in particular, there were zero entries, but the sheer number of files it has to go through (as i tested it on an archive testnet node) when getting iterator resulted in the huge time taken. And yes the benchmarks were in milliseconds.

Interesting — I didn’t know the staking endblocker could take that much time on an archive node.
My testbed was based on a full node, so I hadn’t noticed that difference. This is a great catch.

I haven’t fully reviewed the code yet, but the direction looks great to me.
(FYI: I’m not a reviewer or maintainer.)

One potential downside I can think of is if the cached data becomes dirty — the performance gain is clear when the cached state remains consistent.

@randy-cro
Copy link
Author

@randy-cro
Just out of curiosity — do we know roughly how many entries were being iterated in those queues (ValidatorQueueIterator, UBDQueueIterator, RedelegationQueueIterator)?
Also, were the before/after measurements (1109 → 1.43) taken on the same state size? And just to confirm, the unit here is milliseconds (ms), right?

For the chain i tested on in particular, there were zero entries, but the sheer number of files it has to go through (as i tested it on an archive testnet node) when getting iterator resulted in the huge time taken. And yes the benchmarks were in milliseconds.

Interesting — I didn’t know the staking endblocker could take that much time on an archive node. My testbed was based on a full node, so I hadn’t noticed that difference. This is a great catch.

I haven’t fully reviewed the code yet, but the direction looks great to me. (FYI: I’m not a reviewer or maintainer.)

One potential downside I can think of is if the cached data becomes dirty — the performance gain is clear when the cached state remains consistent.

yeap, haven't fully tested with a dirty cache but I believe it is safe to assume that the performance gain should still be significant because the cache is in memory and the cache would eventually be cleared because these redelegations/undelegations/unbonding would eventually reach maturity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants