Skip to content

perf(storage): add shared RocksDB block cache and row cache#6195

Draft
azteca1998 wants to merge 1 commit intomainfrom
add-rocksdb-block-cache
Draft

perf(storage): add shared RocksDB block cache and row cache#6195
azteca1998 wants to merge 1 commit intomainfrom
add-rocksdb-block-cache

Conversation

@azteca1998
Copy link
Contributor

Add a shared 512 MB LRU block cache across all column families and a 128 MB row cache on the DB options.

Previously each CF used RocksDB's default 8 MB per-CF cache, meaning the system relied almost entirely on the OS page cache for read performance. A single shared block cache allows hot data from any CF to benefit from the full cache budget, which is the single most impactful RocksDB tuning knob for read-heavy workloads.

The row cache adds an additional layer that caches entire key-value pairs, avoiding block decompression overhead for frequently accessed entries.

No resync required — this is a runtime configuration change.

Motivation

Description

Checklist

  • Updated STORE_SCHEMA_VERSION (crates/storage/lib.rs) if the PR includes breaking changes to the Store requiring a re-sync.

Add a shared 512 MB LRU block cache across all column families and a
128 MB row cache on the DB options.

Previously each CF used RocksDB's default 8 MB per-CF cache, meaning
the system relied almost entirely on the OS page cache for read
performance. A single shared block cache allows hot data from any CF
to benefit from the full cache budget, which is the single most
impactful RocksDB tuning knob for read-heavy workloads.

The row cache adds an additional layer that caches entire key-value
pairs, avoiding block decompression overhead for frequently accessed
entries.

No resync required — this is a runtime configuration change.
@github-actions github-actions bot added the performance Block execution throughput and performance in general label Feb 12, 2026
@github-actions
Copy link

🤖 Kimi Code Review

Review Summary

This PR adds RocksDB caching optimizations to improve performance. While the changes are generally good, there are a few issues to address:

Critical Issues

  1. Memory leak in cache creation (lines 69-73):

    let block_cache = Cache::new_lru_cache(512 * 1024 * 1024);
    let row_cache = Cache::new_lru_cache(128 * 1024 * 1024);
    opts.set_row_cache(&row_cache);

    The Cache objects are created but dropped at the end of the function, which means RocksDB will have dangling references. These need to be stored as fields in RocksDBBackend to ensure they live as long as the database.

  2. Missing error handling (lines 69-73):
    Cache::new_lru_cache can fail and return None if the cache size is too large. This should be handled gracefully.

Performance Concerns

  1. Hard-coded cache sizes (lines 69-73):
    The 512MB block cache and 128MB row cache are hard-coded. These should be configurable via RocksDBBackend::open parameters or environment variables to allow tuning for different deployment scenarios.

Code Quality Issues

  1. Code duplication (lines 116-187):
    The pattern of creating BlockBasedOptions and setting the block cache is repeated 8 times. Consider extracting this into a helper function.

Suggested Fixes

// In RocksDBBackend struct
pub struct RocksDBBackend {
    db: DBWithThreadMode<MultiThreaded>,
    block_cache: Cache,
    row_cache: Cache,
}

// In open method
let block_cache = Cache::new_lru_cache(512 * 1024 * 1024)
    .ok_or(StoreError::Custom("Failed to create block cache".to_string()))?;
let row_cache = Cache::new_lru_cache(128 * 1024 * 1024)
    .ok_or(StoreError::Custom("Failed to create row cache".to_string()))?;
opts.set_row_cache(&row_cache);

// Store caches in the struct
Ok(Self {
    db,
    block_cache,
    row_cache,
})

The caching strategy itself looks appropriate for an Ethereum client, with different configurations for different column families based on their access patterns.


Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link

🤖 Codex Code Review

Findings (ordered by severity)

  1. Potential cache lifetime bug / UAF risk
    block_cache and row_cache are created as locals and only referenced by &Cache in options. If the rocksdb crate does not internally clone/retain the cache handle, these will be dropped at the end of open, and the DB may hold dangling pointers. This is subtle and can surface as crashes or data corruption under load.
    Refs: crates/storage/backend/rocksdb.rs:67-73, crates/storage/backend/rocksdb.rs:115-190
    Recommendation: Store the caches on RocksDBBackend (or another owner that lives as long as db) to make the lifetime explicit, or confirm via rocksdb crate docs/source that set_row_cache/set_block_cache clone and hold the cache.

  2. Memory footprint / configurability
    Adding a shared 512MB block cache plus a 128MB row cache increases baseline memory usage by ~640MB, on top of already large write buffers (1GB DB write buffer and per-CF buffers). This may cause OOM or pressure on smaller nodes and containerized deployments.
    Refs: crates/storage/backend/rocksdb.rs:49-52, crates/storage/backend/rocksdb.rs:67-73
    Recommendation: Make cache sizes configurable (env/config) or scale by available memory. At minimum, document the new memory budget.

Rust / correctness / Ethereum-specific
No direct EVM, consensus, RLP, or validation logic changes in this diff. No security vulnerabilities or gas/accounting concerns introduced by these changes.

If you can confirm the cache lifetime semantics in rocksdb, I’m happy to re-review with that context.


Automated review by OpenAI Codex · custom prompt

@azteca1998 azteca1998 linked an issue Feb 12, 2026 that may be closed by this pull request
@github-actions
Copy link

Lines of code report

Total lines added: 10
Total lines removed: 0
Total lines changed: 10

Detailed view
+------------------------------------------+-------+------+
| File                                     | Lines | Diff |
+------------------------------------------+-------+------+
| ethrex/crates/storage/backend/rocksdb.rs | 325   | +10  |
+------------------------------------------+-------+------+

@github-actions
Copy link

🤖 Claude Code Review

Now let me review the PR and provide my findings in markdown format.

PR Review: Add Shared RocksDB Block Cache and Row Cache

Overview

This PR adds a shared 512 MB LRU block cache and 128 MB row cache to improve read performance for the RocksDB storage backend. The changes are runtime configuration only and don't require a resync.


Positive Findings

  1. Well-motivated change: Moving from per-CF default caches (8 MB each) to a shared 512 MB cache is a significant improvement for read-heavy workloads
  2. Clean implementation: The cache objects are created once and shared across all column families
  3. Good cache sizing: 512 MB block cache + 128 MB row cache is reasonable for blockchain workloads
  4. No breaking changes: Correctly marked as not requiring STORE_SCHEMA_VERSION update

⚠️ Issues & Concerns

1. Cache Lifetime and Memory Management (crates/storage/backend/rocksdb.rs:69-73)

Severity: Medium

The block_cache and row_cache objects are created as local variables but then referenced by multiple BlockBasedOptions and the main Options object. While RocksDB's Rust bindings use Arc internally for cache objects, this should be verified.

Recommendation:

// Make the shared ownership explicit
let block_cache = Arc::new(Cache::new_lru_cache(512 * 1024 * 1024));
let row_cache = Arc::new(Cache::new_lru_cache(128 * 1024 * 1024));

Actually, checking the rocksdb crate, Cache::new_lru_cache returns a Cache type that internally uses Arc, so the current implementation is correct. However, the code could benefit from a comment explaining the cache lifetime semantics.

2. Row Cache Considerations (crates/storage/backend/rocksdb.rs:71-73)

Severity: Low

Row cache is set on the DB-level options, which means it applies to ALL column families. However, row cache has some important characteristics:

  • It bypasses block cache and caches entire rows (key-value pairs)
  • It's only beneficial for point lookups, not range scans
  • It can actually hurt performance for workloads with large values or heavy range scans
  • It uses uncompressed data, which can be memory-inefficient

Consideration for Ethereum context:

  • Trie node lookups (ACCOUNT_TRIE_NODES, STORAGE_TRIE_NODES) are primarily point lookups → row cache beneficial
  • Block bodies and receipts may be large and less frequently re-read → row cache may waste memory
  • Transaction lookups are point lookups → row cache beneficial

Recommendation:
Consider benchmarking with and without row cache. If kept, add a comment explaining the rationale for the 128 MB size choice.

3. Missing Memory Budget Documentation (crates/storage/backend/rocksdb.rs:30-73)

Severity: Low

The total memory budget is now:

  • Block cache: 512 MB
  • Row cache: 128 MB
  • DB write buffer: 1 GB (line 49)
  • Per-CF write buffers: ranging from 64 MB to 512 MB
  • Total: ~3-4 GB minimum

This should be documented for operators to understand the memory requirements.

Recommendation:
Add a comment at the top of the function documenting the total memory budget.

4. Potential Cache Contention (crates/storage/backend/rocksdb.rs:69)

Severity: Very Low

A single shared cache across all CFs means hot data from one CF can evict data from another. For a blockchain node:

  • During sync: BODIES, RECEIPTS, and TRIE_NODES are hot
  • During RPC serving: Different access patterns

This is generally fine and is the recommended RocksDB pattern, but worth noting.


🔍 Code Quality

  1. Consistent pattern: All BlockBasedOptions now consistently set the shared block cache ✅
  2. No unsafe code introduced: All changes use safe RocksDB API ✅
  3. No new dependencies: Only uses existing imports ✅
  4. Idiomatic Rust: Follows existing code patterns ✅

🧪 Testing Recommendations

Since this changes performance characteristics significantly:

  1. Benchmark read latency before/after on a synced node
  2. Monitor memory usage to ensure the 640 MB cache budget is respected
  3. Check sync performance to ensure the cache helps (or doesn't hurt) initial sync
  4. Test under memory pressure to verify graceful behavior

📝 Minor Issues

  1. Typo in struct comment (crates/storage/backend/rocksdb.rs:25)
    /// Optimistric transaction database
    
    Should be "Optimistic" (though this is pre-existing, not from this PR)

Security & Correctness

  • No security vulnerabilities introduced ✅
  • No changes to data encoding/decoding ✅
  • No changes to consensus logic ✅
  • Memory safety preserved ✅

🎯 Final Verdict

APPROVE with minor suggestions

The changes are sound and should provide meaningful performance improvements. The primary concern is ensuring the row cache is actually beneficial for this workload - this should be validated through benchmarking.

Suggested follow-ups:

  1. Add comments explaining cache lifetime and memory budget
  2. Benchmark row cache effectiveness (could be done post-merge)
  3. Consider making cache sizes configurable via CLI flags in a future PR

Automated review by Claude (Anthropic) · custom prompt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Block execution throughput and performance in general

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

Configure block cache (RocksDB)

1 participant