Skip to content

perf(l1): optimize storage layer for block execution#6207

Open
ilitteri wants to merge 4 commits intomainfrom
perf/storage-optimizations
Open

perf(l1): optimize storage layer for block execution#6207
ilitteri wants to merge 4 commits intomainfrom
perf/storage-optimizations

Conversation

@ilitteri
Copy link
Collaborator

@ilitteri ilitteri commented Feb 13, 2026

Motivation

Profiling block execution on mainnet revealed that lock contention, per-lookup allocations, and O(log n) data structure access in the storage layer accounted for a significant portion of execution time. The storage hot path (get_storage_at_root → trie lookups) was hitting multiple bottlenecks on every trie node access.

Description

Four incremental optimizations to the storage layer, each targeting a specific bottleneck identified via perf profiling on a 32-core AMD server:

1. BTreeMap → FxHashMap + Mutex → RwLock

  • Replace BTreeMap<Vec<u8>, Vec<u8>> with FxHashMap for in-memory storage tables — O(1) lookups instead of O(log n)
  • Replace Mutex with RwLock for trie_cache and last_computed_flatkeyvalue — allows concurrent readers (only the background worker writes, 2-3 times per block)

2. RCU pattern + held read view

  • RCU (Read-Copy-Update) for InMemoryBackend: wrap the database as Arc<RwLock<Arc<Database>>> so readers clone the inner Arc (O(1)) and read entirely lock-free
  • Held read view in BackendTrieDB: acquire the read view once at construction and reuse it for all trie node lookups, eliminating ~8000+ Box allocations per block

3. Shared read view across trie opens

  • Change begin_read() to return Arc<dyn StorageReadView> instead of Box<dyn StorageReadView>, enabling cheap Arc::clone() sharing
  • In get_storage_at_root, pre-acquire the read view, trie cache, and last_written value once, then share across both the state trie and storage trie opens — eliminates duplicate RwLock acquisitions
  • Add StorageReadView: Send + Sync bound (required for Arc sharing)

4. Pre-compute prefix nibbles in TrieWrapper

  • apply_prefix was called on every trie node lookup, creating 3 Vec allocations per call (from_bytes + append_new + concat)
  • Pre-compute the prefix nibbles once in the TrieWrapper constructor, reducing to 1 allocation (just concat) per lookup
  • Add TrieWrapper::new() constructor to encapsulate prefix pre-computation

Benchmark Results

30 runs each, back-to-back on same server (32-core AMD, ethrex-office-4):

Block Txs Gas Baseline Optimized Delta
24443168 442 37.5M 57.19ms 48.32ms -15.5%
24197547 556 53.0M 66.31ms 56.75ms -14.4%
24199455 651 37.9M 45.26ms 38.70ms -14.5%
24198607 116 12.0M 22.62ms 20.88ms -7.7%

14-15% improvement on heavy blocks, 8% on light blocks. Lock contention dropped from 8.1% to 1.4% of block executor time.

How to Test

  • cargo test -p ethrex-storage
  • cargo test -p ethrex-blockchain
  • cargo clippy -p ethrex-storage -p ethrex-blockchain -- -D warnings
  • Run a full sync or replay a mainnet block to verify identical state roots

RwLockReadGuard across read transactions instead of re-acquiring per get().
Also replace Mutex with RwLock for trie_cache and last_computed_flatkeyvalue
to allow concurrent readers.

Combined, these changes reduce block execution time by ~9.3% (57ms → 52ms)
on block 24443168 (442 txs, 37.5M gas) benchmarked on a 32-core AMD server.
pre-acquired read view in BackendTrieDB for the entire trie traversal.

InMemoryBackend now wraps its Database in Arc, so begin_read() clones
the Arc (O(1)) and releases the lock immediately. InMemoryReadTx and
InMemoryLocked hold an owned Arc<Database> snapshot — all subsequent
gets are lock-free HashMap lookups with no RwLock contention.

BackendTrieDB now acquires a single read view in its constructor and
reuses it for all get() calls during the trie traversal. This eliminates
the per-node-lookup Box allocation and lock acquisition that previously
happened ~8000+ times per block.

StorageReadView gains Send + Sync bounds and begin_read() returns a
'static view, enabling BackendTrieDB to own the read view.
Change begin_read() to return Arc<dyn StorageReadView> instead of Box,
allowing the read view to be shared across multiple BackendTrieDB instances.

In get_storage_at_root (the SLOAD hot path), pre-acquire the read view,
trie cache, and last_written once and share them between the state trie
and storage trie opens. This eliminates per-query duplicate RwLock
acquisitions and Arc allocations.
apply_prefix was called on every trie node lookup, creating 3 Vec allocations
each time (from_bytes + append_new + concat). Pre-computing the prefix nibbles
once in the TrieWrapper constructor reduces this to 1 allocation (just concat)
per lookup. Also adds TrieWrapper::new() constructor to encapsulate this.
@ilitteri ilitteri requested a review from a team as a code owner February 13, 2026 22:06
Copilot AI review requested due to automatic review settings February 13, 2026 22:06
@github-actions github-actions bot added L1 Ethereum client performance Block execution throughput and performance in general labels Feb 13, 2026
@github-actions
Copy link

🤖 Kimi Code Review

Review Summary

This PR introduces significant performance optimizations to the storage layer, primarily through:

  1. Shared read views to reduce lock contention
  2. Pre-computed prefix nibbles for storage tries
  3. Arc-based snapshots for in-memory backend
  4. RwLock instead of Mutex for better concurrency

Issues Found

1. Critical: Atomicity Violation in InMemoryBackend (in_memory.rs:33-34, 153-154, 168-169)

The use of Arc::make_mut creates a new database snapshot on every write, breaking atomicity guarantees. Multiple concurrent writes will create divergent snapshots instead of properly synchronizing:

// Problem: Each write creates a new Arc, breaking atomicity
let db_mut = Arc::make_mut(&mut *db);

Fix: Use RwLock<Database> directly instead of RwLock<Arc<Database>>, or implement proper transaction isolation.

2. Performance: Unnecessary Arc Cloning (in_memory.rs:44-48)

The in-memory backend clones the entire database on every read transaction, which is O(n) complexity. This defeats the purpose of the optimization:

let snapshot = self.inner.read()?.clone(); // Clones entire database

Fix: Return a reference-based read view that holds the RwLock guard, or use a proper MVCC implementation.

3. Memory Safety: Lifetime Mismatch (api/mod.rs:37)

The trait signature change from Box<dyn StorageReadView + '_> to Arc<dyn StorageReadView> removes the lifetime parameter, but the actual implementation may still have lifetime constraints.

4. Code Duplication (store.rs:2422-2599)

Multiple nearly identical open_*_trie functions could be consolidated using generics or a builder pattern.

5. Error Handling Inconsistency (layering.rs:195-210)

The TrieWrapper::new function computes prefix_nibbles but the old apply_prefix function is still used in some places, creating inconsistency.

Recommendations

  1. Revert the Arc-based snapshot approach in InMemoryBackend - it's fundamentally flawed for atomicity
  2. Use parking_lot::RwLock instead of std::sync::RwLock for better performance
  3. Consider using a BTreeMap snapshot approach that shares unchanged nodes rather than full cloning
  4. Add comprehensive tests for concurrent operations to verify atomicity guarantees
  5. Document the thread-safety guarantees clearly for each backend implementation

The RocksDB backend changes look correct as RocksDB handles its own MVCC.


Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link

🤖 Codex Code Review

Findings

  1. Potential nondeterminism in prefix iteration due to HashMap/FxHashMap
    In crates/storage/backend/in_memory.rs around the type Table = FxHashMap... and InMemoryReadTx::prefix_iter, the data structure changed from BTreeMap to hash maps. prefix_iter builds a Vec by iterating the whole table and filtering by prefix, which now yields nondeterministic order. If any caller relies on deterministic ordering for iteration (e.g., tests, debug tooling, or FKV generation comparisons), this can cause flakiness or different outputs.
    Recommendation: either keep BTreeMap or explicitly sort results (by key) before returning.
    Line refs: crates/storage/backend/in_memory.rs near type Table = FxHashMap... and impl StorageReadView for InMemoryReadTx::prefix_iter.

  2. Potential large-copy amplification in in-memory DB on writes
    In crates/storage/backend/in_memory.rs around Arc<RwLock<Arc<Database>>> and Arc::make_mut, any active read snapshot causes Arc::make_mut to clone the entire Database on write. This is an intentional RCU pattern but can be unexpectedly expensive for large in-memory tables (copies all tables, not just one).
    Recommendation: consider documenting this tradeoff or scoping snapshots (per-table arc) if this is used outside tests.
    Line refs: crates/storage/backend/in_memory.rs near struct InMemoryBackend { inner: Arc<RwLock<Arc<Database>>> } and Arc::make_mut in clear_table, commit, and delete.

Notes / Questions

  • StorageReadView is now Send + Sync and begin_read returns Arc<dyn StorageReadView>. This implies read views may be shared across threads. I assume RocksDBReadTx (not shown here) is thread-safe; if it wraps a RocksDB Snapshot or similar, please confirm it is Send + Sync and that concurrent reads on the same view are safe.
    Line refs: crates/storage/api/mod.rs near pub trait StorageReadView: Send + Sync and begin_read -> Arc<dyn StorageReadView>.

Positive

  • The shared read view in BackendTrieDB and the pre-acquired cache/read view in get_storage_at_root reduce lock churn and repeated transaction creation; the intent is clear and likely a net performance win.
    Line refs: crates/storage/trie.rs in BackendTrieDB and crates/storage/store.rs around get_storage_at_root and open_*_trie_shared.

If you want, I can scan for any callers that depend on deterministic prefix iteration order to confirm whether Point 1 is a real behavioral change.


Automated review by OpenAI Codex · custom prompt

@github-actions
Copy link

🤖 Claude Code Review

Now I have enough context to provide a comprehensive code review. Let me compile my findings:


PR Review: Performance Optimization of Storage Layer (-15.5%)

Summary

This PR implements four incremental optimizations to the storage hot path, targeting lock contention and allocation overhead identified through profiling. The changes are well-motivated and demonstrate clear performance wins (15.5% speedup, 18.2% throughput increase).

Critical Issues

1. Type inconsistency in InMemoryBackend (crates/storage/backend/in_memory.rs:11)

The outer Database container type was changed from BTreeMap to HashMap, but this introduces a subtle inconsistency:

type Table = FxHashMap<Vec<u8>, Vec<u8>>;
type Database = HashMap<&'static str, Table>;  // ← Standard HashMap, not FxHashMap

Issue: The outer HashMap uses the default SipHash hasher while tables use FxHash. For consistency and performance with small string keys (table names), this should also use FxHashMap:

type Database = FxHashMap<&'static str, Table>;

Impact: Minor performance regression for table name lookups, though table count is small so impact is minimal in practice.


2. Potential Arc clone overhead in get_storage_at_root (crates/storage/store.rs:2104-2106, 2118-2120)

The code clones Arc three times per call:

read_view.clone(),
cache.clone(),
last_written.clone(),  // ← This is Vec<u8>, not Arc!

Issue: last_written is Vec<u8>, not Arc<Vec<u8>>. This clones the entire vector twice (once for state trie, once for storage trie) when it should be wrapped in Arc for sharing.

Recommendation: Change last_written() to return Arc<Vec<u8>> and store last_computed_flatkeyvalue as Arc<RwLock<Arc<Vec<u8>>>>, similar to the trie_cache pattern. This would eliminate the vector clones.

Impact: Moderate - each last_written vec is typically 66-131 bytes, cloned twice per storage lookup.


3. Missing Send + Sync bounds verification (crates/storage/api/mod.rs:58)

The trait now requires Send + Sync:

pub trait StorageReadView: Send + Sync {

Verification needed: Ensure all concrete implementations (InMemoryReadTx, RocksDBReadTx) have fields that are Send + Sync. Looking at the code:

  • InMemoryReadTx::snapshot is Arc<Database> ✓ (Arc is Send+Sync if Database is)
  • RocksDBReadTx::db is Arc<DB> ✓ (DB from RocksDB is Send+Sync)

Status: Appears correct, but worth confirming with a compile-time assertion or test if not already verified.


Performance Concerns

4. Arc::make_mut potential COW overhead (crates/storage/backend/in_memory.rs:33, 153, 169)

The RCU pattern uses Arc::make_mut for writes:

let db_mut = Arc::make_mut(&mut *db);

Concern: If any reader still holds a reference to the inner Arc<Database>, this will trigger a full database clone (copy-on-write). With FxHashMap's fast hashing, this could be expensive for large databases.

Analysis: The design assumes writes are infrequent (background worker only, 2-3 times per block). Readers clone the inner Arc and release the RwLock immediately, so the window for contention is small. However, if a reader holds the snapshot for a long time while a write occurs, the entire DB gets cloned.

Recommendation: Add a comment documenting this behavior and the assumption that reads are short-lived. Consider profiling to verify COW isn't triggered frequently in practice.


5. Prefix nibble computation still allocates (crates/storage/layering.rs:207-208, 233-236, 241-244)

The optimization pre-computes prefix nibbles, but the application still allocates:

let key = match &self.prefix_nibbles {
    Some(prefix) => prefix.concat(&key),  // ← Still allocates a new Nibbles
    None => key,
};

Issue: concat creates a new Nibbles on every lookup (reduced from 3 allocations to 1, but not eliminated).

Potential optimization: If Nibbles used a small-vec optimization or stack buffer for common sizes (65-131 nibbles), this allocation could be eliminated entirely. This would require changes to the ethrex_trie crate.

Impact: Lower priority - already reduced by 67%, further optimization requires trie library changes.


Code Quality Issues

6. Redundant field in BackendTrieDB (crates/storage/trie.rs:14-15)

The db field is documented as "used only for writes" but the struct is used for both reads and writes:

/// Reference to the storage backend (used only for writes)
db: Arc<dyn StorageBackend>,
/// Pre-acquired read view held for the lifetime of this struct.
read_view: Arc<dyn StorageReadView>,

Question: Is db only used in put_batch? If so, consider making BackendTrieDB read-only and creating a separate write struct. If reads also need db, update the comment.

Looking at usage: db is only used in put_batch (line 140). Consider creating BackendTrieDBWrite that wraps this for writes, or clarify that this struct supports both operations but prefers the read_view for reads.


7. Removed apply_prefix usage not fully deleted (crates/storage/trie.rs:6, 111)

The import and function are still present but largely unused:

use crate::layering::apply_prefix;  // Line 6
// ...
fn make_key(&self, path: Nibbles) -> Vec<u8> {
    apply_prefix(self.address_prefix, path).into_vec()  // Line 111
}

While BackendTrieDB still uses it in make_key, TrieWrapper no longer uses it in the hot path. The function itself is still useful but consider:

  1. Documenting that it's kept for make_key and backwards compatibility
  2. Or inlining it into the few places it's still used

Correctness

8. RwLock poisoning behavior (crates/storage/store.rs:2093-2094)

Lock errors map to StoreError::LockError:

.read()
.map_err(|_| StoreError::LockError)?

Question: What happens if a thread panics while holding a write lock? The RwLock will be poisoned, and all subsequent operations will fail. Is this the desired behavior, or should poisoned locks be recovered?

Analysis: For blockchain storage, fail-fast on poisoned locks is likely correct - if a storage write panicked, the database is potentially in an inconsistent state. The current behavior seems appropriate.


9. Trie cache RCU consistency (crates/storage/store.rs:2759-2760)

During trie updates, the cache is updated with RCU:

let trie = Arc::new(trie_mut);
*trie_cache.write().map_err(|_| StoreError::LockError)? = trie.clone();

Observation: After the write lock is released, readers will see the new cache state. There's a small window where readers might miss cache entries if they grabbed the old cache before the update but perform lookups after the database is updated.

Analysis: This is inherent to the RCU pattern and acceptable - readers will simply hit the database instead of the cache, which is correct (slower but not incorrect). The tradeoff is worth it for lock-free reads.


Minor Issues

10. Missing type documentation (crates/storage/backend/in_memory.rs:10-11)

The type aliases should document why FxHashMap is chosen:

// Using FxHashMap for O(1) lookups vs BTreeMap's O(log n).
// FxHash is faster than SipHash for small keys (byte vectors, table names).
type Table = FxHashMap<Vec<u8>, Vec<u8>>;
type Database = FxHashMap<&'static str, Table>;  // Fixed: use FxHashMap here too

11. TrieWrapper::new could take ownership (crates/storage/layering.rs:201-215)

The constructor currently takes owned Box<dyn TrieDB> but borrows to compute prefix:

pub fn new(
    state_root: H256,
    inner: Arc<TrieLayerCache>,
    db: Box<dyn TrieDB>,
    prefix: Option<H256>,
) -> Self {
    let prefix_nibbles = prefix.map(|p| Nibbles::from_bytes(p.as_bytes()).append_new(17));
    // ...
}

This is fine, but the prefix calculation could be made more explicit that it's pre-computation. Consider renaming the field in the struct to precomputed_prefix for clarity.


Positive Observations

  1. Well-structured incremental optimization - Each optimization is independent and can be reasoned about separately
  2. Strong performance gains - 15.5% reduction with clear profiling data to back it up
  3. Type-safe RCU implementation - Using Arc<RwLock<Arc<T>>> properly implements the RCU pattern
  4. Proper trait bounds - Adding Send + Sync to StorageReadView is necessary and correct for Arc sharing
  5. Backward compatible - All existing tests should pass without modification
  6. Good documentation - The PR description clearly explains each optimization's rationale

Recommendations

Must fix:

  1. Change Database type to use FxHashMap consistently (Item 1)

Should fix:
2. Wrap last_written in Arc to avoid vector clones (Item 2)
3. Add documentation for COW behavior in InMemoryBackend (Item 4)

Nice to have:
4. Add type aliases documentation (Item 10)
5. Clarify BackendTrieDB::db field usage (Item 6)

Future optimization:
6. Consider small-vec optimization for Nibbles in the trie library (Item 5)


Security Assessment

No security vulnerabilities identified. The changes maintain the same correctness guarantees as before:

  • No data races (RwLock properly protects shared state)
  • No use-after-free (Arc ensures proper lifetimes)
  • No undefined behavior from unsafe code (no unsafe used in this PR)
  • Blockchain correctness preserved (state root computation unchanged)

Verdict

APPROVE with minor fixes requested (Item 1 and optionally Item 2). The performance gains are significant and well-justified. The implementation is sound with only minor consistency issues to address.


Automated review by Claude (Anthropic) · custom prompt

@github-actions
Copy link

Lines of code report

Total lines added: 107
Total lines removed: 0
Total lines changed: 107

Detailed view
+--------------------------------------------+-------+------+
| File                                       | Lines | Diff |
+--------------------------------------------+-------+------+
| ethrex/crates/storage/backend/in_memory.rs | 150   | +2   |
+--------------------------------------------+-------+------+
| ethrex/crates/storage/layering.rs          | 195   | +23  |
+--------------------------------------------+-------+------+
| ethrex/crates/storage/store.rs             | 2483  | +55  |
+--------------------------------------------+-------+------+
| ethrex/crates/storage/trie.rs              | 175   | +27  |
+--------------------------------------------+-------+------+

@greptile-apps
Copy link

greptile-apps bot commented Feb 13, 2026

Greptile Overview

Greptile Summary

This PR implements four incremental storage layer optimizations that collectively achieve a 15.5% reduction in block execution time through targeted elimination of lock contention, allocation overhead, and algorithmic complexity.

Key Changes:

  • Replaced Mutex with RwLock for trie_cache and last_computed_flatkeyvalue to enable concurrent readers (only background worker writes)
  • Implemented RCU (Read-Copy-Update) pattern for InMemoryBackend with Arc<RwLock<Arc<Database>>> for lock-free reads after initial snapshot acquisition
  • Changed BTreeMap to FxHashMap in in-memory storage tables for O(1) instead of O(log n) lookups
  • Modified begin_read() to return Arc<dyn StorageReadView> instead of Box<dyn StorageReadView + '_>, enabling cheap cloning and sharing of read views across multiple trie opens
  • Added BackendTrieDB field to hold pre-acquired read view for entire trie traversal, eliminating ~8000+ Box allocations per block
  • Pre-computed prefix nibbles in TrieWrapper constructor, reducing from 3 allocations per trie node lookup to 1

Performance Impact:
Block execution improved from 57.19ms to 48.32ms median (-15.5%), lock contention dropped from 8.1% to 1.4%. The changes are well-architected with proper separation of concerns between the optimized hot path (get_storage_at_root with shared resources) and existing code paths.

Confidence Score: 4/5

  • This PR is safe to merge with minor considerations around the RCU pattern implementation
  • Score reflects well-tested optimizations with clear performance gains (15.5% improvement, backed by benchmarks), sound architectural design, and proper thread safety. The RCU pattern and RwLock migration are correctly implemented. One point deducted because the Arc::make_mut pattern in InMemoryBackend will clone the entire database on writes if any read snapshots still exist, which could cause unexpected memory pressure in high-concurrency scenarios. The changes maintain backward compatibility and existing tests should catch regressions.
  • Pay close attention to crates/storage/backend/in_memory.rs - verify RCU pattern behavior under concurrent read/write load in production

Important Files Changed

Filename Overview
crates/storage/backend/in_memory.rs Implemented RCU pattern with Arc<RwLock<Arc<Database>>>, replaced BTreeMap with FxHashMap for O(1) lookups, snapshots now clone inner Arc for lock-free reads
crates/storage/layering.rs Pre-computes prefix nibbles in TrieWrapper::new() constructor to avoid repeated allocations on every trie node lookup, reducing allocations from 3 to 1 per lookup
crates/storage/store.rs Changed trie_cache and last_computed_flatkeyvalue from Mutex to RwLock, added shared read view pattern in get_storage_at_root, new *_with_view methods enable resource sharing across multiple trie opens
crates/storage/trie.rs Added read_view field to BackendTrieDB to hold pre-acquired read view, eliminating per-lookup allocations, new *_with_view constructors support sharing a single read view across multiple trie instances

Sequence Diagram

sequenceDiagram
    participant App as Block Executor
    participant Store as Store
    participant Backend as StorageBackend
    participant TrieDB as BackendTrieDB
    participant Cache as TrieLayerCache
    
    Note over App,Cache: Optimized get_storage_at_root flow
    
    App->>Store: get_storage_at_root(state_root, address, storage_key)
    
    Note over Store: Pre-acquire shared resources (optimization #2 & #3)
    Store->>Backend: begin_read()
    Backend-->>Store: Arc<StorageReadView> (cloneable snapshot)
    Store->>Cache: trie_cache.read() (RwLock instead of Mutex)
    Cache-->>Store: Arc<TrieLayerCache> clone
    Store->>Store: last_written() (RwLock.read())
    
    Note over Store: Open state trie with shared resources
    Store->>TrieDB: new_for_accounts_with_view(backend, read_view, last_written)
    Note over TrieDB: Holds read_view for entire traversal (no per-lookup allocation)
    
    Store->>TrieDB: state_trie.get(account_hash)
    Note over TrieDB: Pre-computed prefix nibbles (optimization #4)
    TrieDB->>Cache: inner.get(state_root, prefixed_key)
    Cache-->>TrieDB: cache hit/miss
    alt cache miss
        TrieDB->>Backend: read_view.get() (lock-free with RCU)
        Backend-->>TrieDB: trie node data
    end
    TrieDB-->>Store: encoded_account
    
    Note over Store: Open storage trie reusing same read_view
    Store->>TrieDB: new_for_storages_with_view(backend, read_view.clone(), last_written)
    
    Store->>TrieDB: storage_trie.get(hashed_key)
    Note over TrieDB: Reuses held read_view (no new snapshot)
    TrieDB->>Cache: inner.get(state_root, prefixed_key)
    alt cache miss
        TrieDB->>Backend: read_view.get() (same snapshot)
        Backend-->>TrieDB: storage value
    end
    TrieDB-->>Store: storage_value
    Store-->>App: U256 result
    
    Note over App,Cache: Key optimizations:<br/>1. BTreeMap→FxHashMap (O(log n)→O(1))<br/>2. RCU pattern (lock-free reads)<br/>3. Shared read_view (no duplicate snapshots)<br/>4. Pre-computed prefix (1 alloc vs 3)
Loading

Last reviewed commit: d4555a6

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

.map_err(|_| StoreError::Custom("Failed to acquire write lock".to_string()))?;

let table_ref = db.entry(table).or_insert_with(Table::new);
let db_mut = Arc::make_mut(&mut *db);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arc::make_mut will clone the entire database if any read snapshots still hold references to the inner Arc<Database>. Under high concurrency (many concurrent get_storage_at_root calls holding read views), this could cause unexpected memory spikes and copy overhead on writes. Consider documenting this trade-off or monitoring memory usage in production.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/backend/in_memory.rs
Line: 153:153

Comment:
`Arc::make_mut` will clone the entire database if any read snapshots still hold references to the inner `Arc<Database>`. Under high concurrency (many concurrent `get_storage_at_root` calls holding read views), this could cause unexpected memory spikes and copy overhead on writes. Consider documenting this trade-off or monitoring memory usage in production.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +2104 to +2106
read_view.clone(),
cache.clone(),
last_written.clone(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good optimization - cloning Arc is cheap (atomic increment) while avoiding duplicate RwLock acquisitions and snapshot creation.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/store.rs
Line: 2104:2106

Comment:
Good optimization - cloning `Arc` is cheap (atomic increment) while avoiding duplicate `RwLock` acquisitions and snapshot creation.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +207 to +208
let prefix_nibbles =
prefix.map(|p| Nibbles::from_bytes(p.as_bytes()).append_new(17));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent optimization - pre-computing the prefix once eliminates 3 allocations (from_bytes + append_new + concat) on every get() call, keeping only the final concat.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/layering.rs
Line: 207:208

Comment:
Excellent optimization - pre-computing the prefix once eliminates 3 allocations (`from_bytes` + `append_new` + `concat`) on every `get()` call, keeping only the final `concat`.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the storage layer for block execution through four targeted performance improvements, achieving a 15.5% reduction in execution time. The optimizations target bottlenecks identified via profiling: data structure access patterns (BTreeMap→FxHashMap), lock contention (Mutex→RwLock), per-lookup allocations (RCU pattern + held read views), and repeated prefix computations.

Changes:

  • Replaced Mutex with RwLock for read-heavy trie_cache and last_computed_flatkeyvalue to enable concurrent readers
  • Implemented RCU pattern in InMemoryBackend with Arc<RwLock<Arc>> for lock-free reads via snapshots
  • Modified storage API to return Arc<dyn StorageReadView> instead of Box, enabling cheap sharing of read views across multiple trie operations
  • Added constructors for BackendTrieDB that accept pre-acquired shared read views to eliminate per-trie-open allocations
  • Pre-computed prefix nibbles in TrieWrapper constructor to reduce allocations from 3 to 1 per trie node lookup

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
crates/storage/api/mod.rs Changed StorageBackend::begin_read() to return Arc instead of Box, added Send + Sync bounds to StorageReadView trait
crates/storage/backend/rocksdb.rs Updated begin_read() to return Arc
crates/storage/backend/in_memory.rs Implemented RCU pattern with Arc<RwLock<Arc>>, switched Table from BTreeMap to FxHashMap, updated begin_read() for snapshot-based lock-free reads
crates/storage/trie.rs Added BackendTrieDB constructors with shared read view support (*_with_view variants), changed read view field to Arc for sharing
crates/storage/layering.rs Added TrieWrapper::new() constructor with pre-computed prefix nibbles, replaced apply_prefix calls with direct concat operations
crates/storage/store.rs Converted trie_cache and last_computed_flatkeyvalue from Mutex to RwLock, added *_shared trie opening methods, updated get_storage_at_root to pre-acquire and share resources, updated all TrieWrapper instantiations to use new constructor

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ilitteri ilitteri changed the title perf(l1): optimize storage layer for block execution (-15.5%) perf(l1): optimize storage layer for block execution Feb 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client performance Block execution throughput and performance in general

Projects

Status: No status
Status: Todo

Development

Successfully merging this pull request may close these issues.

1 participant