perf(l1): optimize storage layer for block execution by ilitteri · Pull Request #6207 · lambdaclass/ethrex

ilitteri · 2026-02-13T22:06:19Z

Motivation

Profiling block execution on mainnet revealed that lock contention, per-lookup allocations, and O(log n) data structure access in the storage layer accounted for a significant portion of execution time. The storage hot path (get_storage_at_root → trie lookups) was hitting multiple bottlenecks on every trie node access.

Description

Four incremental optimizations to the storage layer, each targeting a specific bottleneck identified via perf profiling on a 32-core AMD server:

1. BTreeMap → FxHashMap + Mutex → RwLock

Replace BTreeMap<Vec<u8>, Vec<u8>> with FxHashMap for in-memory storage tables — O(1) lookups instead of O(log n)
Replace Mutex with RwLock for trie_cache and last_computed_flatkeyvalue — allows concurrent readers (only the background worker writes, 2-3 times per block)

2. RCU pattern + held read view

RCU (Read-Copy-Update) for InMemoryBackend: wrap the database as Arc<RwLock<Arc<Database>>> so readers clone the inner Arc (O(1)) and read entirely lock-free
Held read view in BackendTrieDB: acquire the read view once at construction and reuse it for all trie node lookups, eliminating ~8000+ Box allocations per block

3. Shared read view across trie opens

Change begin_read() to return Arc<dyn StorageReadView> instead of Box<dyn StorageReadView>, enabling cheap Arc::clone() sharing
In get_storage_at_root, pre-acquire the read view, trie cache, and last_written value once, then share across both the state trie and storage trie opens — eliminates duplicate RwLock acquisitions
Add StorageReadView: Send + Sync bound (required for Arc sharing)

4. Pre-compute prefix nibbles in TrieWrapper

apply_prefix was called on every trie node lookup, creating 3 Vec allocations per call (from_bytes + append_new + concat)
Pre-compute the prefix nibbles once in the TrieWrapper constructor, reducing to 1 allocation (just concat) per lookup
Add TrieWrapper::new() constructor to encapsulate prefix pre-computation

Benchmark Results

30 runs each, back-to-back on same server (32-core AMD, ethrex-office-4):

Block	Txs	Gas	Baseline	Optimized	Delta
24443168	442	37.5M	57.19ms	48.32ms	-15.5%
24197547	556	53.0M	66.31ms	56.75ms	-14.4%
24199455	651	37.9M	45.26ms	38.70ms	-14.5%
24198607	116	12.0M	22.62ms	20.88ms	-7.7%

14-15% improvement on heavy blocks, 8% on light blocks. Lock contention dropped from 8.1% to 1.4% of block executor time.

How to Test

cargo test -p ethrex-storage
cargo test -p ethrex-blockchain
cargo clippy -p ethrex-storage -p ethrex-blockchain -- -D warnings
Run a full sync or replay a mainnet block to verify identical state roots

RwLockReadGuard across read transactions instead of re-acquiring per get(). Also replace Mutex with RwLock for trie_cache and last_computed_flatkeyvalue to allow concurrent readers. Combined, these changes reduce block execution time by ~9.3% (57ms → 52ms) on block 24443168 (442 txs, 37.5M gas) benchmarked on a 32-core AMD server.

pre-acquired read view in BackendTrieDB for the entire trie traversal. InMemoryBackend now wraps its Database in Arc, so begin_read() clones the Arc (O(1)) and releases the lock immediately. InMemoryReadTx and InMemoryLocked hold an owned Arc<Database> snapshot — all subsequent gets are lock-free HashMap lookups with no RwLock contention. BackendTrieDB now acquires a single read view in its constructor and reuses it for all get() calls during the trie traversal. This eliminates the per-node-lookup Box allocation and lock acquisition that previously happened ~8000+ times per block. StorageReadView gains Send + Sync bounds and begin_read() returns a 'static view, enabling BackendTrieDB to own the read view.

Change begin_read() to return Arc<dyn StorageReadView> instead of Box, allowing the read view to be shared across multiple BackendTrieDB instances. In get_storage_at_root (the SLOAD hot path), pre-acquire the read view, trie cache, and last_written once and share them between the state trie and storage trie opens. This eliminates per-query duplicate RwLock acquisitions and Arc allocations.

apply_prefix was called on every trie node lookup, creating 3 Vec allocations each time (from_bytes + append_new + concat). Pre-computing the prefix nibbles once in the TrieWrapper constructor reduces this to 1 allocation (just concat) per lookup. Also adds TrieWrapper::new() constructor to encapsulate this.

github-actions · 2026-02-13T22:07:01Z

🤖 Kimi Code Review

Review Summary

This PR introduces significant performance optimizations to the storage layer, primarily through:

Shared read views to reduce lock contention
Pre-computed prefix nibbles for storage tries
Arc-based snapshots for in-memory backend
RwLock instead of Mutex for better concurrency

Issues Found

1. Critical: Atomicity Violation in InMemoryBackend (in_memory.rs:33-34, 153-154, 168-169)

The use of Arc::make_mut creates a new database snapshot on every write, breaking atomicity guarantees. Multiple concurrent writes will create divergent snapshots instead of properly synchronizing:

// Problem: Each write creates a new Arc, breaking atomicity
let db_mut = Arc::make_mut(&mut *db);

Fix: Use RwLock<Database> directly instead of RwLock<Arc<Database>>, or implement proper transaction isolation.

2. Performance: Unnecessary Arc Cloning (in_memory.rs:44-48)

The in-memory backend clones the entire database on every read transaction, which is O(n) complexity. This defeats the purpose of the optimization:

let snapshot = self.inner.read()?.clone(); // Clones entire database

Fix: Return a reference-based read view that holds the RwLock guard, or use a proper MVCC implementation.

3. Memory Safety: Lifetime Mismatch (api/mod.rs:37)

The trait signature change from Box<dyn StorageReadView + '_> to Arc<dyn StorageReadView> removes the lifetime parameter, but the actual implementation may still have lifetime constraints.

4. Code Duplication (store.rs:2422-2599)

Multiple nearly identical open_*_trie functions could be consolidated using generics or a builder pattern.

5. Error Handling Inconsistency (layering.rs:195-210)

The TrieWrapper::new function computes prefix_nibbles but the old apply_prefix function is still used in some places, creating inconsistency.

Recommendations

Revert the Arc-based snapshot approach in InMemoryBackend - it's fundamentally flawed for atomicity
Use parking_lot::RwLock instead of std::sync::RwLock for better performance
Consider using a BTreeMap snapshot approach that shares unchanged nodes rather than full cloning
Add comprehensive tests for concurrent operations to verify atomicity guarantees
Document the thread-safety guarantees clearly for each backend implementation

The RocksDB backend changes look correct as RocksDB handles its own MVCC.

Automated review by Kimi (Moonshot AI) · custom prompt

github-actions · 2026-02-13T22:08:13Z

🤖 Codex Code Review

Findings

Potential nondeterminism in prefix iteration due to HashMap/FxHashMap
In crates/storage/backend/in_memory.rs around the type Table = FxHashMap... and InMemoryReadTx::prefix_iter, the data structure changed from BTreeMap to hash maps. prefix_iter builds a Vec by iterating the whole table and filtering by prefix, which now yields nondeterministic order. If any caller relies on deterministic ordering for iteration (e.g., tests, debug tooling, or FKV generation comparisons), this can cause flakiness or different outputs.
Recommendation: either keep BTreeMap or explicitly sort results (by key) before returning.
Line refs: crates/storage/backend/in_memory.rs near type Table = FxHashMap... and impl StorageReadView for InMemoryReadTx::prefix_iter.
Potential large-copy amplification in in-memory DB on writes
In crates/storage/backend/in_memory.rs around Arc<RwLock<Arc<Database>>> and Arc::make_mut, any active read snapshot causes Arc::make_mut to clone the entire Database on write. This is an intentional RCU pattern but can be unexpectedly expensive for large in-memory tables (copies all tables, not just one).
Recommendation: consider documenting this tradeoff or scoping snapshots (per-table arc) if this is used outside tests.
Line refs: crates/storage/backend/in_memory.rs near struct InMemoryBackend { inner: Arc<RwLock<Arc<Database>>> } and Arc::make_mut in clear_table, commit, and delete.

Notes / Questions

StorageReadView is now Send + Sync and begin_read returns Arc<dyn StorageReadView>. This implies read views may be shared across threads. I assume RocksDBReadTx (not shown here) is thread-safe; if it wraps a RocksDB Snapshot or similar, please confirm it is Send + Sync and that concurrent reads on the same view are safe.
Line refs: crates/storage/api/mod.rs near pub trait StorageReadView: Send + Sync and begin_read -> Arc<dyn StorageReadView>.

Positive

The shared read view in BackendTrieDB and the pre-acquired cache/read view in get_storage_at_root reduce lock churn and repeated transaction creation; the intent is clear and likely a net performance win.
Line refs: crates/storage/trie.rs in BackendTrieDB and crates/storage/store.rs around get_storage_at_root and open_*_trie_shared.

If you want, I can scan for any callers that depend on deterministic prefix iteration order to confirm whether Point 1 is a real behavioral change.

Automated review by OpenAI Codex · custom prompt

github-actions · 2026-02-13T22:08:32Z

🤖 Claude Code Review

Now I have enough context to provide a comprehensive code review. Let me compile my findings:

PR Review: Performance Optimization of Storage Layer (-15.5%)

Summary

This PR implements four incremental optimizations to the storage hot path, targeting lock contention and allocation overhead identified through profiling. The changes are well-motivated and demonstrate clear performance wins (15.5% speedup, 18.2% throughput increase).

Critical Issues

1. Type inconsistency in InMemoryBackend (crates/storage/backend/in_memory.rs:11)

The outer Database container type was changed from BTreeMap to HashMap, but this introduces a subtle inconsistency:

type Table = FxHashMap<Vec<u8>, Vec<u8>>;
type Database = HashMap<&'static str, Table>;  // ← Standard HashMap, not FxHashMap

Issue: The outer HashMap uses the default SipHash hasher while tables use FxHash. For consistency and performance with small string keys (table names), this should also use FxHashMap:

type Database = FxHashMap<&'static str, Table>;

Impact: Minor performance regression for table name lookups, though table count is small so impact is minimal in practice.

2. Potential Arc clone overhead in get_storage_at_root (crates/storage/store.rs:2104-2106, 2118-2120)

The code clones Arc three times per call:

read_view.clone(),
cache.clone(),
last_written.clone(),  // ← This is Vec<u8>, not Arc!

Issue: last_written is Vec<u8>, not Arc<Vec<u8>>. This clones the entire vector twice (once for state trie, once for storage trie) when it should be wrapped in Arc for sharing.

Recommendation: Change last_written() to return Arc<Vec<u8>> and store last_computed_flatkeyvalue as Arc<RwLock<Arc<Vec<u8>>>>, similar to the trie_cache pattern. This would eliminate the vector clones.

Impact: Moderate - each last_written vec is typically 66-131 bytes, cloned twice per storage lookup.

3. Missing Send + Sync bounds verification (crates/storage/api/mod.rs:58)

The trait now requires Send + Sync:

pub trait StorageReadView: Send + Sync {

Verification needed: Ensure all concrete implementations (InMemoryReadTx, RocksDBReadTx) have fields that are Send + Sync. Looking at the code:

InMemoryReadTx::snapshot is Arc<Database> ✓ (Arc is Send+Sync if Database is)
RocksDBReadTx::db is Arc<DB> ✓ (DB from RocksDB is Send+Sync)

Status: Appears correct, but worth confirming with a compile-time assertion or test if not already verified.

Performance Concerns

4. Arc::make_mut potential COW overhead (crates/storage/backend/in_memory.rs:33, 153, 169)

The RCU pattern uses Arc::make_mut for writes:

let db_mut = Arc::make_mut(&mut *db);

Concern: If any reader still holds a reference to the inner Arc<Database>, this will trigger a full database clone (copy-on-write). With FxHashMap's fast hashing, this could be expensive for large databases.

Analysis: The design assumes writes are infrequent (background worker only, 2-3 times per block). Readers clone the inner Arc and release the RwLock immediately, so the window for contention is small. However, if a reader holds the snapshot for a long time while a write occurs, the entire DB gets cloned.

Recommendation: Add a comment documenting this behavior and the assumption that reads are short-lived. Consider profiling to verify COW isn't triggered frequently in practice.

5. Prefix nibble computation still allocates (crates/storage/layering.rs:207-208, 233-236, 241-244)

The optimization pre-computes prefix nibbles, but the application still allocates:

let key = match &self.prefix_nibbles {
    Some(prefix) => prefix.concat(&key),  // ← Still allocates a new Nibbles
    None => key,
};

Issue: concat creates a new Nibbles on every lookup (reduced from 3 allocations to 1, but not eliminated).

Potential optimization: If Nibbles used a small-vec optimization or stack buffer for common sizes (65-131 nibbles), this allocation could be eliminated entirely. This would require changes to the ethrex_trie crate.

Impact: Lower priority - already reduced by 67%, further optimization requires trie library changes.

Code Quality Issues

6. Redundant field in BackendTrieDB (crates/storage/trie.rs:14-15)

The db field is documented as "used only for writes" but the struct is used for both reads and writes:

/// Reference to the storage backend (used only for writes)
db: Arc<dyn StorageBackend>,
/// Pre-acquired read view held for the lifetime of this struct.
read_view: Arc<dyn StorageReadView>,

Question: Is db only used in put_batch? If so, consider making BackendTrieDB read-only and creating a separate write struct. If reads also need db, update the comment.

Looking at usage: db is only used in put_batch (line 140). Consider creating BackendTrieDBWrite that wraps this for writes, or clarify that this struct supports both operations but prefers the read_view for reads.

7. Removed apply_prefix usage not fully deleted (crates/storage/trie.rs:6, 111)

The import and function are still present but largely unused:

use crate::layering::apply_prefix;  // Line 6
// ...
fn make_key(&self, path: Nibbles) -> Vec<u8> {
    apply_prefix(self.address_prefix, path).into_vec()  // Line 111
}

While BackendTrieDB still uses it in make_key, TrieWrapper no longer uses it in the hot path. The function itself is still useful but consider:

Documenting that it's kept for make_key and backwards compatibility
Or inlining it into the few places it's still used

Correctness

8. RwLock poisoning behavior (crates/storage/store.rs:2093-2094)

Lock errors map to StoreError::LockError:

.read()
.map_err(|_| StoreError::LockError)?

Question: What happens if a thread panics while holding a write lock? The RwLock will be poisoned, and all subsequent operations will fail. Is this the desired behavior, or should poisoned locks be recovered?

Analysis: For blockchain storage, fail-fast on poisoned locks is likely correct - if a storage write panicked, the database is potentially in an inconsistent state. The current behavior seems appropriate.

9. Trie cache RCU consistency (crates/storage/store.rs:2759-2760)

During trie updates, the cache is updated with RCU:

let trie = Arc::new(trie_mut);
*trie_cache.write().map_err(|_| StoreError::LockError)? = trie.clone();

Observation: After the write lock is released, readers will see the new cache state. There's a small window where readers might miss cache entries if they grabbed the old cache before the update but perform lookups after the database is updated.

Analysis: This is inherent to the RCU pattern and acceptable - readers will simply hit the database instead of the cache, which is correct (slower but not incorrect). The tradeoff is worth it for lock-free reads.

Minor Issues

10. Missing type documentation (crates/storage/backend/in_memory.rs:10-11)

The type aliases should document why FxHashMap is chosen:

// Using FxHashMap for O(1) lookups vs BTreeMap's O(log n).
// FxHash is faster than SipHash for small keys (byte vectors, table names).
type Table = FxHashMap<Vec<u8>, Vec<u8>>;
type Database = FxHashMap<&'static str, Table>;  // Fixed: use FxHashMap here too

11. TrieWrapper::new could take ownership (crates/storage/layering.rs:201-215)

The constructor currently takes owned Box<dyn TrieDB> but borrows to compute prefix:

pub fn new(
    state_root: H256,
    inner: Arc<TrieLayerCache>,
    db: Box<dyn TrieDB>,
    prefix: Option<H256>,
) -> Self {
    let prefix_nibbles = prefix.map(|p| Nibbles::from_bytes(p.as_bytes()).append_new(17));
    // ...
}

This is fine, but the prefix calculation could be made more explicit that it's pre-computation. Consider renaming the field in the struct to precomputed_prefix for clarity.

Positive Observations

Well-structured incremental optimization - Each optimization is independent and can be reasoned about separately
Strong performance gains - 15.5% reduction with clear profiling data to back it up
Type-safe RCU implementation - Using Arc<RwLock<Arc<T>>> properly implements the RCU pattern
Proper trait bounds - Adding Send + Sync to StorageReadView is necessary and correct for Arc sharing
Backward compatible - All existing tests should pass without modification
Good documentation - The PR description clearly explains each optimization's rationale

Recommendations

Must fix:

Change Database type to use FxHashMap consistently (Item 1)

Should fix:
2. Wrap last_written in Arc to avoid vector clones (Item 2)
3. Add documentation for COW behavior in InMemoryBackend (Item 4)

Nice to have:
4. Add type aliases documentation (Item 10)
5. Clarify BackendTrieDB::db field usage (Item 6)

Future optimization:
6. Consider small-vec optimization for Nibbles in the trie library (Item 5)

Security Assessment

No security vulnerabilities identified. The changes maintain the same correctness guarantees as before:

No data races (RwLock properly protects shared state)
No use-after-free (Arc ensures proper lifetimes)
No undefined behavior from unsafe code (no unsafe used in this PR)
Blockchain correctness preserved (state root computation unchanged)

Verdict

APPROVE with minor fixes requested (Item 1 and optionally Item 2). The performance gains are significant and well-justified. The implementation is sound with only minor consistency issues to address.

Automated review by Claude (Anthropic) · custom prompt

github-actions · 2026-02-13T22:08:47Z

Lines of code report

Total lines added: 107
Total lines removed: 0
Total lines changed: 107

Detailed view

+--------------------------------------------+-------+------+
| File                                       | Lines | Diff |
+--------------------------------------------+-------+------+
| ethrex/crates/storage/backend/in_memory.rs | 150   | +2   |
+--------------------------------------------+-------+------+
| ethrex/crates/storage/layering.rs          | 195   | +23  |
+--------------------------------------------+-------+------+
| ethrex/crates/storage/store.rs             | 2483  | +55  |
+--------------------------------------------+-------+------+
| ethrex/crates/storage/trie.rs              | 175   | +27  |
+--------------------------------------------+-------+------+

greptile-apps · 2026-02-13T22:09:43Z

Greptile Overview

Greptile Summary

This PR implements four incremental storage layer optimizations that collectively achieve a 15.5% reduction in block execution time through targeted elimination of lock contention, allocation overhead, and algorithmic complexity.

Key Changes:

Replaced Mutex with RwLock for trie_cache and last_computed_flatkeyvalue to enable concurrent readers (only background worker writes)
Implemented RCU (Read-Copy-Update) pattern for InMemoryBackend with Arc<RwLock<Arc<Database>>> for lock-free reads after initial snapshot acquisition
Changed BTreeMap to FxHashMap in in-memory storage tables for O(1) instead of O(log n) lookups
Modified begin_read() to return Arc<dyn StorageReadView> instead of Box<dyn StorageReadView + '_>, enabling cheap cloning and sharing of read views across multiple trie opens
Added BackendTrieDB field to hold pre-acquired read view for entire trie traversal, eliminating ~8000+ Box allocations per block
Pre-computed prefix nibbles in TrieWrapper constructor, reducing from 3 allocations per trie node lookup to 1

Performance Impact:
Block execution improved from 57.19ms to 48.32ms median (-15.5%), lock contention dropped from 8.1% to 1.4%. The changes are well-architected with proper separation of concerns between the optimized hot path (get_storage_at_root with shared resources) and existing code paths.

Confidence Score: 4/5

This PR is safe to merge with minor considerations around the RCU pattern implementation
Score reflects well-tested optimizations with clear performance gains (15.5% improvement, backed by benchmarks), sound architectural design, and proper thread safety. The RCU pattern and RwLock migration are correctly implemented. One point deducted because the Arc::make_mut pattern in InMemoryBackend will clone the entire database on writes if any read snapshots still exist, which could cause unexpected memory pressure in high-concurrency scenarios. The changes maintain backward compatibility and existing tests should catch regressions.
Pay close attention to crates/storage/backend/in_memory.rs - verify RCU pattern behavior under concurrent read/write load in production

Important Files Changed

Filename	Overview
crates/storage/backend/in_memory.rs	Implemented RCU pattern with `Arc<RwLock<Arc<Database>>>`, replaced BTreeMap with FxHashMap for O(1) lookups, snapshots now clone inner Arc for lock-free reads
crates/storage/layering.rs	Pre-computes prefix nibbles in `TrieWrapper::new()` constructor to avoid repeated allocations on every trie node lookup, reducing allocations from 3 to 1 per lookup
crates/storage/store.rs	Changed `trie_cache` and `last_computed_flatkeyvalue` from `Mutex` to `RwLock`, added shared read view pattern in `get_storage_at_root`, new `*_with_view` methods enable resource sharing across multiple trie opens
crates/storage/trie.rs	Added `read_view` field to `BackendTrieDB` to hold pre-acquired read view, eliminating per-lookup allocations, new `*_with_view` constructors support sharing a single read view across multiple trie instances

Sequence Diagram

sequenceDiagram
    participant App as Block Executor
    participant Store as Store
    participant Backend as StorageBackend
    participant TrieDB as BackendTrieDB
    participant Cache as TrieLayerCache
    
    Note over App,Cache: Optimized get_storage_at_root flow
    
    App->>Store: get_storage_at_root(state_root, address, storage_key)
    
    Note over Store: Pre-acquire shared resources (optimization #2 & #3)
    Store->>Backend: begin_read()
    Backend-->>Store: Arc<StorageReadView> (cloneable snapshot)
    Store->>Cache: trie_cache.read() (RwLock instead of Mutex)
    Cache-->>Store: Arc<TrieLayerCache> clone
    Store->>Store: last_written() (RwLock.read())
    
    Note over Store: Open state trie with shared resources
    Store->>TrieDB: new_for_accounts_with_view(backend, read_view, last_written)
    Note over TrieDB: Holds read_view for entire traversal (no per-lookup allocation)
    
    Store->>TrieDB: state_trie.get(account_hash)
    Note over TrieDB: Pre-computed prefix nibbles (optimization #4)
    TrieDB->>Cache: inner.get(state_root, prefixed_key)
    Cache-->>TrieDB: cache hit/miss
    alt cache miss
        TrieDB->>Backend: read_view.get() (lock-free with RCU)
        Backend-->>TrieDB: trie node data
    end
    TrieDB-->>Store: encoded_account
    
    Note over Store: Open storage trie reusing same read_view
    Store->>TrieDB: new_for_storages_with_view(backend, read_view.clone(), last_written)
    
    Store->>TrieDB: storage_trie.get(hashed_key)
    Note over TrieDB: Reuses held read_view (no new snapshot)
    TrieDB->>Cache: inner.get(state_root, prefixed_key)
    alt cache miss
        TrieDB->>Backend: read_view.get() (same snapshot)
        Backend-->>TrieDB: storage value
    end
    TrieDB-->>Store: storage_value
    Store-->>App: U256 result
    
    Note over App,Cache: Key optimizations:<br/>1. BTreeMap→FxHashMap (O(log n)→O(1))<br/>2. RCU pattern (lock-free reads)<br/>3. Shared read_view (no duplicate snapshots)<br/>4. Pre-computed prefix (1 alloc vs 3)

_{Last reviewed commit: d4555a6}

greptile-apps

_{6 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-13T22:09:47Z

crates/storage/backend/in_memory.rs

            .map_err(|_| StoreError::Custom("Failed to acquire write lock".to_string()))?;

-        let table_ref = db.entry(table).or_insert_with(Table::new);
+        let db_mut = Arc::make_mut(&mut *db);


Arc::make_mut will clone the entire database if any read snapshots still hold references to the inner Arc<Database>. Under high concurrency (many concurrent get_storage_at_root calls holding read views), this could cause unexpected memory spikes and copy overhead on writes. Consider documenting this trade-off or monitoring memory usage in production.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/storage/backend/in_memory.rs Line: 153:153 Comment: `Arc::make_mut` will clone the entire database if any read snapshots still hold references to the inner `Arc<Database>`. Under high concurrency (many concurrent `get_storage_at_root` calls holding read views), this could cause unexpected memory spikes and copy overhead on writes. Consider documenting this trade-off or monitoring memory usage in production. <sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub> How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-13T22:09:48Z

crates/storage/store.rs

+                read_view.clone(),
+                cache.clone(),
+                last_written.clone(),


Good optimization - cloning Arc is cheap (atomic increment) while avoiding duplicate RwLock acquisitions and snapshot creation.

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/storage/store.rs Line: 2104:2106 Comment: Good optimization - cloning `Arc` is cheap (atomic increment) while avoiding duplicate `RwLock` acquisitions and snapshot creation. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-13T22:09:49Z

crates/storage/layering.rs

+        let prefix_nibbles =
+            prefix.map(|p| Nibbles::from_bytes(p.as_bytes()).append_new(17));


Excellent optimization - pre-computing the prefix once eliminates 3 allocations (from_bytes + append_new + concat) on every get() call, keeping only the final concat.

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/storage/layering.rs Line: 207:208 Comment: Excellent optimization - pre-computing the prefix once eliminates 3 allocations (`from_bytes` + `append_new` + `concat`) on every `get()` call, keeping only the final `concat`. How can I resolve this? If you propose a fix, please make it concise.

Copilot

Pull request overview

This PR optimizes the storage layer for block execution through four targeted performance improvements, achieving a 15.5% reduction in execution time. The optimizations target bottlenecks identified via profiling: data structure access patterns (BTreeMap→FxHashMap), lock contention (Mutex→RwLock), per-lookup allocations (RCU pattern + held read views), and repeated prefix computations.

Changes:

Replaced Mutex with RwLock for read-heavy trie_cache and last_computed_flatkeyvalue to enable concurrent readers
Implemented RCU pattern in InMemoryBackend with Arc<RwLock<Arc>> for lock-free reads via snapshots
Modified storage API to return Arc<dyn StorageReadView> instead of Box, enabling cheap sharing of read views across multiple trie operations
Added constructors for BackendTrieDB that accept pre-acquired shared read views to eliminate per-trie-open allocations
Pre-computed prefix nibbles in TrieWrapper constructor to reduce allocations from 3 to 1 per trie node lookup

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
crates/storage/api/mod.rs	Changed StorageBackend::begin_read() to return Arc instead of Box, added Send + Sync bounds to StorageReadView trait
crates/storage/backend/rocksdb.rs	Updated begin_read() to return Arc
crates/storage/backend/in_memory.rs	Implemented RCU pattern with Arc<RwLock<Arc>>, switched Table from BTreeMap to FxHashMap, updated begin_read() for snapshot-based lock-free reads
crates/storage/trie.rs	Added BackendTrieDB constructors with shared read view support (*_with_view variants), changed read view field to Arc for sharing
crates/storage/layering.rs	Added TrieWrapper::new() constructor with pre-computed prefix nibbles, replaced apply_prefix calls with direct concat operations
crates/storage/store.rs	Converted trie_cache and last_computed_flatkeyvalue from Mutex to RwLock, added *_shared trie opening methods, updated get_storage_at_root to pre-acquire and share resources, updated all TrieWrapper instantiations to use new constructor

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ilitteri added 4 commits February 13, 2026 15:13

ilitteri requested a review from a team as a code owner February 13, 2026 22:06

Copilot AI review requested due to automatic review settings February 13, 2026 22:06

github-actions bot assigned ilitteri Feb 13, 2026

github-actions bot added L1 Ethereum client performance Block execution throughput and performance in general labels Feb 13, 2026

github-project-automation bot added this to ethrex_l1 and ethrex_performance Feb 13, 2026

github-project-automation bot moved this to Todo in ethrex_performance Feb 13, 2026

Copilot started reviewing on behalf of ilitteri February 13, 2026 22:06 View session

greptile-apps bot reviewed Feb 13, 2026

View reviewed changes

Copilot AI reviewed Feb 13, 2026

View reviewed changes

ilitteri changed the title ~~perf(l1): optimize storage layer for block execution (-15.5%)~~ perf(l1): optimize storage layer for block execution Feb 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(l1): optimize storage layer for block execution#6207

perf(l1): optimize storage layer for block execution#6207
ilitteri wants to merge 4 commits intomainfrom
perf/storage-optimizations

ilitteri commented Feb 13, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

greptile-apps bot commented Feb 13, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 13, 2026

Uh oh!

greptile-apps bot Feb 13, 2026

Uh oh!

greptile-apps bot Feb 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		let prefix_nibbles =
		prefix.map(\|p\| Nibbles::from_bytes(p.as_bytes()).append_new(17));

Conversation

ilitteri commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

1. BTreeMap → FxHashMap + Mutex → RwLock

2. RCU pattern + held read view

3. Shared read view across trie opens

4. Pre-compute prefix nibbles in TrieWrapper

Benchmark Results

How to Test

Uh oh!

github-actions bot commented Feb 13, 2026

🤖 Kimi Code Review

Review Summary

Issues Found

1. Critical: Atomicity Violation in InMemoryBackend (in_memory.rs:33-34, 153-154, 168-169)

2. Performance: Unnecessary Arc Cloning (in_memory.rs:44-48)

3. Memory Safety: Lifetime Mismatch (api/mod.rs:37)

4. Code Duplication (store.rs:2422-2599)

5. Error Handling Inconsistency (layering.rs:195-210)

Recommendations

Uh oh!

github-actions bot commented Feb 13, 2026

🤖 Codex Code Review

Uh oh!

github-actions bot commented Feb 13, 2026

🤖 Claude Code Review

PR Review: Performance Optimization of Storage Layer (-15.5%)

Summary

Critical Issues

1. Type inconsistency in InMemoryBackend (crates/storage/backend/in_memory.rs:11)

2. Potential Arc clone overhead in get_storage_at_root (crates/storage/store.rs:2104-2106, 2118-2120)

3. Missing Send + Sync bounds verification (crates/storage/api/mod.rs:58)

Performance Concerns

4. Arc::make_mut potential COW overhead (crates/storage/backend/in_memory.rs:33, 153, 169)

5. Prefix nibble computation still allocates (crates/storage/layering.rs:207-208, 233-236, 241-244)

Code Quality Issues

6. Redundant field in BackendTrieDB (crates/storage/trie.rs:14-15)

7. Removed apply_prefix usage not fully deleted (crates/storage/trie.rs:6, 111)

Correctness

8. RwLock poisoning behavior (crates/storage/store.rs:2093-2094)

9. Trie cache RCU consistency (crates/storage/store.rs:2759-2760)

Minor Issues

10. Missing type documentation (crates/storage/backend/in_memory.rs:10-11)

11. TrieWrapper::new could take ownership (crates/storage/layering.rs:201-215)

Positive Observations

Recommendations

Security Assessment

Verdict

Uh oh!

github-actions bot commented Feb 13, 2026

Lines of code report

Uh oh!

greptile-apps bot commented Feb 13, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

ilitteri commented Feb 13, 2026 •

edited

Loading