perf(l1): enable per-level LZ4/Zstd compression on RocksDB trie and FKV column families by ilitteri · Pull Request #6176 · lambdaclass/ethrex

ilitteri · 2026-02-11T14:52:46Z

Motivation

CPU profiling on EXPB benchmarks (gigablocks and slow scenarios) shows RocksDB compaction consuming 36-60% of cumulative CPU time, with PosixRandomAccessFile::Read at 9-12% flat — indicating heavy disk I/O from uncompressed SST files in the four most write-heavy column families (trie nodes and flat key-values).

Description

Add per-level compression to ACCOUNT_TRIE_NODES, STORAGE_TRIE_NODES, ACCOUNT_FLATKEYVALUE, and STORAGE_FLATKEYVALUE column families:

L0-L1: None (recently flushed, short-lived data — compression overhead not worth it)
L2-L4: LZ4 (fast compression for mid-tier levels)
L5-L6: Zstd (best compression ratio for cold data)

RLP-encoded trie node and flat key-value data compresses 40-70%. This reduces SST file sizes, compaction I/O, and foreground read bandwidth at the cost of some CPU for compression/decompression.

Also adds the zstd feature to the workspace rocksdb dependency (it previously only had lz4).

No resync required

This is not a breaking change. RocksDB stores the compression type in each SST file's metadata, so it can read both compressed and uncompressed files transparently. Existing uncompressed SST files remain readable as-is — new compressed files are only written as compaction runs. The database gradually transitions to compressed format over time with no manual intervention.

Benchmark Results (EXPB on ethrex-office-3)

All runs use CPU profiling enabled. Fresh baselines (#63-65) and compression runs (#61-62, #66) were run back-to-back on the same machine.

Throughput

Scenario	Main (Mgas/s)	Compression (Mgas/s)	Change
fast (200 mainnet blocks)	420.48	544.21	+29.4%
gigablocks (100 synthetic max-gas blocks)	909.63	1232.8	+35.5%
slow (5000 mainnet blocks)	811.04	877.06	+8.1%

Latency — Fast (200 mainnet blocks)

Metric	Main	Compression	Change
avg	67.81 ms	54.86 ms	-19.1%
median	52.98 ms	36.75 ms	-30.6%
p95	148.88 ms	146.71 ms	-1.5%
p99	295.55 ms	208.66 ms	-29.4%

Latency — Gigablocks (100 synthetic max-gas blocks)

Metric	Main	Compression	Change
avg	1410 ms	1020 ms	-27.7%
median	1200 ms	854 ms	-28.8%
p95	2760 ms	1720 ms	-37.7%
p99	4340 ms	2220 ms	-48.8%

Latency — Slow (5000 mainnet blocks)

Metric	Main	Compression	Change
avg	29.48 ms	26.40 ms	-10.4%
median	21.79 ms	20.22 ms	-7.2%
p95	76.03 ms	68.29 ms	-10.2%
p99	133.54 ms	119.07 ms	-10.8%

CPU Profile Comparison — Gigablocks

Function	Main (#64)	Compression (#61)	Change
Wall time	230.28s	183.38s	-20.4%
Total CPU samples	49833ms (21.64%)	48792ms (26.61%)	-2% abs, +23% util
CompactionJob::Run	18174ms (36.47%)	22475ms (46.06%)	+24% abs (includes compression)
CompressBlock (NEW)	—	6566ms (13.46%)	Compression overhead
VM::execute	12370ms (24.82%)	9708ms (19.90%)	-21.5% abs
rayon::join (merkle)	10573ms (21.22%)	8936ms (18.31%)	-15.5% abs
PosixFile::Read (disk I/O)	5854ms (11.75%)	5069ms (10.39%)	-13.4% abs

CPU Profile Comparison — Slow

Function	Main (#65)	Compression (#62)	Change
Wall time	1826.40s	1818.78s	-0.4%
Total CPU samples	127137ms (6.96%)	158414ms (8.71%)	+24.6% abs, +25% util
CompactionJob::Run	76185ms (59.92%)	109357ms (69.03%)	+43.5% abs (includes compression)
CompressBlock (NEW)	—	30059ms (18.97%)	Compression overhead
VM::execute	13244ms (10.42%)	12224ms (7.72%)	-7.7% abs
PosixFile::Read (disk I/O)	11178ms (8.79%)	11602ms (7.32%)	+3.8% abs
apply_trie_updates	10368ms (8.15%)	9875ms (6.23%)	-4.8% abs
WriteBatch::Iterate	7893ms (6.21%)	7414ms (4.68%)	-6.1% abs

Key takeaway: Compression trades ~13-19% of CPU samples for reduced I/O. The gigablocks scenario benefits most — wall time dropped 20% because compaction processes smaller (compressed) SST files, reducing write stall pressure during heavy block execution. The slow scenario shows moderate improvement with block latency dropping 10%.

How to Test

Run EXPB benchmarks: fast, gigablocks, and slow scenarios with CPU profiling enabled
Verify Hoodi snap sync completes without errors (tested on ethrex-mainnet-test-2 — synced to chain tip in ~20 min)
Verify node follows head normally after sync

…n families Add per-level compression to ACCOUNT_TRIE_NODES, STORAGE_TRIE_NODES, ACCOUNT_FLATKEYVALUE, and STORAGE_FLATKEYVALUE column families: - L0-L1: None (recently flushed, short-lived data) - L2-L4: LZ4 (fast compression for mid-tier levels) - L5-L6: Zstd (best compression ratio for cold data) These four CFs are the most write-heavy and were previously uncompressed. RLP-encoded trie node and flat key-value data is highly compressible (40-70% size reduction), so enabling compression reduces SST file sizes, compaction I/O, and foreground read bandwidth. Expected 5-15% total CPU reduction across workloads.

The per-level compression config uses Zstd for L5-L6, but the rocksdb crate was only built with LZ4 support. Add the "zstd" feature to the workspace dependency so Zstd compression is linked into the binary.

github-actions · 2026-02-11T14:53:20Z

🤖 Kimi Code Review

Review Summary

This PR adds zstd compression support to RocksDB and configures per-level compression strategies for different column families. The changes are well-intentioned but have some issues that need addressing.

Issues Found

1. Missing Error Handling (Critical)

File: crates/storage/backend/rocksdb.rs
Lines: 131-144 and 156-169

The set_compression_per_level method can fail if the compression type is not supported. This should be handled:

// Current code (lines 131-144):
cf_opts.set_compression_per_level(&[
    rocksdb::DBCompressionType::None,
    // ... other levels
]);

// Should check if zstd is actually available:
if !rocksdb::DBCompressionType::Zstd.is_supported() {
    warn!("Zstd compression not supported, falling back to LZ4");
    // Use alternative compression strategy
}

2. Configuration Duplication (Maintainability)

File: crates/storage/backend/rocksdb.rs
Lines: 125-169

The compression configuration is duplicated for both column families. Extract this into a helper function:

fn configure_compression_options(opts: &mut Options) {
    let compression_levels = if rocksdb::DBCompressionType::Zstd.is_supported() {
        vec![
            rocksdb::DBCompressionType::None,
            rocksdb::DBCompressionType::None,
            rocksdb::DBCompressionType::Lz4,
            rocksdb::DBCompressionType::Lz4,
            rocksdb::DBCompressionType::Lz4,
            rocksdb::DBCompressionType::Zstd,
            rocksdb::DBCompressionType::Zstd,
        ]
    } else {
        // Fallback configuration
        vec![/* ... */]
    };
    
    opts.set_compression_per_level(&compression_levels);
}

3. Missing Feature Gate (Compatibility)

File: Cargo.toml

The zstd feature should be conditional to avoid breaking builds on systems without zstd:

[dependencies]
rocksdb = { version = "0.24.0", default-features = false, features = [
  "bindgen-runtime",
  "lz4",
  "zstd",  # Consider making this optional
] }

4. Performance Considerations (Moderate)

The compression strategy assumes 7 levels (L0-L6), but RocksDB's default number of levels is 7. However, this should be explicitly verified:

// Add verification
assert_eq!(opts.get_num_levels(), 7, "Unexpected number of levels");

5. Documentation (Minor)

File: crates/storage/backend/rocksdb.rs
Lines: 127-130, 152-155

The comments about compression ratios (40-70%) should cite sources or be marked as estimates:

// TODO: Verify compression ratios through benchmarking
// Preliminary tests suggest RLP-encoded trie nodes compress 40-70%

Security Considerations

No immediate security vulnerabilities found
Compression can potentially enable compression oracle attacks, but this is mitigated by the fact that blockchain data is public anyway

Recommendations

Add runtime checks for compression support
Extract duplicate configuration into helper functions
Add proper error handling for compression configuration
Consider making zstd an optional feature
Add metrics to monitor compression effectiveness

The PR is generally well-structured but needs these improvements for production readiness.

Automated review by Kimi (Moonshot AI) · custom prompt

github-actions · 2026-02-11T14:54:06Z

🤖 Codex Code Review

Adds per‑level compression and zstd dependency only. Mostly fine; a few things to consider:

Potential runtime crash if RocksDB is built without zstd support: you enable zstd feature but if dynamic lib missing in some environments, opening DB may fail. Consider documenting the new dependency or adding a startup check/log. crates/storage/backend/rocksdb.rs (around lines 131–147, 144–159).
Assumed level count hard-coded to 7 entries. If column family is configured with a different num_levels, RocksDB will still accept, but extra levels won’t use intended compression. Consider setting cf_opts.set_num_levels(7) to match the array or using DB::set_options/config to ensure consistency. crates/storage/backend/rocksdb.rs (around lines 131–147, 144–159).
Performance tradeoff: Zstd on L5–L6 increases CPU on compaction/read; ok for cold data but can impact sync/initial import. Consider making this configurable or gated by a “compression level” config. Same locations as above.
No tests or metrics changes: consider adding a benchmark or noting expected compression ratios in docs instead of code comments. (Comment claims “40–70%” without source.) crates/storage/backend/rocksdb.rs (around lines 128–139, 141–152).

If you want, I can check for config hooks or write a small runtime guard to log missing zstd support.

Automated review by OpenAI Codex · custom prompt

github-actions · 2026-02-11T14:54:35Z

🤖 Claude Code Review

Perfect! Now I have all the context I need. Let me provide a comprehensive code review.

Code Review - PR #6176: Per-Level RocksDB Compression

Summary

This PR adds per-level compression configuration to RocksDB for trie and flat key-value column families, targeting a significant performance improvement with impressive benchmark results (+35.5% throughput on gigablocks, -48.8% p99 latency).

✅ Strengths

Well-motivated change: CPU profiling data clearly identifies the bottleneck (36-60% time in compaction, 9-12% in disk I/O)
Excellent benchmark results: Significant performance improvements across all metrics
Good compression strategy: The tiered approach (None → LZ4 → Zstd) is sensible for LSM tree levels
Code clarity: Comments explain the rationale inline at crates/storage/backend/rocksdb.rs:128-130 and crates/storage/backend/rocksdb.rs:153-155

🔍 Findings

1. Potential Mismatch with RocksDB Level Count (Medium Priority)

Location: crates/storage/backend/rocksdb.rs:131-139 and crates/storage/backend/rocksdb.rs:156-164

The compression configuration specifies 7 levels (L0-L6), but RocksDB's actual level count depends on runtime configuration. With level_compaction_dynamic_level_bytes(true) (line 47), RocksDB dynamically adjusts which levels are active.

Concern: If RocksDB uses more than 7 levels in production (possible with large datasets), levels beyond L6 will have undefined compression behavior. The rocksdb crate documentation indicates that set_compression_per_level will apply the last specified compression type to all higher levels if they exist.

Recommendation:

Consider extending the array to cover more levels (e.g., L7-L9) with Zstd
Or explicitly document the assumption that 7 levels is sufficient for expected dataset sizes
Add a test or assertion to verify level count assumptions

// Suggested improvement:
cf_opts.set_compression_per_level(&[
    rocksdb::DBCompressionType::None, // L0
    rocksdb::DBCompressionType::None, // L1
    rocksdb::DBCompressionType::Lz4,  // L2
    rocksdb::DBCompressionType::Lz4,  // L3
    rocksdb::DBCompressionType::Lz4,  // L4
    rocksdb::DBCompressionType::Zstd, // L5
    rocksdb::DBCompressionType::Zstd, // L6
    rocksdb::DBCompressionType::Zstd, // L7+ (if they exist)
]);

2. Code Duplication (Low Priority)

Location: crates/storage/backend/rocksdb.rs:131-139 and crates/storage/backend/rocksdb.rs:156-164

The exact same compression configuration is duplicated for trie and FKV column families. This creates a maintenance burden if the compression strategy needs to be adjusted.

Recommendation: Extract to a constant or helper:

const TRIE_AND_FKV_COMPRESSION_LEVELS: [rocksdb::DBCompressionType; 7] = [
    rocksdb::DBCompressionType::None, // L0
    rocksdb::DBCompressionType::None, // L1
    rocksdb::DBCompressionType::Lz4,  // L2
    rocksdb::DBCompressionType::Lz4,  // L3
    rocksdb::DBCompressionType::Lz4,  // L4
    rocksdb::DBCompressionType::Zstd, // L5
    rocksdb::DBCompressionType::Zstd, // L6
];

// Then use:
cf_opts.set_compression_per_level(&TRIE_AND_FKV_COMPRESSION_LEVELS);

3. Migration Consideration (Informational)

Impact: Existing databases

When users upgrade, existing SST files won't be automatically recompressed. The new compression only applies to:

Newly flushed memtables
Files created during future compactions

Question: Should the PR description mention this migration behavior? Users with large existing databases might not see immediate benefits until natural compaction occurs, or they might want to trigger manual compaction.

4. Dependency Version Verification (Low Priority)

Location: Cargo.toml:130-134

The PR adds the zstd feature to rocksdb 0.24.0.

Verification needed: Confirm that rocksdb 0.24.0's Zstd implementation is production-ready and doesn't have known issues. (This is likely fine, but worth a quick check of the rocksdb-rs changelog).

5. Alignment with Global Compression Settings (Minor)

Location: crates/storage/backend/rocksdb.rs:65

The global default is DBCompressionType::None (line 65), but this is overridden for compressible tables and now for trie/FKV tables with per-level compression. This is correct, but the interaction between global and per-CF settings could be documented.

🔒 Security Considerations

No security issues identified. Compression algorithms (LZ4, Zstd) are well-vetted and don't introduce security vulnerabilities in this context.

⚡ Performance Considerations

CPU vs I/O tradeoff: The PR explicitly trades CPU for I/O reduction, which is appropriate given the profiling data showing I/O as the bottleneck.
Decompression overhead on reads: Compressed data must be decompressed on read. The benchmark results show this is net positive, but consider:
- Read-heavy workloads might have different characteristics
- Cold storage reads (L5-L6 with Zstd) will have higher decompression cost
- The benchmark results already validate this tradeoff is beneficial
Compression ratio assumption: The 40-70% compression ratio mentioned in comments (line 130) should ideally be verified periodically as data patterns evolve.

🦀 Rust Best Practices

All good. The code follows idiomatic Rust patterns:

Proper use of RocksDB API
Clear ownership semantics
No unsafe code introduced

📝 Additional Suggestions

Consider logging compression stats: RocksDB can expose compression statistics. Consider enabling periodic logging to verify actual compression ratios in production:

// Around line 76-77, when statistics are enabled:
opts.enable_statistics();
opts.set_stats_dump_period_sec(600);

Documentation: Consider adding a comment about the CPU overhead tradeoff at the function level (line 30) for future maintainers.

✅ Conclusion

This is a high-quality PR with excellent motivation, clear implementation, and strong benchmark validation. The changes are well-targeted and appropriate for the identified bottleneck.

Recommendation: Approve with minor suggestions

The main suggestion is to extend the compression level array to cover more than 7 levels as a safety measure for large production datasets. The code duplication is a minor maintenance issue but not blocking. Everything else is solid.

The 35% throughput improvement and 48% p99 latency reduction speak for themselves - this is a significant performance win for Ethereum state storage operations.

Automated review by Claude (Anthropic) · custom prompt

github-actions · 2026-02-11T14:55:30Z

Lines of code report

Total lines added: 18
Total lines removed: 0
Total lines changed: 18

Detailed view

+------------------------------------------+-------+------+
| File                                     | Lines | Diff |
+------------------------------------------+-------+------+
| ethrex/crates/storage/backend/rocksdb.rs | 333   | +18  |
+------------------------------------------+-------+------+

greptile-apps · 2026-02-11T15:00:03Z

Greptile Overview

Greptile Summary

This PR enables the zstd feature on the workspace rocksdb dependency and configures per-level compression for the most write-heavy trie and flat key-value column families in RocksDBBackend::open.

Specifically, it keeps L0–L1 uncompressed to avoid CPU overhead on short-lived data, uses LZ4 for mid-tier levels (L2–L4), and Zstd for colder levels (L5–L6). This integrates cleanly with the existing per-column-family options initialization in crates/storage/backend/rocksdb.rs, without changing any read/write logic or table schemas.

Confidence Score: 5/5

This PR is safe to merge with minimal risk.
Changes are limited to enabling the RocksDB zstd feature and setting per-level compression on specific column families; no functional storage semantics or schemas were altered. The lockfile updates are consistent with the feature addition.
No files require special attention

Important Files Changed

Filename	Overview
Cargo.toml	Enables the RocksDB crate's `zstd` feature alongside existing `lz4` support.
Cargo.lock	Adds `zstd-sys` to the lockfile as a transitive dependency from enabling RocksDB zstd support.
crates/storage/backend/rocksdb.rs	Configures per-level compression (None/LZ4/Zstd) for trie and flat key-value column families via `set_compression_per_level`.

Sequence Diagram

sequenceDiagram
    participant Node as Ethrex Node
    participant Backend as RocksDBBackend::open
    participant Rocks as RocksDB

    Node->>Backend: open(path)
    Backend->>Rocks: list_cf(opts, path)
    Rocks-->>Backend: existing CF names
    Backend->>Backend: build CF descriptors
    Note over Backend: For ACCOUNT_TRIE_NODES/STORAGE_TRIE_NODES
    Backend->>Backend: set_compression_per_level([None,None,Lz4,Lz4,Lz4,Zstd,Zstd])
    Note over Backend: For ACCOUNT_FLATKEYVALUE/STORAGE_FLATKEYVALUE
    Backend->>Backend: set_compression_per_level([None,None,Lz4,Lz4,Lz4,Zstd,Zstd])
    Backend->>Rocks: open_cf_descriptors(opts, path, descriptors)
    Rocks-->>Backend: DB handle
    Backend-->>Node: RocksDBBackend
    Note over Rocks: New compactions write SSTs compressed per level

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

github-actions · 2026-02-11T15:37:36Z

Benchmark Block Execution Results Comparison Against Main

Command	Mean [s]	Min [s]	Max [s]	Relative
`base`	66.320 ± 0.525	65.655	67.055	1.01 ± 0.01
`head`	65.713 ± 0.562	64.944	66.471	1.00

ElFantasma · 2026-02-12T18:37:50Z

crates/storage/backend/rocksdb.rs

                    cf_opts.set_target_file_size_base(256 * 1024 * 1024); // 256MB
                    cf_opts.set_memtable_prefix_bloom_ratio(0.2); // Bloom filter

+                    // Per-level compression: same strategy as trie CFs.


nit: Since this is identical to the trie CFs block above, you could extract a shared constant to avoid the duplication:

const COMPRESSION_PER_LEVEL: [DBCompressionType; 7] = [ DBCompressionType::None, DBCompressionType::None, DBCompressionType::Lz4, DBCompressionType::Lz4, DBCompressionType::Lz4, DBCompressionType::Zstd, DBCompressionType::Zstd, ];

Not a big deal with only 2 copies though — fine either way.

ilitteri added 2 commits February 11, 2026 09:39

Enable zstd feature for rocksdb crate

d71e4dc

The per-level compression config uses Zstd for L5-L6, but the rocksdb crate was only built with LZ4 support. Add the "zstd" feature to the workspace dependency so Zstd compression is linked into the binary.

Copilot AI review requested due to automatic review settings February 11, 2026 14:52

ilitteri requested a review from a team as a code owner February 11, 2026 14:52

ilitteri added performance Block execution throughput and performance in general L1 Ethereum client labels Feb 11, 2026

github-project-automation bot added this to ethrex_l1 and ethrex_performance Feb 11, 2026

github-project-automation bot moved this to Todo in ethrex_performance Feb 11, 2026

github-actions bot assigned ilitteri Feb 11, 2026

ethrex-project-sync bot moved this to In Review in ethrex_l1 Feb 11, 2026

Update changelog with RocksDB compression PR

3ac6e34

Copilot AI reviewed Feb 11, 2026

View reviewed changes

edg-l approved these changes Feb 12, 2026

View reviewed changes

ElFantasma approved these changes Feb 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(l1): enable per-level LZ4/Zstd compression on RocksDB trie and FKV column families#6176

perf(l1): enable per-level LZ4/Zstd compression on RocksDB trie and FKV column families#6176
ilitteri wants to merge 3 commits intomainfrom
perf/rocksdb-compression

ilitteri commented Feb 11, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

greptile-apps bot commented Feb 11, 2026

Important Files Changed

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

ElFantasma Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ilitteri commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

No resync required

Benchmark Results (EXPB on ethrex-office-3)

Throughput

Latency — Fast (200 mainnet blocks)

Latency — Gigablocks (100 synthetic max-gas blocks)

Latency — Slow (5000 mainnet blocks)

CPU Profile Comparison — Gigablocks

CPU Profile Comparison — Slow

How to Test

Uh oh!

github-actions bot commented Feb 11, 2026

🤖 Kimi Code Review

Review Summary

Issues Found

1. Missing Error Handling (Critical)

2. Configuration Duplication (Maintainability)

3. Missing Feature Gate (Compatibility)

4. Performance Considerations (Moderate)

5. Documentation (Minor)

Security Considerations

Recommendations

Uh oh!

github-actions bot commented Feb 11, 2026

🤖 Codex Code Review

Uh oh!

github-actions bot commented Feb 11, 2026

🤖 Claude Code Review

Code Review - PR #6176: Per-Level RocksDB Compression

Summary

✅ Strengths

🔍 Findings

1. Potential Mismatch with RocksDB Level Count (Medium Priority)

2. Code Duplication (Low Priority)

3. Migration Consideration (Informational)

4. Dependency Version Verification (Low Priority)

5. Alignment with Global Compression Settings (Minor)

🔒 Security Considerations

⚡ Performance Considerations

🦀 Rust Best Practices

📝 Additional Suggestions

✅ Conclusion

Uh oh!

github-actions bot commented Feb 11, 2026

Lines of code report

Uh oh!

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Block Execution Results Comparison Against Main

Uh oh!

ElFantasma Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ilitteri commented Feb 11, 2026 •

edited

Loading

github-actions bot commented Feb 11, 2026 •

edited

Loading