perf(l1): enable per-level LZ4/Zstd compression on RocksDB trie and FKV column families#6176
perf(l1): enable per-level LZ4/Zstd compression on RocksDB trie and FKV column families#6176
Conversation
…n families Add per-level compression to ACCOUNT_TRIE_NODES, STORAGE_TRIE_NODES, ACCOUNT_FLATKEYVALUE, and STORAGE_FLATKEYVALUE column families: - L0-L1: None (recently flushed, short-lived data) - L2-L4: LZ4 (fast compression for mid-tier levels) - L5-L6: Zstd (best compression ratio for cold data) These four CFs are the most write-heavy and were previously uncompressed. RLP-encoded trie node and flat key-value data is highly compressible (40-70% size reduction), so enabling compression reduces SST file sizes, compaction I/O, and foreground read bandwidth. Expected 5-15% total CPU reduction across workloads.
The per-level compression config uses Zstd for L5-L6, but the rocksdb crate was only built with LZ4 support. Add the "zstd" feature to the workspace dependency so Zstd compression is linked into the binary.
🤖 Kimi Code ReviewReview SummaryThis PR adds zstd compression support to RocksDB and configures per-level compression strategies for different column families. The changes are well-intentioned but have some issues that need addressing. Issues Found1. Missing Error Handling (Critical)File: The // Current code (lines 131-144):
cf_opts.set_compression_per_level(&[
rocksdb::DBCompressionType::None,
// ... other levels
]);
// Should check if zstd is actually available:
if !rocksdb::DBCompressionType::Zstd.is_supported() {
warn!("Zstd compression not supported, falling back to LZ4");
// Use alternative compression strategy
}2. Configuration Duplication (Maintainability)File: The compression configuration is duplicated for both column families. Extract this into a helper function: fn configure_compression_options(opts: &mut Options) {
let compression_levels = if rocksdb::DBCompressionType::Zstd.is_supported() {
vec![
rocksdb::DBCompressionType::None,
rocksdb::DBCompressionType::None,
rocksdb::DBCompressionType::Lz4,
rocksdb::DBCompressionType::Lz4,
rocksdb::DBCompressionType::Lz4,
rocksdb::DBCompressionType::Zstd,
rocksdb::DBCompressionType::Zstd,
]
} else {
// Fallback configuration
vec![/* ... */]
};
opts.set_compression_per_level(&compression_levels);
}3. Missing Feature Gate (Compatibility)File: The zstd feature should be conditional to avoid breaking builds on systems without zstd: [dependencies]
rocksdb = { version = "0.24.0", default-features = false, features = [
"bindgen-runtime",
"lz4",
"zstd", # Consider making this optional
] }4. Performance Considerations (Moderate)The compression strategy assumes 7 levels (L0-L6), but RocksDB's default number of levels is 7. However, this should be explicitly verified: // Add verification
assert_eq!(opts.get_num_levels(), 7, "Unexpected number of levels");5. Documentation (Minor)File: The comments about compression ratios (40-70%) should cite sources or be marked as estimates: // TODO: Verify compression ratios through benchmarking
// Preliminary tests suggest RLP-encoded trie nodes compress 40-70%Security Considerations
Recommendations
The PR is generally well-structured but needs these improvements for production readiness. Automated review by Kimi (Moonshot AI) · custom prompt |
🤖 Codex Code ReviewAdds per‑level compression and zstd dependency only. Mostly fine; a few things to consider:
If you want, I can check for config hooks or write a small runtime guard to log missing zstd support. Automated review by OpenAI Codex · custom prompt |
🤖 Claude Code ReviewPerfect! Now I have all the context I need. Let me provide a comprehensive code review. Code Review - PR #6176: Per-Level RocksDB CompressionSummaryThis PR adds per-level compression configuration to RocksDB for trie and flat key-value column families, targeting a significant performance improvement with impressive benchmark results (+35.5% throughput on gigablocks, -48.8% p99 latency). ✅ Strengths
🔍 Findings1. Potential Mismatch with RocksDB Level Count (Medium Priority)Location: crates/storage/backend/rocksdb.rs:131-139 and crates/storage/backend/rocksdb.rs:156-164 The compression configuration specifies 7 levels (L0-L6), but RocksDB's actual level count depends on runtime configuration. With Concern: If RocksDB uses more than 7 levels in production (possible with large datasets), levels beyond L6 will have undefined compression behavior. The rocksdb crate documentation indicates that Recommendation:
// Suggested improvement:
cf_opts.set_compression_per_level(&[
rocksdb::DBCompressionType::None, // L0
rocksdb::DBCompressionType::None, // L1
rocksdb::DBCompressionType::Lz4, // L2
rocksdb::DBCompressionType::Lz4, // L3
rocksdb::DBCompressionType::Lz4, // L4
rocksdb::DBCompressionType::Zstd, // L5
rocksdb::DBCompressionType::Zstd, // L6
rocksdb::DBCompressionType::Zstd, // L7+ (if they exist)
]);2. Code Duplication (Low Priority)Location: crates/storage/backend/rocksdb.rs:131-139 and crates/storage/backend/rocksdb.rs:156-164 The exact same compression configuration is duplicated for trie and FKV column families. This creates a maintenance burden if the compression strategy needs to be adjusted. Recommendation: Extract to a constant or helper: const TRIE_AND_FKV_COMPRESSION_LEVELS: [rocksdb::DBCompressionType; 7] = [
rocksdb::DBCompressionType::None, // L0
rocksdb::DBCompressionType::None, // L1
rocksdb::DBCompressionType::Lz4, // L2
rocksdb::DBCompressionType::Lz4, // L3
rocksdb::DBCompressionType::Lz4, // L4
rocksdb::DBCompressionType::Zstd, // L5
rocksdb::DBCompressionType::Zstd, // L6
];
// Then use:
cf_opts.set_compression_per_level(&TRIE_AND_FKV_COMPRESSION_LEVELS);3. Migration Consideration (Informational)Impact: Existing databases When users upgrade, existing SST files won't be automatically recompressed. The new compression only applies to:
Question: Should the PR description mention this migration behavior? Users with large existing databases might not see immediate benefits until natural compaction occurs, or they might want to trigger manual compaction. 4. Dependency Version Verification (Low Priority)Location: Cargo.toml:130-134 The PR adds the Verification needed: Confirm that rocksdb 0.24.0's Zstd implementation is production-ready and doesn't have known issues. (This is likely fine, but worth a quick check of the rocksdb-rs changelog). 5. Alignment with Global Compression Settings (Minor)Location: crates/storage/backend/rocksdb.rs:65 The global default is 🔒 Security ConsiderationsNo security issues identified. Compression algorithms (LZ4, Zstd) are well-vetted and don't introduce security vulnerabilities in this context. ⚡ Performance Considerations
🦀 Rust Best PracticesAll good. The code follows idiomatic Rust patterns:
📝 Additional Suggestions
// Around line 76-77, when statistics are enabled:
opts.enable_statistics();
opts.set_stats_dump_period_sec(600);
✅ ConclusionThis is a high-quality PR with excellent motivation, clear implementation, and strong benchmark validation. The changes are well-targeted and appropriate for the identified bottleneck. Recommendation: Approve with minor suggestions The main suggestion is to extend the compression level array to cover more than 7 levels as a safety measure for large production datasets. The code duplication is a minor maintenance issue but not blocking. Everything else is solid. The 35% throughput improvement and 48% p99 latency reduction speak for themselves - this is a significant performance win for Ethereum state storage operations. Automated review by Claude (Anthropic) · custom prompt |
Lines of code reportTotal lines added: Detailed view |
Greptile OverviewGreptile SummaryThis PR enables the Specifically, it keeps L0–L1 uncompressed to avoid CPU overhead on short-lived data, uses LZ4 for mid-tier levels (L2–L4), and Zstd for colder levels (L5–L6). This integrates cleanly with the existing per-column-family options initialization in Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| Cargo.toml | Enables the RocksDB crate's zstd feature alongside existing lz4 support. |
| Cargo.lock | Adds zstd-sys to the lockfile as a transitive dependency from enabling RocksDB zstd support. |
| crates/storage/backend/rocksdb.rs | Configures per-level compression (None/LZ4/Zstd) for trie and flat key-value column families via set_compression_per_level. |
Sequence Diagram
sequenceDiagram
participant Node as Ethrex Node
participant Backend as RocksDBBackend::open
participant Rocks as RocksDB
Node->>Backend: open(path)
Backend->>Rocks: list_cf(opts, path)
Rocks-->>Backend: existing CF names
Backend->>Backend: build CF descriptors
Note over Backend: For ACCOUNT_TRIE_NODES/STORAGE_TRIE_NODES
Backend->>Backend: set_compression_per_level([None,None,Lz4,Lz4,Lz4,Zstd,Zstd])
Note over Backend: For ACCOUNT_FLATKEYVALUE/STORAGE_FLATKEYVALUE
Backend->>Backend: set_compression_per_level([None,None,Lz4,Lz4,Lz4,Zstd,Zstd])
Backend->>Rocks: open_cf_descriptors(opts, path, descriptors)
Rocks-->>Backend: DB handle
Backend-->>Node: RocksDBBackend
Note over Rocks: New compactions write SSTs compressed per level
Benchmark Block Execution Results Comparison Against Main
|
| cf_opts.set_target_file_size_base(256 * 1024 * 1024); // 256MB | ||
| cf_opts.set_memtable_prefix_bloom_ratio(0.2); // Bloom filter | ||
|
|
||
| // Per-level compression: same strategy as trie CFs. |
There was a problem hiding this comment.
nit: Since this is identical to the trie CFs block above, you could extract a shared constant to avoid the duplication:
const COMPRESSION_PER_LEVEL: [DBCompressionType; 7] = [
DBCompressionType::None, DBCompressionType::None,
DBCompressionType::Lz4, DBCompressionType::Lz4, DBCompressionType::Lz4,
DBCompressionType::Zstd, DBCompressionType::Zstd,
];Not a big deal with only 2 copies though — fine either way.
Motivation
CPU profiling on EXPB benchmarks (gigablocks and slow scenarios) shows RocksDB compaction consuming 36-60% of cumulative CPU time, with
PosixRandomAccessFile::Readat 9-12% flat — indicating heavy disk I/O from uncompressed SST files in the four most write-heavy column families (trie nodes and flat key-values).Description
Add per-level compression to
ACCOUNT_TRIE_NODES,STORAGE_TRIE_NODES,ACCOUNT_FLATKEYVALUE, andSTORAGE_FLATKEYVALUEcolumn families:RLP-encoded trie node and flat key-value data compresses 40-70%. This reduces SST file sizes, compaction I/O, and foreground read bandwidth at the cost of some CPU for compression/decompression.
Also adds the
zstdfeature to the workspacerocksdbdependency (it previously only hadlz4).No resync required
This is not a breaking change. RocksDB stores the compression type in each SST file's metadata, so it can read both compressed and uncompressed files transparently. Existing uncompressed SST files remain readable as-is — new compressed files are only written as compaction runs. The database gradually transitions to compressed format over time with no manual intervention.
Benchmark Results (EXPB on ethrex-office-3)
All runs use CPU profiling enabled. Fresh baselines (#63-65) and compression runs (#61-62, #66) were run back-to-back on the same machine.
Throughput
Latency — Fast (200 mainnet blocks)
Latency — Gigablocks (100 synthetic max-gas blocks)
Latency — Slow (5000 mainnet blocks)
CPU Profile Comparison — Gigablocks
CPU Profile Comparison — Slow
Key takeaway: Compression trades ~13-19% of CPU samples for reduced I/O. The gigablocks scenario benefits most — wall time dropped 20% because compaction processes smaller (compressed) SST files, reducing write stall pressure during heavy block execution. The slow scenario shows moderate improvement with block latency dropping 10%.
How to Test
fast,gigablocks, andslowscenarios with CPU profiling enabledethrex-mainnet-test-2— synced to chain tip in ~20 min)