-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Compaction Trivial Move
Trivial move feature can help reduce write amplification. Trivial move is a type of compaction that moves the SST file directly to the next level without compacting the file. It happens when there's no overlap with the key ranges in the next level. For example, in the following graph, SST file 2 on Level 1 doesn't have any overlap files on Level 2, so when compacting file 2 to Level 2, there's no need to re-write file 2: SST file 1 on Level 1 cannot do trivial move, it has to be compacted with file 3 and 4 as there's overlap.
This feature is available in both Leveled (always enabled) and Universal compaction (disabled on default; can be enable through options.compaction_options_universal.allow_trivial_move
).
Trivial move is skipped in the following scenarios:
- When compression settings are different between the input and the output level.
- This can happen when Hybrid compression is used (
options.bottommost_compression
) or per level compression is configured (options.compression_per_level
)
- This can happen when Hybrid compression is used (
- When Manual compaction is used, and compaction filter is configured
- (For Leveled Compaction alone) When the SST moved to level N+1 would overlap too many SSTs in level N+2 (would exceed
max_compaction_bytes
). The default setting formax_compaction_bytes
is 25 *target_file_size_base
(So, any overlap spanning 25 SSTs in Level N+2).
Two potential side effects of Trivial move are:
- Many small/uncompacted SST files
- Since, these files are not compacted, Compaction filters are not run on these files.
Workarounds possible are:
- Use Manual Compaction to force the compaction by:
- setting
BottommostLevelCompaction
tokForce
orkForceOptimized
. Or - Have compaction filter configured
- setting
- Setting a different bottommost compression - this will also force the compaction on the bottommost level.
Contents
- RocksDB Wiki
- Overview
- RocksDB FAQ
- Terminology
- Requirements
- Contributors' Guide
- Release Methodology
- RocksDB Users and Use Cases
- RocksDB Public Communication and Information Channels
-
Basic Operations
- Iterator
- Prefix seek
- SeekForPrev
- Tailing Iterator
- Compaction Filter
- Multi Column Family Iterator (Experimental)
- Read-Modify-Write (Merge) Operator
- Column Families
- Creating and Ingesting SST files
- Single Delete
- Low Priority Write
- Time to Live (TTL) Support
- Transactions
- Snapshot
- DeleteRange
- Atomic flush
- Read-only and Secondary instances
- Approximate Size
- User-defined Timestamp
- Wide Columns
- BlobDB
- Online Verification
- Options
- MemTable
- Journal
- Cache
- Write Buffer Manager
- Compaction
- SST File Formats
- IO
- Compression
- Full File Checksum and Checksum Handoff
- Background Error Handling
- Huge Page TLB Support
- Tiered Storage (Experimental)
- Logging and Monitoring
- Known Issues
- Troubleshooting Guide
- Tests
- Tools / Utilities
-
Implementation Details
- Delete Stale Files
- Partitioned Index/Filters
- WritePrepared-Transactions
- WriteUnprepared-Transactions
- How we keep track of live SST files
- How we index SST
- Merge Operator Implementation
- RocksDB Repairer
- Write Batch With Index
- Two Phase Commit
- Iterator's Implementation
- Simulation Cache
- [To Be Deprecated] Persistent Read Cache
- DeleteRange Implementation
- unordered_write
- Extending RocksDB
- RocksJava
- Lua
- Performance
- Projects Being Developed
- Misc