Delete A Range Of Keys

Note: the techniques described in this page are obsolete. For users of RocksDB 5.18+, the native DeleteRange function is a better alternative for all known use cases.

In many cases, people want to drop a range of keys. For example, in MyRocks, we encoded rows from one table by prefixing with table ID, so that when we need to drop a table, all keys with the prefix needs to be dropped. For another example, if we store different attributes of a user with key with the format [user_id][attribute_id], then if a user deletes the account, we need to delete all the keys prefixing [user_id].

The standard way of deleting those keys is to iterate through all of them and issue Delete() to them one by one. This approach works when the number of keys to delete is not large. However, there are two potential drawbacks of this solution:

the space occupied by the data will not be reclaimed immediately. We'll wait for compaction to clean up the data. This is usually a concern when the range to delete takes a significant amount of space of the database.
the chunk of tombstones may slow down iterators.

There are two other ways you can do to delete keys from a range:

The first way is to issue a DeleteFilesInRange() to the range. The command will remove all SST files only containing keys in the range to delete. For a large chunk, it will immediately reclaim most space, so it is a good solution to problem 1. One thing needs to be taken care of is that after the operations, some keys in the range may still exist in the database. If you want to remove all of them, you should follow up with other operations, but it can be done in a slower pace. Also, be aware that, DeleteFilesInRange() will be removed despite of existing snapshots, so you shouldn't expect to be able to read data from the range using existing snapshots any more.

Another way is to apply compaction filter together with CompactRange(). You can write a compaction filter that can filter out keys from deleted range. When you want to delete keys from a range, call CompactRange() for the range to delete. While the compaction finishes, the keys will be dropped. We recommend you to turn CompactionFilter::IgnoreSnapshots() to true to make sure keys are dropped even if you have outstanding snapshots. Otherwise, you may not be able to fully remove all the keys in the range from the system. This approach can also solve the problem of reclaiming data, but it introduces more I/Os than the DeleteFilesInRange() approach. However, DeleteFilesInRange() cannot remove all the data in the range. So a better way is to first apply DeleteFilesInRange(), and then issue CompactRange() with compaction filter.

Problem 2 is a harder problem to solve. One way is to apply DeleteFilesInRange() + CompactRange() so that all keys and tombstones for the range are dropped. It works for large ranges, but it is too expensive if we frequently drop small ranges. The reason is that DeleteFilesInRange() is unlikely to delete any file and CompactRange() will delete far more data than needed because it needs to execute compactions against all SST files containing any key in the deletion rang. For the use cases where CompactRange() is too expensive, there are still two ways to reduce the harm:

if you never overwrite existing keys, you can try to use DB::SingleDelete() instead of Delete() to kill tombstones sooner. Tombstones will be dropped after it meets the original keys, rather than compacted to the last level.
use NewCompactOnDeletionCollectorFactory() to speed up compaction when there are chunks of tombstones.

Contents

RocksDB Wiki
Overview
RocksDB FAQ
Terminology
Requirements
Contributors' Guide
Release Methodology
RocksDB Users and Use Cases
RocksDB Public Communication and Information Channels
Basic Operations
- Iterator
- Prefix seek
- SeekForPrev
- Tailing Iterator
- Compaction Filter
- Multi Column Family Iterator (Experimental)
- Read-Modify-Write (Merge) Operator
- Column Families
- Creating and Ingesting SST files
- Single Delete
- Low Priority Write
- Time to Live (TTL) Support
- Transactions
- Snapshot
- DeleteRange
- Atomic flush
- Read-only and Secondary instances
- Approximate Size
- User-defined Timestamp
- Wide Columns
- BlobDB
- Online Verification
Options
- Setup Options and Basic Tuning
- Option String and Option Map
- RocksDB Options File
MemTable
Journal
- Write Ahead Log (WAL)
- MANIFEST
- Track WAL in MANIFEST
Cache
- Block Cache
- SecondaryCache (Experimental)
Write Buffer Manager
Compaction
- Leveled Compaction
- Universal compaction style
- FIFO compaction style
- Manual Compaction
- Subcompaction
- Choose Level Compaction Files
- Managing Disk Space Utilization
- Trivial Move Compaction
- Remote Compaction (Experimental)
SST File Formats
- Block-based Table Format
- PlainTable Format
- CuckooTable Format
- Index Block Format
- Bloom Filter
- Data Block Hash Index
IO
- Rate Limiter
- SST File Manager
- Direct I/O
Compression
- Dictionary Compression
Full File Checksum and Checksum Handoff
Background Error Handling
Huge Page TLB Support
Tiered Storage (Experimental)
Logging and Monitoring
- Logger
- Statistics
- Compaction Stats and DB Status
- Perf Context and IO Stats Context
- EventListener
Known Issues
Troubleshooting Guide
Tests
- Stress Test
- Fuzzing
- Benchmarking
Tools / Utilities
- Administration and Data Access Tool
- How to Backup RocksDB?
- Replication Helpers
- Checkpoints
- How to persist in-memory RocksDB database
- Third-party language bindings
- RocksDB Trace, Replay, Analyzer, and Workload Generation
- Block cache analysis and simulation tools
- IO Tracer and Parser
Implementation Details
- Delete Stale Files
- Partitioned Index/Filters
- WritePrepared-Transactions
- WriteUnprepared-Transactions
- How we keep track of live SST files
- How we index SST
- Merge Operator Implementation
- RocksDB Repairer
- Write Batch With Index
- Two Phase Commit
- Iterator's Implementation
- Simulation Cache
- [To Be Deprecated] Persistent Read Cache
- DeleteRange Implementation
- unordered_write
Extending RocksDB
- RocksDB Configurable Objects
- The Customizable Class
- Object Registry
RocksJava
- RocksJava Basics
- Logging in RocksJava
- JNI Debugging
- RocksJava API TODO
- RocksJava Performance on Flash Storage
- Tuning RocksDB from Java
Lua
- Lua CompactionFilter
Performance
- Performance Benchmarks
- In Memory Workload Performance
- Read-Modify-Write (Merge) Performance
- Delete A Range Of Keys
- Write Stalls
- Pipelined Write
- MultiGet Performance
- Tuning Guide
- Memory usage in RocksDB
- Speed-Up DB Open
- Implement Queue Service Using RocksDB
- Asynchronous IO
- Off-peak in RocksDB
Projects Being Developed
Misc
- Building on Windows
- Developing with an IDE
- Open Projects
- Talks
- Publication
- Features Not in LevelDB
- How to ask a performance-related question?
- Articles about Rocks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete A Range Of Keys

Clone this wiki locally