-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Online Verification
RocksDB supports some online verifications while the DB is running, to catch potential software bugs and mitigate impacts of flawed hardware. Here are some of them.
The option does some basic consistency checks to LSM-tree. It checks files in non-0 level are not overlapping, and level 0 files follow some ordering rule, a file is not compacted twice, as well as some BlobDB related conditions. The DB will be frozen from new writes if a violation is detected. This feature is by default on.
This option does some more expensive extra checking when generating a new SST file. After each SST is generated in flush or compaction, the file is re-opened an check the content of the file: (1) the keys are in comparator order (also available and enabled by default during file write via ColumnFamilyOptions::check_flush_compaction_key_order
); (2) the hash of all the KVs is the same as calculated when we add KVs into it. These checks detect certain corruptions so we can prevent the corrupt files from being applied to the DB. We suggest users turn it on at least in shadow environments, and consider to run it in production too if you can afford the overheads.
With this option on, RocksDB will check the count of entries added into memtable while flushing it into an SST file. This feature is to have some online coverage to memtable corruption, which can be caused by RocksDB bugs or hardware issues. Checking total count is the first step to have some checks to it. This feature will be released in 6.21 and by default on. In the future, we will check more counters during memtables, e.g. number of puts or number of deletes.
RocksDB has extensive support for per key-value checksum to detect in-memory data corruptions. There are different options for different types of in-memory data. Checksum will be compute for each internal key (including user key, op type and sequence number) and value and verified when necessary. For implementation detail, refer to blog post.
This protects data in write batch. When replaying from WAL during DB open, this is enabled always (set to 8) so that WAL replaying is always protected.
This protects key-values that are in memtables.
This protects key-values in blocks that are in block cache, which is typically RocksDB's largest use of memory.
Contents
- RocksDB Wiki
- Overview
- RocksDB FAQ
- Terminology
- Requirements
- Contributors' Guide
- Release Methodology
- RocksDB Users and Use Cases
- RocksDB Public Communication and Information Channels
-
Basic Operations
- Iterator
- Prefix seek
- SeekForPrev
- Tailing Iterator
- Compaction Filter
- Multi Column Family Iterator (Experimental)
- Read-Modify-Write (Merge) Operator
- Column Families
- Creating and Ingesting SST files
- Single Delete
- Low Priority Write
- Time to Live (TTL) Support
- Transactions
- Snapshot
- DeleteRange
- Atomic flush
- Read-only and Secondary instances
- Approximate Size
- User-defined Timestamp
- Wide Columns
- BlobDB
- Online Verification
- Options
- MemTable
- Journal
- Cache
- Write Buffer Manager
- Compaction
- SST File Formats
- IO
- Compression
- Full File Checksum and Checksum Handoff
- Background Error Handling
- Huge Page TLB Support
- Tiered Storage (Experimental)
- Logging and Monitoring
- Known Issues
- Troubleshooting Guide
- Tests
- Tools / Utilities
-
Implementation Details
- Delete Stale Files
- Partitioned Index/Filters
- WritePrepared-Transactions
- WriteUnprepared-Transactions
- How we keep track of live SST files
- How we index SST
- Merge Operator Implementation
- RocksDB Repairer
- Write Batch With Index
- Two Phase Commit
- Iterator's Implementation
- Simulation Cache
- [To Be Deprecated] Persistent Read Cache
- DeleteRange Implementation
- unordered_write
- Extending RocksDB
- RocksJava
- Lua
- Performance
- Projects Being Developed
- Misc