Skip to content

Releases: facebook/rocksdb

RocksDB 7.1.2

20 Apr 01:48
Compare
Choose a tag to compare

7.1.2 (2022-04-19)

Bug Fixes

  • Fixed bug which caused rocksdb failure in the situation when rocksdb was accessible using UNC path
  • Fixed a race condition when 2PC is disabled and WAL tracking in the MANIFEST is enabled. The race condition is between two background flush threads trying to install flush results, causing a WAL deletion not tracked in the MANIFEST. A future DB open may fail.
  • Fixed a heap use-after-free race with DropColumnFamily.
  • Fixed a bug that rocksdb.read.block.compaction.micros cannot track compaction stats (#9722).
  • Fixed file_type, relative_filename and directory fields returned by GetLiveFilesMetaData(), which were added in inheriting from FileStorageInfo.
  • Fixed a bug affecting track_and_verify_wals_in_manifest. Without the fix, application may see "open error: Corruption: Missing WAL with log number" while trying to open the db. The corruption is a false alarm but prevents DB open (#9766).

RocksDB 7.1.1

13 Apr 21:20
Compare
Choose a tag to compare

7.1.1 (2022-04-07)

Bug Fixes

  • Fix segfault in FilePrefetchBuffer with async_io as it doesn't wait for pending jobs to complete on destruction.

7.1.0 (2022-03-23)

New Features

  • Allow WriteBatchWithIndex to index a WriteBatch that includes keys with user-defined timestamps. The index itself does not have timestamp.
  • Add support for user-defined timestamps to write-committed transaction without API change. The TransactionDB layer APIs do not allow timestamps because we require that all user-defined-timestamps-aware operations go through the Transaction APIs.
  • Added BlobDB options to ldb
  • BlockBasedTableOptions::detect_filter_construct_corruption can now be dynamically configured using DB::SetOptions.
  • Automatically recover from retryable read IO errors during backgorund flush/compaction.
  • Experimental support for preserving file Temperatures through backup and restore, and for updating DB metadata for outside changes to file Temperature (UpdateManifestForFilesState or ldb update_manifest --update_temperatures).
  • Experimental support for async_io in ReadOptions which is used by FilePrefetchBuffer to prefetch some of the data asynchronously, if reads are sequential and auto readahead is enabled by rocksdb internally.

Bug Fixes

  • Fixed a major performance bug in which Bloom filters generated by pre-7.0 releases are not read by early 7.0.x releases (and vice-versa) due to changes to FilterPolicy::Name() in #9590. This can severely impact read performance and read I/O on upgrade or downgrade with existing DB, but not data correctness.
  • Fixed a data race on versions_ between DBImpl::ResumeImpl() and threads waiting for recovery to complete (#9496)
  • Fixed a bug caused by race among flush, incoming writes and taking snapshots. Queries to snapshots created with these race condition can return incorrect result, e.g. resurfacing deleted data.
  • Fixed a bug that DB flush uses options.compression even options.compression_per_level is set.
  • Fixed a bug that DisableManualCompaction may assert when disable an unscheduled manual compaction.
  • Fix a race condition when cancel manual compaction with DisableManualCompaction. Also DB close can cancel the manual compaction thread.
  • Fixed a potential timer crash when open close DB concurrently.
  • Fixed a race condition for alive_log_files_ in non-two-write-queues mode. The race is between the write_thread_ in WriteToWAL() and another thread executing FindObsoleteFiles(). The race condition will be caught if __glibcxx_requires_nonempty is enabled.
  • Fixed a bug that Iterator::Refresh() reads stale keys after DeleteRange() performed.
  • Fixed a race condition when disable and re-enable manual compaction.
  • Fixed automatic error recovery failure in atomic flush.
  • Fixed a race condition when mmaping a WritableFile on POSIX.

Public API changes

  • Added pure virtual FilterPolicy::CompatibilityName(), which is needed for fixing major performance bug involving FilterPolicy naming in SST metadata without affecting Customizable aspect of FilterPolicy. This change only affects those with their own custom or wrapper FilterPolicy classes.
  • options.compression_per_level is dynamically changeable with SetOptions().
  • Added WriteOptions::rate_limiter_priority. When set to something other than Env::IO_TOTAL, the internal rate limiter (DBOptions::rate_limiter) will be charged at the specified priority for writes associated with the API to which the WriteOptions was provided. Currently the support covers automatic WAL flushes, which happen during live updates (Put(), Write(), Delete(), etc.) when WriteOptions::disableWAL == false and DBOptions::manual_wal_flush == false.
  • Add DB::OpenAndTrimHistory API. This API will open DB and trim data to the timestamp specified by trim_ts (The data with timestamp larger than specified trim bound will be removed). This API should only be used at a timestamp-enabled column families recovery. If the column family doesn't have timestamp enabled, this API won't trim any data on that column family. This API is not compatible with avoid_flush_during_recovery option.
  • Remove BlockBasedTableOptions.hash_index_allow_collision which already takes no effect.

RocksDB 7.0.4

29 Mar 20:31
Compare
Choose a tag to compare

7.0.4 (2022-03-29)

Bug Fixes

  • Fixed a race condition when disable and re-enable manual compaction.
  • Fixed a race condition for alive_log_files_ in non-two-write-queues mode. The race is between the write_thread_ in WriteToWAL() and another thread executing FindObsoleteFiles(). The race condition will be caught if __glibcxx_requires_nonempty is enabled.
  • Fixed a race condition when mmaping a WritableFile on POSIX.
  • Fixed a race condition when 2PC is disabled and WAL tracking in the MANIFEST is enabled. The race condition is between two background flush threads trying to install flush results, causing a WAL deletion not tracked in the MANIFEST. A future DB open may fail.
  • Fixed a heap use-after-free race with DropColumnFamily.
  • Fixed a bug that rocksdb.read.block.compaction.micros cannot track compaction stats (#9722).

RocksDB 6.29.5

29 Mar 20:30
Compare
Choose a tag to compare

6.29.5 (2022-03-29)

Bug Fixes

  • Fixed a race condition for alive_log_files_ in non-two-write-queues mode. The race is between the write_thread_ in WriteToWAL() and another thread executing FindObsoleteFiles(). The race condition will be caught if __glibcxx_requires_nonempty is enabled.
  • Fixed a race condition when mmaping a WritableFile on POSIX.
  • Fixed a race condition when 2PC is disabled and WAL tracking in the MANIFEST is enabled. The race condition is between two background flush threads trying to install flush results, causing a WAL deletion not tracked in the MANIFEST. A future DB open may fail.
  • Fixed a heap use-after-free race with DropColumnFamily.
  • Fixed a bug that rocksdb.read.block.compaction.micros cannot track compaction stats (#9722).

RocksDB 7.0.3

25 Mar 17:00
Compare
Choose a tag to compare

7.0.3 (2022-03-25)

Bug Fixes

  • Fixed a major performance bug in which Bloom filters generated by pre-7.0 releases are not read by early 7.0.x releases (and vice-versa) due to changes to FilterPolicy::Name() in #9590. This can severely impact read performance and read I/O on upgrade or downgrade with existing DB, but not data correctness.
  • Fixed a bug that Iterator::Refresh() reads stale keys after DeleteRange() performed.

Public API changes

  • Added pure virtual FilterPolicy::CompatibilityName(), which is needed for fixing major performance bug involving FilterPolicy naming in SST metadata without affecting Customizable aspect of FilterPolicy. For source code, this change only affects those with their own custom or wrapper FilterPolicy classes, but does break compiled library binary compatibility in a patch release.
  • Since RocksDB 7, RocksJava now requires Java 8 (previously Java 7).

RocksDB 6.29.4

23 Mar 01:24
Compare
Choose a tag to compare

6.29.4 (2022-03-22)

Bug Fixes

  • Fixed a bug caused by race among flush, incoming writes and taking snapshots. Queries to snapshots created with these race condition can return incorrect result, e.g. resurfacing deleted data.
  • Fixed a bug that DisableManualCompaction may assert when disable an unscheduled manual compaction.
  • Fixed a bug that Iterator::Refresh() reads stale keys after DeleteRange() performed.
  • Fixed a race condition when disable and re-enable manual compaction.
  • Fix a race condition when cancel manual compaction with DisableManualCompaction. Also DB close can cancel the manual compaction thread.
  • Fixed a data race on versions_ between DBImpl::ResumeImpl() and threads waiting for recovery to complete (#9496)
  • Fixed a read-after-free bug in DB::GetMergeOperands().
  • Fixed NUM_INDEX_AND_FILTER_BLOCKS_READ_PER_LEVEL, NUM_DATA_BLOCKS_READ_PER_LEVEL, and NUM_SST_READ_PER_LEVEL stats to be reported once per MultiGet batch per level.

RocksDB 7.0.2

14 Mar 16:45
Compare
Choose a tag to compare

Rocksdb Change Log

7.0.2 (2022-03-12)

  • Fixed a bug that DisableManualCompaction may assert when disable an unscheduled manual compaction.

RocksDB 7.0.1

12 Mar 00:02
Compare
Choose a tag to compare

Rocksdb Change Log

7.0.1 (2022-03-02)

Bug Fixes

  • Fix a race condition when cancel manual compaction with DisableManualCompaction. Also DB close can cancel the manual compaction thread.
  • Fixed a data race on versions_ between DBImpl::ResumeImpl() and threads waiting for recovery to complete (#9496)
  • Fixed a bug caused by race among flush, incoming writes and taking snapshots. Queries to snapshots created with these race condition can return incorrect result, e.g. resurfacing deleted data.

7.0.0 (2022-02-20)

Bug Fixes

  • Fixed a major bug in which batched MultiGet could return old values for keys deleted by DeleteRange when memtable Bloom filter is enabled (memtable_prefix_bloom_size_ratio > 0). (The fix includes a substantial MultiGet performance improvement in the unusual case of both memtable_whole_key_filtering and prefix_extractor.)
  • Fixed more cases of EventListener::OnTableFileCreated called with OK status, file_size==0, and no SST file kept. Now the status is Aborted.
  • Fixed a read-after-free bug in DB::GetMergeOperands().
  • Fix a data loss bug for 2PC write-committed transaction caused by concurrent transaction commit and memtable switch (#9571).
  • Fixed NUM_INDEX_AND_FILTER_BLOCKS_READ_PER_LEVEL, NUM_DATA_BLOCKS_READ_PER_LEVEL, and NUM_SST_READ_PER_LEVEL stats to be reported once per MultiGet batch per level.

Performance Improvements

  • Mitigated the overhead of building the file location hash table used by the online LSM tree consistency checks, which can improve performance for certain workloads (see #9351).
  • Switched to using a sorted std::vector instead of std::map for storing the metadata objects for blob files, which can improve performance for certain workloads, especially when the number of blob files is high.
  • DisableManualCompaction() doesn't have to wait scheduled manual compaction to be executed in thread-pool to cancel the job.

Public API changes

  • Require C++17 compatible compiler (GCC >= 7, Clang >= 5, Visual Studio >= 2017) for compiling RocksDB and any code using RocksDB headers (previously required C++11). See #9388.
  • Require Java 8 for compiling RocksJava (previously Java 7). See #9541
  • Removed deprecated automatic finalization of RocksJava RocksObjects, the user must explicitly call close() on their RocksJava objects. See #9523.
  • Added ReadOptions::rate_limiter_priority. When set to something other than Env::IO_TOTAL, the internal rate limiter (DBOptions::rate_limiter) will be charged at the specified priority for file reads associated with the API to which the ReadOptions was provided.
  • Remove HDFS support from main repo.
  • Remove librados support from main repo.
  • Remove obsolete backupable_db.h and type alias BackupableDBOptions. Use backup_engine.h and BackupEngineOptions. Similar renamings are in the C and Java APIs.
  • Removed obsolete utility_db.h and UtilityDB::OpenTtlDB. Use db_ttl.h and DBWithTTL::Open.
  • Remove deprecated API DB::AddFile from main repo.
  • Remove deprecated API ObjectLibrary::Register() and the (now obsolete) Regex public API. Use ObjectLibrary::AddFactory() with PatternEntry instead.
  • Remove deprecated option DBOption::table_cache_remove_scan_count_limit.
  • Remove deprecated API AdvancedColumnFamilyOptions::soft_rate_limit.
  • Remove deprecated API AdvancedColumnFamilyOptions::hard_rate_limit.
  • Remove deprecated API DBOption::base_background_compactions.
  • Remove deprecated API DBOptions::purge_redundant_kvs_while_flush.
  • Remove deprecated overloads of API DB::CompactRange.
  • Remove deprecated option DBOptions::skip_log_error_on_recovery.
  • Remove ReadOptions::iter_start_seqnum which has been deprecated.
  • Remove DBOptions::preserved_deletes and DB::SetPreserveDeletesSequenceNumber().
  • Remove deprecated API AdvancedColumnFamilyOptions::rate_limit_delay_max_milliseconds.
  • Removed timestamp from WriteOptions. Accordingly, added to DB APIs Put, Delete, SingleDelete, etc. accepting an additional argument 'timestamp'. Added Put, Delete, SingleDelete, etc to WriteBatch accepting an additional argument 'timestamp'. Removed WriteBatch::AssignTimestamps(vector) API. Renamed WriteBatch::AssignTimestamp() to WriteBatch::UpdateTimestamps() with clarified comments.
  • Changed type of cache buffer passed to Cache::CreateCallback from void* to const void*.
  • Significant updates to FilterPolicy-related APIs and configuration:
    • Remove public API support for deprecated, inefficient block-based filter (use_block_based_builder=true).
      • Old code and configuration strings that would enable it now quietly enable full filters instead, though any built-in FilterPolicy can still read block-based filters. This includes changing the longstanding default behavior of the Java API.
      • Remove deprecated FilterPolicy::CreateFilter() and FilterPolicy::KeyMayMatch()
      • Remove rocksdb_filterpolicy_create() from C API, as the only C API support for custom filter policies is now obsolete.
      • If temporary memory usage in full filter creation is a problem, consider using partitioned filters, smaller SST files, or setting reserve_table_builder_memory=true.
    • Remove support for "filter_policy=experimental_ribbon" configuration
      string. Use something like "filter_policy=ribbonfilter:10" instead.
    • Allow configuration string like "filter_policy=bloomfilter:10" without
      bool, to minimize acknowledgement of obsolete block-based filter.
    • Made FilterPolicy Customizable. Configuration of filter_policy is now accurately saved in OPTIONS file and can be loaded with LoadOptionsFromFile. (Loading an OPTIONS file generated by a previous version only enables reading and using existing filters, not generating new filters. Previously, no filter_policy would be configured from a saved OPTIONS file.)
    • Change meaning of nullptr return from GetBuilderWithContext() from "use
      block-based filter" to "generate no filter in this case."
      • Also, when user specifies bits_per_key < 0.5, we now round this down
        to "no filter" because we expect a filter with >= 80% FP rate is
        unlikely to be worth the CPU cost of accessing it (esp with
        cache_index_and_filter_blocks=1 or partition_filters=1).
      • bits_per_key >= 0.5 and < 1.0 is still rounded up to 1.0 (for 62% FP
        rate)
    • Remove class definitions for FilterBitsBuilder and FilterBitsReader from
      public API, so these can evolve more easily as implementation details.
      Custom FilterPolicy can still decide what kind of built-in filter to use
      under what conditions.
    • Also removed deprecated functions
      • FilterPolicy::GetFilterBitsBuilder()
      • NewExperimentalRibbonFilterPolicy()
    • Remove default implementations of
      • FilterPolicy::GetBuilderWithContext()
  • Remove default implementation of Name() from FileSystemWrapper.
  • Rename SizeApproximationOptions.include_memtabtles to SizeApproximationOptions.include_memtables.
  • Remove deprecated option DBOptions::max_mem_compaction_level.
  • Return Status::InvalidArgument from ObjectRegistry::NewObject if a factory exists but the object ould not be created (returns NotFound if the factory is missing).
  • Remove deprecated overloads of API DB::GetApproximateSizes.
  • Remove deprecated option DBOptions::new_table_reader_for_compaction_inputs.
  • Add Transaction::SetReadTimestampForValidation() and Transaction::SetCommitTimestamp(). Default impl returns NotSupported().
  • Add support for decimal patterns to ObjectLibrary::PatternEntry
  • Remove deprecated remote compaction APIs CompactionService::Start() and CompactionService::WaitForComplete(). Please use CompactionService::StartV2(), CompactionService::WaitForCompleteV2() instead, which provides the same information plus extra data like priority, db_id, etc.
  • ColumnFamilyOptions::OldDefaults and DBOptions::OldDefaults are marked deprecated, as they are no longer maintained.
  • Add subcompaction callback APIs: OnSubcompactionBegin() and OnSubcompactionCompleted().
  • Add file Temperature information to FileOperationInfo in event listener API.
  • Change the type of SizeApproximationFlags from enum to enum class. Also update the signature of DB::GetApproximateSizes API from uint8_t to SizeApproximationFlags.
  • Add Temperature hints information from RocksDB in API NewSequentialFile(). backup and checkpoint operations need to open the source files with NewSequentialFile(), which will have the temperature hints. Other operations are not covered.

Behavior Changes

  • Disallow the combination of DBOptions.use_direct_io_for_flush_and_compaction == true and DBOptions.writable_file_max_buffer_size == 0. This combination can cause WritableFileWriter::Append() to loop forever, and it does not make much sense in direct IO.
  • ReadOptions::total_order_seek no longer affects DB::Get(). The original motivation for this interaction has been obsolete since RocksDB has been able to detect whether the current prefix extractor is compatible with that used to generate table files, probably RocksDB 5.14.0.

New Features

  • Introduced an option BlockBasedTableOptions::detect_filter_construct_corruption for detecting corruption during Bloom Filter (format_version >= 5) and Ribbon Filter construction.
  • Improved the SstDumpTool to read the comparator from table properties and use it to read the SST File.
  • Extended the column family statistics in the info log so the total amount of garbage in the blob files and the blob file space amplification factor are also logged. Also exposed the blob file space amp via the rocksdb.blob-stats DB property.
  • Introduced the API rocksdb_create_dir_if_missing in c.h that calls underlying file system's CreateDirIfMissing API to create the directory.
  • Added last level and non-last level read statistics: LAST_LEVEL_READ_*, NON_LAST_LEVEL_READ_*.
  • Experimental: Add support for new APIs ReadAsync in ...
Read more

RocksDB 6.29.3

18 Feb 21:04
Compare
Choose a tag to compare

Rocksdb Change Log

6.29.3 (2022-02-17)

Bug Fixes

  • Fix a data loss bug for 2PC write-committed transaction caused by concurrent transaction commit and memtable switch (#9571).

6.29.2 (2022-02-15)

Performance Improvements

  • DisableManualCompaction() doesn't have to wait scheduled manual compaction to be executed in thread-pool to cancel the job.

6.29.1 (2022-01-31)

Bug Fixes

  • Fixed a major bug in which batched MultiGet could return old values for keys deleted by DeleteRange when memtable Bloom filter is enabled (memtable_prefix_bloom_size_ratio > 0). (The fix includes a substantial MultiGet performance improvement in the unusual case of both memtable_whole_key_filtering and prefix_extractor.)

6.29.0 (2022-01-21)

Note: The next release will be major release 7.0. See #9390 for more info.

Public API change

  • Added values to TraceFilterType: kTraceFilterIteratorSeek, kTraceFilterIteratorSeekForPrev, and kTraceFilterMultiGet. They can be set in TraceOptions to filter out the operation types after which they are named.
  • Added TraceOptions::preserve_write_order. When enabled it guarantees write records are traced in the same order they are logged to WAL and applied to the DB. By default it is disabled (false) to match the legacy behavior and prevent regression.
  • Made the Env class extend the Customizable class. Implementations need to be registered with the ObjectRegistry and to implement a Name() method in order to be created via this method.
  • Options::OldDefaults is marked deprecated, as it is no longer maintained.
  • Add ObjectLibrary::AddFactory and ObjectLibrary::PatternEntry classes. This method and associated class are the preferred mechanism for registering factories with the ObjectLibrary going forward. The ObjectLibrary::Register method, which uses regular expressions and may be problematic, is deprecated and will be in a future release.
  • Changed BlockBasedTableOptions::block_size from size_t to uint64_t.
  • Added API warning against using Iterator::Refresh() together with DB::DeleteRange(), which are incompatible and have always risked causing the refreshed iterator to return incorrect results.

Behavior Changes

  • DB::DestroyColumnFamilyHandle() will return Status::InvalidArgument() if called with DB::DefaultColumnFamily().
  • On 32-bit platforms, mmap reads are no longer quietly disabled, just discouraged.

New Features

  • Added Options::DisableExtraChecks() that can be used to improve peak write performance by disabling checks that should not be necessary in the absence of software logic errors or CPU+memory hardware errors. (Default options are slowly moving toward some performance overheads for extra correctness checking.)

Performance Improvements

  • Improved read performance when a prefix extractor is used (Seek, Get, MultiGet), even compared to version 6.25 baseline (see bug fix below), by optimizing the common case of prefix extractor compatible with table file and unchanging.

Bug Fixes

  • Fix a bug that FlushMemTable may return ok even flush not succeed.
  • Fixed a bug of Sync() and Fsync() not using fcntl(F_FULLFSYNC) on OS X and iOS.
  • Fixed a significant performance regression in version 6.26 when a prefix extractor is used on the read path (Seek, Get, MultiGet). (Excessive time was spent in SliceTransform::AsString().)

New Features

  • Added RocksJava support for MacOS universal binary (ARM+x86)

RocksDB 6.28.2

02 Feb 17:56
Compare
Choose a tag to compare

6.28.2 (2022-01-31)

Bug Fixes

  • Fixed a major bug in which batched MultiGet could return old values for keys deleted by DeleteRange when memtable Bloom filter is enabled (memtable_prefix_bloom_size_ratio > 0). (The fix includes a substantial MultiGet performance improvement in the unusual case of both memtable_whole_key_filtering and prefix_extractor.)

6.28.1 (2022-01-10)

Bug Fixes

  • Fixed compilation errors on newer compiler, e.g. clang-12

6.28.0 (2021-12-17)

New Features

  • Introduced 'CommitWithTimestamp' as a new tag. Currently, there is no API for user to trigger a write with this tag to the WAL. This is part of the efforts to support write-commited transactions with user-defined timestamps.

Bug Fixes

  • Fixed a bug in rocksdb automatic implicit prefetching which got broken because of new feature adaptive_readahead and internal prefetching got disabled when iterator moves from one file to next.
  • Fixed a bug in TableOptions.prepopulate_block_cache which causes segmentation fault when used with TableOptions.partition_filters = true and TableOptions.cache_index_and_filter_blocks = true.
  • Fixed a bug affecting custom memtable factories which are not registered with the ObjectRegistry. The bug could result in failure to save the OPTIONS file.
  • Fixed a bug causing two duplicate entries to be appended to a file opened in non-direct mode and tracked by FaultInjectionTestFS.
  • Fixed a bug in TableOptions.prepopulate_block_cache to support block-based filters also.
  • Block cache keys no longer use FSRandomAccessFile::GetUniqueId() (previously used when available), so a filesystem recycling unique ids can no longer lead to incorrect result or crash (#7405). For files generated by RocksDB >= 6.24, the cache keys are stable across DB::Open and DB directory move / copy / import / export / migration, etc. Although collisions are still theoretically possible, they are (a) impossible in many common cases, (b) not dependent on environmental factors, and (c) much less likely than a CPU miscalculation while executing RocksDB.

Behavior Changes

  • MemTableList::TrimHistory now use allocated bytes when max_write_buffer_size_to_maintain > 0(default in TrasactionDB, introduced in PR#5022) Fix #8371.

Public API change

  • Extend WriteBatch::AssignTimestamp and AssignTimestamps API so that both functions can accept an optional checker argument that performs additional checking on timestamp sizes.
  • Introduce a new EventListener callback that will be called upon the end of automatic error recovery.

Performance Improvements

  • Replaced map property TableProperties::properties_offsets with uint64_t property external_sst_file_global_seqno_offset to save table properties's memory.
  • Block cache accesses are faster by RocksDB using cache keys of fixed size (16 bytes).

Java API Changes

  • Removed Java API TableProperties.getPropertiesOffsets() as it exposed internal details to external users.