[Speculation] Unexpectedly high memory usage when using bloom filters & monotonic keys #63

marvin-j97 · 2024-09-30T17:16:05Z

L0 and L1 segments will be written with very low FP rate (more bits) for bloom filters. This is fine because L0 segments don't tend to live long. However, in a monotonic series, those segments are never rewritten, so at some point, when there are 100s or 1000s of disjoint segments, the cost really builds up, resulting in unexpectedly high memory usage. Of course, in those scenarios one could disable bloom filters, but it's still unexpected.

For very large datasets, Ribbon filters may be desirable, which trade lower memory usage for higher CPU cost. #64

Plus compression would not be applied, when skipping compression on L0: #37

And with #51 L0/L1 segments will have high memory impact because of the full block index, so higher memory usage should be even more noticable.

Possible solution, in Leveled compaction strategy:
When moving from L1 to L2, if feature "bloom" OR any(lz4, miniz) enabled, rewrite segments instead of trivial move
This increases write amp by 1, but should be worth it. To do so, Leveled compaction needs to improved so it can trivially move into L1 without having to wait for an ongoing compaction (if the key ranges don't overlap - as they don't in a disjoint workload)

marvin-j97 changed the title ~~[Speculation] High memory usage when using bloom filters & monotonic keys~~ [Speculation] Unexpectedly high memory usage when using bloom filters & monotonic keys Oct 1, 2024

marvin-j97 added enhancement New feature or request performance test and removed enhancement New feature or request labels Oct 1, 2024

marvin-j97 pinned this issue Oct 18, 2024

marvin-j97 unpinned this issue Oct 21, 2024

marvin-j97 added a commit that referenced this issue Oct 22, 2024

fix: maybe #63

b59169e

marvin-j97 mentioned this issue Oct 22, 2024

Better handle segment target size for disjoint workloads #72

Merged

marvin-j97 closed this as completed Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Speculation] Unexpectedly high memory usage when using bloom filters & monotonic keys #63

[Speculation] Unexpectedly high memory usage when using bloom filters & monotonic keys #63

marvin-j97 commented Sep 30, 2024 •

edited

Loading

[Speculation] Unexpectedly high memory usage when using bloom filters & monotonic keys #63

[Speculation] Unexpectedly high memory usage when using bloom filters & monotonic keys #63

Comments

marvin-j97 commented Sep 30, 2024 • edited Loading

marvin-j97 commented Sep 30, 2024 •

edited

Loading