You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
L0 and L1 segments will be written with very low FP rate (more bits) for bloom filters. This is fine because L0 segments don't tend to live long. However, in a monotonic series, those segments are never rewritten, so at some point, when there are 100s or 1000s of disjoint segments, the cost really builds up, resulting in unexpectedly high memory usage. Of course, in those scenarios one could disable bloom filters, but it's still unexpected.
For very large datasets, Ribbon filters may be desirable, which trade lower memory usage for higher CPU cost. #64
Plus compression would not be applied, when skipping compression on L0: #37
And with #51 L0/L1 segments will have high memory impact because of the full block index, so higher memory usage should be even more noticable.
Possible solution, in Leveled compaction strategy:
When moving from L1 to L2, if feature "bloom" OR any(lz4, miniz) enabled, rewrite segments instead of trivial move
This increases write amp by 1, but should be worth it. To do so, Leveled compaction needs to improved so it can trivially move into L1 without having to wait for an ongoing compaction (if the key ranges don't overlap - as they don't in a disjoint workload)
The text was updated successfully, but these errors were encountered:
marvin-j97
changed the title
[Speculation] High memory usage when using bloom filters & monotonic keys
[Speculation] Unexpectedly high memory usage when using bloom filters & monotonic keys
Oct 1, 2024
L0 and L1 segments will be written with very low FP rate (more bits) for bloom filters. This is fine because L0 segments don't tend to live long. However, in a monotonic series, those segments are never rewritten, so at some point, when there are 100s or 1000s of disjoint segments, the cost really builds up, resulting in unexpectedly high memory usage. Of course, in those scenarios one could disable bloom filters, but it's still unexpected.
For very large datasets, Ribbon filters may be desirable, which trade lower memory usage for higher CPU cost. #64
Plus compression would not be applied, when skipping compression on L0: #37
And with #51 L0/L1 segments will have high memory impact because of the full block index, so higher memory usage should be even more noticable.
Possible solution, in Leveled compaction strategy:
When moving from L1 to L2, if feature "bloom" OR any(lz4, miniz) enabled, rewrite segments instead of trivial move
This increases write amp by 1, but should be worth it. To do so, Leveled compaction needs to improved so it can trivially move into L1 without having to wait for an ongoing compaction (if the key ranges don't overlap - as they don't in a disjoint workload)
The text was updated successfully, but these errors were encountered: