Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize large IN filters on integer data types #23456

Merged
merged 3 commits into from
Sep 25, 2024

Conversation

raunaqmorarka
Copy link
Member

Description

Use a bitset based filter instead of a hash set when the range of
values is narrow enough. We use bitset only when it would occupy
lesser space than the equivalent open hash set.
This is useful for making evaluation of dynamic filter more efficient
as we often collect large integer sets in dynamic filters.

    BenchmarkDynamicPageFilter.filterPages
    (filterSize)   (inputDataSet)  (nonNullsSelectivity)   Mode  Cnt  Before score      After score
             100  INT64_FIXED_32K                    0.2  thrpt   30  446.174 ? 10.598   449.113 ?  5.323 ops/s
            1000  INT64_FIXED_32K                    0.2  thrpt   30  407.625 ?  3.139  1379.767 ? 19.318 ops/s
            5000  INT64_FIXED_32K                    0.2  thrpt   30  426.413 ?  6.485  1254.731 ? 11.685 ops/s

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

Copy link
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

.add(BigInteger.valueOf(1));
// A Set based on a bitmap uses (max - min) / 64 longs
// Create a bitset only if it uses fewer entries than the equivalent hash set
if (range.compareTo(BigInteger.valueOf(Integer.MAX_VALUE)) > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a bitset smaller than the L1 cache size, would it make sense to use bitset even if it uses a little bit more memory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, bitset is indeed faster than open hash set at all L1 cache sizes. However, the gap tends to be much smaller for small sized sets.
I'm also hesitant to increase memory usage because this stuff is unaccounted memory usage. So i'll keep the existing logic for now.

{
INT32_RANDOM(INTEGER, (block, r) -> INTEGER.writeLong(block, r.nextInt())),
INT64_RANDOM(BIGINT, (block, r) -> BIGINT.writeLong(block, r.nextLong())),
INT64_FIXED_32K(BIGINT, (block, r) -> BIGINT.writeLong(block, r.nextLong() % 32768)), // LongBitSetFilter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test smaller ranges than 32K? What ranges do we expect in practice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 32K here is actually just the range of the input values. Narrowing the input range ensures that bitset will be chosen instead of hashset.
Size of set is determined by filterSize and for that the benchmark params are 100, 1000, 5000

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was asking basically if we expect in practice smaller than 32k value ranges (max - min)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what i've seen in benchmarks and other anecdotal cases, the joined columns tend to be dates/times or some kind of unique id or primary key which are usually in a not super-wide range.

@lukasz-stec
Copy link
Member

Nice improvement!

return new LongBitSetFilter(values, min, max);
}

private static boolean isDirectLongComparisonValidType(Type type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this also in SimplifyContinuousInValues. I would move it to io.trino.type.TypeUtils

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That method is slightly different, over there we're additionally looking for types where it's possible to get the next consecutive value by just incrementing the underlying long by 1.
I've added a code comment and renamed it to be clearer.

public void setup(long inputRows)
{
// Pollute the JVM profile
for (DataSet dataSet : ImmutableList.of(DataSet.INT32_RANDOM, DataSet.INT64_FIXED_32K, DataSet.INT64_RANDOM)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not guaranteed to pollute the profile. It depends on various factors, such whether it runs just on the interpreter, C1, C2, etc. A better way to do that is to run with JMH's bulk warmup mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made that change, but in practise I'm not finding the JMH's bulk warmup mode to be a better choice.
Using that causes a multi-fold increase in the runtime of the benchmark because it warms up each test permutation for N iterations for each benchmark, whereas what we want is to warm up each test permutation once and then warm N iterations for only the benchmark.
The other way of manually polluting profile may not be guaranteed to work but in practice I have never found it to not work.
On the other hand, the JMH's bulk warmup mode is always so time consuming that I always find myself manually editing the code to remove this and pollute the profile manually to get a JHM result in reasonable amount of time.
And running JMH remotely is not the solution to this problem, it still takes up too much time.

Use a bitset based filter instead of a hash set when the range of
values is narrow enough. We use bitset only when it would occupy
lesser space than the equivalent open hash set.
This is useful for making evaluation of dynamic filter more efficient
as we often collect large integer sets in dynamic filters.

BenchmarkDynamicPageFilter.filterPages
(filterSize)   (inputDataSet)  (nonNullsSelectivity)   Mode  Cnt  Before score      After score
         100  INT64_FIXED_32K                    0.2  thrpt   30  446.174 ? 10.598   449.113 ?  5.323 ops/s
        1000  INT64_FIXED_32K                    0.2  thrpt   30  407.625 ?  3.139  1379.767 ? 19.318 ops/s
        5000  INT64_FIXED_32K                    0.2  thrpt   30  426.413 ?  6.485  1254.731 ? 11.685 ops/s
@raunaqmorarka raunaqmorarka merged commit 3ac3530 into trinodb:master Sep 25, 2024
95 checks passed
@raunaqmorarka raunaqmorarka deleted the drf-bitset branch September 25, 2024 03:29
@github-actions github-actions bot added this to the 459 milestone Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants