Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out grouping sets basing on grouping() constraints before execution #23389

Open
thermo911 opened this issue Sep 12, 2024 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@thermo911
Copy link

Problem

Trino performs an aggregation for each grouping set even if particular grouping sets can be removed basing on predicates on grouping(...).

Example

Query

SELECT a, b, sum(c) FROM t GROUP BY CUBE (a, b) HAVING grouping(a, b) != 3

is effectively a

SELECT a, b, sum(c) FROM t GROUP BY GROUPING SETS ((a), (b), (a, b))

(global aggregation is removed).

Currently, Trino produces following plan for such query. Aggregation is performed for each grouping set from GroupIdNode.

Output[columnNames = [a, b, _col2]]
│   Layout: [a$gid:bigint, b$gid:bigint, sum:bigint]
│   a := a$gid
│   b := b$gid
│   _col2 := sum
└─ FilterProject[filterPredicate = ("$literal$"(from_base64('CQAAAElOVF9BUlJBWQQAAAAAAwAAAAEAAAACAAAAAAAAAA=='))[(groupid + bigint '1')] <> 3)]
   │   Layout: [a$gid:bigint, b$gid:bigint, sum:bigint]
   └─ Aggregate[type = FINAL, keys = [a$gid, b$gid, groupid]]
      │   Layout: [a$gid:bigint, b$gid:bigint, groupid:bigint, sum:bigint]
      │   sum := sum(sum_0)
      └─ LocalExchange[partitioning = HASH, arguments = [a$gid, b$gid, groupid]]
         │   Layout: [a$gid:bigint, b$gid:bigint, groupid:bigint, sum_0:row(bigint, bigint)]
         └─ RemoteExchange[type = REPARTITION]
            │   Layout: [a$gid:bigint, b$gid:bigint, groupid:bigint, sum_0:row(bigint, bigint)]
            └─ Aggregate[type = PARTIAL, keys = [a$gid, b$gid, groupid]]
               │   Layout: [a$gid:bigint, b$gid:bigint, groupid:bigint, sum_0:row(bigint, bigint)]
               │   sum_0 := sum(c)
               └─ GroupId[symbols = [[], [a], [b], [a, b]]]
                  │   Layout: [a$gid:bigint, b$gid:bigint, c:bigint, groupid:bigint]
                  │   b$gid := b
                  │   a$gid := a
                  └─ TableScan[table = iceberg:default.test$data@2456677682933822434]
                         Layout: [a:bigint, b:bigint, c:bigint]
                         a := 1:a:bigint
                         b := 2:b:bigint
                         c := 3:c:bigint

Expected behavior

In example above Trino figures out what grouping sets are actually used and removes other ones from the query plan.

@hashhar hashhar added the enhancement New feature or request label Sep 13, 2024
@wendigo
Copy link
Contributor

wendigo commented Sep 17, 2024

@raunaqmorarka can you take a look? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

3 participants