WIP Add partitioning push down #23432

dain · 2024-09-16T04:02:12Z

Description

Add partitioning push down to table scan which a connector can use to activate optional partitioning, or choose between multiple partitioning strategies. This replaces the existing Metadata makeCompatiblePartitioning and getCommonPartitioningHandle methods used exclusively by Hive with a more generic applyPartitioning method.

Hive has been updated to the new system, and now only applies bucketed execution when it is actually used in the coordinator. This can improve performance when parallelism is limited by the bucketing and the bucketing isn't necessary for the query.

Iceberg has been updated to support bucketed execution also. This applies the same optimizations available to Hive which allows the engine to eliminate unnecessary redistribution of tables. Additionally, since Iceberg supports multiple independent partitioning functions, a table can effectively have multiple distributions, which makes the optimization
even more effective.

Iceberg bucket execution can be controlled with the iceberg.bucket-execution-mode configuration property and the bucket_execution session property. The mode can be set to NEVER, AUTO, or ALWAYS. AUTO is the default and enables bucked execution when the bucket count is equal to or greater than the current node count.

TODO

Iceberg tests similar to the Hive tests
Change Hive bucket execution configuration and sesion properties to match new Iceberg properties

Follup Work

AddExchanges does not propigate preferred partitioning through joins, which reduces effectiveness of compatible partition used in Hive and Iceberg
Iceberg support for mismatched buckets
Add stable node-bucket assignments in system assigned bucketing to improve file system caching

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(X) Release notes are required, with the following suggested text:

# SPI
* Add partitioning push down. ({issue}`issuenumber`)

# Iceberg
* Add bucketed execution which can improve performance when running a join or aggregation on a bucketed table.

Check boolean nullsAndAnyReplicated field before more complex fields

Add partitioning push down to table scan which connector can use to activate optional partitioning, or choose between multiple partitioning strategies. This replaces the existing Metadata makeCompatiblePartitioning and getCommonPartitioningHandle methods used exclusively by Hive

Add support for pushing plan partitioning into Iceberg when Iceberg tables use hash bucked partitioning. This enables co-located joins which can be significantly more efficient. Additionally, since Iceberg supports multiple independent partitioning functions, a table can effectively have multiple distributions, which makes the optimization more effective. This feature can be controlled with the iceberg.bucket-execution-mode configuration property and the bucket_execution session property. Mode can be set to NEVER, AUTO, or ALWAYS. AUTO is the default and enables bucked execution when the bucket count is equal to or greater than the current node count.

dain added 5 commits September 15, 2024 20:38

Fix plan rendering in Hive test failure messages

23cf63b

Rename HiveBucketHandle to HiaveTablePartitioning

a6958ef

Optimize ActualProperties compatible checks

ba97284

Check boolean nullsAndAnyReplicated field before more complex fields

Improve connector partitioning JavaDocs

b9b5805

Add buckedCount to getSplitBucketFunction

0766ed6

cla-bot bot added the cla-signed label Sep 16, 2024

github-actions bot added iceberg Iceberg connector hive Hive connector labels Sep 16, 2024

dain force-pushed the apply-partitioning branch from 8a6f521 to 3e3dcb9 Compare September 16, 2024 06:46

dain added 5 commits September 21, 2024 09:36

Support partition functions with no bucket count preference

ae974d9

Use automatic system bucket assignment in Hive node partitioning

2321964

WIP disable semi-join partitioning pushdown

1ebd691

dain force-pushed the apply-partitioning branch from 2177896 to 1ebd691 Compare September 21, 2024 16:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP Add partitioning push down #23432

WIP Add partitioning push down #23432

dain commented Sep 16, 2024

WIP Add partitioning push down #23432

Are you sure you want to change the base?

WIP Add partitioning push down #23432

Conversation

dain commented Sep 16, 2024

Description

TODO

Follup Work

Release notes