Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KYUUBI #6560] Support removing user-specified repartition before writing when using zorder #6561

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

huangxiaopingRD
Copy link
Contributor

@huangxiaopingRD huangxiaopingRD commented Jul 24, 2024

🔍 Description

Issue References 🔗

This pull request fixes #6560

Describe Your Solution 🔧

  • Add RemoveRepartitionBeforeInsertInto to remove the user-specified repartition

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Because the user added unreasonable repartition, the compression rate will not be high enough after turning on zorder, and the overall time consumption will increase significantly.

Behavior With This Pull Request 🎉

After removing the user-specified repartition, the parallelism of Rebalance is more reasonable, resulting in higher compression rate and shorter execution time.

Related Unit Tests

ZorderSuiteBase / test("Check remove user specify repartition as expected")


Checklist 📝

Be nice. Be informative.

@codecov-commenter
Copy link

codecov-commenter commented Jul 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (d3e1768) to head (3bff311).

Additional details and impacted files
@@          Coverage Diff           @@
##           master   #6561   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         677     677           
  Lines       41907   41907           
  Branches     5721    5721           
======================================
  Misses      41907   41907           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pan3793
Copy link
Member

pan3793 commented Jul 24, 2024

Generally, we should always respect user explicitly requested data distribution to unsurprise users. My view may be too pessimistic, breaking user requested data distribution may cause correctness issue in corner cases, I'm -0 on introduing this feature. But I'm OK if other committers accept it, as long as we disable it by default and explain the dangerous in the docs

@huangxiaopingRD
Copy link
Contributor Author

Generally, we should always respect user explicitly requested data distribution to unsurprise users. My view may be too pessimistic, breaking user requested data distribution may cause correctness issue in corner cases, I'm -0 on introduing this feature. But I'm OK if other committers accept it, as long as we disable it by default and explain the dangerous in the docs

Your concern is correct. This feature is only decided by the user, not by the platform. This feature is similar to Spark's RemoveAllHints

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Support removing user-specified repartition before writing when using zorder
3 participants