Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Optimize not to call getNullCount as much as possible #820

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

kazuyukitanimura
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

This PR reduces to call getNullCount as much as possible

What changes are included in this PR?

Calling getNullCount is expensive so reusing the number rather than re-calculating

How are these changes tested?

Existing test

@kazuyukitanimura kazuyukitanimura marked this pull request as ready for review August 13, 2024 04:16
@kazuyukitanimura
Copy link
Contributor Author

Before

Screenshot 2024-08-13 at 3 22 53 PM

After

Screenshot 2024-08-13 at 3 22 38 PM

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 5.22388% with 127 lines in your changes missing coverage. Please review.

Project coverage is 54.02%. Comparing base (4fe43ad) to head (a773832).
Report is 9 commits behind head on main.

Files Patch % Lines
...in/java/org/apache/arrow/c/CometArrayExporter.java 0.00% 54 Missing ⚠️
...n/java/org/apache/comet/vector/CometMapVector.java 0.00% 16 Missing ⚠️
...main/java/org/apache/comet/vector/CometVector.java 0.00% 13 Missing ⚠️
...ava/org/apache/comet/vector/CometStructVector.java 0.00% 9 Missing ⚠️
...ain/scala/org/apache/comet/vector/NativeUtil.scala 0.00% 8 Missing ⚠️
.../java/org/apache/comet/vector/CometLazyVector.java 0.00% 6 Missing ⚠️
...in/java/org/apache/comet/parquet/ColumnReader.java 0.00% 4 Missing ⚠️
.../src/main/java/org/apache/comet/parquet/Utils.java 20.00% 4 Missing ⚠️
...org/apache/comet/vector/CometDictionaryVector.java 0.00% 4 Missing ⚠️
.../java/org/apache/comet/vector/CometListVector.java 0.00% 3 Missing ⚠️
... and 4 more
Additional details and impacted files
@@              Coverage Diff              @@
##               main     #820       +/-   ##
=============================================
+ Coverage     33.94%   54.02%   +20.07%     
+ Complexity      874      854       -20     
=============================================
  Files           112      110        -2     
  Lines         42916    10559    -32357     
  Branches       9464     2000     -7464     
=============================================
- Hits          14567     5704     -8863     
+ Misses        25379     3857    -21522     
+ Partials       2970      998     -1972     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kazuyukitanimura
Copy link
Contributor Author

@kazuyukitanimura
Copy link
Contributor Author

Somehow Run codecov/codecov-action@v3 keeps failing, seems not related but not sure why

@andygrove
Copy link
Member

I am now running benchmarks with this PR

@andygrove
Copy link
Member

I ran my local TPC-DS benchmark and it doesn't show any improvement for that benchmark.

tpcds_allqueries (4)

@kazuyukitanimura Do you see an improvement with any of the microbenchmark queries?

@kazuyukitanimura
Copy link
Contributor Author

Thanks @andygrove
hmmm how many iteration was used for your local benchmark?

I used q27.
I will try to run some microbenchmarks to showcase...

@andygrove
Copy link
Member

Thanks @andygrove hmmm how many iteration was used for your local benchmark?

I used q27. I will try to run some microbenchmarks to showcase...

This was the average of 3 runs for all 99 TPC-DS queries @ 100 GB scale factor

@kazuyukitanimura
Copy link
Contributor Author

@andygrove I ran q27 30 times. The improvement is not as big as I hoped but it is improving

Before

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.6.1
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
q27: Comet (Scan, Exec)                            4001           4183          98         72.5          13.8       1.0X

After

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.6.1
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
q27: Comet (Scan, Exec)                            3847           4104         122         75.4          13.3       1.0X

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kazuyukitanimura
Copy link
Contributor Author

Thanks, I realized there might be a better way to do this. Please hold merging.

@kazuyukitanimura kazuyukitanimura marked this pull request as draft August 20, 2024 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants