IMPALA-9509: Add built-in function array_contains#8
Open
zhangyifan27 wants to merge 5 commits intomasterfrom
Open
IMPALA-9509: Add built-in function array_contains#8zhangyifan27 wants to merge 5 commits intomasterfrom
zhangyifan27 wants to merge 5 commits intomasterfrom
Conversation
and SYNC_HMS_EVENTS_STRICT_MODE The commit documents query options SYNC_HMS_EVENTS_WAIT_TIME_S and SYNC_HMS_EVENTS_STRICT_MODE Url: https://impala.apache.org/docs/build/html/topics/impala_set.html Change-Id: Ia11663c5e84794d4bca658124cde59bf97aa7158 Reviewed-on: http://gerrit.cloudera.org:8080/23592 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com>
37ac26b to
220ff77
Compare
This patch adds benchmarks to the Byte Stream Split encoding. It
compares different ways to use the decoder.
I added benchmarks for the following comparisons:
* Compile VS Runtime initialized decoder
* Float VS Int VS Double VS Long VS 6 and 11 byte size types
* Repeating VS Sequential VS Random ordered data
* Decoding one by one VS in batch VS with stride (!= byte_size)
* Small VS Medium (10x small) VS Large (100x small) stride
Conclusions:
* Passing the byte size as a template parameter is almost 5 times
as fast as passing it in the constructor.
* The size of the type heavily influences the speed
* The data variation doesn't influence the speed at all
* Reading values in batch is much faster than one-by-one
* The stride sizes have a small influence on the speed
For more details and graphs, go to
https://docs.google.com/spreadsheets/d/129LwvR6gpZInlRhlVWktn6Haugwo_fnloAAYfI0Qp2s
Change-Id: I708af625348b0643aa3f37525b8a6e74f0c47057
Reviewed-on: http://gerrit.cloudera.org:8080/23401
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
On python 3, when Impyla receives a result with a string that is not valid UTF-8, it returns that as bytes. TPC-DS Q30 on scale 20 has a result that contains invalid UTF-8, so bin/run-workload.py can fail while trying to dump this to JSON. This modifies CustomJSONEncoder to handle serializing bytes by converting it to a string with invalid unicode handled with backslashes. Testing: - Ran bin/run-workload.py against TPC-DS scale 20 Change-Id: Ibe31c656de4fc65f8580c7b3b49bf655b8a5ecea Reviewed-on: http://gerrit.cloudera.org:8080/23602 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This modifies bin/single_node_perf_run.py to stop using the sh python package. It replaces sh with calls to subprocess. It stops installing sh for both the Python 2 and 3 virtualenvs. Testing: - Ran perf-AB-test job with it and examined the logs Change-Id: Ic5f9316a5d83c5c0dc37d4a94c55b6a655765fe3 Reviewed-on: http://gerrit.cloudera.org:8080/23600 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
0a87b76 to
d89bc3f
Compare
This patch adds support for built-in function array_contains(). The function supports primitive types (BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING) and NULL values are handled as follows: - Returns NULL if the array is NULL - Returns NULL if the search element is NULL - Skips NULL elements within the array during comparison Limitations: - TIMESTAMP, DECIMAL and DATE types are not supported. - Complex types(ARRAY<ARRAY<T>>, ARRAY<STRUCT<...>>) are not supported. Testing: - EE tests are added in test_array_contains.py - Note: BE unit tests in expr-test.cc are not added because IMPALA-9559 (Implement constructors for complex types) and IMPALA-11893 (Allow cast from NULL to complex types) are not yet implemented, which prevents constructing ARRAY literals in SQL expressions for testing purposes Change-Id: I751cc9f6c7f785f5269c7203fd753ec9aa2a9e78
d89bc3f to
497bfef
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.