Skip to content

IMPALA-9509: Add built-in function array_contains#8

Open
zhangyifan27 wants to merge 5 commits intomasterfrom
IMPALA-9509
Open

IMPALA-9509: Add built-in function array_contains#8
zhangyifan27 wants to merge 5 commits intomasterfrom
IMPALA-9509

Conversation

@zhangyifan27
Copy link
Owner

No description provided.

and SYNC_HMS_EVENTS_STRICT_MODE

The commit documents query options SYNC_HMS_EVENTS_WAIT_TIME_S
and SYNC_HMS_EVENTS_STRICT_MODE

Url: https://impala.apache.org/docs/build/html/topics/impala_set.html

Change-Id: Ia11663c5e84794d4bca658124cde59bf97aa7158
Reviewed-on: http://gerrit.cloudera.org:8080/23592
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
ThePomfrit and others added 3 commits November 19, 2025 17:49
This patch adds benchmarks to the Byte Stream Split encoding. It
compares different ways to use the decoder.

I added benchmarks for the following comparisons:
  * Compile VS Runtime initialized decoder
  * Float VS Int VS Double VS Long VS 6 and 11 byte size types
  * Repeating VS Sequential VS Random ordered data
  * Decoding one by one VS in batch VS with stride (!= byte_size)
  * Small VS Medium (10x small) VS Large (100x small) stride

Conclusions:
  * Passing the byte size as a template parameter is almost 5 times
    as fast as passing it in the constructor.
  * The size of the type heavily influences the speed
  * The data variation doesn't influence the speed at all
  * Reading values in batch is much faster than one-by-one
  * The stride sizes have a small influence on the speed

For more details and graphs, go to
https://docs.google.com/spreadsheets/d/129LwvR6gpZInlRhlVWktn6Haugwo_fnloAAYfI0Qp2s

Change-Id: I708af625348b0643aa3f37525b8a6e74f0c47057
Reviewed-on: http://gerrit.cloudera.org:8080/23401
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
On python 3, when Impyla receives a result with a string that is
not valid UTF-8, it returns that as bytes. TPC-DS Q30 on scale 20
has a result that contains invalid UTF-8, so bin/run-workload.py
can fail while trying to dump this to JSON.

This modifies CustomJSONEncoder to handle serializing bytes by
converting it to a string with invalid unicode handled with
backslashes.

Testing:
 - Ran bin/run-workload.py against TPC-DS scale 20

Change-Id: Ibe31c656de4fc65f8580c7b3b49bf655b8a5ecea
Reviewed-on: http://gerrit.cloudera.org:8080/23602
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This modifies bin/single_node_perf_run.py to stop using the sh
python package. It replaces sh with calls to subprocess. It
stops installing sh for both the Python 2 and 3 virtualenvs.

Testing:
 - Ran perf-AB-test job with it and examined the logs

Change-Id: Ic5f9316a5d83c5c0dc37d4a94c55b6a655765fe3
Reviewed-on: http://gerrit.cloudera.org:8080/23600
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
@zhangyifan27 zhangyifan27 force-pushed the IMPALA-9509 branch 3 times, most recently from 0a87b76 to d89bc3f Compare November 20, 2025 09:16
This patch adds support for built-in function array_contains().

The function supports primitive types (BOOLEAN, TINYINT, SMALLINT,
INT, BIGINT, FLOAT, DOUBLE, STRING) and NULL values are handled
as follows:
- Returns NULL if the array is NULL
- Returns NULL if the search element is NULL
- Skips NULL elements within the array during comparison

Limitations:
 - TIMESTAMP, DECIMAL and DATE types are not supported.
 - Complex types(ARRAY<ARRAY<T>>, ARRAY<STRUCT<...>>) are not
supported.

Testing:
 - EE tests are added in test_array_contains.py
 - Note: BE unit tests in expr-test.cc are not added because
   IMPALA-9559 (Implement constructors for complex types) and
   IMPALA-11893 (Allow cast from NULL to complex types) are not
   yet implemented, which prevents constructing ARRAY literals
   in SQL expressions for testing purposes

Change-Id: I751cc9f6c7f785f5269c7203fd753ec9aa2a9e78
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants