Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unconditionally pushdown varchar predicate to Clickhouse #23516

Conversation

ssheikin
Copy link
Contributor

@ssheikin ssheikin commented Sep 20, 2024

ClickHouse collation is case-sensitive.
ClickHouse has same sort ordering as Trino.

Per https://clickhouse.com/docs/en/sql-reference/statements/show#show_columns
ClickHouse has no per-column collations
Clickhouse is UTF-8 encoded with byte-by-byte comparison.
https://clickhouse.com/docs/en/sql-reference/statements/select/order-by#collation-support
So exactly as trino.
https://github.com/airlift/slice/blob/2.2/src/main/java/io/airlift/slice/Slice.java#L1205
That’s why all operations on varchars may pushdown.

Description

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Section
* Pushdown varchar predicate to ClickHouse unconditionally. ({issue}`23516`)

ClickHouse collation is case-sensitive.
ClickHouse has same sort ordering as Trino.

Per https://clickhouse.com/docs/en/sql-reference/statements/show#show_columns
ClickHouse has no per-column collations
Clickhouse is UTF-8 encoded with byte-by-byte comparison.
https://clickhouse.com/docs/en/sql-reference/statements/select/order-by#collation-support
So exactly as trino.
https://github.com/airlift/slice/blob/2.2/src/main/java/io/airlift/slice/Slice.java#L1205
That’s why all operations on varchars may pushdown.
@ebyhr
Copy link
Member

ebyhr commented Sep 20, 2024

ClickHouse supports collation at an index level. https://clickhouse.com/docs/en/sql-reference/statements/show#show-index
What happens if the pushed-down query uses the index?

@ssheikin
Copy link
Contributor Author

@ebyhr IIUC DB does not use index if query condition does not match index parameters.
In case of ClickHouse, index collation is just an ordering of the values within index.

collation - The sorting of the column in the index: A if ascending, D if descending, NULL if unsorted. (Nullable(String))

if the pushed-down query uses the index

it means that ordering for index matched order requested by query and ClickHouse executes query faster.

@raunaqmorarka raunaqmorarka merged commit f3751cf into trinodb:master Sep 21, 2024
16 checks passed
@github-actions github-actions bot added this to the 459 milestone Sep 21, 2024
@ssheikin ssheikin deleted the ssheikin/52/trino/clickhouse-PredicatePushdownController branch September 21, 2024 11:22
@ebyhr
Copy link
Member

ebyhr commented Sep 21, 2024

Per https://clickhouse.com/docs/en/sql-reference/statements/show#show_columns
ClickHouse has no per-column collations

Checking SHOW COLUMNS docs is basically insufficient. We should check if the database supports collation when creating tables, starting the instance and so on.

Actually, ClickHouse supports specifying column collation for new tables:

CREATE TABLE test (x varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL) ENGINE = Memory;

It's just allowed at syntax level and it doesn't affect results as far as I tested locally, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants