Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize column listing to not require fetching table properties #23429

Merged
merged 8 commits into from
Sep 27, 2024

Conversation

hashhar
Copy link
Member

@hashhar hashhar commented Sep 15, 2024

Description

Before this change table metadata was fetched when listing columns which
ended up fetching table properties for JDBC connectors which incurs I/O
in connectors which support them.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# ClickHouse, SQL Server
* Improve performance of listing columns. ({issue}`23429`)

@hashhar
Copy link
Member Author

hashhar commented Sep 15, 2024

while working on this I noticed that we do confusing things in some places.

e.g. ConnectorMetadata#getTableMetadata#getColumns excludes columns like $path in Hive connector but ConnectorMetadata#getColumnHandles includes them. I guess this is to prevent "materializing" those hidden columns in other SQL operations but this makes for confusion. And this is also what prevents applying this optimisation to non-JDBC connectors (or even JDBC connectors too in future if they start returning different columns from getTableMetadata#getColumns VS getColumnHandles.

@hashhar
Copy link
Member Author

hashhar commented Sep 15, 2024

For Ignite and Phoenix specifically I'll send a follow-up PR to "unhide" the autogenerated columns we add in case the user doesn't specify primary key (which is required for those systems).

@hashhar hashhar force-pushed the hashhar/table-property-optimisation branch from 4f0a72a to b1d47f7 Compare September 15, 2024 15:07
@hashhar hashhar force-pushed the hashhar/table-property-optimisation branch from b1d47f7 to 6461364 Compare September 16, 2024 14:47
@hashhar
Copy link
Member Author

hashhar commented Sep 16, 2024

@kokosing PTAL at the fixups + Add tests to count I/O operations in JDBC connectors (which is the only new commit, it replaces Disable connection reuse in connection counting tests).

@hashhar hashhar force-pushed the hashhar/table-property-optimisation branch from 6461364 to d0fe3ad Compare September 16, 2024 15:01
Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done!

This makes it possible to inject a different `ConnectionFactory` in SQL
Server connector.
This test is only an approximation since we assume that for each query a
new connection is being opened today. In future we can implement a more
thorough test by wrapping Connection, Statement and PreparedStatement
and counting invocations of various methods.
Before this change table metadata was fetched when listing columns which
ended up fetching table properties for JDBC connectors which incurs I/O
in connectors which support them.
@hashhar hashhar force-pushed the hashhar/table-property-optimisation branch from d0fe3ad to b82fbdf Compare September 26, 2024 14:17
@hashhar
Copy link
Member Author

hashhar commented Sep 26, 2024

Squashed the fixups, will merge after CI.

@hashhar hashhar merged commit 3332095 into trinodb:master Sep 27, 2024
66 of 67 checks passed
@github-actions github-actions bot added this to the 460 milestone Sep 27, 2024
@hashhar hashhar deleted the hashhar/table-property-optimisation branch September 27, 2024 17:39
@mosabua mosabua mentioned this pull request Sep 27, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants