-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(go/adbc/driver/snowflake): improve GetObjects performance and semantics #2254
Conversation
CC @davidhcoe Can you take a look at this and confirm it fixes your issues with the exception of the All and Columns depths? |
if before[len(before)-1] != '\\' { | ||
b.WriteByte('\\') | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is to handle pre-escaped characters? But what if the escape is itself escaped? (Or is that not allowed?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not from our own spec 😅, we specify escapes aren't supported at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not from our own spec 😅, we specify escapes aren't supported at all
Yeah, I pointed this out in #1508
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's time I open a branch for 1.2.0...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's time I open a branch for 1.2.0...
I've started some work on that in conjunction with multiple results sets. https://github.com/CurtHagenlocher/arrow-adbc/tree/MoreResults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this is intended to handle pre-escaped characters. The logic is taken from snowflake's JDBC driver, I figured handling one level of escaping was sufficient given our current spec. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's time I open a branch for 1.2.0...
I've started some work on that in conjunction with multiple results sets. https://github.com/CurtHagenlocher/arrow-adbc/tree/MoreResults
Oh this is great!
gQueryIDs.Go(func() error { | ||
return conn.Raw(func(driverConn any) (err error) { | ||
query := "SHOW TERSE /* ADBC:getObjectsDBSchemas */ DATABASES" | ||
if catalog != nil && len(*catalog) > 0 && *catalog != "%" && *catalog != ".*" { | ||
query += " LIKE '" + escapeSingleQuoteForLike(*catalog) + "'" | ||
} | ||
query += " IN ACCOUNT" | ||
|
||
terseDbQueryID, err = getQueryID(gQueryIDsCtx, query, driverConn) | ||
return | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these are repeated across cases. Could we extract them out of the switch-case to avoid the duplication?
Thanks @zeroshade! Any rough performance numbers for using |
@joellubi Most of the performance actually came from the improved handling of the channels rather than the switch to using The way the channels were being handled caused bottlenecks since we weren't using buffered channels and the record reader was being passed through a channel instead of just using it directly. Switching up the managing of the channels led to about a 25% improvement in performance by removing the blocking. My tests showed a drop from ~5s to ~3.5s for a large GetObjects scenario. About 2/3 of the time is the raw snowflake execution. which for the ADBC account is taking a total of around 2 - 3 seconds depending on the query for all of the |
Ah cool, the record reader handling is much cleaner now. Not sure why I did it that way originally. Good catch on increasing the buffer size for the channel. I did think that could be a bottleneck which is why I didn't make it unbuffered, but didn't think it would be so significant. I also couldn't think of a value to use that didn't feel somewhat arbitrary. Maybe making it configurable or set to |
e4e38e8
to
55a5c76
Compare
@joellubi I'll switch it to |
return conn.Raw(func(driverConn any) (err error) { | ||
query := "SHOW TERSE /* ADBC:getObjectsCatalogs */ DATABASES" | ||
if catalog != nil && len(*catalog) > 0 && *catalog != "%" && *catalog != ".*" { | ||
query += " LIKE '" + escapeSingleQuoteForLike(*catalog) + "'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this will be a case sensitive search (LIKE) and will it also treat names with underscores as wildcards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LIKE
keyword in the SHOW
commands is actually case-insensitive according to the docs (https://docs.snowflake.com/en/sql-reference/sql/show-tables). But it does treat underscores like a LIKE
comparison, though we do say in the docs that the arguments for "catalog" and such are treated as patterns if they include wildcards like _
and %
.
Finally got the unit tests and validation tests passing for this. Can I get one more review pass please? |
@@ -2180,15 +2180,15 @@ void StatementTest::TestSqlBind() { | |||
|
|||
ASSERT_THAT( | |||
AdbcStatementSetSqlQuery( | |||
&statement, "SELECT * FROM bindtest ORDER BY \"col1\" ASC NULLS FIRST", &error), | |||
&statement, "SELECT * FROM bindtest ORDER BY col1 ASC NULLS FIRST", &error), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we perhaps need a quirk for escaping column names?
(I also wouldn't be opposed to trying to make these tests more data-driven...I should go find time to sketch it out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more about consistency. Our CREATE TABLE
query earlier in this function doesn't quote the column names, so our select statement needs to also not quote the names. Almost everywhere else we quote the columns. We just need to be consistent.
That said, I agree with it would be awesome for these tests to be more data-driven.
50b302e
to
e26883f
Compare
Fixes #2171
Improves the channel handling and query building for metadata conversion to Arrow for better performance.
For all cases except when retrieving Column metadata we'll now utilize
SHOW
queries and build the patterns into those queries. This allows thoseGetObjects
calls with appropriate depths to be called without having to specify a current database or schema.