Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snowflake: cannot call GetObjects with null catalog #2171

Closed
davidhcoe opened this issue Sep 19, 2024 · 18 comments · Fixed by #2254
Closed

snowflake: cannot call GetObjects with null catalog #2171

davidhcoe opened this issue Sep 19, 2024 · 18 comments · Fixed by #2254
Labels
Type: bug Something isn't working

Comments

@davidhcoe
Copy link
Contributor

What happened?

The recent updates to the Snowflake driver using SQL-based queries do not allow for querying across catalogs and tables. For example, per

/// \param[in] catalog Only show tables in the given catalog. If NULL,
, I should be able to pass a NULL value for the catalog and ignore filtering by catalog to locate all tables , regardless of catalog, and if the value is an empty string, be able to find the tables that don't belong to a catalog.

The current SQL requires that I already have a database context, and the information_schema is queried that is tied to that catalog that is passed in.

Stack Trace

No response

How can we reproduce the bug?

No response

Environment/Setup

No response

@davidhcoe davidhcoe added the Type: bug Something isn't working label Sep 19, 2024
@davidhcoe
Copy link
Contributor Author

fyi @joellubi, @zeroshade

@joellubi
Copy link
Member

Hi @davidhcoe, can you share the error you're getting?

We explicitly test this case (note that the catalogFilter field is missing which means we're passing a nil pointer, and we expect to find all resources in this case).

I'm wondering if this could be related to snowflake insisting that a default database is set, and perhaps the session is missing one.

@davidhcoe
Copy link
Contributor Author

yes, the specific error is:

[Snowflake] 090105 (22000): Cannot perform SELECT. This session does not have a current database. Call 'USE DATABASE', or use a qualified name.

But looking at the SQL query it doesn't appear to allow for the empty strings or null entries, but I will setup and run the test you pointed to and try it that way as well.

@davidhcoe
Copy link
Contributor Author

davidhcoe commented Sep 20, 2024

The test works, but I think only because the URI already has the database in it which sets the context.

@joellubi
Copy link
Member

The test works, but I think only because the URI already has the database in it which sets the context.

This is true. I believe this is a Snowflake quirk where a default database must be set in order to use unqualified schema names (see USE DATABASE). This makes sense in most cases, but doesn't seem like it should be necessary for special schemas like information_schema.

The only ways I have been able to get this to work have been to use a default database for the session, or to qualify the schema in the query (e.g. database_name.information_schema). Snowflake requires the database name in both cases.

Was this working before? Prior to this change we used SHOW TERSE DATABASES to get the catalogs, which may not have required a database to be set for the session.

I think this can present issues for users aiming to use GetObjects to discover the databases available before connecting to one. Would it make sense to allow users with no database currently set for their session to issue GetObjects(depth=Catalog) and get the result of SHOW TERSE DATABASES to help bridge this gap before they can add a database to the session?

@lidavidm
Copy link
Member

I think that's reasonable. Is it also possible to detect this case and issue an error if the user tries to query the catalog (beyond the databases) without a session set?

Either way it would be good to make sure this is also explained in the documentation

@joellubi
Copy link
Member

Yes I think we can document the behavior and add some informative errors to guide users toward the correct usage.

@davidhcoe
Copy link
Contributor Author

It did work previously. In our scenario, the user can browse their databases and schemas before connecting to a table.

But shouldn’t you be able to call GetObjects with null values even at the column depth and have it bring in all columns across all tables (even if it’s not performant)?

@joellubi
Copy link
Member

You're right, it appears that information_schema.databases must be qualified with a valid database name but shows all databases regardless while information_schema.schemata only enumerates objects within the current database.

I was hoping we could make this a thinner wrapper around existing snowflake functionality, but it doesn't appear that the existing APIs have quite the right semantics. I can work on a change for the next release that allows the search to fan out to all databases matching the search.

@davidhcoe
Copy link
Contributor Author

Thanks. The metadata calls have been a consistent performance bottleneck, particularly as we look to switch from ODBC to ADBC. In ODBC, these calls are consistently less than 1 second but the ADBC calls are usually 2.5+ seconds.

@joellubi
Copy link
Member

Are you able to tell, either from the snowflake query log or from the driver source code if you have access, what APIs the ODBC driver is currently using to achieve this performance?

@davidhcoe
Copy link
Contributor Author

I’ll try to pull together the specifics on Monday.

joellubi pushed a commit that referenced this issue Oct 2, 2024
…atalog depth (#2194)

This PR partially addresses
#2171 by calling SHOW TERSE
DATABASES if GetObjects is called with the catalog depth and a null
catalog is passed.

It does not address other parts of
#2171 where a catalog can be
null at other depths.

---------

Co-authored-by: David Coe <[email protected]>
@zeroshade
Copy link
Member

@davidhcoe was the merged issue #2194 sufficient to fix this problem and we can close this issue?

@davidhcoe
Copy link
Contributor Author

No. #2194 only addressed being able to query catalogs with a null catalog value. However, the spec allows the ability to pass a null catalog and search all schemas and tables, but that still can't be done because it requires a database context with the current code.

@zeroshade
Copy link
Member

Okay, and were you able to figure out what query or APIs the ODBC driver was using per @joellubi's comment here?

@davidhcoe
Copy link
Contributor Author

Yes. I had an email thread with Joel on the topic, but I'll add here that the ODBC stack is making the following calls:

Call 1: 
show WAREHOUSES

Call 2:
use "PERF_DB"

Call 3:
show schemas in database "PERF_DB"

Call 4:
show /* ODBC:ColumnMetadataSource */ columns in table "PERF_DB"."PERF_SCHEMA"."TZLOOKUP"

Call 5:
show WAREHOUSES

Call 6:
select "LOCATIONID",
    "BOROUGH",
    "ZONE",
    "SERVICE_ZONE"
from "PERF_DB"."PERF_SCHEMA"."TZLOOKUP"
LIMIT 1000 OFFSET 0

@zeroshade
Copy link
Member

@davidhcoe that example has use "PERF_DB" so it is using a database context to perform the subsequent queries. Do you have an example with the ODBC driver where you are trying to get this metadata information without a database context like you're attempting to do with ADBC? Or does the ODBC driver also require having a database context?

You mentioned that the ODBC version is consistently <1 second while the ADBC version is 2+ seconds, but the ODBC calls in the example aren't performing the equivalent task to what we're talking about? That ODBC call stack is only fetching the list of schemas in a single database, and then the metadata columns in a single table. That's not the equivalent to null catalog example with ADBC, and the ODBC version is using a database context.

Can we get the ODBC calls for the equivalent task to the full search without a database context so we can see what queries the driver is making?

@CurtHagenlocher
Copy link
Contributor

My very very vague recollection about the ODBC driver -- and this comes from troubleshooting the end result rather than seeing what kind of queries it emits -- is that it will actually do the "use" statement to switch catalogs. This means that if you use SQLTables to list tables across catalogs, it ends up running a separate backend command for each catalog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants