Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake Polaris Iceberg: NoSuchTableException: Table does not exist at location #504

Open
2 of 4 tasks
ambaricloud opened this issue Aug 3, 2024 · 7 comments
Open
2 of 4 tasks
Labels
bug Something isn't working

Comments

@ambaricloud
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

Please describe the bug 🐞

Created an Iceberg table in Snowflake Polaris Internal Catalog via Spark. Able to perform all Iceberg table feature tasks. Failing when I try to convert to delta via X-Table.

cat polaris_ice_to_delta_orders_config.yaml
sourceFormat: ICEBERG
targetFormats:

  • DELTA
    datasets:
  • tableBasePath: s3://ambaricloudsatya/prod/orders/
    tableName: orders

java -cp "utilities-0.1.0-beta1-bundled.jar:iceberg-aws-1.3.1.jar:bundle-2.23.9.jar" io.onetable.utilities.RunSync --datasetConfig polaris_ice_to_delta_orders_config.yaml

SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
SLF4J: Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier.
SLF4J: Ignoring binding found at [jar:file:/Users/satyak/iceberg/demo/xtable/utilities-0.1.0-beta1-bundled.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See https://www.slf4j.org/codes.html#ignoredBindings for an explanation.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2024-08-03 14:00:59 INFO io.onetable.utilities.RunSync:141 - Running sync for basePath s3://ambaricloudsatya/prod/orders/ for following table formats [DELTA]
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/satyak/iceberg/demo/xtable/utilities-0.1.0-beta1-bundled.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2024-08-03 14:01:03 INFO io.onetable.client.OneTableClient:264 - No previous OneTable sync for target. Falling back to snapshot sync.
2024-08-03 14:01:04 ERROR io.onetable.utilities.RunSync:164 - Error running sync for s3://ambaricloudsatya/prod/orders/
org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at location: s3://ambaricloudsatya/prod/orders
at org.apache.iceberg.hadoop.HadoopTables.load(HadoopTables.java:97) ~[utilities-0.1.0-beta1-bundled.jar:?]
at io.onetable.iceberg.IcebergTableManager.lambda$getTable$1(IcebergTableManager.java:58) ~[utilities-0.1.0-beta1-bundled.jar:?]
at java.util.Optional.orElseGet(Optional.java:369) ~[?:?]
at io.onetable.iceberg.IcebergTableManager.getTable(IcebergTableManager.java:58) ~[utilities-0.1.0-beta1-bundled.jar:?]
at io.onetable.iceberg.IcebergSourceClient.initSourceTable(IcebergSourceClient.java:81) ~[utilities-0.1.0-beta1-bundled.jar:?]
at io.onetable.iceberg.IcebergSourceClient.getSourceTable(IcebergSourceClient.java:59) ~[utilities-0.1.0-beta1-bundled.jar:?]
at io.onetable.iceberg.IcebergSourceClient.getCurrentSnapshot(IcebergSourceClient.java:129) ~[utilities-0.1.0-beta1-bundled.jar:?]
at io.onetable.spi.extractor.ExtractFromSource.extractSnapshot(ExtractFromSource.java:36) ~[utilities-0.1.0-beta1-bundled.jar:?]
at io.onetable.client.OneTableClient.syncSnapshot(OneTableClient.java:164) ~[utilities-0.1.0-beta1-bundled.jar:?]
at io.onetable.client.OneTableClient.sync(OneTableClient.java:122) ~[utilities-0.1.0-beta1-bundled.jar:?]
at io.onetable.utilities.RunSync.main(RunSync.java:162) ~[utilities-0.1.0-beta1-bundled.jar:?]

Are you willing to submit PR?

  • I am willing to submit a PR!
  • I am willing to submit a PR but need help getting started!

Code of Conduct

@ambaricloud ambaricloud added the bug Something isn't working label Aug 3, 2024
@the-other-tim-brown
Copy link
Contributor

@ambaricloud you'll need to specify the --icebergCatalogConfig in your sync command so that the table can be read from the catalog. Check out item 4 in the readme for running the bundled jar

@ambaricloud
Copy link
Author

@the-other-tim-brown Thank you. For glue, I used the below catalog config. I need to check the same for Polaris.
catalogImpl: org.apache.iceberg.aws.glue.GlueCatalog
catalogName: onetable
catalogOptions:
io-impl: org.apache.iceberg.aws.s3.S3FileIO
warehouse: s3://ambaricloudsatya/prod

@the-other-tim-brown
Copy link
Contributor

@the-other-tim-brown Thank you. For glue, I used the below catalog config. I need to check the same for Polaris. catalogImpl: org.apache.iceberg.aws.glue.GlueCatalog catalogName: onetable catalogOptions: io-impl: org.apache.iceberg.aws.s3.S3FileIO warehouse: s3://ambaricloudsatya/prod

You will need to use Polaris since you are creating the Iceberg table in Polaris. The catalog should match the Iceberg catalog used.

@jeremyakers
Copy link

Hi @the-other-tim-brown - Quick question on this comment:

You will need to use Polaris

Is Polaris (or generic REST catalogs) supported by XTable? I'm trying to find this out and I didn't see anything in the docs about Polaris or generic REST catalogs in general.

@vinishjail97
Copy link
Contributor

@jeremyakers Yes it does, you need to follow instructions for Polaris to register an iceberg table present in storage.
https://polaris.io/#section/Quick-Start/Defining-a-Catalog

Similar instructions for Unity, Glue etc. can be found here. If you are able to get it working for Polaris working, do you mind sharing the commands, we can add the docs similar to Glue and Unity catalog ?

https://xtable.apache.org/docs/unity-catalog#register-the-target-table-in-databricks-unity-catalog
https://xtable.apache.org/docs/glue-catalog#register-the-target-table-in-glue-data-catalog

@sagarlakshmipathy
Copy link
Contributor

I ran into an issue while using Snowflake's polaris catalog. Documenting here.

java -cp /Users/sagarl/Downloads/iceberg-spark-runtime-3.4_2.12-1.4.1.jar:/Users/sagarl/latest/incubator-xtable/xtable-utilities/target/xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:/Users/sagarl/Downloads/bundle-2.20.160.jar:/Users/sagarl/Downloads/url-connection-client-2.20.160.jar org.apache.xtable.utilities.RunSync --datasetConfig config.yaml --icebergCatalogConfig catalog.yaml

Error

2024-09-20 22:55:30 INFO  org.apache.iceberg.RemoveSnapshots:328 - Cleaning up expired files (local, incremental)
2024-09-20 22:55:31 ERROR org.apache.xtable.spi.sync.TableFormatSync:78 - Failed to sync snapshot
org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Delegate access to table with user-specified write location is temporarily not supported.
	at org.apache.iceberg.rest.ErrorHandlers$DefaultErrorHandler.accept(ErrorHandlers.java:157) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
	at org.apache.iceberg.rest.ErrorHandlers$CommitErrorHandler.accept(ErrorHandlers.java:88) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
	at org.apache.iceberg.rest.ErrorHandlers$CommitErrorHandler.accept(ErrorHandlers.java:71) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
	at org.apache.iceberg.rest.HTTPClient.throwFailure(HTTPClient.java:183) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
	at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:292) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
	at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:226) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
	at org.apache.iceberg.rest.HTTPClient.post(HTTPClient.java:337) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
	at org.apache.iceberg.rest.RESTClient.post(RESTClient.java:112) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
	at org.apache.iceberg.rest.RESTTableOperations.commit(RESTTableOperations.java:152) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
	at org.apache.iceberg.BaseTransaction.lambda$commitSimpleTransaction$3(BaseTransaction.java:416) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
	at org.apache.iceberg.BaseTransaction.commitSimpleTransaction(BaseTransaction.java:412) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
	at org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:307) ~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
	at org.apache.xtable.iceberg.IcebergConversionTarget.completeSync(IcebergConversionTarget.java:221) ~[xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
	at org.apache.xtable.spi.sync.TableFormatSync.getSyncResult(TableFormatSync.java:165) ~[xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
	at org.apache.xtable.spi.sync.TableFormatSync.syncSnapshot(TableFormatSync.java:70) [xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
	at org.apache.xtable.conversion.ConversionController.syncSnapshot(ConversionController.java:182) [xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
	at org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:118) [xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
	at org.apache.xtable.utilities.RunSync.main(RunSync.java:191) [xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]

The sync did not completely happen at this point meaning the table gets created in target format in the catalog, but doesn't have data in it.

config.yaml

sourceFormat: HUDI
targetFormats:
  - ICEBERG
datasets:
  -
    tableBasePath: s3://xtable-demo-bucket/spark_demo/people
    tableName: people
    partitionSpec: city:VALUE
    namespace: spark_demo

catalog.yaml

catalogImpl: org.apache.iceberg.rest.RESTCatalog
catalogName: iceberg_catalog
catalogOptions:
  io-impl: org.apache.iceberg.aws.s3.S3FileIO
  warehouse: iceberg_catalog
  uri: https://<polaris-id>.snowflakecomputing.com/polaris/api/catalog
  credential: <client-id>:<client-secret>
  header.X-Iceberg-Access-Delegation: vended-credentials
  scope: PRINCIPAL_ROLE:ALL
  client.region: us-west-2

I could access the table using spark-shell using command

pyspark --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160,software.amazon.awssdk:url-connection-client:2.20.160 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.defaultCatalog=polaris \
--conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.polaris.type=rest \
--conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials \
--conf spark.sql.catalog.polaris.uri=https://<polaris-id>.snowflakecomputing.com/polaris/api/catalog \
--conf spark.sql.catalog.polaris.credential=<client-id>:<client-secret> \
--conf spark.sql.catalog.polaris.warehouse=iceberg_catalog \
--conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:my_spark_admin_role \
--conf spark.sql.catalog.polaris.client.region=us-west-2
>>> spark.sql("USE spark_demo")
DataFrame[]
>>> spark.sql("SHOW TABLES").show()
+----------+----------+-----------+                                             
| namespace| tableName|isTemporary|
+----------+----------+-----------+
|spark_demo|    people|      false|
|spark_demo|test_table|      false|
+----------+----------+-----------+

>>> spark.sql("SELECT * FROM people").show()
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+----+---------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name| id|name|age|city|create_ts|
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+----+---------+
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+----+---------+

>>> 

@sagarlakshmipathy
Copy link
Contributor

I believe this is a separate issue, so tracking it here #545

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants