Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect schema used when using time-travel #11162

Open
1 of 3 tasks
ghost opened this issue Sep 18, 2024 · 2 comments
Open
1 of 3 tasks

Incorrect schema used when using time-travel #11162

ghost opened this issue Sep 18, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@ghost
Copy link

ghost commented Sep 18, 2024

Apache Iceberg version

1.5.0

Query engine

Spark

Please describe the bug 🐞

When using time travel to retrieve a previous version of a table via a snapshot ID, the table’s schema is used instead of the snapshot's schema, contrary to the documentation.

Reproduction code:

# Create the table
spark_session.sql(f"CREATE TABLE iceberg_test (id bigint, data string, col float)")

# Populate the table
spark_session.sql(f"INSERT INTO iceberg_test values (1, 'a', 1.0), (2, 'b', 2.0), (3, 'c', 3.0)")

# Rename 'col' to 'value'
spark_session.sql(f"ALTER TABLE iceberg_test RENAME COLUMN col TO value")

# Insert a new row
spark_session.sql(f"INSERT INTO iceberg_test values (4, 'd', 4.0)")

# Time-travel to the first snapshot_id provided by iceberg_test.snapshots
snapshot_1 = spark_session.sql(f"SELECT * FROM iceberg_test VERSION AS OF <INSERT SNAPSHOT ID>")

# Operation on the renamed field
snapshot_1.filter("col == 2.0").show()

We end up with the following error:

Py4JJavaError: An error occurred while calling o111.showString.
: org.apache.iceberg.exceptions.ValidationException: Cannot find field 'col' in struct: struct<1: id: optional long, 2: data: optional string, 3: value: optional float>

NOTES:

  • snapshot_1.printSchema() would confirm that the field name is col and not value, as per the last snapshot
  • The error also occurs when using the Spark DataFrame API

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
@ghost ghost added the bug Something isn't working label Sep 18, 2024
@jishangarg
Copy link

Hi @fides-bot, can I know which version of Spark you are using?

@ghost
Copy link
Author

ghost commented Sep 19, 2024

Hi @jishangarg, we're using Spark 3.5.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant