You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using time travel to retrieve a previous version of a table via a snapshot ID, the table’s schema is used instead of the snapshot's schema, contrary to the documentation.
Reproduction code:
# Create the tablespark_session.sql(f"CREATE TABLE iceberg_test (id bigint, data string, col float)")
# Populate the tablespark_session.sql(f"INSERT INTO iceberg_test values (1, 'a', 1.0), (2, 'b', 2.0), (3, 'c', 3.0)")
# Rename 'col' to 'value'spark_session.sql(f"ALTER TABLE iceberg_test RENAME COLUMN col TO value")
# Insert a new rowspark_session.sql(f"INSERT INTO iceberg_test values (4, 'd', 4.0)")
# Time-travel to the first snapshot_id provided by iceberg_test.snapshotssnapshot_1=spark_session.sql(f"SELECT * FROM iceberg_test VERSION AS OF <INSERT SNAPSHOT ID>")
# Operation on the renamed fieldsnapshot_1.filter("col == 2.0").show()
We end up with the following error:
Py4JJavaError: An error occurred while calling o111.showString.
: org.apache.iceberg.exceptions.ValidationException: Cannot find field 'col' in struct: struct<1: id: optional long, 2: data: optional string, 3: value: optional float>
NOTES:
snapshot_1.printSchema() would confirm that the field name is col and not value, as per the last snapshot
The error also occurs when using the Spark DataFrame API
Willingness to contribute
I can contribute a fix for this bug independently
I would be willing to contribute a fix for this bug with guidance from the Iceberg community
I cannot contribute a fix for this bug at this time
The text was updated successfully, but these errors were encountered:
Apache Iceberg version
1.5.0
Query engine
Spark
Please describe the bug 🐞
When using time travel to retrieve a previous version of a table via a snapshot ID, the table’s schema is used instead of the snapshot's schema, contrary to the documentation.
Reproduction code:
We end up with the following error:
NOTES:
snapshot_1.printSchema()
would confirm that the field name iscol
and notvalue
, as per the last snapshotWillingness to contribute
The text was updated successfully, but these errors were encountered: