-
Notifications
You must be signed in to change notification settings - Fork 281
Error creating table from pyarrow schema with pa.uuid() #1986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
we have the UUIDType type already, I think we need to handle the case iceberg-python/pyiceberg/io/pyarrow.py Line 1203 in 260ef54
|
iceberg-python/pyiceberg/io/pyarrow.py Lines 587 to 592 in 8bfb16c
The UUIDType transfer to pyarrow is fixed_size_binary[16] . You might use pa.binary(16) and store the bytes of the uuid in your pyarrow table.
import uuid
import pyarrow as pa
uuids = pa.array([uuid.uuid4().bytes for _ in range(100) ], pa.binary(16)) |
I think I've also encountered this problem, trying to write tests for https://github.com/duckdb/duckdb-iceberg The col_uuid = pa.array([UUID('020d4fc7-acd6-45ac-b216-7873f4038e1f').bytes], pa.uuid()) Results in:
What does work is this: col_uuid = pa.array([UUID('020d4fc7-acd6-45ac-b216-7873f4038e1f').bytes], pa.binary(16)) But the problem then is that the parquet file created by pyiceberg does not have the I think it's fine to accept pa.binary(16), but then pyiceberg should add the update.add_column("col_uuid", UUIDType(), default_value="f79c3e09-677c-4bbd-a479-3f349cb785e7", required=False) |
Apache Iceberg version
0.9.0 (latest release)
Please describe the bug 🐞
Preamble: using a local sqlite db:
A pyiceberg UUID column works fine:
But a pyarrow UUID column gives an error:
The exception is:
Related to simw/pydantic-to-pyarrow#27
Willingness to contribute
The text was updated successfully, but these errors were encountered: