Error creating table from pyarrow schema with pa.uuid() #1986

simw · 2025-05-10T23:20:00Z

Apache Iceberg version

0.9.0 (latest release)

Please describe the bug 🐞

Preamble: using a local sqlite db:

from pyiceberg.catalog import load_catalog

warehouse_path = "data/warehouse"
catalog = load_catalog(
    "default",
    **{
        'type': 'sql',
        "uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
        "warehouse": f"file://{warehouse_path}",
    },
)

A pyiceberg UUID column works fine:

from pyiceberg.schema import Schema
from pyiceberg.types import NestedField, UUIDType

schema = Schema(
    NestedField(field_id=1, name="uuid", field_type=UUIDType(), required=False),
)

catalog.create_table("default.test2", schema=schema)

But a pyarrow UUID column gives an error:

import pyarrow as pa

schema = pa.schema([pa.field("foo", pa.uuid(), nullable=True)])

catalog.create_table("default.test4", schema=schema)

The exception is:

File ~/Code/Projects/others/icebergs/.venv/lib/python3.13/site-packages/pyiceberg/io/pyarrow.py:1032, in _(obj, visitor)
   1030     result = visit_pyarrow(field_type, visitor)
   1031 except TypeError as e:
-> 1032     raise UnsupportedPyArrowTypeException(obj, f"Column '{obj.name}' has an unsupported type: {field_type}") from e
   1033 visitor.after_field(obj)
   1035 return visitor.field(obj, result)

UnsupportedPyArrowTypeException: Column 'foo' has an unsupported type: extension<arrow.uuid>

Related to simw/pydantic-to-pyarrow#27

Willingness to contribute

I can contribute a fix for this bug independently
I would be willing to contribute a fix for this bug with guidance from the Iceberg community
I cannot contribute a fix for this bug at this time

The text was updated successfully, but these errors were encountered:

jim-ngoo · 2025-05-12T02:28:24Z

we have the UUIDType type already, ~~I think what we missed is the visit_pyarrow decorator~~

I think we need to handle the case pa.uuid() here:

iceberg-python/pyiceberg/io/pyarrow.py

Line 1203 in 260ef54

def primitive(self, primitive: pa.DataType) -> PrimitiveType:

DinGo4DEV · 2025-05-14T07:26:39Z

iceberg-python/pyiceberg/io/pyarrow.py

Lines 587 to 592 in 8bfb16c

    
           def schema_to_pyarrow( 
        
               schema: Union[Schema, IcebergType], 
        
               metadata: Dict[bytes, bytes] = EMPTY_DICT, 
        
               include_field_ids: bool = True, 
        
           ) -> pa.schema: 
        
               return visit(schema, _ConvertToArrowSchema(metadata, include_field_ids))

The UUIDType transfer to pyarrow is fixed_size_binary[16] . You might use pa.binary(16) and store the bytes of the uuid in your pyarrow table.

import uuid
import pyarrow as pa
uuids = pa.array([uuid.uuid4().bytes for _ in range(100) ], pa.binary(16))

Tishj · 2025-05-14T10:59:32Z

I think I've also encountered this problem, trying to write tests for https://github.com/duckdb/duckdb-iceberg
I am using a build from main (0.10 dev)

The table.append(...) method won't accept this array:

col_uuid = pa.array([UUID('020d4fc7-acd6-45ac-b216-7873f4038e1f').bytes], pa.uuid())

Results in:

  File "/iceberg-python/pyiceberg/io/pyarrow.py", line 1061, in _
    raise UnsupportedPyArrowTypeException(obj, f"Column '{obj.name}' has an unsupported type: {field_type}") from e
pyiceberg.io.pyarrow.UnsupportedPyArrowTypeException: Column 'col_uuid' has an unsupported type: extension<arrow.uuid>

What does work is this:

col_uuid = pa.array([UUID('020d4fc7-acd6-45ac-b216-7873f4038e1f').bytes], pa.binary(16))

But the problem then is that the parquet file created by pyiceberg does not have the UUIDType() logical type for the field, which trips up our (duckdb's) parquet reader.

I think it's fine to accept pa.binary(16), but then pyiceberg should add the UUIDType() logical type to the parquet file's field, if the destination is a uuidtype, such as this:

update.add_column("col_uuid", UUIDType(), default_value="f79c3e09-677c-4bbd-a479-3f349cb785e7", required=False)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error creating table from pyarrow schema with pa.uuid() #1986

Error creating table from pyarrow schema with pa.uuid() #1986

simw commented May 10, 2025

jim-ngoo commented May 12, 2025 •

edited

Loading

DinGo4DEV commented May 14, 2025

Tishj commented May 14, 2025 •

edited

Loading

Error creating table from pyarrow schema with pa.uuid() #1986

Error creating table from pyarrow schema with pa.uuid() #1986

Comments

simw commented May 10, 2025

Apache Iceberg version

Please describe the bug 🐞

Willingness to contribute

jim-ngoo commented May 12, 2025 • edited Loading

DinGo4DEV commented May 14, 2025

Tishj commented May 14, 2025 • edited Loading

jim-ngoo commented May 12, 2025 •

edited

Loading

Tishj commented May 14, 2025 •

edited

Loading