Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: to_pyarrow() with NULL typed values has an unfriendly error #10153

Closed
1 task done
NickCrews opened this issue Sep 18, 2024 · 2 comments
Closed
1 task done

bug: to_pyarrow() with NULL typed values has an unfriendly error #10153

NickCrews opened this issue Sep 18, 2024 · 2 comments
Labels
bug Incorrect behavior inside of ibis

Comments

@NickCrews
Copy link
Contributor

NickCrews commented Sep 18, 2024

What happened?

import ibis

t = ibis.memtable({"a": [1, 2, 3]})
t = t.mutate(b=ibis.literal(None))
print(t.schema())
# ibis.Schema {
#   a  int64
#   b  null
# }
print(ibis.to_sql(t))
# SELECT
#   "t0"."a",
#   NULL AS "b"
# FROM "ibis_pandas_memtable_mdzwbpd3yraixd6urow744s6m4" AS "t0"
t.to_pyarrow()
Traceback
---------------------------------------------------------------------------
ArrowNotImplementedError                  Traceback (most recent call last)
Cell In[36], line 7
      5 print(t.schema())
      6 print(ibis.to_sql(t))
----> 7 t.to_pyarrow()

File ~/code/ibis/ibis/expr/types/core.py:489, in Expr.to_pyarrow(self, params, limit, **kwargs)
    461 @experimental
    462 def to_pyarrow(
    463     self,
   (...)
    467     **kwargs: Any,
    468 ) -> pa.Table:
    469     \"\"\"Execute expression and return results in as a pyarrow table.
    470 
    471     This method is eager and will execute the associated expression
   (...)
    487         A pyarrow table holding the results of the executed expression.
    488     \"\"\"
--> 489     return self._find_backend(use_default=True).to_pyarrow(
    490         self, params=params, limit=limit, **kwargs
    491     )

File ~/code/ibis/ibis/backends/duckdb/__init__.py:1431, in Backend.to_pyarrow(self, expr, params, limit, **_)
   1422 def to_pyarrow(
   1423     self,
   1424     expr: ir.Expr,
   (...)
   1428     **_: Any,
   1429 ) -> pa.Table:
   1430     table = self._to_duckdb_relation(expr, params=params, limit=limit).arrow()
-> 1431     return expr.__pyarrow_result__(table)

File ~/code/ibis/ibis/expr/types/relations.py:212, in Table.__pyarrow_result__(self, table, schema, data_mapper)
    209 if data_mapper is None:
    210     from ibis.formats.pyarrow import PyArrowData as data_mapper
--> 212 return data_mapper.convert_table(
    213     table, self.schema() if schema is None else schema
    214 )

File ~/code/ibis/ibis/formats/pyarrow.py:327, in PyArrowData.convert_table(cls, table, schema)
    324 pa_schema = table.schema
    326 if pa_schema != desired_schema:
--> 327     return table.cast(desired_schema, safe=False)
    328 else:
    329     return table

File ~/code/scg/atlas/.venv/lib/python3.11/site-packages/pyarrow/table.pxi:4555, in pyarrow.lib.Table.cast()

File ~/code/scg/atlas/.venv/lib/python3.11/site-packages/pyarrow/table.pxi:574, in pyarrow.lib.ChunkedArray.cast()

File ~/code/scg/atlas/.venv/lib/python3.11/site-packages/pyarrow/compute.py:405, in cast(arr, target_type, safe, options, memory_pool)
    403     else:
    404         options = CastOptions.safe(target_type)
--> 405 return call_function(\"cast\", [arr], options, memory_pool)

File ~/code/scg/atlas/.venv/lib/python3.11/site-packages/pyarrow/_compute.pyx:590, in pyarrow._compute.call_function()

File ~/code/scg/atlas/.venv/lib/python3.11/site-packages/pyarrow/_compute.pyx:385, in pyarrow._compute.Function.call()

File ~/code/scg/atlas/.venv/lib/python3.11/site-packages/pyarrow/error.pxi:155, in pyarrow.lib.pyarrow_internal_check_status()

File ~/code/scg/atlas/.venv/lib/python3.11/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()

ArrowNotImplementedError: Unsupported cast from int32 to null using function cast_null

I'm not exactly sure what we want to happen here, but I think it could be friendlier, or fail earlier. Possible options:

  1. Fail when ibis.literal(None) is called, we should have to pass a specific type. But I don't think this is a good idea, since this untyped value in some circumstances can be type-inferred in the context, for example if you are conn.insert()ing it into an existing table, the backend can cast the NULL to whatever it needs to be
  2. in to_pyarrow() do a full traversal through all the types (including struct fields and array element types), and if there are any dt.Null types, then error with a more helpful error
  3. Actually return a pyarrow table with type NULL, since arrow supports a "NULL" datatype.

What version of ibis are you using?

main

What backend(s) are you using, if any?

duckdb

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@NickCrews NickCrews added the bug Incorrect behavior inside of ibis label Sep 18, 2024
@NickCrews
Copy link
Contributor Author

Ah, I think I am running into duckdb/duckdb#7149

@cpcloud
Copy link
Member

cpcloud commented Sep 18, 2024

Closing in favor of #9669.

@cpcloud cpcloud closed this as not planned Won't fix, can't repro, duplicate, stale Sep 18, 2024
@github-project-automation github-project-automation bot moved this from backlog to done in Ibis planning and roadmap Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Archived in project
Development

No branches or pull requests

2 participants