Store polars Series instead of pyarrow Array in cudf_polars LiteralColumn expr #18564

mroeschke · 2025-04-24T00:37:33Z

Description

Contributes to #18534

Depends on:

Add support for large list host Arrow data conversion #18562
pylibcudf.Column accepting objects with __arrow_c_stream__
Store Python scalars instead of PyArrow Scalars in cudf_polars Literal expr #18563

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…mn expr

copy-pr-bot · 2025-04-24T00:37:36Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

…eralcolumn

…m__` (#18712) closes #18573 unblocks #18564 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #18712

…eralcolumn

Closes #18745 and Closes #18819 and contributes to #18534 #15132. I replaced as many pyarrow calls as I could with features supported in this PR. The only notable case I skipped is conversions to pylibcudf Columns from pyarrow string arrays. To support creating pylibcudf columns from string lists, I think we'd need #17192. Nevertheless that case will be covered by #18564 using `pl.Series` instead of `pa.array`. In the future, when we support #17192, we can use `plc.Column.from_list([...])` instead of `plc.Column(pl.Series(...))`. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #18768

…eralcolumn

Follow up to #18768. Addresses this comment #18768 (comment). Also contributes to #18534 #15132 This PR adds support for creating pylibcudf Column from Python iterables containing strings and nested lists of strings. Should also unblock #18564 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #18916

…eralcolumn

…cudf into plc/ref/literalcolumn

…eralcolumn

This reverts commit 4005bd6.

This reverts commit 6fe7d65.

…eralcolumn

…cudf into plc/ref/literalcolumn

…eralcolumn

Matt711 · 2025-06-24T15:22:01Z

python/cudf_polars/cudf_polars/testing/plugin.py

@@ -158,6 +158,10 @@ def pytest_configure(config: pytest.Config) -> None:
    "tests/unit/io/test_scan.py::test_async_read_21945[scan_type2]": "Debug output on stderr doesn't match",
    "tests/unit/io/test_scan.py::test_async_read_21945[scan_type3]": "Debug output on stderr doesn't match",
    "tests/unit/io/test_multiscan.py::test_multiscan_row_index[scan_csv-write_csv-csv]": "Debug output on stderr doesn't match",
+    "tests/unit/functions/range/test_linear_space.py::test_linear_space_date": "Needs https://github.com/pola-rs/polars/issues/23020",
+    "tests/unit/sql/test_temporal.py::test_implicit_temporal_strings[dt IN ('1960-01-07','2077-01-01','2222-02-22')-expected15]": "Needs https://github.com/pola-rs/polars/issues/23020",


I think this is the same issue as https://github.com/pola-rs/polars/issues/21660.

Yup, that was linked in the issue I created here.

Matt711 · 2025-06-24T15:23:37Z

python/cudf_polars/cudf_polars/dsl/expressions/boolean.py

+            if haystack.obj.type().id() == plc.TypeId.LIST:
+                # Unwrap values from the list column
+                haystack = Column(haystack.obj.children()[1])
+                # TODO: Remove check once Column's require dtype


Just noting #19193

Hopefully this PR goes in first and then we can simply remove this check in #19193.

vyasr · 2025-06-24T20:47:16Z

python/cudf_polars/cudf_polars/dsl/expressions/boolean.py

+            if haystack.obj.type().id() == plc.TypeId.LIST:
+                # Unwrap values from the list column
+                haystack = Column(haystack.obj.children()[1])
+                # TODO: Remove check once Column's require dtype


Hopefully this PR goes in first and then we can simply remove this check in #19193.

vyasr · 2025-06-24T20:49:48Z

python/cudf_polars/cudf_polars/dsl/to_ast.py

+                # to a expr.LiteralColumn, so the actual type is in the inner type
+                plc_dtype = DataType(haystack.dtype.polars.inner).plc
+            else:
+                plc_dtype = haystack.dtype.plc  # pragma: no cover


Should we add a test for this case? Is it difficult to test for some reason?

The conditions where we hit this is covered by the Polars unit tests, but I can try to replicate this in our own unit tests in #19193

vyasr · 2025-06-24T20:52:30Z

python/cudf_polars/cudf_polars/utils/dtypes.py

-)
-
-
-def downcast_arrow_lists(typ: pa.DataType) -> pa.DataType:


Love to see this going away.

mroeschke · 2025-06-25T17:30:35Z

/merge

…olars (#19198) This PR and #18564 are the last PRs removing `pylibcudf.interop.to_arrow` in `cudf_polars`. Towards #18534 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #19198

Store polars Series instead of pyarrow Array in cudf_polars LiterColu…

3dc3dc4

…mn expr

mroeschke added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change cudf-polars Issues specific to cudf-polars labels Apr 24, 2025

github-project-automation bot added this to cuDF Python Apr 24, 2025

github-actions bot assigned mroeschke Apr 24, 2025

github-actions bot added the Python Affects Python cuDF API. label Apr 24, 2025

vyasr mentioned this pull request Apr 25, 2025

Add support for large list host Arrow data conversion #18562

Merged

3 tasks

This was referenced May 5, 2025

Bump polars version to <1.29 #18581

Merged

Allow pylibcudf.Column to consume objects exposing __arrow_c_stream__ #18712

Merged

mroeschke added 2 commits May 8, 2025 16:44

Merge remote-tracking branch 'upstream/branch-25.06' into plc/ref/lit…

e816d23

…eralcolumn

Start converting pa.array to pl.Series calls

7f73061

Matt711 mentioned this pull request May 14, 2025

Create a pylibcudf Column from a python iterable #18768

Merged

3 tasks

vyasr mentioned this pull request May 16, 2025

Store Python scalars instead of PyArrow Scalars in cudf_polars Literal expr #18563

Merged

5 tasks

Merge remote-tracking branch 'upstream/branch-25.06' into plc/ref/lit…

1d996fc

…eralcolumn

mroeschke changed the title ~~Store polars Series instead of pyarrow Array in cudf_polars LiterColumn expr~~ Store polars Series instead of pyarrow Array in cudf_polars LiteralColumn expr May 20, 2025

GPUtester moved this to In Progress in cuDF Python May 20, 2025

vyasr changed the base branch from branch-25.06 to branch-25.08 May 20, 2025 23:32

Merge remote-tracking branch 'upstream/branch-25.06' into plc/ref/lit…

33e84ba

…eralcolumn

github-actions bot added the pylibcudf Issues specific to the pylibcudf package label May 21, 2025

This was referenced May 21, 2025

Create a pylibcudf Column from a iterable of python strings #18916

Merged

Bump polars version to <1.31 #18920

Merged

[Story] Add standard data ingestion pipelines to pylibcudf #15132

Closed

mroeschke added 3 commits May 29, 2025 11:10

Merge remote-tracking branch 'upstream/branch-25.08' into plc/ref/lit…

9ab0e49

…eralcolumn

Fix annotation, use more property defined methods

ef95d2c

Fix BooleanFunction when haystack is LiteralColumn

46f7c59

mroeschke and others added 3 commits June 10, 2025 18:52

Merge remote-tracking branch 'upstream/branch-25.08' into plc/ref/lit…

9e64974

…eralcolumn

Use Matt M's git patch

6fe7d65

Merge branch 'branch-25.08' into plc/ref/literalcolumn

a927c53

github-actions bot assigned Matt711 Jun 11, 2025

mroeschke added 13 commits June 11, 2025 11:04

Merge remote-tracking branch 'upstream/branch-25.08' into plc/ref/lit…

4667b75

…eralcolumn

Try second patch

4005bd6

Merge branch 'plc/ref/literalcolumn' of https://github.com/mroeschke/…

aa3b512

…cudf into plc/ref/literalcolumn

Merge remote-tracking branch 'upstream/branch-25.08' into plc/ref/lit…

e0985fc

…eralcolumn

Revert "Try second patch"

12db1e8

This reverts commit 4005bd6.

Revert "Use Matt M's git patch"

8e6d28c

This reverts commit 6fe7d65.

Merge remote-tracking branch 'upstream/branch-25.08' into plc/ref/lit…

85e0e70

…eralcolumn

Merge remote-tracking branch 'upstream/branch-25.08' into plc/ref/lit…

d5030a4

…eralcolumn

Use needles.dtype now

67ec21a

Merge branch 'branch-25.08' into plc/ref/literalcolumn

d112806

Check if needs has a dtype assigned

ecc80da

Merge remote-tracking branch 'upstream/branch-25.08' into plc/ref/lit…

a0858ed

…eralcolumn

Merge branch 'plc/ref/literalcolumn' of https://github.com/mroeschke/…

031d6ce

…cudf into plc/ref/literalcolumn

mroeschke requested review from Matt711 and vyasr June 13, 2025 22:14

mroeschke added 2 commits June 17, 2025 15:34

Merge remote-tracking branch 'upstream/branch-25.08' into plc/ref/lit…

3742e98

…eralcolumn

Merge remote-tracking branch 'upstream/branch-25.08' into plc/ref/lit…

10e0d38

…eralcolumn

mroeschke mentioned this pull request Jun 18, 2025

Avoid pylibcudf.interop.to_arrow in DataFrame.to_polars in cudf_polars #19198

Merged

3 tasks

Merge remote-tracking branch 'upstream/branch-25.08' into plc/ref/lit…

02b95b4

…eralcolumn

Matt711 approved these changes Jun 24, 2025

View reviewed changes

vyasr approved these changes Jun 24, 2025

View reviewed changes

rapids-bot bot merged commit fb1628a into rapidsai:branch-25.08 Jun 25, 2025
91 checks passed

github-project-automation bot moved this from In Progress to Done in cuDF Python Jun 25, 2025

mroeschke deleted the plc/ref/literalcolumn branch June 25, 2025 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Store polars Series instead of pyarrow Array in cudf_polars LiteralColumn expr #18564

Store polars Series instead of pyarrow Array in cudf_polars LiteralColumn expr #18564

Uh oh!

mroeschke commented Apr 24, 2025 •

edited by vyasr

Loading

Uh oh!

copy-pr-bot bot commented Apr 24, 2025

Uh oh!

Matt711 Jun 24, 2025

Uh oh!

mroeschke Jun 25, 2025

Uh oh!

Matt711 Jun 24, 2025

Uh oh!

vyasr Jun 24, 2025

Uh oh!

vyasr Jun 24, 2025

Uh oh!

vyasr Jun 24, 2025

Uh oh!

mroeschke Jun 25, 2025

Uh oh!

vyasr Jun 24, 2025

Uh oh!

mroeschke commented Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!

Store polars Series instead of pyarrow Array in cudf_polars LiteralColumn expr #18564

Store polars Series instead of pyarrow Array in cudf_polars LiteralColumn expr #18564

Uh oh!

Conversation

mroeschke commented Apr 24, 2025 • edited by vyasr Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

copy-pr-bot bot commented Apr 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mroeschke commented Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!

mroeschke commented Apr 24, 2025 •

edited by vyasr

Loading