Skip to content

Store polars Series instead of pyarrow Array in cudf_polars LiteralColumn expr #18564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 61 commits into from
Jun 25, 2025

Conversation

mroeschke
Copy link
Contributor

@mroeschke mroeschke commented Apr 24, 2025

Description

Contributes to #18534

Depends on:

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change cudf-polars Issues specific to cudf-polars labels Apr 24, 2025
Copy link

copy-pr-bot bot commented Apr 24, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

rapids-bot bot pushed a commit that referenced this pull request May 13, 2025
…m__` (#18712)

closes #18573

unblocks #18564

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #18712
@mroeschke mroeschke changed the title Store polars Series instead of pyarrow Array in cudf_polars LiterColumn expr Store polars Series instead of pyarrow Array in cudf_polars LiteralColumn expr May 20, 2025
@GPUtester GPUtester moved this to In Progress in cuDF Python May 20, 2025
@vyasr vyasr changed the base branch from branch-25.06 to branch-25.08 May 20, 2025 23:32
rapids-bot bot pushed a commit that referenced this pull request May 21, 2025
Closes #18745 and Closes #18819 and contributes to #18534 #15132. I replaced as many pyarrow  calls as I could with features supported in this PR. The only notable case I skipped is conversions to pylibcudf Columns from pyarrow string arrays. To support creating pylibcudf columns from string lists, I think we'd need #17192. Nevertheless that case will be covered by #18564 using `pl.Series` instead of `pa.array`.

In the future, when we support #17192, we can use `plc.Column.from_list([...])` instead of `plc.Column(pl.Series(...))`.

Authors:
  - Matthew Murray (https://github.com/Matt711)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #18768
@github-actions github-actions bot added the pylibcudf Issues specific to the pylibcudf package label May 21, 2025
rapids-bot bot pushed a commit that referenced this pull request May 29, 2025
Follow up to #18768. Addresses this comment #18768 (comment). Also contributes to #18534 #15132

This PR adds support for creating pylibcudf Column  from Python iterables containing strings and nested lists of strings.

Should also unblock #18564

Authors:
  - Matthew Murray (https://github.com/Matt711)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #18916
@mroeschke mroeschke requested review from Matt711 and vyasr June 13, 2025 22:14
@@ -158,6 +158,10 @@ def pytest_configure(config: pytest.Config) -> None:
"tests/unit/io/test_scan.py::test_async_read_21945[scan_type2]": "Debug output on stderr doesn't match",
"tests/unit/io/test_scan.py::test_async_read_21945[scan_type3]": "Debug output on stderr doesn't match",
"tests/unit/io/test_multiscan.py::test_multiscan_row_index[scan_csv-write_csv-csv]": "Debug output on stderr doesn't match",
"tests/unit/functions/range/test_linear_space.py::test_linear_space_date": "Needs https://github.com/pola-rs/polars/issues/23020",
"tests/unit/sql/test_temporal.py::test_implicit_temporal_strings[dt IN ('1960-01-07','2077-01-01','2222-02-22')-expected15]": "Needs https://github.com/pola-rs/polars/issues/23020",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the same issue as https://github.com/pola-rs/polars/issues/21660.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that was linked in the issue I created here.

if haystack.obj.type().id() == plc.TypeId.LIST:
# Unwrap values from the list column
haystack = Column(haystack.obj.children()[1])
# TODO: Remove check once Column's require dtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting #19193

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this PR goes in first and then we can simply remove this check in #19193.

if haystack.obj.type().id() == plc.TypeId.LIST:
# Unwrap values from the list column
haystack = Column(haystack.obj.children()[1])
# TODO: Remove check once Column's require dtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this PR goes in first and then we can simply remove this check in #19193.

# to a expr.LiteralColumn, so the actual type is in the inner type
plc_dtype = DataType(haystack.dtype.polars.inner).plc
else:
plc_dtype = haystack.dtype.plc # pragma: no cover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test for this case? Is it difficult to test for some reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditions where we hit this is covered by the Polars unit tests, but I can try to replicate this in our own unit tests in #19193

)


def downcast_arrow_lists(typ: pa.DataType) -> pa.DataType:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love to see this going away.

@mroeschke
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit fb1628a into rapidsai:branch-25.08 Jun 25, 2025
91 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Jun 25, 2025
@mroeschke mroeschke deleted the plc/ref/literalcolumn branch June 25, 2025 17:30
rapids-bot bot pushed a commit that referenced this pull request Jun 25, 2025
…olars (#19198)

This PR and #18564 are the last PRs removing `pylibcudf.interop.to_arrow` in `cudf_polars`.

Towards #18534

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #19198
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf-polars Issues specific to cudf-polars improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants