Skip to content

Implement __dlpack__ dunder for pylibcudf columns #18566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: branch-25.06
Choose a base branch
from

Conversation

seberg
Copy link
Contributor

@seberg seberg commented Apr 24, 2025

This implements the __dlpack__ dunder (and __dlpack_device__), which could then be also forwarded to libcudf columns.

There is a bit of a clash with the old dlpack implementation. It is similar, but also different because the old one is table centric and always copies, while this is column centric and copies only if requested.
Thus, I kept it as detailed API.

The from_dlpack() can/should be extended to support at least 1-D objects that implement __dlpack__ (although)

One of the more complex things here is the stream synchronization unfortunately, it seems very hard to test reliably in practice. (My tries didn't produce a failure when it should fail.)


Marking as draft, since there is likely some discussion/thought needed. For one, while it existed I am not sure I believe in exposing this in C++ (at this time). And then it might work to create a helper/intermediate object rather than doing it all here?

(Also wondering if it wouldn't be easier to just vendor the dlpack header?)

CC @vyasr, since I think you were interested in this.

This implements the `__dlpack__` dunder (and `__dlpack_device__`),
which could then be also forwarded to libcudf columns.

There is a bit of a clash with the old dlpack implementation.
It is similar, but also different because the old one is table centric
and always copies, while this is column centric and copies only if
requested.
Thus, I kept it as detailed API.

The `from_dlpack()` can/should be extended to support at least 1-D
objects that implement `__dlpack__` (although)

One of the more complex things here is the stream synchronization
unfortunately, it seems very hard to test reliably in practice.

Signed-off-by: Sebastian Berg <[email protected]>
Copy link

copy-pr-bot bot commented Apr 24, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. CMake CMake build issue pylibcudf Issues specific to the pylibcudf package labels Apr 24, 2025
@seberg seberg added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Apr 24, 2025
@Matt711
Copy link
Contributor

Matt711 commented Apr 24, 2025

Just noting a relevant issue I've worked on recently.

@vyasr
Copy link
Contributor

vyasr commented Apr 25, 2025

Thanks Sebastian! Yes, this is definitely of interest. Probably the most relevant similar things that have happened recently are #18402 and #15370, the corresponding Arrow protocols.

One of the more complex things here is the stream synchronization unfortunately, it seems very hard to test reliably in practice. (My tries didn't produce a failure when it should fail.)

Possibly because everything in cudf executes on the default stream anyway. You would have to explicitly request a different (non-blocking) stream to test this I think. Not sure if you tried that.

Marking as draft, since there is likely some discussion/thought needed. For one, while it existed I am not sure I believe in exposing this in C++ (at this time). And then it might work to create a helper/intermediate object rather than doing it all here?

I'm not sure that I understand. The dlpack spec is intended to support C-level interfacing as well as Python, so are you just asking whether libcudf wants to do that?

(Also wondering if it wouldn't be easier to just vendor the dlpack header?)

As in vendoring vs cloning as part of the build? We don't vendor too much in RAPIDS, but the dlpack header is small enough that it could probably be justified.

@seberg
Copy link
Contributor Author

seberg commented Apr 25, 2025

The dlpack spec is intended to support C-level interfacing as well as Python

Yeah,unfortunately we don't have as clearly defined interface for exchange in C/C++. So I think it makes sense to make it public (with a slightly different signature).
I.e. the C++ function here asks the caller to do the ownership tracking and that may always be the case (you could do an overload for a shared_ptr<column> as input).

But yeah, we could just expose it outside detail, I might do two modifications:

  1. skip the to_host and copy arguments
  2. Add DLVersion max_version struct to be passed in.

Maybe actually safer to make it to_dlpack_v1 function in C++.

You would have to explicitly request a different (non-blocking) stream

Yeah, was using a second non-blocking cupy stream. But I should try mixing host/device copy and kernel launches rather than two kernel launches to improve the chance of seeing something, maybe.

As in vendoring vs cloning as part of the build?

Yeah, most projects do and I think it should be a build-time and not a runtime dependency in conda. But I can also just move it there. DLPack should take care to not break ABI in unexpected ways. Possible ABI change may also make the to_dlpack_v1 naming clearer.

@vyasr
Copy link
Contributor

vyasr commented Apr 28, 2025

I would be fine with vendoring the header. dlpack is simpler than arrow and we are also not using a large helper library like nanoarrow that is a moving target. Vendoring a single header for this would probably be simpler and also address issues like #12175.

Do we need to support multiple versions? Can we just go straight to the newer versioned dlpack structs?

@seberg
Copy link
Contributor Author

seberg commented Apr 29, 2025

Do we need to support multiple versions? Can we just go straight to the newer versioned dlpack structs?

Maybe we should just to keep things simple and since nobody complained much about the lack, yet.
Roll-out of versioned support is still in progress (e.g. torch doesn't have it yet, while cupy/numpy support it for long enough now).

I would be fine with vendoring the header.

👍

@vyasr
Copy link
Contributor

vyasr commented May 7, 2025

Practically speaking I think Arrow data interchange is more valuable for cudf than dlpack, especially on the export side. Since nested types involve multiple buffers, using dlpack for that data requires looping over each buffer, which really is only supportable from Python at the pylibcudf level since we don't expose those buffers in the public pandas- or polars-like APIs. We're still building out the pylibcudf API for real public consumption, so if we can get our dlpack ducks in a row in time for that I think that would be sufficient.

@vyasr vyasr moved this to In Progress in cuDF Python May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants