Commit 27abc1d
authored
Handle dealloc in stream-ordered cudf-polars ops (#20467)
This updates cudf-polars' usage of CUDA streams to safely handle deallocation.
Consider the following sequence of operations:
1. Read some data on stream A
2. Read some data on stream B
3. Concat data from A and B on new stream C
cudf-polars currently ensures that C is downstream of `A` and `B` before doing the concat. But then our execution will typically drop all references to the data (on streams A or B), at which point Python's reference counting will call a `cudaFreeAsync` on streams A and B to free the memory used by the data from 1 and 2.
We need to ensure that this stream ordered free happens after the result from 3 (on stream C) is ready, and so we join stream C into each of A and B in each place where we previously just joined the streams.
Authors:
- Tom Augspurger (https://github.com/TomAugspurger)
Approvers:
- Richard (Rick) Zamora (https://github.com/rjzamora)
- Lawrence Mitchell (https://github.com/wence-)
URL: #204671 parent 1226588 commit 27abc1d
1 file changed
+186
-137
lines changed
0 commit comments