-
Notifications
You must be signed in to change notification settings - Fork 952
Fix slicing after Join
and GroupBy
in streaming cudf-polars
#19187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
edit: probably not an issue. See #19187 (comment) @rjzamora there's one similar issue on import polars as pl
from cudf_polars.testing.asserts import assert_gpu_result_equal
left = pl.LazyFrame(
{
"a": [1, 2, 3, 1, None],
"b": [1, 2, 3, 4, 5],
"c": [2, 3, 4, 5, 6],
}
)
right = pl.LazyFrame(
{
"a": [1, 4, 3, 7, None, None, 1],
"c": [2, 3, 4, 5, 6, 7, 8],
"d": [6, None, 7, 8, -1, 2, 4],
}
)
q = left.join(right, on="a", how="left", nulls_equal=False, coalesce=True).slice(1, 5)
assert_gpu_result_equal(q, engine=pl.GPUEngine(executor="streaming", executor_options={"max_rows_per_partition": 4})) which fails with
I'm starting to investigate that now. Should we handle that in a separate PR? |
Maybe this is just a join ordering issue. I don't think that
import polars as pl
from cudf_polars.testing.asserts import assert_gpu_result_equal
left = pl.LazyFrame(
{
"a": [1, 2, 3, 1, None],
"b": [1, 2, 3, 4, 5],
"c": [2, 3, 4, 5, 6],
}
)
right = pl.LazyFrame(
{
"a": [1, 4, 3, 7, None, None, 1],
"c": [2, 3, 4, 5, 6, 7, 8],
"d": [6, None, 7, 8, -1, 2, 4],
}
)
engine = pl.GPUEngine(
executor="streaming", executor_options={"max_rows_per_partition": 4}
)
q = left.join(right, on="a", how="left", maintain_order="left").slice(1, 5)
assert_gpu_result_equal(q, engine=engine) so I'll just plan to update #19146 to adjust these tests somehow, probably by specifying the engine explicitly... |
Yes, this is exactly how I understand it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, #19186 caused some failures here. I think just removing the config_options
should suffice.
/merge |
Description
The streaming executor does not properly handle slicing after
Join
orGroupBy
. Rather than slicing the "reduced" join/gropuby result, each partition is sliced individually. This PR includes a general fix, by pulling the slice operation out ofJoin
andGroupBy
nodes.Checklist