Skip to content

[BUG]: Incorrect result in non-coalesce join with experimental streaming executor and multiple partitions #19153

Closed
@TomAugspurger

Description

@TomAugspurger

Describe the bug

The test python/cudf_polars/tests/test_join.py::test_non_coalesce_join[left-nulls_not_equal-join_expr0]
fails when using a small blocksize / multiple partitions.

Steps/Code to reproduce bug

Here's a simplified example

import polars as pl

from cudf_polars.testing.asserts import assert_gpu_result_equal

left = pl.LazyFrame(
    {
        "a": [1, 2, 3, 1, None],
        "b": [1, 2, 3, 4, 5],
        "c": [2, 3, 4, 5, 6],
    }
)

right = pl.LazyFrame(
    {
        "a": [1, 4, 3, 7, None, None, 1],
        "c": [2, 3, 4, 5, 6, 7, 8],
        "d": [6, None, 7, 8, -1, 2, 4],
    }
)

q = left.join(right, on=pl.col("a"), how="inner", nulls_equal=False, coalesce=False)
assert_gpu_result_equal(q, engine=pl.GPUEngine(executor="streaming", executor_options={"max_rows_per_partition": 3}))

which fails with

AssertionError: DataFrames are different (value mismatch for column 'a')
[left]:  [1, 1, 3, 1, 1]
[right]: [1, 3, 1, 1, 1]

Expected behavior

Match polars / no error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcudf-polarsIssues specific to cudf-polars

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions