Configurable blocksize mode for streaming executor in unit tests #19146

TomAugspurger · 2025-06-12T19:24:39Z

This adds a new test option --blocksize-mode to the test runner, which lets us easily exercise the multi-partition code paths of the streaming executor.

When --blocksize-mode=small and --executor="streaming" the default behavior will be to create a GPUEngine with max_rows_per_partition=5 (which controls the partition size from in-memory dataframes) and target_partition_size=10 (which controls the partition size from parquet sources).

Towards #18928

This adds a new test option ``--blocksize-mode`` to the test runner, which lets us easily exercise the multi-partition code paths of the streaming executor. When ``--blocksize-mode=small`` and ``--executor="streaming"`` the default behavior will be to create a GPUEngine with ``max_rows_per_partition=5`` and ``target_partition_size=10``.

copy-pr-bot · 2025-06-12T19:24:42Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

TomAugspurger · 2025-06-12T19:32:30Z

The bad news: 196 tests fail with this. I'm starting to work through those in the issues linked from #18928. Fixes for those will probably go in separate PRs.

But I wanted to put this up a little early to discuss the general approach. For now, I've added a separate run of the full cudf_polars/tests suite with --blocksize-mode="small". This is probably sufficient to get the coverage we want, but is a bit of a blunt tool.

In particular, I don't the interaction between that second test run and the tests that want specific control over the GPU engine / configuration. Right now those tests will be run twice with the exact same configuration (--blocksize-mode, like --scheduler only has an effect when the engine passed to assert_gpu_result_equal is None.

As an alternative, we could make engine a required argument of assert_gpu_result_equal and force the test to think a bit about what it wants. The majority of our tests don't care, and so can just use an engine fixture, which we can parametrize over the blocksize-modes to get good coverage. That's a bunch of busywork to add an engine fixture to each test that doesn't already have it and will result in a bigger diff, but is a bit cleaner. Anyone have thoughts on whether that's worth doing?

…nfigure-partitions

TomAugspurger · 2025-06-18T22:01:26Z

Quick status update here: We have two more PRs in the works that fix the last two correctness issues:

And then I'll push an update here that slightly modifies some of the tests (e.g. changing some assert parameters like check_order=False, skipping some tests) where the chunked streaming executor is expected to behave differently.

…nfigure-partitions

github-actions bot assigned TomAugspurger Jun 12, 2025

github-actions bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels Jun 12, 2025

github-project-automation bot added this to cuDF Python Jun 12, 2025

GPUtester moved this to In Progress in cuDF Python Jun 12, 2025

TomAugspurger added tests Unit testing for project non-breaking Non-breaking change labels Jun 13, 2025

TomAugspurger changed the title ~~Configurable blocksize mode for streaming executor~~ Configurable blocksize mode for streaming executor in unit tests Jun 13, 2025

This was referenced Jun 13, 2025

Add fallback to HStack lowering in cudf-polars #19163

Merged

Fix Literal partitioning in cudf-polars #19160

Merged

TomAugspurger added 2 commits June 16, 2025 11:29

Merge remote-tracking branch 'upstream/branch-25.08' into tom/test-co…

188ac24

…nfigure-partitions

skip some row order checks in join

9f1f078

This was referenced Jun 16, 2025

[FEA] Better control over the output dtype in aggregations #15852

Open

[BUG]: NA values incorrectly filled with False in String ops with streaming executor and multiple partitions #19148

Closed

Merge remote-tracking branch 'upstream/branch-25.08' into tom/test-co…

8741a94

…nfigure-partitions

TomAugspurger mentioned this pull request Jun 18, 2025

Fix slicing after Join and GroupBy in streaming cudf-polars #19187

Merged

3 tasks

skip problematic join().slice()

4e26686

TomAugspurger added 2 commits June 24, 2025 07:54

Merge remote-tracking branch 'upstream/branch-25.08' into tom/test-co…

58ae53f

…nfigure-partitions

fixups

b391fad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configurable blocksize mode for streaming executor in unit tests #19146

Configurable blocksize mode for streaming executor in unit tests #19146

Uh oh!

TomAugspurger commented Jun 12, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Jun 12, 2025

Uh oh!

TomAugspurger commented Jun 12, 2025 •

edited

Loading

Uh oh!

TomAugspurger commented Jun 18, 2025

Uh oh!

Uh oh!

Configurable blocksize mode for streaming executor in unit tests #19146

Are you sure you want to change the base?

Configurable blocksize mode for streaming executor in unit tests #19146

Uh oh!

Conversation

TomAugspurger commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Jun 12, 2025

Uh oh!

TomAugspurger commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger commented Jun 18, 2025

Uh oh!

Uh oh!

TomAugspurger commented Jun 12, 2025 •

edited

Loading

TomAugspurger commented Jun 12, 2025 •

edited

Loading