-
Notifications
You must be signed in to change notification settings - Fork 952
Configurable blocksize mode for streaming executor in unit tests #19146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.08
Are you sure you want to change the base?
Configurable blocksize mode for streaming executor in unit tests #19146
Conversation
This adds a new test option ``--blocksize-mode`` to the test runner, which lets us easily exercise the multi-partition code paths of the streaming executor. When ``--blocksize-mode=small`` and ``--executor="streaming"`` the default behavior will be to create a GPUEngine with ``max_rows_per_partition=5`` and ``target_partition_size=10``.
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
The bad news: 196 tests fail with this. I'm starting to work through those in the issues linked from #18928. Fixes for those will probably go in separate PRs. But I wanted to put this up a little early to discuss the general approach. For now, I've added a separate run of the full In particular, I don't the interaction between that second test run and the tests that want specific control over the GPU engine / configuration. Right now those tests will be run twice with the exact same configuration ( As an alternative, we could make |
…nfigure-partitions
Quick status update here: We have two more PRs in the works that fix the last two correctness issues:
And then I'll push an update here that slightly modifies some of the tests (e.g. changing some |
…nfigure-partitions
This adds a new test option
--blocksize-mode
to the test runner, which lets us easily exercise the multi-partition code paths of the streaming executor.When
--blocksize-mode=small
and--executor="streaming"
the default behavior will be to create a GPUEngine withmax_rows_per_partition=5
(which controls the partition size from in-memory dataframes) andtarget_partition_size=10
(which controls the partition size from parquet sources).Towards #18928