Skip to content

Leverage new pylibcudf grouped_range_rolling_window for cuDF classic rolling(window: int) #19162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mroeschke
Copy link
Contributor

Description

Toward #18709

Does not cover windows that are BaseIndexer subclasses or timedeltas

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke self-assigned this Jun 13, 2025
@mroeschke mroeschke requested a review from a team as a code owner June 13, 2025 19:55
@mroeschke mroeschke added the Python Affects Python cuDF API. label Jun 13, 2025
@mroeschke mroeschke requested review from bdice and rjzamora June 13, 2025 19:55
@mroeschke mroeschke added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jun 13, 2025
@GPUtester GPUtester moved this to In Progress in cuDF Python Jun 13, 2025
@mroeschke mroeschke requested a review from a team as a code owner June 13, 2025 21:21
@github-actions github-actions bot added the cudf.pandas Issues specific to cudf.pandas label Jun 16, 2025
def _window_to_window_sizes(self, window):
if is_integer(window):
return cudautils.grouped_window_sizes_from_offset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete this function from cudautils now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still used when doing a groupby.rolling with a frequency-like window. I think grouped_range_rolling_window should be able to handle this case if formulated right so I can look into it in a follow up

)
if not is_integer(window):
gb_size = groupby.size().sort_index()
self._group_starts = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it a bit hard to follow why _group_starts is getting set this way, and that led me down a bit of a rabbit hole. Do you think it would be cleaner to move input validation from __init__ into separate helper functions so that RollingGroupby and Rolling could have completely separate __init__ functions that call helpers, and then inline the code for converting windows to sizes? The current way that _normalize and _window_to_window_sizes is set up seems to improve code reuse but at the significant expense of readability.

Out of scope for this PR, but a suggestion for future improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed there's an opportunity here to make things clearer. Additionally, we may not even need this if we can convert this to purely use pylibcudf as described in #19162 (comment)

@mroeschke
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 61f2dad into rapidsai:branch-25.08 Jun 25, 2025
94 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Jun 25, 2025
@mroeschke mroeschke deleted the ref/cudf/window/grouped_range_window branch June 25, 2025 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf.pandas Issues specific to cudf.pandas improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants