Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-37383][flink-examples]Correct throttling logic on ThrottledIterator #26203

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rafaelzimmermann
Copy link

@rafaelzimmermann rafaelzimmermann commented Feb 24, 2025

What is the purpose of the change

The throttle function was updating its last batch check time before the sleep operation, causing it to underestimate the elapsed time and allow approximately double the intended throughput rate.

Brief change log

  • Moved the timestamp update to after the sleep to ensure the elapsed time calculation properly accounts for the full duration between batches, maintaining the configured rate limit.
  • Added injectable time supplier and sleep function for better testing to improve code maintainability
  • Added tests to make sure the changes work as intended

Verifying this change

This change added tests and can be verified as follows:

  • Added test coverage to ThrottledIterator to verify:
    • invalid elements per second
    • consistent window size
    • non-serializable source scenarios

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@flinkbot
Copy link
Collaborator

flinkbot commented Feb 24, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@rafaelzimmermann rafaelzimmermann changed the title Correct throttling logic on ThrottledIterator [FLINK-37383]Correct throttling logic on ThrottledIterator Feb 25, 2025
@rafaelzimmermann rafaelzimmermann changed the title [FLINK-37383]Correct throttling logic on ThrottledIterator [FLINK-37383][flink-examples]Correct throttling logic on ThrottledIterator Feb 25, 2025
Copy link
Contributor

@davidradl davidradl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit tests

@rafaelzimmermann rafaelzimmermann force-pushed the fix-ThrottledIterator-last-batch-check-time-update branch from 8f82f6e to 600c9c2 Compare February 27, 2025 06:40
@rafaelzimmermann
Copy link
Author

Please add unit tests

Adding tests would require refactoring the time-dependent code, which might increase complexity for what's meant to be just an example class. Would you prefer to keep the implementation simple, or should I proceed with the refactoring?

@rafaelzimmermann rafaelzimmermann force-pushed the fix-ThrottledIterator-last-batch-check-time-update branch 3 times, most recently from 892cd0a to 3bf85ee Compare February 28, 2025 10:04
@rafaelzimmermann
Copy link
Author

Please add unit tests

Tests added

@rafaelzimmermann rafaelzimmermann force-pushed the fix-ThrottledIterator-last-batch-check-time-update branch 4 times, most recently from c1cfd9d to f9edfd3 Compare March 5, 2025 11:07
@rafaelzimmermann
Copy link
Author

@flinkbot run azure

@rafaelzimmermann rafaelzimmermann force-pushed the fix-ThrottledIterator-last-batch-check-time-update branch 2 times, most recently from a924f44 to d772d70 Compare March 5, 2025 18:18
The throttle function was updating its last batch check time before
the sleep operation, causing it to underestimate the elapsed time
and allow approximately double the intended throughput rate.

Moving the timestamp update to after the sleep ensures the elapsed
time calculation properly accounts for the full duration between
batches, maintaining the configured rate limit.

The commit refactors ThrottledIterator by:

Adding injectable time supplier and sleep function for better testing
Improving code maintainability with functional interfaces
This change makes the code more testable and reliable while maintaining
existing functionality.

Add test coverage for ThrottledIterator edge cases

Adds test coverage for invalid elements per second, consistent window
size, and non-serializable source scenarios in ThrottledIterator tests.
@rafaelzimmermann rafaelzimmermann force-pushed the fix-ThrottledIterator-last-batch-check-time-update branch from d772d70 to 422a59d Compare March 6, 2025 05:16
@rafaelzimmermann
Copy link
Author

@flinkbot run azure

@rafaelzimmermann
Copy link
Author

rafaelzimmermann commented Mar 6, 2025

@davidradl The PR is ready for another review round.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants