Skip to content

[WIP] KAFKA-19588: Reduce number of events generated in AsyncKafkaConsumer.poll() #20363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: trunk
Choose a base branch
from

Conversation

kirktrue
Copy link
Contributor

@kirktrue kirktrue commented Aug 17, 2025

We create—and wait on—PollEvent in Consumer.poll() to ensure we wait for
reconciliation and/or auto-commit. However, reconciliation is relatively
rare, and auto-commit only happens every N seconds, so the remainder of
the time, we should try to avoid sending poll events.

kirktrue and others added 4 commits August 10, 2025 16:13
…poll()

We create—and wait on—PollEvent in Consumer.poll() to ensure we wait for reconciliation and/or auto-commit. However, reconciliation is relatively rare, and auto-commit only happens every N seconds, so the remainder of the time, we should try to avoid sending poll events.
@github-actions github-actions bot added triage PRs from the community consumer clients labels Aug 17, 2025
@kirktrue kirktrue marked this pull request as ready for review August 19, 2025 03:02
@kirktrue
Copy link
Contributor Author

@lianetm Could you add the ci-approved label so that I can see how this runs with GitHub Actions? Thanks!

Copy link
Member

@lianetm lianetm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kirktrue , took a first look, one high level concern for now

// the interval time or reconciling new assignments
applicationEventHandler.add(event);

if (reconciliationInProgress.get() || autoCommitState.shouldAutoCommit()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we end up with a race condition here if the app thread sees autoCommitState.shouldAutoCommit() false at this point (because interval hasn't expired just yet), but by the time the background checks the same when processing the poll event the interval expired?

In that case, I expect the background would trigger the auto-commit while the app thread moved onto updating positions for fetching (and that leads to a whole new set of race conditions that we already dealt with before). Basically, whatever change we introduce here to not wait on Poll, needs to ensure that we retrieve the positions to commit before moving on to update fetch positions, that's the main challenge with this change I expect. Thinking, but not sure yet how to address that if we don't wait on Poll. Thougts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @lianetm that this is opening up the risk of race conditions. However, I think the principle here is a good one. The risk part here is related to the auto-commit timer. If auto-commit is not enabled, we absolutely know that we are not racing with the auto-commit timer. If it is enabled, we are potentially in a race. So, a slight twist on this can safely optimise when auto-commit is not enabled.

Copy link
Member

@AndrewJSchofield AndrewJSchofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of trivial initial comments. I've skimmed over the PR and understand the overall flow now. I'll do a more in-depth review shortly.

* Reset the auto-commit timer to the provided time (backoff), so that the next auto-commit is
* sent out then. If auto-commit is disabled this will perform no action.
*/
void resetTimer(long retryBackoffMs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why not final long retryBackoffMs also?

this.log = logContext.logger(AutoCommitState.class);
this.timer = time.timer(autoCommitInterval);
this.autoCommitInterval = autoCommitInterval;
this.hasInflightCommit = new AtomicBoolean();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the presence of synchronized on all of these methods make the use of AtomicBoolean redundant?

@github-actions github-actions bot removed the triage PRs from the community label Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants