Skip to content

Conversation

@valeriy42
Copy link
Contributor

Backports the following commits to 9.1:

Fixes issue where calendar events failed to update some jobs when associated with large numbers of jobs (>1000) due to queue capacity limits and sequential processing.

Problem: UpdateJobProcessNotifier has a 1000-item queue and processes updates sequentially. It uses offer() on the queue, which silently drops updates when the queue is full.

However, calendar/filter updates don't need ordering guarantees. Hence, JobManager.submitJobEventUpdate() can bypass the queue and avoid the bottleneck of the queue size.

Another problem is the "fire-and-forget" pattern: submitJobEventUpdate() returns immediately without waiting for the update to complete. I introduce RefCountingListener to track the calendar updates. We start a background thread that updates the jobs and tracks succeeded, failed, and skipped jobs, while the request is returned immediately to prevent a timeout.

Finally, if the problem with failed job updates persists, I enhanced the logging throughout the system to create a trace for future diagnostics.

Refactor JobManager.submitJobEventUpdate() to bypass UpdateJobProcessNotifier queue
Use RefCountingListener for parallel calendar/filter updates
Add comprehensive logging throughout the system
Create CalendarScalabilityIT integration tests
Add helper methods to base test class
@valeriy42 valeriy42 added :ml Machine learning >bug auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport cloud-deploy Publish cloud docker image for Cloud-First-Testing Team:ML Meta label for the ML team labels Nov 13, 2025
@elasticsearchmachine elasticsearchmachine merged commit c569568 into elastic:9.1 Nov 13, 2025
35 of 36 checks passed
@valeriy42 valeriy42 deleted the backport/9.1/pr-136886 branch November 13, 2025 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport >bug cloud-deploy Publish cloud docker image for Cloud-First-Testing :ml Machine learning Team:ML Meta label for the ML team v9.1.8

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants