Skip to content

fix: single statement lock-and-fetch unresolved task from first batch…#811

Open
strelchm wants to merge 1 commit intokagkarlsson:masterfrom
strelchm:single-stat-lock-and-fetch-unresolved-task-dead-state-fixed
Open

fix: single statement lock-and-fetch unresolved task from first batch…#811
strelchm wants to merge 1 commit intokagkarlsson:masterfrom
strelchm:single-stat-lock-and-fetch-unresolved-task-dead-state-fixed

Conversation

@strelchm
Copy link
Copy Markdown
Contributor

Added unpicking of unresolved tasks in single‑statement LOCK_AND_FETCH mode

When a task type becomes unresolved (e.g., not registered in the current scheduler instance due to a rolling update where new versions don't have the task yet), the compensation batch query unpicks those tasks so they can be processed by other scheduler instances that have the task type available (this behavior is already built-in for FETCH and generic LOCK_AND_FETCH modes out of the box).

Implementation Details

Compensation query in single statement LOCK_AND_FETCH mode:

  • Added a transaction. The first query is existed SELECT FOR UPDATE SKIP LOCKED that picks a task. The second query unpicks unresolved tasks if any exist. If there are no unresolved executions, the second query is skipped and only the transaction commit is performed.
  • Allows other scheduler instances to pick up unresolved tasks

Unpick operation (unpickPickedBatch repository method):

  • Uses a batch update to mark tasks as unpicked (picked = false with version increment).
    If any of the unpicked tasks is not found or has a mismatched version, the method throws an exception.

Tests Added

Several additional tests are added to TaskResolverTest for better coverage.

Fixes

#804

Reminders

  • Added/ran automated tests
  • Ran mvn spotless:apply

cc @kagkarlsson

}

private List<Execution> lockAndFetchSingleStatement(Instant now, int limit) {
return jdbcRunner.inTransaction(
Copy link
Copy Markdown
Contributor Author

@strelchm strelchm Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some tradeoff to add a transaction instead of one query. I think atomicity is better here — the second query will be executed only in rare unresolved task cases, and the transaction impact will be just the second (for batch) roundtrip.

Copy link
Copy Markdown

@OptimumCode OptimumCode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@strelchm Thanks for making the PR! We are currently experiencing a problem with false-positive dead executions because the lock-and-fetch strategy does not unpick unresolved tasks (when they were grabbed by an old worker that does not yet have the task definition). And this PR is just what we need!

partitioningBy(
execution -> taskResolver.isUnresolved(execution.getTaskName())));

List<Execution> unresolvedExecutions = allCandidatesByIsUnresolved.get(true);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: We could use Boolean.TRUE to avoid unnecessary boxing here (and Boolean.FALSE below)

Copy link
Copy Markdown
Contributor Author

@strelchm strelchm Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks! Fixed.

@strelchm strelchm force-pushed the single-stat-lock-and-fetch-unresolved-task-dead-state-fixed branch from d721c99 to df851ee Compare April 25, 2026 21:27
@strelchm strelchm requested a review from OptimumCode April 25, 2026 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants