-
Notifications
You must be signed in to change notification settings - Fork 769
Description
Bug report
Expected behavior and actual behavior
If a task is submitted to a Google Batch Spot instance and the job is preempted, Batch should resubmit up to the value of google.batch.maxSpotAttempts. If the retried job completes, nextflow should update the task status based on the value recorded in .exitcode. If the number of retries is exceeded, then nextflow should mark that task as failed with exit status 50001.
In practice, tasks are being marked as failed with exit status 50001 if the Batch job is preempted, no matter the outcome of the retry.
I believe this bug may have been introduced by #6498.
Steps to reproduce the problem
Submit a long-running task with google-batch executor using spot instances and maxSpotAttempts > 0. If the job is preempted, it will fail with 50001 no matter the outcome of the retry.
Program output
Nextflow output (relevant line only):
Process `<omitted> (1a86b74b-cb31-4346-a030-fd30f11613a7_4)` terminated with an error exit status (50001)
Note that the value of .exitcode in the corresponding working directory is 0. Batch also reports that the job was preempted, retried, and succeeded. The job stdout also confirms that the job reached completion.
Environment
- Nextflow version:
nextflow version 25.10.3.10983 - Java version:
openjdk 17.0.15-internal 2025-04-15 - Operating system: Linux
- Bash version:
GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)
Additional context
I've also made a comment in what may be a related issue here: #6690