Ability to suspend allocation reschedule #25339

dani · 2025-03-11T08:20:22Z

Proposal

I've asked in the forum, and searched for a way to achieve this, but couldn't : it'd be useful to be able to suspend allocation rescheduling for a job (and then be able to mark it for re-schedule again later).

Use-cases

My main use case is this : suppose you have a 3 (or more) node service for high availability (common examples would be a postgres replica, an ES cluster, a MongoDB replica set etc.), each allocation with its own CSI volume (per_alloc=true). Now, I need to do some maintenance task on the storage (for example, migrate the data to a new volume). I would like to do it one allocation at a time, so the service stays up the whole time. For the last alloc it's easy : I can scale the group down, do my maintenance, then scale it up again. But for the first two alloc, there's no way. If I stop alloc 0, it'll be immediatly re-scheduled. So you have to take the service down.

Attempted Solutions

I tried to run a maintenance job with an access to the volume of alloc-0 of such a job : as expected, the job is waiting for a claim on the volume. Now, I stop alloc-0 of the job, hopping my maintenance job would acquire the claim, and block the real alloc-0 to be started again. But it's not working : the "real" alloc-0 always wins the race and get the claim on the volume.

dani added the type/enhancement label Mar 11, 2025

jrasell added this to Nomad - Community Issues Triage Mar 11, 2025

github-project-automation bot moved this to Needs Triage in Nomad - Community Issues Triage Mar 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to suspend allocation reschedule #25339

Ability to suspend allocation reschedule #25339

dani commented Mar 11, 2025

Ability to suspend allocation reschedule #25339

Ability to suspend allocation reschedule #25339

Comments

dani commented Mar 11, 2025

Proposal

Use-cases

Attempted Solutions