You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've asked in the forum, and searched for a way to achieve this, but couldn't : it'd be useful to be able to suspend allocation rescheduling for a job (and then be able to mark it for re-schedule again later).
Use-cases
My main use case is this : suppose you have a 3 (or more) node service for high availability (common examples would be a postgres replica, an ES cluster, a MongoDB replica set etc.), each allocation with its own CSI volume (per_alloc=true). Now, I need to do some maintenance task on the storage (for example, migrate the data to a new volume). I would like to do it one allocation at a time, so the service stays up the whole time. For the last alloc it's easy : I can scale the group down, do my maintenance, then scale it up again. But for the first two alloc, there's no way. If I stop alloc 0, it'll be immediatly re-scheduled. So you have to take the service down.
Attempted Solutions
I tried to run a maintenance job with an access to the volume of alloc-0 of such a job : as expected, the job is waiting for a claim on the volume. Now, I stop alloc-0 of the job, hopping my maintenance job would acquire the claim, and block the real alloc-0 to be started again. But it's not working : the "real" alloc-0 always wins the race and get the claim on the volume.
The text was updated successfully, but these errors were encountered:
Proposal
I've asked in the forum, and searched for a way to achieve this, but couldn't : it'd be useful to be able to suspend allocation rescheduling for a job (and then be able to mark it for re-schedule again later).
Use-cases
My main use case is this : suppose you have a 3 (or more) node service for high availability (common examples would be a postgres replica, an ES cluster, a MongoDB replica set etc.), each allocation with its own CSI volume (per_alloc=true). Now, I need to do some maintenance task on the storage (for example, migrate the data to a new volume). I would like to do it one allocation at a time, so the service stays up the whole time. For the last alloc it's easy : I can scale the group down, do my maintenance, then scale it up again. But for the first two alloc, there's no way. If I stop alloc 0, it'll be immediatly re-scheduled. So you have to take the service down.
Attempted Solutions
I tried to run a maintenance job with an access to the volume of alloc-0 of such a job : as expected, the job is waiting for a claim on the volume. Now, I stop alloc-0 of the job, hopping my maintenance job would acquire the claim, and block the real alloc-0 to be started again. But it's not working : the "real" alloc-0 always wins the race and get the claim on the volume.
The text was updated successfully, but these errors were encountered: