Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to suspend allocation reschedule #25339

Open
dani opened this issue Mar 11, 2025 · 0 comments
Open

Ability to suspend allocation reschedule #25339

dani opened this issue Mar 11, 2025 · 0 comments

Comments

@dani
Copy link

dani commented Mar 11, 2025

Proposal

I've asked in the forum, and searched for a way to achieve this, but couldn't : it'd be useful to be able to suspend allocation rescheduling for a job (and then be able to mark it for re-schedule again later).

Use-cases

My main use case is this : suppose you have a 3 (or more) node service for high availability (common examples would be a postgres replica, an ES cluster, a MongoDB replica set etc.), each allocation with its own CSI volume (per_alloc=true). Now, I need to do some maintenance task on the storage (for example, migrate the data to a new volume). I would like to do it one allocation at a time, so the service stays up the whole time. For the last alloc it's easy : I can scale the group down, do my maintenance, then scale it up again. But for the first two alloc, there's no way. If I stop alloc 0, it'll be immediatly re-scheduled. So you have to take the service down.

Attempted Solutions

I tried to run a maintenance job with an access to the volume of alloc-0 of such a job : as expected, the job is waiting for a claim on the volume. Now, I stop alloc-0 of the job, hopping my maintenance job would acquire the claim, and block the real alloc-0 to be started again. But it's not working : the "real" alloc-0 always wins the race and get the claim on the volume.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

1 participant