Skip to content

Conversation

emadolsky
Copy link

Problems such as quota exceeding can cause Status.Replicas to be lower than Spec.Replicas because the pods cannot be created. This causes unsafe rollouts and results in unavailability.

Problems such as quota exceeding can cause Status.Replicas to be
lower than Spec.Replicas because the pods cannot be created.
This causes unsafe rollouts and results in unavailability.
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this change is correct. The code you're touching is used to check whether there are non ready pods. We should read the number of pods from the actual state (status) and not the desired one (spec).

@pracucci
Copy link
Collaborator

I suggest you to open an issue to describe your specific problem, so we can better understand how we can address it.

@emadolsky
Copy link
Author

So the problem with the code is that when the status.Replicas is less than spec.replicas there is no logic to prevent rollout from continuing with other statedulsets. So do you suggest to add another function to check if the desired number of pods are already in place for a given statefulset? If that’s the case, I totally agree that it is a better approach. If not, let me know if you need more details on the issue.

@emadolsky
Copy link
Author

I am not getting any responses in here nor the issue.
@charleskorn Could you maybe take a look?

@tcp13equals2
Copy link
Contributor

Hey @emadolsky ... I have been working on a custom pod eviction handler which required me to also look for unready pods.

After discussions with @charleskorn, we went for using the max(Status.Replicas, Spec.Replicas) and then comparing this to the number of observed pods and their status.

The reason we went with using the max was to err on the side of caution in determining if there were any pods not yet reporting under the StatefulSet. The max approach also deals with both an upscale and downscale scenario.

But this is for the pod eviction logic where we are looking for any sort of disruption.

I wonder if something similar to this could be used to address this issue?

@emadolsky
Copy link
Author

Hey @tcp13equals2! The approach you mentioned will work here too for fixing the mentioned issue. But I'm not sure if it works fine for reducing the replicas of statefulsets as when the spec.Replicas is reduced, the status.Replica stays the same, so the function here returns false. Which according to the name of function (hasStatefulSetNotReadyPods ) it's correct, but we want to reach a consistent final state in a statefulSet before moving to the next one. So it makes sense to wait to ensure that we have only as many as spec.Replicas ready replicas before moving to the next one.

@emadolsky emadolsky requested a review from pracucci August 19, 2025 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants