Skip to content

Quota Exceed causes continuation of rollout despite incomplete rollout of one statefulset #246

@emadolsky

Description

@emadolsky

The issue is the logical error in deducing the full readiness of a StatefulSet during updates.

In the code, hasStatefulSetNotReadyPods only checks whether all the pods created for a StatefulSet are ready (By checking Status, and then listing the pods and checking their Status). This logic works fine when all the pods can be created successfully for a StatefulSet, but when they can't (e.g. when there isn't enough quota for the pods), it misbehaves in the way that even though there might be a number of uncreated replicas of a StatefulSet, it considers that StatefulSet to be successfully rolled out, and proceeds with the next StatefulSet.

I try to demonstrate this by an example which could be used to reproduce the issue:

sts1, sts2, sts3 are 3 StatefulSet in a rollout group each having 3 replicas. They are already up to date and all ready.
each replica has a cpu.limit of 6 cores, and the namespace has a hard limit quota of cpu.limit=56 (at this point cpu.limit used=54).

Now, there is an update to the StatefulSets to increase their cpu.limit to 7.

rollout-operator starts with updating sts1 and deletes the pods of sts1. sts1-0 and sts1-1 are created but sts1-2 cannot (since cpu.limit used=50 and cpu.limit needed to create sts1-2 is 7 - 57 > 56)

Now, after sts1-0 and sts1-2 are ready, rollout-operator considers rollout of sts1 as finished, so it moves on to the next StatefulSet which is totally a misbehaviour.

My proposal for the fix is to use Spec.Replicas instead of Status.Replicas in hasStatefulSetNotReadyPods so that it returns true when some pods are not created. Because a not created pod is still a not ready pod. The related PR is #245.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions