Quota Exceed causes continuation of rollout despite incomplete rollout of one statefulset

The issue is the logical error in deducing the full readiness of a StatefulSet during updates.

In the code, `hasStatefulSetNotReadyPods` only checks whether all the pods created for a StatefulSet are ready (By checking Status, and then listing the pods and checking their Status). This logic works fine when all the pods can be created successfully for a StatefulSet, but when they can't (e.g. when there isn't enough quota for the pods), it misbehaves in the way that even though there might be a number of uncreated replicas of a StatefulSet, it considers that StatefulSet to be successfully rolled out, and proceeds with the next StatefulSet.

I try to demonstrate this by an example which could be used to reproduce the issue:

`sts1`, `sts2`, `sts3` are 3 StatefulSet in a rollout group each having 3 replicas. They are already up to date and all ready.
each replica has a cpu.limit of 6 cores, and the namespace has a hard limit quota of cpu.limit=56 (at this point cpu.limit used=54).

Now, there is an update to the StatefulSets to increase their cpu.limit to 7.

rollout-operator starts with updating `sts1` and deletes the pods of `sts1`. `sts1-0` and `sts1-1` are created but `sts1-2` cannot (since cpu.limit used=50 and cpu.limit needed to create `sts1-2` is 7 - 57 > 56)

Now, after `sts1-0` and `sts1-2` are ready, rollout-operator considers rollout of `sts1` as finished, so it moves on to the next StatefulSet which is totally a misbehaviour.


My proposal for the fix is to use `Spec.Replicas` instead of `Status.Replicas` in `hasStatefulSetNotReadyPods` so that it returns true when some pods are not created. Because a not created pod is still a not ready pod. The related PR is https://github.com/grafana/rollout-operator/pull/245.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quota Exceed causes continuation of rollout despite incomplete rollout of one statefulset #246

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quota Exceed causes continuation of rollout despite incomplete rollout of one statefulset #246

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions