Karpenter does not respect matchLabelKeys in topologySpreadConstraints #1569
Labels
kind/bug
Categorizes issue or PR as related to a bug.
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
Description
Observed Behavior:
Performing a rolling update on a Deployment with
topologySpreadConstraints
configured.Karpenter has calculated that a new pod should be scheduled to a particular node, however that node violates the spread constraints.
Pod was only successfully scheduled once i manually free'd space on a node in the correct zone.
I had a quick look over the topology code and it doesn't look like any of the tests take into account the
matchLabelKeys
field.Looking at the state of my cluster, it does appear that karpenter was trying to schedule into the zone that would reduce skew based on all pods across the 2 replicasets, not matching the pod-template-hash label
Heres what my spread constraints look like:
Pod Events
Karpenter logs (they're in JSON and logged to Loki, i can pull the raw logs if you need them though):
The relevant node zones:
Pod distribution for only the new ReplicaSet, as you can see eu-west-1b is where the pending pod needs to be scheduled
Pod distribution for the deployment as a whole, where it looks like 1c is the correct zone
This is a little annoying to reproduce because (afaik) the ReplicaSet controller picks a random pod to terminate from the scaling down RS, so you do not always get into this scenario where the skew across all pods is different from the skew on the scaling up RS
Expected Behavior:
Correctly determine which node(s) a pod can be scheduled to taking into account the
matchLabelKeys
intopologySpreadConstraints
Reproduction Steps (Please include YAML):
Create a Deployment with
topologySpreadConstraints
usingmatchLabelKeys
.Perform a rolling restart
Hope the ReplicaSet controller randomly kills pods from the old RS such that the overall skew indicates scheduling to the wrong zone compared to skew on the new RS...
Versions:
kubectl version
): 1.30The text was updated successfully, but these errors were encountered: