Karpenter does not respect matchLabelKeys in topologySpreadConstraints #1569

hamishforbes · 2024-08-12T22:55:34Z

Description

Observed Behavior:

Performing a rolling update on a Deployment with topologySpreadConstraints configured.
Karpenter has calculated that a new pod should be scheduled to a particular node, however that node violates the spread constraints.

Pod was only successfully scheduled once i manually free'd space on a node in the correct zone.

I had a quick look over the topology code and it doesn't look like any of the tests take into account the matchLabelKeys field.

Looking at the state of my cluster, it does appear that karpenter was trying to schedule into the zone that would reduce skew based on all pods across the 2 replicasets, not matching the pod-template-hash label

Heres what my spread constraints look like:

spec:
  replicas: 6
  topologySpreadConstraints:
  - labelSelector:
      matchLabels:
        name: scala-api-web
    matchLabelKeys:
    - pod-template-hash
    maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule

Pod Events

> k describe pod scala-api-web-59ccdf6896-m8xck
  Normal   Nominated         115s (x2 over 4m5s)    karpenter          Pod should schedule on: nodeclaim/default-2hdfm, node/ip-172-30-137-27.eu-west-1.compute.internal
  Warning  FailedScheduling  2s (x8 over 12m)       default-scheduler  0/29 nodes are available: 1 node(s) didn't match pod topology spread constraints, 27 Insufficient memory, 28 Insufficient cpu. preemption: 0/29 nodes are available: 1 node(s) didn't satisfy existing pods anti-affinity rules, 14 Insufficient cpu, 23 Insufficient memory, 3 node(s) didn't match pod topology spread constraints.
  Normal   Scheduled         6s                     default-scheduler  Successfully assigned alpha/scala-api-web-59ccdf6896-m8xck to ip-172-30-119-255.eu-west-1.compute.internal

Karpenter logs (they're in JSON and logged to Loki, i can pull the raw logs if you need them though):

The relevant node zones:

> k get node -o=custom-columns="Name:.metadata.name,Zone:.metadata.labels.topology\.kubernetes\.io/zone"
Name                                          Zone
ip-172-30-137-27.eu-west-1.compute.internal   eu-west-1c
ip-172-30-119-255.eu-west-1.compute.internal   eu-west-1b

Pod distribution for only the new ReplicaSet, as you can see eu-west-1b is where the pending pod needs to be scheduled

Pod distribution for the deployment as a whole, where it looks like 1c is the correct zone

This is a little annoying to reproduce because (afaik) the ReplicaSet controller picks a random pod to terminate from the scaling down RS, so you do not always get into this scenario where the skew across all pods is different from the skew on the scaling up RS

Expected Behavior:

Correctly determine which node(s) a pod can be scheduled to taking into account the matchLabelKeys in topologySpreadConstraints

Reproduction Steps (Please include YAML):
Create a Deployment with topologySpreadConstraints using matchLabelKeys.
Perform a rolling restart
Hope the ReplicaSet controller randomly kills pods from the old RS such that the overall skew indicates scheduling to the wrong zone compared to skew on the new RS...

Versions:

Chart Version: 0.37.0
Kubernetes Version (kubectl version): 1.30

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-08-12T22:55:42Z

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jigisha620 · 2024-08-13T23:22:27Z

Hi @hamishforbes,
We currently don't support matchLabelKeys. This has been called out in our documentation - https://karpenter.sh/v0.37/upgrading/compatibility/
We also have an open issue to track this - #430
Do you mind closing the issue that you created in favour of #430? Feel free to add your use case to the open issue so that we can prioritize this.

hamishforbes · 2024-08-14T00:14:28Z

🤦‍♂️ Not sure how I missed that, maybe I was searching in the aws/karpenter-provider-aws repo.
Yes that issue covers it, will close.

hamishforbes added the kind/bug Categorizes issue or PR as related to a bug. label Aug 12, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 12, 2024

hamishforbes closed this as completed Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Karpenter does not respect matchLabelKeys in topologySpreadConstraints #1569

Karpenter does not respect matchLabelKeys in topologySpreadConstraints #1569

hamishforbes commented Aug 12, 2024

k8s-ci-robot commented Aug 12, 2024

jigisha620 commented Aug 13, 2024

hamishforbes commented Aug 14, 2024

Karpenter does not respect matchLabelKeys in topologySpreadConstraints #1569

Karpenter does not respect matchLabelKeys in topologySpreadConstraints #1569

Comments

hamishforbes commented Aug 12, 2024

Description

k8s-ci-robot commented Aug 12, 2024

jigisha620 commented Aug 13, 2024

hamishforbes commented Aug 14, 2024