Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter does not respect matchLabelKeys in topologySpreadConstraints #1569

Closed
hamishforbes opened this issue Aug 12, 2024 · 3 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@hamishforbes
Copy link

Description

Observed Behavior:

Performing a rolling update on a Deployment with topologySpreadConstraints configured.
Karpenter has calculated that a new pod should be scheduled to a particular node, however that node violates the spread constraints.

Pod was only successfully scheduled once i manually free'd space on a node in the correct zone.

I had a quick look over the topology code and it doesn't look like any of the tests take into account the matchLabelKeys field.

Looking at the state of my cluster, it does appear that karpenter was trying to schedule into the zone that would reduce skew based on all pods across the 2 replicasets, not matching the pod-template-hash label

Heres what my spread constraints look like:

spec:
  replicas: 6
  topologySpreadConstraints:
  - labelSelector:
      matchLabels:
        name: scala-api-web
    matchLabelKeys:
    - pod-template-hash
    maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule

Pod Events

> k describe pod scala-api-web-59ccdf6896-m8xck
  Normal   Nominated         115s (x2 over 4m5s)    karpenter          Pod should schedule on: nodeclaim/default-2hdfm, node/ip-172-30-137-27.eu-west-1.compute.internal
  Warning  FailedScheduling  2s (x8 over 12m)       default-scheduler  0/29 nodes are available: 1 node(s) didn't match pod topology spread constraints, 27 Insufficient memory, 28 Insufficient cpu. preemption: 0/29 nodes are available: 1 node(s) didn't satisfy existing pods anti-affinity rules, 14 Insufficient cpu, 23 Insufficient memory, 3 node(s) didn't match pod topology spread constraints.
  Normal   Scheduled         6s                     default-scheduler  Successfully assigned alpha/scala-api-web-59ccdf6896-m8xck to ip-172-30-119-255.eu-west-1.compute.internal

Karpenter logs (they're in JSON and logged to Loki, i can pull the raw logs if you need them though):
Screenshot 2024-08-13 at 10 47 43

The relevant node zones:

> k get node -o=custom-columns="Name:.metadata.name,Zone:.metadata.labels.topology\.kubernetes\.io/zone"
Name                                          Zone
ip-172-30-137-27.eu-west-1.compute.internal   eu-west-1c
ip-172-30-119-255.eu-west-1.compute.internal   eu-west-1b

Pod distribution for only the new ReplicaSet, as you can see eu-west-1b is where the pending pod needs to be scheduled
Screenshot 2024-08-13 at 10 50 11

Pod distribution for the deployment as a whole, where it looks like 1c is the correct zone
Screenshot 2024-08-13 at 10 50 07

This is a little annoying to reproduce because (afaik) the ReplicaSet controller picks a random pod to terminate from the scaling down RS, so you do not always get into this scenario where the skew across all pods is different from the skew on the scaling up RS

Expected Behavior:

Correctly determine which node(s) a pod can be scheduled to taking into account the matchLabelKeys in topologySpreadConstraints

Reproduction Steps (Please include YAML):
Create a Deployment with topologySpreadConstraints using matchLabelKeys.
Perform a rolling restart
Hope the ReplicaSet controller randomly kills pods from the old RS such that the overall skew indicates scheduling to the wrong zone compared to skew on the new RS...

Versions:

  • Chart Version: 0.37.0
  • Kubernetes Version (kubectl version): 1.30
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@hamishforbes hamishforbes added the kind/bug Categorizes issue or PR as related to a bug. label Aug 12, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 12, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jigisha620
Copy link
Contributor

Hi @hamishforbes,
We currently don't support matchLabelKeys. This has been called out in our documentation - https://karpenter.sh/v0.37/upgrading/compatibility/
We also have an open issue to track this - #430
Do you mind closing the issue that you created in favour of #430? Feel free to add your use case to the open issue so that we can prioritize this.

@hamishforbes
Copy link
Author

🤦‍♂️ Not sure how I missed that, maybe I was searching in the aws/karpenter-provider-aws repo.
Yes that issue covers it, will close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

3 participants