Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support new topologySpread scheduling constraints #430

Open
tzneal opened this issue Aug 1, 2023 · 15 comments · May be fixed by #852
Open

Support new topologySpread scheduling constraints #430

tzneal opened this issue Aug 1, 2023 · 15 comments · May be fixed by #852
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. v1 Issues requiring resolution by the v1 milestone

Comments

@tzneal
Copy link
Contributor

tzneal commented Aug 1, 2023

Description

Observed Behavior:

Karpenter doesn't support:

  • matchLabelKeys
  • nodeAffinityPolicy
  • nodeTaintsPolicy

These fields were introduced into beta in 1.27.

Expected Behavior:

Reproduction Steps (Please include YAML):

Versions:

  • Chart Version:
  • Kubernetes Version (kubectl version):
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@tzneal tzneal added the kind/bug Categorizes issue or PR as related to a bug. label Aug 1, 2023
@ellistarn ellistarn added kind/feature Categorizes issue or PR as related to a new feature. v1 Issues requiring resolution by the v1 milestone and removed kind/bug Categorizes issue or PR as related to a bug. labels Aug 7, 2023
@jonathan-innis jonathan-innis changed the title Support new scheduling constraints Support new topologySpread scheduling constraints Aug 10, 2023
@sadath-12
Copy link
Contributor

I'll work on this issue

@k8s-ci-robot k8s-ci-robot added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed help-wanted labels Nov 22, 2023
@jmdeal
Copy link
Member

jmdeal commented Dec 6, 2023

/assign

@jmdeal
Copy link
Member

jmdeal commented Dec 6, 2023

/remove-help

@k8s-ci-robot k8s-ci-robot removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Dec 6, 2023
@jmdeal jmdeal linked a pull request Dec 8, 2023 that will close this issue
@ilkinmammadzada
Copy link

ilkinmammadzada commented Feb 6, 2024

What is current behaviour of the karpenter for nodeAffinityPolicy?

@jmdeal
Copy link
Member

jmdeal commented Feb 6, 2024

Currently Karpenter doesn't recognize either nodeAffinityPolicy, so it will always be treated as honor. Originally the linked PR was blocked on some 1.29 features and I haven't had the time to do the performance testing since we got unblocked on 1.29 testing. Those features have since been scoped out but the linked PR will add support for nodeAffinityPolicy, nodeTaintPolicy, and matchLabelKeys.

@dominik-dezordo-vc
Copy link

Hey everyone! Thanks for putting in the effort to support these new constraints! We just hit this exact issue in our clusters, where we are using matchLabelKeys together with topologySpreadConstraints to spread our pods across AZs.

I was wondering if you currently have a timeline for getting this PR merged? It seems the code is already ready for some time and it is just waiting for a rebase/performance test?

@jmdeal
Copy link
Member

jmdeal commented May 3, 2024

This has been backlogged for me for a little while, but I should have some bandwidth to get this wrapped out within the next week. Like you said, it really should only be performance testing and a rebase at this point that's left.

@dominik-dezordo-vc
Copy link

Amazing, thanks for the fast response. Really appreciate the work you are doing with karpenter!

@pznamensky
Copy link

We are also really awaiting for matchLabelKeys as now topologySpread is applied to all versions of a deployment which sometimes works not as optimal as we would like.

@abrazouski
Copy link

abrazouski commented Jun 13, 2024

Hello everyone
In my company, we heavily rely on the matchLabelKeys setting and we've also encountered similar issues when new pods of the rollout were in the Pending because Karpenter didn't respect the matchLabelKeys and made the calculations taking into account pods of the previous revision. It caused an incident during the night preventing us from applying hotfixes to the production workloads.

Could you please share the ETA of this feature?

@hamishforbes
Copy link

Hi, the lack of matchLabelKeys support, in particular, is also causing problems for us.

Is there anything we can do to help, test a custom build of Karpenter that supports this feature or something?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 12, 2024
@JeromeAtZoox
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 28, 2024
@hrishi-hugo
Copy link

Hi, any update on when this feature may be available ?

@JacobHenner
Copy link

Currently Karpenter doesn't recognize either nodeAffinityPolicy, so it will always be treated as honor.

I've observed different behavior, similar to what was described in this closed issue.

In my case, I have a zonal topologySpreadConstraint. There are nodes for all three zones (A, B, C) in the cluster, but only nodes in zones A and B (not C) match a label that's also set as a nodeSelector on a particular Deployment. This label is set by Karpenter by being included in NodePools, and those NodePools are associated exclusively with an EC2NodeClass that can only provision nodes in zones A and B.

The zonal topologySpreadConstraint has maxSkew=1, and whenUnsatisfiable=DoNotSchedule. Suppose I scale up this Deployment from 2 to 3, and assume replica 1 is in zone A and replica 2 is in zone B - the new replica should qualify for either zone A or zone B, and zone C should be disregarded because no existing or provisionable instances in zone C will match the required nodeSelector label. It appears as if kube-scheduler understands this, but Karpenter does not. If there is available capacity in A or B, kube-scheduler will properly schedule the pod. If there is no available capacity in A or B, Karpenter should provision a new instance, but instead it emits an error event no instance type satisfied resources ... and requirements ... topology.kubernetes.io/zone In [C] (no instance type met the scheduling requirements or had a required offering);, delaying the scheduling of the pod. This should not happen, since nothing indicated that C was a valid zone - including live instances matching the required nodeSelector (a form of affinity), or a NodePool/EC2NodeClass combination.

What's most confusing here is the difference in behavior between how kube-scheduler and Karpenter is applying the topologySpreadConstraint. aws/karpenter-provider-aws#3397 proposed adding an explicit zonal affinity rule as a workaround, but this should not be required since it's not required by kube-scheduler. Many users will not realize that the additional affinity rule is required (only for Karpenter), leading to surprise operational issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. v1 Issues requiring resolution by the v1 milestone
Projects
None yet