Support new topologySpread scheduling constraints #430

tzneal · 2023-08-01T13:28:02Z

Description

Observed Behavior:

Karpenter doesn't support:

matchLabelKeys
nodeAffinityPolicy
nodeTaintsPolicy

These fields were introduced into beta in 1.27.

Expected Behavior:

Reproduction Steps (Please include YAML):

Versions:

Chart Version:
Kubernetes Version (kubectl version):

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

sadath-12 · 2023-10-03T12:57:03Z

I'll work on this issue

jmdeal · 2023-12-06T21:27:12Z

/assign

jmdeal · 2023-12-06T21:28:26Z

/remove-help

ilkinmammadzada · 2024-02-06T17:21:25Z

What is current behaviour of the karpenter for nodeAffinityPolicy?

jmdeal · 2024-02-06T17:42:31Z

Currently Karpenter doesn't recognize either nodeAffinityPolicy, so it will always be treated as honor. Originally the linked PR was blocked on some 1.29 features and I haven't had the time to do the performance testing since we got unblocked on 1.29 testing. Those features have since been scoped out but the linked PR will add support for nodeAffinityPolicy, nodeTaintPolicy, and matchLabelKeys.

dominik-dezordo-vc · 2024-05-03T07:58:57Z

Hey everyone! Thanks for putting in the effort to support these new constraints! We just hit this exact issue in our clusters, where we are using matchLabelKeys together with topologySpreadConstraints to spread our pods across AZs.

I was wondering if you currently have a timeline for getting this PR merged? It seems the code is already ready for some time and it is just waiting for a rebase/performance test?

jmdeal · 2024-05-03T16:30:00Z

This has been backlogged for me for a little while, but I should have some bandwidth to get this wrapped out within the next week. Like you said, it really should only be performance testing and a rebase at this point that's left.

dominik-dezordo-vc · 2024-05-03T17:07:39Z

Amazing, thanks for the fast response. Really appreciate the work you are doing with karpenter!

pznamensky · 2024-06-03T11:02:59Z

We are also really awaiting for matchLabelKeys as now topologySpread is applied to all versions of a deployment which sometimes works not as optimal as we would like.

abrazouski · 2024-06-13T10:02:50Z

Hello everyone
In my company, we heavily rely on the matchLabelKeys setting and we've also encountered similar issues when new pods of the rollout were in the Pending because Karpenter didn't respect the matchLabelKeys and made the calculations taking into account pods of the previous revision. It caused an incident during the night preventing us from applying hotfixes to the production workloads.

Could you please share the ETA of this feature?

hamishforbes · 2024-08-14T00:29:28Z

Hi, the lack of matchLabelKeys support, in particular, is also causing problems for us.

Is there anything we can do to help, test a custom build of Karpenter that supports this feature or something?

k8s-triage-robot · 2024-11-12T00:45:01Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

JeromeAtZoox · 2024-11-28T06:37:36Z

/remove-lifecycle stale

hrishi-hugo · 2024-12-17T15:15:44Z

Hi, any update on when this feature may be available ?

JacobHenner · 2025-01-09T14:29:43Z

Currently Karpenter doesn't recognize either nodeAffinityPolicy, so it will always be treated as honor.

I've observed different behavior, similar to what was described in this closed issue.

In my case, I have a zonal topologySpreadConstraint. There are nodes for all three zones (A, B, C) in the cluster, but only nodes in zones A and B (not C) match a label that's also set as a nodeSelector on a particular Deployment. This label is set by Karpenter by being included in NodePools, and those NodePools are associated exclusively with an EC2NodeClass that can only provision nodes in zones A and B.

The zonal topologySpreadConstraint has maxSkew=1, and whenUnsatisfiable=DoNotSchedule. Suppose I scale up this Deployment from 2 to 3, and assume replica 1 is in zone A and replica 2 is in zone B - the new replica should qualify for either zone A or zone B, and zone C should be disregarded because no existing or provisionable instances in zone C will match the required nodeSelector label. It appears as if kube-scheduler understands this, but Karpenter does not. If there is available capacity in A or B, kube-scheduler will properly schedule the pod. If there is no available capacity in A or B, Karpenter should provision a new instance, but instead it emits an error event no instance type satisfied resources ... and requirements ... topology.kubernetes.io/zone In [C] (no instance type met the scheduling requirements or had a required offering);, delaying the scheduling of the pod. This should not happen, since nothing indicated that C was a valid zone - including live instances matching the required nodeSelector (a form of affinity), or a NodePool/EC2NodeClass combination.

What's most confusing here is the difference in behavior between how kube-scheduler and Karpenter is applying the topologySpreadConstraint. aws/karpenter-provider-aws#3397 proposed adding an explicit zonal affinity rule as a workaround, but this should not be required since it's not required by kube-scheduler. Many users will not realize that the additional affinity rule is required (only for Karpenter), leading to surprise operational issues.

tzneal added the kind/bug Categorizes issue or PR as related to a bug. label Aug 1, 2023

ellistarn added kind/feature Categorizes issue or PR as related to a new feature. v1 Issues requiring resolution by the v1 milestone and removed kind/bug Categorizes issue or PR as related to a bug. labels Aug 7, 2023

jonathan-innis changed the title ~~Support new scheduling constraints~~ Support new topologySpread scheduling constraints Aug 10, 2023

billrayburn added the help-wanted label Sep 27, 2023

jonathan-innis assigned sadath-12 Oct 5, 2023

sadath-12 mentioned this issue Oct 9, 2023

feat: Supporting new topologySpread scheduling constraints #580

Closed

k8s-ci-robot added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed help-wanted labels Nov 22, 2023

k8s-ci-robot assigned jmdeal Dec 6, 2023

k8s-ci-robot removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Dec 6, 2023

jmdeal linked a pull request Dec 8, 2023 that will close this issue

feat: support new topologySpread scheduling constraints #852

Open

jigisha620 mentioned this issue Aug 13, 2024

Karpenter does not respect matchLabelKeys in topologySpreadConstraints #1569

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 12, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 28, 2024

prad9192 mentioned this issue Jan 6, 2025

The concurrent pod scheduling stuck with topologySpreadConstraints(AZs) aws/karpenter-provider-aws#6447

Open

JacobHenner mentioned this issue Jan 9, 2025

karpenter not provisioning new nodes with proper capacity-spread label aws/karpenter-provider-aws#3397

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support new topologySpread scheduling constraints #430

Support new topologySpread scheduling constraints #430

tzneal commented Aug 1, 2023 •

edited by jonathan-innis

Loading

sadath-12 commented Oct 3, 2023

jmdeal commented Dec 6, 2023

jmdeal commented Dec 6, 2023

ilkinmammadzada commented Feb 6, 2024 •

edited

Loading

jmdeal commented Feb 6, 2024

dominik-dezordo-vc commented May 3, 2024

jmdeal commented May 3, 2024

dominik-dezordo-vc commented May 3, 2024

pznamensky commented Jun 3, 2024

abrazouski commented Jun 13, 2024 •

edited

Loading

hamishforbes commented Aug 14, 2024

k8s-triage-robot commented Nov 12, 2024

JeromeAtZoox commented Nov 28, 2024

hrishi-hugo commented Dec 17, 2024

JacobHenner commented Jan 9, 2025

Support new topologySpread scheduling constraints #430

Support new topologySpread scheduling constraints #430

Comments

tzneal commented Aug 1, 2023 • edited by jonathan-innis Loading

Description

sadath-12 commented Oct 3, 2023

jmdeal commented Dec 6, 2023

jmdeal commented Dec 6, 2023

ilkinmammadzada commented Feb 6, 2024 • edited Loading

jmdeal commented Feb 6, 2024

dominik-dezordo-vc commented May 3, 2024

jmdeal commented May 3, 2024

dominik-dezordo-vc commented May 3, 2024

pznamensky commented Jun 3, 2024

abrazouski commented Jun 13, 2024 • edited Loading

hamishforbes commented Aug 14, 2024

k8s-triage-robot commented Nov 12, 2024

JeromeAtZoox commented Nov 28, 2024

hrishi-hugo commented Dec 17, 2024

JacobHenner commented Jan 9, 2025

tzneal commented Aug 1, 2023 •

edited by jonathan-innis

Loading

ilkinmammadzada commented Feb 6, 2024 •

edited

Loading

abrazouski commented Jun 13, 2024 •

edited

Loading