kep-4622: promote topologyManager policy: max-allowable-numa-nodes to GA #5166

cyclinder · 2025-02-12T14:39:45Z

One-line PR description: topologyMnagaer policy: max-allowable-numa-nodes to GA(1.33)

Issue link: KEP-4622: Add a TopologyManager policy option for MaxAllowableNUMANodes #4622

Other comments:
- it was created in 1.31 as a beta feature: KEP-4622: New TopologyManager Policy: max-allowable-numa-nodes #4624
- The GA graduation would mean:
  - the max-allowable-numa-nodes option will graduate to GA

k8s-ci-robot · 2025-02-12T14:39:55Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cyclinder
Once this PR has been reviewed and has the lgtm label, please assign johnbelamaric for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ffromani

initial review. We're very close to the deadline, but let's try to have this in

ffromani · 2025-02-12T14:46:35Z

keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/README.md


 [kubernetes.io]: https://kubernetes.io/
 [kubernetes/enhancements]: https://git.k8s.io/enhancements
-[kubernetes/kubernetes]: https://git.k8s.io/kubernetes


why was this removed?

vscode remove it😅, revert it now

ffromani · 2025-02-12T14:46:54Z

keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/README.md

-For GA:
-
- degrading the node and checking the node is reported as degraded
-


why was this removed?

I kinda don't remember the context here, do we need this e2e to test it?

that was about making the state of the feature observable to others. We already has issues reported in this area: kubernetes/kubernetes#131738

With the new sig-arch requirements effective 1.34, if we do this for GA we need to have another beta

[EDIT] point is: exactly like this issue demonstrate, we have this performance issue already. So the slowdown is not caused by the maximum allowed, but rather by how the topology manager currently handles machines with high NUMA zone count. In hindsight, I don't think degrading the node is something that add values because is not related to this setting. Actually the degradation, or the signal in general, seems to be better represented by the existing metric about the admission time. I can only imagine as possible improvement to extend that metric.

ffromani · 2025-02-12T14:47:15Z

keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/README.md

 #### GA

- Add a metrics: `kubelet_topology_manager_admission_time`.
+- Add a metric: `kubelet_topology_manager_admission_time`.


please add a note documenting we can use an existing metric

We already have this metric in the kubelet code.

ffromani · 2025-02-12T14:47:36Z

keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/README.md

+  - Feature gate name:
    - `TopologyManagerPolicyBetaOptions`
-    - `TopologyManagerPolicyOptions`
+    - `TopologyManagerPolicyOptions` - going to be locked to true once GA


no need to mention the locking, it's the standard process for each feature

ffromani · 2025-02-12T14:48:04Z

keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/kep.yaml

 # List the feature gate name and the components for which it must be enabled
 feature-gates:
  - name: "TopologyManagerPolicyBetaOptions"
-    components:


why was this removed?

I'm curious too, maybe vscode's markdown linter removed it, revert it now.

mrunalp · 2025-02-13T01:09:42Z

I am supportive of graduating features to GA but not sure if we will get PRR in time as this came in late.

ffromani · 2025-02-13T09:40:43Z

I am supportive of graduating features to GA but not sure if we will get PRR in time as this came in late.

same. I'll still be helping here with the reviews and cleaning up the feature, but if we miss the deadline (as unfortunately seems likely) is not too bad: we can prepare and be ready for a early 1.34 merge.

k8s-triage-robot · 2025-05-14T10:35:24Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ffromani · 2025-05-14T10:37:49Z

/remove-lifecycle stale

ffromani · 2025-05-14T10:38:02Z

/retitle kep-4622: promote topologyManager policy: max-allowable-numa-nodes to GA

kannon92 · 2025-05-16T14:57:26Z

keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/README.md

-  - Feature gate name: 
+- [X] Feature gate (also fill in values in `kep.yaml`)
+  - Feature gate name:
    - `TopologyManagerPolicyBetaOptions`


@ffromani what should one do here for GA policies with TopologyManager?

I take it that this is no longer toggleable via beta options?

I think we should remove TopologyManagerPolicyBetaOptions , leaving only TopologyManagerPolicyOptions

ffromani

from sig-node the feature itself is done. I will defer observability concerns to PRR review

ffromani · 2025-06-12T11:04:32Z

keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/README.md

-For GA:
-
- degrading the node and checking the node is reported as degraded
-


that was about making the state of the feature observable to others. We already has issues reported in this area: kubernetes/kubernetes#131738

With the new sig-arch requirements effective 1.34, if we do this for GA we need to have another beta

[EDIT] point is: exactly like this issue demonstrate, we have this performance issue already. So the slowdown is not caused by the maximum allowed, but rather by how the topology manager currently handles machines with high NUMA zone count. In hindsight, I don't think degrading the node is something that add values because is not related to this setting. Actually the degradation, or the signal in general, seems to be better represented by the existing metric about the admission time. I can only imagine as possible improvement to extend that metric.

ffromani · 2025-06-12T11:05:32Z

keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/README.md

 #### GA

- Add a metrics: `kubelet_topology_manager_admission_time`.
+- An existing metric: `kubelet_topology_manager_admission_time` can be used.


please check pkg/kubelet/metrics/metrics.go. I think it is topology_manager_admission_duration_ms

Thanks, you are right, see https://github.com/kubernetes/kubernetes/blob/0154f8a222c38d5808f1aecf218379cedf15e8a7/pkg/kubelet/metrics/metrics.go#L125C46-L125C84

ffromani · 2025-06-12T11:05:54Z

keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/README.md

 -->

-We have a metric which records the topology manager admission time: `kubelet_topology_manager_admission_time`.
+We have an existing metric which records the topology manager admission time: `kubelet_topology_manager_admission_time`.


ditto about the metric name

ffromani · 2025-06-12T11:05:59Z

keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/README.md

 previous answers based on experience in the field.
 -->
-We add a metric: `kubelet_topology_manager_admission_time` for kubelet, which can be used to check if the setting is causing unacceptable performance drops.
+An existing metric: `kubelet_topology_manager_admission_time` for kubelet  can be used to check if the setting is causing unacceptable performance drops.


ffromani

@cyclinder could you kindly check you are using the latest KEP template? https://github.com/kubernetes/enhancements/tree/master/keps/NNNN-kep-template

swatisehgal

Overall looks good. Just curious about the current state of e2e test for this feature.

Do we already have tests in the codebase to ensure that this policy option works as expected and is compatible with other Topology Manager policy options or planned as part of GA graduation?

cyclinder · 2025-06-16T11:55:35Z

I can only imagine as possible improvement to extend that metric.

Agree, do you mean that we need another metric for this option? We already have an existing metric: topology_manager_admission_duration_ms now.

Do we already have tests in the codebase to ensure that this policy option works as expected and is compatible with other Topology Manager policy options or planned as part of GA graduation?

We already have an e2e test for this in kubernetes/kubernetes#124148. see https://github.com/kubernetes/kubernetes/blob/0154f8a222c38d5808f1aecf218379cedf15e8a7/test/e2e_node/topology_manager_test.go#L257

SergeyKanzhelev · 2025-06-16T21:24:39Z

keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/kep.yaml

  - "@klueska"
  - "@ffromani"
 approvers:
  - "@sig-node-tech-leads"


should be @klueska to match the OWNERS file in this directory

Thank you for the review. updated.

ffromani

I think we can have some e2e tests which happen to set this value and then run a subset of the existing topology manager tests. This should be easy, cheap enough and will give us enough coverage. I can't say if this warrants a beta2 or can be done in the context of the GA graduation.

ffromani · 2025-06-17T12:09:33Z

keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/README.md

+1.34:
+
 - promote to GA
 - cannot be disabled


This probably means LockToDefault: true. OK for me and remove in 1.35. The feature gate enables a flag disabled by default which is opt-in

ffromani · 2025-06-17T12:12:20Z

keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/README.md

 will rollout across nodes.
 -->

+Rollout or rollout fail do not impact already running workloads, only impact the new workloads.


This is a general additional comment. This is subtler. Rolling out this option would mean removing the flag from the kubelet. Depending on the hardware, this will prevent the kubelet to start. I don't think rollout concerns really apply to this specific feature.
If you have a a machine wiht 9+ NUMA nodes AND you want to use a topology manager policy which is not None, THEN you need this option.

Thanks for the review. I will update the KEP to address your comments.

ffromani · 2025-06-17T12:13:24Z

keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/README.md

+    - Testing: Are there any tests for failure mode? If not, describe why.
+-->
+
 Setting a value lower 8 causes kubelet crash.


maybe improve it like:

keeping the default value will cause the kubelet to fail to start on machines with 9 or more NUMA cells if any but the `none` topology manager policy is also configured.

Signed-off-by: Cyclinder Kuo <[email protected]>

cyclinder · 2025-06-19T00:56:49Z

I think we can have some e2e tests which happen to set this value and then run a subset of the existing topology manager tests. This should be easy, cheap enough and will give us enough coverage. I can't say if this warrants a beta2 or can be done in the context of the GA graduation.

I am open to having some other e2e tests, Do we need to do this in 1.34 or 1.35?

ffromani · 2025-06-20T06:24:31Z

I think we can have some e2e tests which happen to set this value and then run a subset of the existing topology manager tests. This should be easy, cheap enough and will give us enough coverage. I can't say if this warrants a beta2 or can be done in the context of the GA graduation.

I am open to having some other e2e tests, Do we need to do this in 1.34 or 1.35?

from what I collected, it's fair and accepted to add e2e tests in the same cycle on which we graduate to beta. But everything should be set before to move to GA.

ffromani · 2025-06-20T06:25:59Z

looks like the label lead-opt-in was missing, so this PR didn't got PRR review?
Let's identify all the gaps from sig-node perspective, work on them and aim for early trivial GA move next cycle

ffromani

according to #5242 my understanding is we need to have another beta to complete the missing e2e tests. Other than that LGTM me. Not adding the label because the PR wants to move to GA, and turns out we need a beta2 to address the gaps

ffromani · 2025-06-20T06:26:30Z

keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/README.md

+
+<!--
+This section is for explicitly listing the motivation, goals, and non-goals of
+this KEP.  Describe why the change is important and the benefits to users. The
+motivation section can optionally provide links to [experience reports] to
+demonstrate the interest in a KEP within the wider Kubernetes community.
+
+[experience reports]: https://github.com/golang/go/wiki/ExperienceReports
+-->


if you resubmit, please remove comments like this from the completed sections.

Thanks for your reminder.

cyclinder · 2025-06-27T08:27:08Z

according to #5242 my understanding is we need to have another beta to complete the missing e2e tests. Other than that LGTM me. Not adding the label because the PR wants to move to GA, and turns out we need a beta2 to address the gaps

According to my understanding, I need to submit another PR, similar to #5242, and also add an e2e test to Kubernetes. Only after these are completed can this PR receive the required label and be merged, right?

cyclinder · 2025-06-27T08:55:54Z

I think we can have some e2e tests which happen to set this value and then run a subset of the existing topology manager tests

We already have an e2e test for this in kubernetes/kubernetes#124148. see https://github.com/kubernetes/kubernetes/blob/0154f8a222c38d5808f1aecf218379cedf15e8a7/test/e2e_node/topology_manager_test.go#L1449. . This is the most basic verification that the changed setting is not breaking stuff for maxNUMANodes(16), Do we still need an additional e2e test?

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 12, 2025

k8s-ci-robot requested review from johnbelamaric and mrunalp February 12, 2025 14:39

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 12, 2025

cyclinder mentioned this pull request Feb 12, 2025

KEP-4622: Add a TopologyManager policy option for MaxAllowableNUMANodes #4622

Open

8 tasks

cyclinder force-pushed the kep4622_1 branch from f0b569f to fcc7df3 Compare February 12, 2025 14:44

ffromani reviewed Feb 12, 2025

View reviewed changes

cyclinder force-pushed the kep4622_1 branch from fcc7df3 to 1010aff Compare February 12, 2025 15:05

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 14, 2025

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 14, 2025

k8s-ci-robot changed the title ~~kep-4622: promote topologyMnagaer policy: max-allowable-numa-nodes to GA~~ kep-4622: promote topologyManager policy: max-allowable-numa-nodes to GA May 14, 2025

kannon92 reviewed May 16, 2025

View reviewed changes

cyclinder force-pushed the kep4622_1 branch from 1010aff to 68566ea Compare June 8, 2025 01:09

ffromani reviewed Jun 12, 2025

View reviewed changes

swatisehgal reviewed Jun 12, 2025

View reviewed changes

cyclinder force-pushed the kep4622_1 branch from 68566ea to 05be2a2 Compare June 16, 2025 11:45

cyclinder force-pushed the kep4622_1 branch from 05be2a2 to e07b23c Compare June 16, 2025 11:56

SergeyKanzhelev reviewed Jun 16, 2025

View reviewed changes

ffromani reviewed Jun 17, 2025

View reviewed changes

kep-4622: promote topologyMnagaer policy: max-allowable-numa-nodes to GA

b3c82a9

Signed-off-by: Cyclinder Kuo <[email protected]>

cyclinder force-pushed the kep4622_1 branch from e07b23c to b3c82a9 Compare June 19, 2025 00:52

ffromani reviewed Jun 20, 2025

View reviewed changes

		For GA:

		- degrading the node and checking the node is reported as degraded

kep-4622: promote topologyManager policy: max-allowable-numa-nodes to GA #5166

Are you sure you want to change the base?

kep-4622: promote topologyManager policy: max-allowable-numa-nodes to GA #5166

Conversation

cyclinder commented Feb 12, 2025

Uh oh!

k8s-ci-robot commented Feb 12, 2025

Uh oh!

ffromani left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyclinder Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ffromani Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrunalp commented Feb 13, 2025

Uh oh!

ffromani commented Feb 13, 2025

Uh oh!

k8s-triage-robot commented May 14, 2025

Uh oh!

ffromani commented May 14, 2025

Uh oh!

ffromani commented May 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ffromani left a comment

Choose a reason for hiding this comment

Uh oh!

ffromani Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ffromani left a comment

Choose a reason for hiding this comment

Uh oh!

swatisehgal left a comment

Choose a reason for hiding this comment

Uh oh!

cyclinder commented Jun 16, 2025

Uh oh!

cyclinder Feb 12, 2025 •

edited

Loading

ffromani Jun 12, 2025 •

edited

Loading

ffromani Jun 12, 2025 •

edited

Loading