Skip to content

Commit 97081ca

Browse files
committed
WIP: KEP-127: Update PRR for beta
Signed-off-by: Rodrigo Campos <[email protected]>
1 parent 487f64d commit 97081ca

File tree

1 file changed

+133
-4
lines changed

1 file changed

+133
-4
lines changed

keps/sig-node/127-user-namespaces/README.md

+133-4
Original file line numberDiff line numberDiff line change
@@ -764,6 +764,15 @@ This section must be completed when targeting beta to a release.
764764

765765
###### How can a rollout or rollback fail? Can it impact already running workloads?
766766

767+
The rollout is just a feature flag on the kubelet and the kube-apiserver.
768+
769+
If one API server is upgraded while others aren't, then that API server might accept pods while
770+
others might reject those pods using the pod.spec.HostUsers field.
771+
772+
On a rollback, pods created while the feature was active (created with user namespaces) will have to
773+
be restarted to be re-created without user namespaces. Just a re-creation of the pod will do the
774+
trick.
775+
767776
<!--
768777
Try to be as paranoid as possible - e.g., what if some components will restart
769778
mid-rollout?
@@ -776,21 +785,34 @@ will rollout across nodes.
776785

777786
###### What specific metrics should inform a rollback?
778787

788+
On Kubernetes side, the kubelet should start correctly.
789+
790+
On the node runtime side, a pod created with pod.spec.HostUsers=false should be running fine if all
791+
node requirements are met.
779792
<!--
780793
What signals should users be paying attention to when the feature is young
781794
that might indicate a serious problem?
782795
-->
783796

784797
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
785798

799+
Yes.
800+
801+
We tested to enable the feature flag, create a deployment with pod.spec.HostUsers=false, and then disable
802+
the feature flag and restart the kubelet and kube-apiserver.
803+
804+
After that, we deleted the deployment pods, the pods were re-created without user namespaces just
805+
fine, without any modification needed on the deployment yaml.
786806
<!--
807+
TODO: rata. Test this!
787808
Describe manual testing that was done and the outcomes.
788809
Longer term, we may want to require automated upgrade/rollback tests, but we
789810
are missing a bunch of machinery and tooling and can't do that now.
790811
-->
791812

792813
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
793814

815+
No.
794816
<!--
795817
Even if applying deprecation policies, they may still surprise some users.
796818
-->
@@ -806,6 +828,7 @@ previous answers based on experience in the field.
806828

807829
###### How can an operator determine if the feature is in use by workloads?
808830

831+
Check if any pod has the pod.spec.HostUsers field set to false.
809832
<!--
810833
Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
811834
checking if there are objects with field X set) may be a last resort. Avoid
@@ -814,6 +837,11 @@ logs or events for this purpose.
814837

815838
###### How can someone using this feature know that it is working for their instance?
816839

840+
Check if any pod has the pod.spec.HostUsers field set to false and is running correctly on a node
841+
that meets all the requirements.
842+
843+
There are ste-by-step examples in the Kubernetes documentation too.
844+
817845
<!--
818846
For instance, if this is a pod-related feature, it should be possible to determine if the feature is functioning properly
819847
for each individual pod.
@@ -828,11 +856,18 @@ Recall that end users cannot usually observe component logs or access metrics.
828856
- [ ] API .status
829857
- Condition name:
830858
- Other field:
831-
- [ ] Other (treat as last resort)
832-
- Details:
859+
- [x] Other (treat as last resort)
860+
- Details: check pods with pod.spec.HostUsers field set to false, and see if they are running
861+
fine.
833862

834863
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
835864

865+
If a node meets all the requirements, there should be no change to existing SLO/SLIs.
866+
867+
If a container runtime wants to support old kernels, it can have a performance impact, though. For
868+
more details, see the question:
869+
"Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?"
870+
836871
<!--
837872
This is your opportunity to define what "normal" quality of service looks like
838873
for a feature.
@@ -850,6 +885,7 @@ question.
850885

851886
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
852887

888+
No new SLI needed for this feature.
853889
<!--
854890
Pick one more of these and delete the rest.
855891
-->
@@ -863,6 +899,17 @@ Pick one more of these and delete the rest.
863899

864900
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
865901

902+
No.
903+
904+
This feature is using yet another namespace when creating a pod. If the pod creation fails (by
905+
an error on the kubelet or returned by the container runtime), a clear error is returned to the
906+
user.
907+
908+
A metric like "errors returned in pods with user namespaces enabled" can be very noisy, as the error
909+
can be completely unrelated (image pull secret errors, configmap referenced and not defined, any
910+
other container runtime error, etc.). We can't see any metric that can be helpful, as the user has a
911+
very direct feedback already.
912+
866913
<!--
867914
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
868915
implementation difficulties, etc.).
@@ -876,6 +923,19 @@ This section must be completed when targeting beta to a release.
876923

877924
###### Does this feature depend on any specific services running in the cluster?
878925

926+
Yes:
927+
928+
- [CRI version]
929+
- Usage description: CRI changes done in k8s 1.27 are needed
930+
- Impact of its outage on the feature: minimal, feature will be ignored by runtimes using an
931+
older version.
932+
- Impact of its degraded performance or high-error rates on the feature: N/A.
933+
934+
- [Linux kernel]
935+
- Usage description: Linux 6.3 or higher
936+
- Impact of its outage on the feature: pod creation will return an error.
937+
- Impact of its degraded performance or high-error rates on the feature: N/A.
938+
879939
<!--
880940
Think about both cluster-level services (e.g. metrics-server) as well
881941
as node-level agents (e.g. specific version of CRI). Focus on external or
@@ -1028,8 +1088,8 @@ and validate the declared limits?
10281088

10291089
The kubelet is spliting the host UID/GID space for different pods, to use for
10301090
their user namespace mapping. The design allows for 65k pods per node, and the
1031-
resource is limited to maxPods per node (currently maxPods defaults to 110, it is unlikely we will
1032-
reach the host limit).
1091+
resource is limited to maxPods per node (currently maxPods defaults to 110, it
1092+
is unlikely we will reach 65k).
10331093

10341094
For container runtimes, they might use more disk space or inodes to chown the
10351095
rootfs. This is if they chose to support this feature without relying on new
@@ -1056,8 +1116,77 @@ details). For now, we leave it here.
10561116

10571117
###### How does this feature react if the API server and/or etcd is unavailable?
10581118

1119+
No changes to current kubelet behaviors. The feature only uses kubelet-local information.
1120+
10591121
###### What are other known failure modes?
10601122

1123+
- Some filesystem used by the pod doesn't support idmap mounts on the kernel used.
1124+
- Detection: How can it be detected via metrics? Stated another way:
1125+
how can an operator troubleshoot without logging into a master or worker node?
1126+
1127+
Just see the pod events, it fails with:
1128+
1129+
Warning Failed 2s (x2 over 4s) kubelet, 127.0.0.1 Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: failed to fulfil mount request: failed to set MOUNT_ATTR_IDMAP on /var/lib/kubelet/pods/f037a704-742c-40fe-8dbf-17ed9225c4df/volumes/kubernetes.io~empty-dir/hugepage: invalid argument (maybe the source filesystem doesn't support idmap mounts on this kernel?): unknown
1130+
1131+
Note the "maybe the source filesystem doesn't support idmap mounts on this kernel?" part.
1132+
1133+
- Mitigations: What can be done to stop the bleeding, especially for already
1134+
running user workloads?
1135+
1136+
Remove the pod.spec.HostUsers field or disable the feature gate.
1137+
1138+
- Diagnostics: What are the useful log messages and their required logging
1139+
levels that could help debug the issue?
1140+
Not required until feature graduated to beta.
1141+
1142+
1143+
- Testing: Are there any tests for failure mode? If not, describe why.
1144+
1145+
TODO: rata.
1146+
1147+
- Error getting the userns IDs range configuration
1148+
- Detection: How can it be detected via metrics? Stated another way:
1149+
how can an operator troubleshoot without logging into a master or worker node?
1150+
1151+
Pod errors
1152+
1153+
- Mitigations: What can be done to stop the bleeding, especially for already
1154+
running user workloads?
1155+
1156+
Disable feature flag
1157+
1158+
- Diagnostics: What are the useful log messages and their required logging
1159+
levels that could help debug the issue?
1160+
Not required until feature graduated to beta.
1161+
1162+
TODO
1163+
1164+
- Testing: Are there any tests for failure mode? If not, describe why.
1165+
1166+
TODO
1167+
1168+
- Error to save/read pod mappings
1169+
- Detection: How can it be detected via metrics? Stated another way:
1170+
how can an operator troubleshoot without logging into a master or worker node?
1171+
1172+
- Mitigations: What can be done to stop the bleeding, especially for already
1173+
running user workloads?
1174+
- Diagnostics: What are the useful log messages and their required logging
1175+
levels that could help debug the issue?
1176+
Not required until feature graduated to beta.
1177+
- Testing: Are there any tests for failure mode? If not, describe why.
1178+
1179+
- Other errors
1180+
- Detection: How can it be detected via metrics? Stated another way:
1181+
how can an operator troubleshoot without logging into a master or worker node?
1182+
- Mitigations: What can be done to stop the bleeding, especially for already
1183+
running user workloads?
1184+
- Diagnostics: What are the useful log messages and their required logging
1185+
levels that could help debug the issue?
1186+
Not required until feature graduated to beta.
1187+
- Testing: Are there any tests for failure mode? If not, describe why.
1188+
1189+
10611190
<!--
10621191
For each of them, fill in the following information by copying the below template:
10631192
- [Failure mode brief description]

0 commit comments

Comments
 (0)