Skip to content

Commit 3766329

Browse files
ratagiuseppe
andcommitted
KEP-127: Address more review comments
Co-authored-by: Giuseppe Scrivano <[email protected]> Signed-off-by: Giuseppe Scrivano <[email protected]> Signed-off-by: Rodrigo Campos <[email protected]>
1 parent fc5ea6e commit 3766329

File tree

1 file changed

+65
-17
lines changed

1 file changed

+65
-17
lines changed

keps/sig-node/127-user-namespaces/README.md

+65-17
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@
3939
- [GA](#ga)
4040
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
4141
- [Version Skew Strategy](#version-skew-strategy)
42+
- [Kubelet and Kube-apiserver skew](#kubelet-and-kube-apiserver-skew)
43+
- [Kubelet and container runtime skews](#kubelet-and-container-runtime-skews)
4244
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
4345
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
4446
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
@@ -612,7 +614,7 @@ use container runtime versions that have the needed changes.
612614

613615
##### critests
614616

615-
- For Alpha, the feature is tested for containerd and CRI-O in cri-tools repo using critest to
617+
- For Beta, the feature is tested for containerd and CRI-O in cri-tools repo using critest to
616618
make sure the specified user namespace configuration is honored.
617619

618620
- <test>: <link to test coverage>
@@ -630,6 +632,9 @@ use container runtime versions that have the needed changes.
630632

631633
- Gather and address feedback from the community
632634
- Be able to configure UID/GID ranges to use for pods
635+
- Add unit tests that exercise the feature gate switch (see section "Are there
636+
any tests for feature enablement/disablement?")
637+
- Add cri-tools test
633638
- This feature is not supported on Windows.
634639
- Get review from VM container runtimes maintainers (not blocker, as VM runtimes should just ignore
635640
the field, but nice to have)
@@ -670,6 +675,26 @@ enhancement:
670675
CRI or CNI may require updating that component before the kubelet.
671676
-->
672677

678+
#### Kubelet and Kube-apiserver skew
679+
680+
The apiserver and kubelet feature gate enablement work fine in any combination:
681+
682+
1. If the apiserver has the feature gate enabled and the kubelet doesn't, then the pod will show
683+
that field and the kubelet will ignore it. Then, the pod is created without user namespaces.
684+
2. If the apiserver has the feature gate disabled and the kubelet enabled, the pod won't show this
685+
field and therefore the kubelet won't act on a field that isn't shown. The pod is created without
686+
user namespaces.
687+
688+
The kubelet can still create pods with user namespaces if static-pods are configured with
689+
pod.spec.hostUsers and has the feature gate enabled.
690+
691+
If the kube-apiserver doesn't support the feature at all (< 1.25), a pod with userns will be
692+
rejected.
693+
694+
If the kubelet doesn't support the feature (< 1.25), it will ignore the pod.spec.hostUsers field.
695+
696+
#### Kubelet and container runtime skews
697+
673698
Some definitions first:
674699
- New kubelet: kubelet with CRI proto files that includes the changes proposed in
675700
this KEP.
@@ -794,6 +819,9 @@ We will also unit test that, if pods were created with the new field
794819
pod.specHostUsers, then if the featuregate is disabled all works as expected (no
795820
user namespace is used).
796821

822+
We will add tests exercising the `switch` of feature gate itself (what happens
823+
if I disable a feature gate after having objects written with the new field)
824+
797825
<!--
798826
The e2e framework does not currently support enabling or disabling feature
799827
gates. However, unit tests in each component dealing with managing data, created
@@ -815,16 +843,18 @@ This section must be completed when targeting beta to a release.
815843

816844
###### How can a rollout or rollback fail? Can it impact already running workloads?
817845

818-
The rollout is just a feature flag on the kubelet and the kube-apiserver.
846+
If one APIserver is upgraded while other's aren't and you are talking to a not
847+
upgraded one, the pod will be accepted (if the apiserver is >= 1.25, rejected if
848+
< 1.25).
819849

820-
If one APIserver is upgraded while other's aren't and you are talking to a not upgraded the pod
821-
will be accepted (if the apiserver is >= 1.25). If it is scheduled to a node that the kubelet has
822-
the feature flag activated and the node meets the requirements to use user namespaces, then the
823-
pod will be created with the namespace. If it is scheduled to a node that has the feature disabled,
824-
it will be created without the user namespace.
850+
If it is scheduled to a node where the kubelet has the feature flag activated
851+
and the node meets the requirements to use user namespaces, then the pod will be
852+
created with the namespace. If it is scheduled to a node that has the feature
853+
disabled, it will be created without the user namespace.
825854

826-
On a rollback, pods created while the feature was active (created with user namespaces) will have to
827-
be re-created without user namespaces.
855+
On a rollback, pods created while the feature was active (created with user
856+
namespaces) will have to be re-created to run without user namespaces. If those
857+
weren't recreated, they will continue to run in a user namespace.
828858

829859
<!--
830860
Try to be as paranoid as possible - e.g., what if some components will restart
@@ -841,8 +871,24 @@ will rollout across nodes.
841871
On Kubernetes side, the kubelet should start correctly.
842872

843873
On the node runtime side, a pod created with pod.spec.hostUsers=false should be on RUNNING state if
844-
all node requirements are met. If the CRI runtime or the handler do not support the feature, the kubelet
845-
returns an error.
874+
all node requirements are met. If the CRI runtime or the handler do not support the feature, the
875+
kubelet returns an error.
876+
877+
When a pod hits this error returned by the kubelet, the status in `kubectl` is shown as
878+
`ContainerCreating` and the pod events shows:
879+
880+
```
881+
Warning FailedCreatePodSandBox 12s (x23 over 5m6s) kubelet Failed to create pod sandbox: user namespaces is not supported by the runtime
882+
```
883+
884+
The following kubelet metrics are useful to check:
885+
- `kubelet_running_pods`: Shows the actual number of pods running
886+
- `kubelet_desired_pods`: The number of pods the kubelet is _trying_ to run
887+
888+
If these metrics are very different, it means there are desired pods that can't be set to running.
889+
If that is the case, checking the pod events to see if they are failing for user namespaces reasons
890+
(like the errors shown in this KEP) is advised, in which case it is recommended to rollback or
891+
disable the feature gate.
846892

847893
<!--
848894
What signals should users be paying attention to when the feature is young
@@ -899,7 +945,7 @@ logs or events for this purpose.
899945

900946
###### How can someone using this feature know that it is working for their instance?
901947

902-
If the runtime doesn't support user namespaces an error is returned by the kubelet.
948+
If the runtime doesn't support user namespaces an error is returned by the kubelet and the pod cannot be created.
903949

904950
There are step-by-step examples in the Kubernetes documentation too: https://kubernetes.io/docs/tasks/configure-pod-container/user-namespaces/
905951

@@ -1201,7 +1247,8 @@ No changes to current kubelet behaviors. The feature only uses kubelet-local inf
12011247
levels that could help debug the issue?
12021248
Not required until feature graduated to beta.
12031249

1204-
The error is returned on pod-creation, no need to search for logs.
1250+
The Kubelet will get an error from the runtime and will propagate it to the pod (visible on
1251+
the pod events).
12051252

12061253
The idmap mount is created by the OCI runtime, not at the kubelet layer. At the kubelet layer, this
12071254
is just another OCI runtime error.
@@ -1231,7 +1278,8 @@ No changes to current kubelet behaviors. The feature only uses kubelet-local inf
12311278
- Detection: How can it be detected via metrics? Stated another way:
12321279
how can an operator troubleshoot without logging into a master or worker node?
12331280

1234-
Errors are returned on pod creation, directly to the user. No need to use metrics.
1281+
Errors are returned on pod creation, directly to the user (visible on the pod events). No
1282+
need to use metrics.
12351283

12361284
See the pod events, it should contain something like:
12371285

@@ -1249,7 +1297,7 @@ No changes to current kubelet behaviors. The feature only uses kubelet-local inf
12491297
levels that could help debug the issue?
12501298
Not required until feature graduated to beta.
12511299

1252-
No extra logs, the error is returned to the user.
1300+
No extra logs, the error is returned to the user (visible in the pod events).
12531301

12541302
- Testing: Are there any tests for failure mode? If not, describe why.
12551303

@@ -1262,8 +1310,8 @@ writing to this file.
12621310
- Detection: How can it be detected via metrics? Stated another way:
12631311
how can an operator troubleshoot without logging into a master or worker node?
12641312

1265-
Errors are returned to the operation failed (like pod creation), no need to see metrics nor
1266-
logs.
1313+
Errors are returned to the operation failed (like pod creation, visible on the pod events),
1314+
no need to see metrics nor logs.
12671315

12681316
Errors are returned to the either on:
12691317
* Kubelet initialization: the initialization fails if the feature gate is active and there is a

0 commit comments

Comments
 (0)