51
51
- [ Alternatives] ( #alternatives )
52
52
- [ Don't use idmap mounts and rely chown all the files correctly] ( #dont-use-idmap-mounts-and-rely-chown-all-the-files-correctly )
53
53
- [ 64k mappings?] ( #64k-mappings )
54
- - [ Allow runtimes to pick the mapping? ] ( #allow-runtimes-to-pick-the-mapping )
54
+ - [ Allow runtimes to pick the mapping] ( #allow-runtimes-to-pick-the-mapping )
55
55
- [ Infrastructure Needed (Optional)] ( #infrastructure-needed-optional )
56
56
<!-- /toc -->
57
57
@@ -126,8 +126,8 @@ Here we use UIDs, but the same applies for GIDs.
126
126
inside the container to different IDs in the host. In particular, mapping root
127
127
inside the container to unprivileged user and group IDs in the node.
128
128
- Increase pod to pod isolation by allowing to use non-overlapping mappings
129
- (UIDs/GIDs) whenever possible. IOW, if two containers runs as user X, they run
130
- as different UIDs in the node and therefore are more isolated than today.
129
+ (UIDs/GIDs) whenever possible. In other words: if two containers runs as user
130
+ X, they run as different UIDs in the node and therefore are more isolated than today.
131
131
- Allow pods to have capabilities (e.g. ` CAP_SYS_ADMIN ` ) that are only valid in
132
132
the pod (not valid in the host).
133
133
- Benefit from the security hardening that user namespaces provide against some
@@ -291,7 +291,7 @@ message Mount {
291
291
### Support for pods
292
292
293
293
Make pods work with user namespaces. This is activated via the
294
- bool ` pod.spec.HostUsers ` .
294
+ bool ` pod.spec.hostUsers ` .
295
295
296
296
The mapping length will be 65536, mapping the range 0-65535 to the pod. This wide
297
297
range makes sure most workloads will work fine. Additionally, we don't need to
@@ -403,7 +403,7 @@ If the pod wants to read who is the owner of file `/vol/configmap/foo`, now it
403
403
will see the owner is root inside the container. This is due to the IDs
404
404
transformations that the idmap mount does for us.
405
405
406
- In other words, we can make sure the pod can read files instead of chowning them
406
+ In other words: we can make sure the pod can read files instead of chowning them
407
407
all using the host IDs the pod is mapped to, by just using an idmap mount that
408
408
has the same mapping that we use for the pod user namespace.
409
409
@@ -469,7 +469,7 @@ something else to this list:
469
469
- What about windows or VM container runtimes, that don't use linux namespaces?
470
470
We need a review from windows maintainers once we have a more clear proposal.
471
471
We can then adjust the needed details, we don't expect the changes (if any) to be big.
472
- IOW, in my head this looks like this: we merge this KEP in provisional state if
472
+ In my head this looks like this: we merge this KEP in provisional state if
473
473
we agree on the high level idea, with @giuseppe we do a PoC so we can fill-in
474
474
more details to the KEP (like CRI changes, changes to container runtimes, how to
475
475
configure kubelet ranges, etc.), and then the Windows folks can review and we
@@ -686,7 +686,7 @@ well as the [existing list] of feature gates.
686
686
-->
687
687
688
688
- [x] Feature gate (also fill in values in ` kep.yaml ` )
689
- - Feature gate name: UserNamespacesPodsSupport
689
+ - Feature gate name: UserNamespacesSupport
690
690
- Components depending on the feature gate: kubelet, kube-apiserver
691
691
692
692
###### Does enabling the feature change any default behavior?
@@ -733,7 +733,7 @@ Pods will have to be re-created to use the feature.
733
733
734
734
We will add.
735
735
736
- We will test for when the field pod.spec.HostUsers is set to true, false
736
+ We will test for when the field pod.spec.hostUsers is set to true, false
737
737
and not set. All of this with and without the feature gate enabled.
738
738
739
739
We will also unit test that, if pods were created with the new field
@@ -766,7 +766,7 @@ The rollout is just a feature flag on the kubelet and the kube-apiserver.
766
766
If one API server is upgraded while others aren't, the pod will be accepted (if the apiserver is >=
767
767
1.25). If it is scheduled to a node that the kubelet has the feature flag activated and the node
768
768
meets the requirements to use user namespaces, then the pod will be created with the namespace. If
769
- it is scheduled to a node that has the feature disabled, it will be scheduled without the user
769
+ it is scheduled to a node that has the feature disabled, it will be created without the user
770
770
namespace.
771
771
772
772
On a rollback, pods created while the feature was active (created with user namespaces) will have to
@@ -787,7 +787,7 @@ will rollout across nodes.
787
787
788
788
On Kubernetes side, the kubelet should start correctly.
789
789
790
- On the node runtime side, a pod created with pod.spec.HostUsers =false should be on RUNNING state if
790
+ On the node runtime side, a pod created with pod.spec.hostUsers =false should be on RUNNING state if
791
791
all node requirements are met.
792
792
<!--
793
793
What signals should users be paying attention to when the feature is young
@@ -798,7 +798,7 @@ that might indicate a serious problem?
798
798
799
799
Yes.
800
800
801
- We tested to enable the feature flag, create a deployment with pod.spec.HostUsers =false, and then disable
801
+ We tested to enable the feature flag, create a deployment with pod.spec.hostUsers =false, and then disable
802
802
the feature flag and restart the kubelet and kube-apiserver.
803
803
804
804
After that, we deleted the deployment pods (not the deployment object), the pods were re-created
@@ -830,7 +830,7 @@ previous answers based on experience in the field.
830
830
831
831
###### How can an operator determine if the feature is in use by workloads?
832
832
833
- Check if any pod has the pod.spec.HostUsers field set to false.
833
+ Check if any pod has the pod.spec.hostUsers field set to false.
834
834
<!--
835
835
Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
836
836
checking if there are objects with field X set) may be a last resort. Avoid
@@ -839,7 +839,7 @@ logs or events for this purpose.
839
839
840
840
###### How can someone using this feature know that it is working for their instance?
841
841
842
- Check if any pod has the pod.spec.HostUsers field set to false and is on RUNNING state on a node
842
+ Check if any pod has the pod.spec.hostUsers field set to false and is on RUNNING state on a node
843
843
that meets all the requirements.
844
844
845
845
There are step-by-step examples in the Kubernetes documentation too.
@@ -859,7 +859,7 @@ Recall that end users cannot usually observe component logs or access metrics.
859
859
- Condition name:
860
860
- Other field:
861
861
- [x] Other (treat as last resort)
862
- - Details: check pods with pod.spec.HostUsers field set to false, and see if they are in RUNNING
862
+ - Details: check pods with pod.spec.hostUsers field set to false, and see if they are in RUNNING
863
863
state.
864
864
865
865
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
@@ -1135,7 +1135,7 @@ No changes to current kubelet behaviors. The feature only uses kubelet-local inf
1135
1135
- Mitigations: What can be done to stop the bleeding, especially for already
1136
1136
running user workloads?
1137
1137
1138
- Remove the pod.spec.HostUsers field or disable the feature gate.
1138
+ Remove the pod.spec.hostUsers field or disable the feature gate.
1139
1139
1140
1140
- Diagnostics: What are the useful log messages and their required logging
1141
1141
levels that could help debug the issue?
@@ -1183,7 +1183,7 @@ No changes to current kubelet behaviors. The feature only uses kubelet-local inf
1183
1183
- Mitigations: What can be done to stop the bleeding, especially for already
1184
1184
running user workloads?
1185
1185
1186
- Remove the pod.spec.HostUsers field or disable the feature gate.
1186
+ Remove the pod.spec.hostUsers field or disable the feature gate.
1187
1187
1188
1188
- Diagnostics: What are the useful log messages and their required logging
1189
1189
levels that could help debug the issue?
@@ -1217,7 +1217,7 @@ writing to this file.
1217
1217
- Mitigations: What can be done to stop the bleeding, especially for already
1218
1218
running user workloads?
1219
1219
1220
- Remove the pod.spec.HostUsers field or disable the feature gate.
1220
+ Remove the pod.spec.hostUsers field or disable the feature gate.
1221
1221
1222
1222
- Diagnostics: What are the useful log messages and their required logging
1223
1223
levels that could help debug the issue?
@@ -1233,12 +1233,11 @@ writing to this file.
1233
1233
There are no tests for failures to read or write the file, the code-paths just return the errors
1234
1234
in those cases.
1235
1235
1236
-
1237
1236
- Error getting the kubelet IDs range configuration
1238
1237
- Detection: How can it be detected via metrics? Stated another way:
1239
1238
how can an operator troubleshoot without logging into a master or worker node?
1240
1239
1241
- In this case the Kubelet will fail to start with a clear error message.
1240
+ In this case the kubelet will fail to start with a clear error message.
1242
1241
1243
1242
- Mitigations: What can be done to stop the bleeding, especially for already
1244
1243
running user workloads?
@@ -1369,21 +1368,23 @@ The issues without idmap mounts in previous iterations of this KEP, is that the
1369
1368
pod had to be unique for every pod in the cluster, easily reaching a limit when the cluster is "big
1370
1369
enough" and the UID space runs out. However, with idmap mounts the IDs assigned to a pod just needs
1371
1370
to be unique within the node (and with 64k ranges we have 64k pods possible in the node, so not
1372
- really an issue). IOW, by using idmap mounts, we changed the IDs limit to be node-scoped instead of
1373
- cluster-wide/cluster-scoped.
1371
+ really an issue). In other words: by using idmap mounts, we changed the IDs limit to be node-scoped
1372
+ instead of cluster-wide/cluster-scoped.
1373
+
1374
+ Some use cases for longer mappings include:
1374
1375
1375
- There are no known use cases for longer mappings that we know of. The 16bit range (0-65535) is what
1376
- is assumed by all POSIX tools that we are aware of. If the need arises, longer mapping can be
1377
- considered in a future KEP .
1376
+ - running a container tool inside a Pod, where that container tool wants to use a UID range.
1377
+ - running an application inside a Pod where the application uses UIDs
1378
+ above 65535 by default .
1378
1379
1379
- ### Allow runtimes to pick the mapping?
1380
+ ### Allow runtimes to pick the mapping
1380
1381
1381
1382
Tim suggested that we might want to allow the container runtimes to choose the
1382
1383
mapping and have different runtimes pick different mappings. While KEP authors
1383
1384
disagree on this, we still need to discuss it and settle on something. This was
1384
1385
[ raised here] ( https://github.com/kubernetes/enhancements/pull/3065#discussion_r798760382 )
1385
1386
1386
- Furthermore, the reasons mentioned by Tim (some nodes having CRIO, some others having containerd,
1387
+ Furthermore, the reasons mentioned by Tim Hockin (some nodes having CRIO, some others having containerd,
1387
1388
etc.) are handled correctly now. Different nodes can use different container runtimes, if a custom
1388
1389
range needs to be used by the kubelet, that can be configured per-node.
1389
1390
0 commit comments