@@ -288,6 +288,43 @@ message Mount {
288
288
}
289
289
```
290
290
291
+ The CRI runtime reports what runtime handlers have support for user
292
+ namespaces through the ` StatusResponse ` message, that gains a new
293
+ field ` runtime_handlers ` :
294
+
295
+ ```
296
+ message StatusResponse {
297
+ // Status of the Runtime.
298
+ RuntimeStatus status = 1;
299
+ // Info is extra information of the Runtime. The key could be arbitrary string, and
300
+ // value should be in json format. The information could include anything useful for
301
+ // debug, e.g. plugins used by the container runtime.
302
+ // It should only be returned non-empty when Verbose is true.
303
+ map<string, string> info = 2;
304
+
305
+ // Runtime handlers.
306
+ repeated RuntimeHandler runtime_handlers = 3;
307
+ }
308
+ ```
309
+
310
+ Where RuntimeHandler is defined as below:
311
+
312
+ ```
313
+ message RuntimeHandlerFeatures {
314
+ // supports_user_namespaces is set to true if the runtime handler supports
315
+ // user namespaces.
316
+ bool supports_user_namespaces = 1;
317
+ }
318
+
319
+ message RuntimeHandler {
320
+ // Name must be unique in StatusResponse.
321
+ // An empty string denotes the default handler.
322
+ string name = 1;
323
+ // Supported features.
324
+ RuntimeHandlerFeatures features = 2;
325
+ }
326
+ ```
327
+
291
328
### Support for pods
292
329
293
330
Make pods work with user namespaces. This is activated via the
@@ -593,6 +630,7 @@ use container runtime versions that have the needed changes.
593
630
594
631
- Gather and address feedback from the community
595
632
- Be able to configure UID/GID ranges to use for pods
633
+ - This feature is not supported on Windows.
596
634
- Get review from VM container runtimes maintainers (not blocker, as VM runtimes should just ignore
597
635
the field, but nice to have)
598
636
@@ -603,6 +641,20 @@ use container runtime versions that have the needed changes.
603
641
604
642
### Upgrade / Downgrade Strategy
605
643
644
+ Existing pods will still work as intended, as the new field is missing there.
645
+
646
+ Upgrade will not change any current behaviors.
647
+
648
+ When the new functionality wasn't yet used, downgrade will not be affected.
649
+
650
+ On downgrade, when the functionality was used, the pods created with
651
+ user namespaces that are running will continue to run with user
652
+ namespaces. Pods will need to be re-created to stop using the user
653
+ namespace.
654
+
655
+ Versions of Kubernetes that doesn't have this feature implemented will
656
+ ignore the new field ` pod.spec.hostUsers ` .
657
+
606
658
### Version Skew Strategy
607
659
608
660
<!--
@@ -635,11 +687,13 @@ doesn't create them. The runtime can detect this situation as the `user` field
635
687
in the ` NamespaceOption ` will be seen as nil, [ thanks to
636
688
protobuf] [ proto3-defaults ] . We already tested this with real code.
637
689
638
- Old runtime and new kubelet: containers are created without userns. As the
639
- ` user ` field of the ` NamespaceOption ` message is not part of the runtime
640
- protofiles, that part is ignored by the runtime and pods are created using the
641
- host userns .
690
+ Old runtime and new kubelet: the runtime won't report that it supports
691
+ user namespaces through the ` StatusResponse ` message, so the kubelet
692
+ will detect it and return an error if a pod with user namespaces is
693
+ created .
642
694
695
+ We added unit tests for the feature gate disabled, and integration
696
+ tests for the feature gate enabled and disabled.
643
697
644
698
[ proto3-defaults ] : https://developers.google.com/protocol-buffers/docs/proto3#default
645
699
@@ -763,15 +817,14 @@ This section must be completed when targeting beta to a release.
763
817
764
818
The rollout is just a feature flag on the kubelet and the kube-apiserver.
765
819
766
- If one API server is upgraded while others aren't, the pod will be accepted (if the apiserver is >=
767
- 1.25). If it is scheduled to a node that the kubelet has the feature flag activated and the node
768
- meets the requirements to use user namespaces, then the pod will be created with the namespace. If
769
- it is scheduled to a node that has the feature disabled, it will be created without the user
770
- namespace.
820
+ If one APIserver is upgraded while other's aren't and you are talking to a not upgraded the pod
821
+ will be accepted (if the apiserver is >= 1.25). If it is scheduled to a node that the kubelet has
822
+ the feature flag activated and the node meets the requirements to use user namespaces, then the
823
+ pod will be created with the namespace. If it is scheduled to a node that has the feature disabled,
824
+ it will be created without the user namespace.
771
825
772
826
On a rollback, pods created while the feature was active (created with user namespaces) will have to
773
- be restarted to be re-created without user namespaces. Just a re-creation of the pod will do the
774
- trick.
827
+ be re-created without user namespaces.
775
828
776
829
<!--
777
830
Try to be as paranoid as possible - e.g., what if some components will restart
@@ -788,24 +841,31 @@ will rollout across nodes.
788
841
On Kubernetes side, the kubelet should start correctly.
789
842
790
843
On the node runtime side, a pod created with pod.spec.hostUsers=false should be on RUNNING state if
791
- all node requirements are met.
844
+ all node requirements are met. If the CRI runtime or the handler do not support the feature, the kubelet
845
+ returns an error.
846
+
792
847
<!--
793
848
What signals should users be paying attention to when the feature is young
794
849
that might indicate a serious problem?
795
850
-->
796
851
797
852
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
798
853
799
- Yes.
854
+ Yes, we tested it locally using ` ./hack/local-up-cluster.sh ` .
800
855
801
- We tested to enable the feature flag, create a deployment with pod.spec.hostUsers=false, and then disable
802
- the feature flag and restart the kubelet and kube-apiserver.
856
+ We tested enabling the feature flag, created a deployment with pod.spec.hostUsers=false, and then disabled
857
+ the feature flag and restarted the kubelet and kube-apiserver.
803
858
804
859
After that, we deleted the deployment pods (not the deployment object), the pods were re-created
805
860
without user namespaces just fine, without any modification needed on the deployment yaml.
806
861
807
862
We then enabled the feature flag on the kubelet and kube-apiserver, and deleted the deployment pod.
808
863
This re-created caused the pod to be re-created, this time with user namespaces enabled again.
864
+
865
+ To validate it, it is necessary to exec into a container in the pod and run the command ` cat /proc/self/uid_map ` .
866
+ When running in a user namespace the output is different than ` 0 0 4294967295 ` as it happens when running without
867
+ a user namespace.
868
+
809
869
<!--
810
870
Describe manual testing that was done and the outcomes.
811
871
Longer term, we may want to require automated upgrade/rollback tests, but we
@@ -839,10 +899,9 @@ logs or events for this purpose.
839
899
840
900
###### How can someone using this feature know that it is working for their instance?
841
901
842
- Check if any pod has the pod.spec.hostUsers field set to false and is on RUNNING state on a node
843
- that meets all the requirements.
902
+ If the runtime doesn't support user namespaces an error is returned by the kubelet.
844
903
845
- There are step-by-step examples in the Kubernetes documentation too.
904
+ There are step-by-step examples in the Kubernetes documentation too: https://kubernetes.io/docs/tasks/configure-pod-container/user-namespaces/
846
905
847
906
<!--
848
907
For instance, if this is a pod-related feature, it should be possible to determine if the feature is functioning properly
@@ -860,7 +919,8 @@ Recall that end users cannot usually observe component logs or access metrics.
860
919
- Other field:
861
920
- [x] Other (treat as last resort)
862
921
- Details: check pods with pod.spec.hostUsers field set to false, and see if they are in RUNNING
863
- state.
922
+ state. Exec into a container and run ` cat /proc/self/uid_map ` to verify that the mappings are different
923
+ than the mappings on the host.
864
924
865
925
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
866
926
@@ -1373,9 +1433,9 @@ instead of cluster-wide/cluster-scoped.
1373
1433
1374
1434
Some use cases for longer mappings include:
1375
1435
1376
- - running a container tool inside a Pod, where that container tool wants to use a UID range.
1377
- - running an application inside a Pod where the application uses UIDs
1378
- above 65535 by default .
1436
+ There are no known use cases for longer mappings that we know of. The 16bit range (0-65535) is what
1437
+ is assumed by all POSIX tools that we are aware of. If the need arises, longer mapping can be
1438
+ considered in a future KEP .
1379
1439
1380
1440
### Allow runtimes to pick the mapping
1381
1441
0 commit comments