Skip to content

Commit fc5ea6e

Browse files
committed
kep-127: address some review comments
Signed-off-by: Giuseppe Scrivano <[email protected]>
1 parent 41a25d5 commit fc5ea6e

File tree

1 file changed

+82
-22
lines changed

1 file changed

+82
-22
lines changed

keps/sig-node/127-user-namespaces/README.md

+82-22
Original file line numberDiff line numberDiff line change
@@ -288,6 +288,43 @@ message Mount {
288288
}
289289
```
290290

291+
The CRI runtime reports what runtime handlers have support for user
292+
namespaces through the `StatusResponse` message, that gains a new
293+
field `runtime_handlers`:
294+
295+
```
296+
message StatusResponse {
297+
// Status of the Runtime.
298+
RuntimeStatus status = 1;
299+
// Info is extra information of the Runtime. The key could be arbitrary string, and
300+
// value should be in json format. The information could include anything useful for
301+
// debug, e.g. plugins used by the container runtime.
302+
// It should only be returned non-empty when Verbose is true.
303+
map<string, string> info = 2;
304+
305+
// Runtime handlers.
306+
repeated RuntimeHandler runtime_handlers = 3;
307+
}
308+
```
309+
310+
Where RuntimeHandler is defined as below:
311+
312+
```
313+
message RuntimeHandlerFeatures {
314+
// supports_user_namespaces is set to true if the runtime handler supports
315+
// user namespaces.
316+
bool supports_user_namespaces = 1;
317+
}
318+
319+
message RuntimeHandler {
320+
// Name must be unique in StatusResponse.
321+
// An empty string denotes the default handler.
322+
string name = 1;
323+
// Supported features.
324+
RuntimeHandlerFeatures features = 2;
325+
}
326+
```
327+
291328
### Support for pods
292329

293330
Make pods work with user namespaces. This is activated via the
@@ -593,6 +630,7 @@ use container runtime versions that have the needed changes.
593630

594631
- Gather and address feedback from the community
595632
- Be able to configure UID/GID ranges to use for pods
633+
- This feature is not supported on Windows.
596634
- Get review from VM container runtimes maintainers (not blocker, as VM runtimes should just ignore
597635
the field, but nice to have)
598636

@@ -603,6 +641,20 @@ use container runtime versions that have the needed changes.
603641

604642
### Upgrade / Downgrade Strategy
605643

644+
Existing pods will still work as intended, as the new field is missing there.
645+
646+
Upgrade will not change any current behaviors.
647+
648+
When the new functionality wasn't yet used, downgrade will not be affected.
649+
650+
On downgrade, when the functionality was used, the pods created with
651+
user namespaces that are running will continue to run with user
652+
namespaces. Pods will need to be re-created to stop using the user
653+
namespace.
654+
655+
Versions of Kubernetes that doesn't have this feature implemented will
656+
ignore the new field `pod.spec.hostUsers`.
657+
606658
### Version Skew Strategy
607659

608660
<!--
@@ -635,11 +687,13 @@ doesn't create them. The runtime can detect this situation as the `user` field
635687
in the `NamespaceOption` will be seen as nil, [thanks to
636688
protobuf][proto3-defaults]. We already tested this with real code.
637689

638-
Old runtime and new kubelet: containers are created without userns. As the
639-
`user` field of the `NamespaceOption` message is not part of the runtime
640-
protofiles, that part is ignored by the runtime and pods are created using the
641-
host userns.
690+
Old runtime and new kubelet: the runtime won't report that it supports
691+
user namespaces through the `StatusResponse` message, so the kubelet
692+
will detect it and return an error if a pod with user namespaces is
693+
created.
642694

695+
We added unit tests for the feature gate disabled, and integration
696+
tests for the feature gate enabled and disabled.
643697

644698
[proto3-defaults]: https://developers.google.com/protocol-buffers/docs/proto3#default
645699

@@ -763,15 +817,14 @@ This section must be completed when targeting beta to a release.
763817

764818
The rollout is just a feature flag on the kubelet and the kube-apiserver.
765819

766-
If one API server is upgraded while others aren't, the pod will be accepted (if the apiserver is >=
767-
1.25). If it is scheduled to a node that the kubelet has the feature flag activated and the node
768-
meets the requirements to use user namespaces, then the pod will be created with the namespace. If
769-
it is scheduled to a node that has the feature disabled, it will be created without the user
770-
namespace.
820+
If one APIserver is upgraded while other's aren't and you are talking to a not upgraded the pod
821+
will be accepted (if the apiserver is >= 1.25). If it is scheduled to a node that the kubelet has
822+
the feature flag activated and the node meets the requirements to use user namespaces, then the
823+
pod will be created with the namespace. If it is scheduled to a node that has the feature disabled,
824+
it will be created without the user namespace.
771825

772826
On a rollback, pods created while the feature was active (created with user namespaces) will have to
773-
be restarted to be re-created without user namespaces. Just a re-creation of the pod will do the
774-
trick.
827+
be re-created without user namespaces.
775828

776829
<!--
777830
Try to be as paranoid as possible - e.g., what if some components will restart
@@ -788,24 +841,31 @@ will rollout across nodes.
788841
On Kubernetes side, the kubelet should start correctly.
789842

790843
On the node runtime side, a pod created with pod.spec.hostUsers=false should be on RUNNING state if
791-
all node requirements are met.
844+
all node requirements are met. If the CRI runtime or the handler do not support the feature, the kubelet
845+
returns an error.
846+
792847
<!--
793848
What signals should users be paying attention to when the feature is young
794849
that might indicate a serious problem?
795850
-->
796851

797852
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
798853

799-
Yes.
854+
Yes, we tested it locally using `./hack/local-up-cluster.sh`.
800855

801-
We tested to enable the feature flag, create a deployment with pod.spec.hostUsers=false, and then disable
802-
the feature flag and restart the kubelet and kube-apiserver.
856+
We tested enabling the feature flag, created a deployment with pod.spec.hostUsers=false, and then disabled
857+
the feature flag and restarted the kubelet and kube-apiserver.
803858

804859
After that, we deleted the deployment pods (not the deployment object), the pods were re-created
805860
without user namespaces just fine, without any modification needed on the deployment yaml.
806861

807862
We then enabled the feature flag on the kubelet and kube-apiserver, and deleted the deployment pod.
808863
This re-created caused the pod to be re-created, this time with user namespaces enabled again.
864+
865+
To validate it, it is necessary to exec into a container in the pod and run the command `cat /proc/self/uid_map`.
866+
When running in a user namespace the output is different than `0 0 4294967295` as it happens when running without
867+
a user namespace.
868+
809869
<!--
810870
Describe manual testing that was done and the outcomes.
811871
Longer term, we may want to require automated upgrade/rollback tests, but we
@@ -839,10 +899,9 @@ logs or events for this purpose.
839899

840900
###### How can someone using this feature know that it is working for their instance?
841901

842-
Check if any pod has the pod.spec.hostUsers field set to false and is on RUNNING state on a node
843-
that meets all the requirements.
902+
If the runtime doesn't support user namespaces an error is returned by the kubelet.
844903

845-
There are step-by-step examples in the Kubernetes documentation too.
904+
There are step-by-step examples in the Kubernetes documentation too: https://kubernetes.io/docs/tasks/configure-pod-container/user-namespaces/
846905

847906
<!--
848907
For instance, if this is a pod-related feature, it should be possible to determine if the feature is functioning properly
@@ -860,7 +919,8 @@ Recall that end users cannot usually observe component logs or access metrics.
860919
- Other field:
861920
- [x] Other (treat as last resort)
862921
- Details: check pods with pod.spec.hostUsers field set to false, and see if they are in RUNNING
863-
state.
922+
state. Exec into a container and run `cat /proc/self/uid_map` to verify that the mappings are different
923+
than the mappings on the host.
864924

865925
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
866926

@@ -1373,9 +1433,9 @@ instead of cluster-wide/cluster-scoped.
13731433

13741434
Some use cases for longer mappings include:
13751435

1376-
- running a container tool inside a Pod, where that container tool wants to use a UID range.
1377-
- running an application inside a Pod where the application uses UIDs
1378-
above 65535 by default.
1436+
There are no known use cases for longer mappings that we know of. The 16bit range (0-65535) is what
1437+
is assumed by all POSIX tools that we are aware of. If the need arises, longer mapping can be
1438+
considered in a future KEP.
13791439

13801440
### Allow runtimes to pick the mapping
13811441

0 commit comments

Comments
 (0)