Linux Capabilities allow you to break apart the power of root into smaller groups of privileges. The Linux capabilities(7) man page provides a detailed description of how capabilities management is performed in Linux. In brief, the Linux kernel associates various capability sets with threads and files. The thread’s Effective capability set determines the current privileges of a thread.
When a thread executes a binary program the kernel updates the various thread capability sets according to a set of rules that take into account the UID of thread before and after the exec system call and the file capabilities of the program being executed. Refer to the blog series in 10# for more details about []Linux capabilities and some examples. For Red Hat specific review of capabilities please refer to the link:Linux Capabilities in OpenShift blog.# An additional reference is link:Docker Run Reference.[]
Users may choose to specify the required permissions for their running application in the Security Context of the pod specification. In OCP, administrators can use the Security Context Constraint (SCC) admission controller plugin to control the permissions allowed for pods deployed to the cluster. If the pod requests permissions that are not allowed by the SCCs available to that pod, the pod will not be admitted to the cluster.
The following runtime and SCC attributes control the capabilities that will be granted to a new container:
-
The values in the SCC for
allowedCapabilities
,defaultAddCapabilities
andrequiredDropCapabilities
-
allowPrivilegeEscalation
: controls whether a container can acquire extra privileges through setuid binaries or the file capabilities of binaries
The capabilities associated with a new container are determined as follows:
-
If the container has the UID 0 (root) its Effective capability set is determined according to the capability attributes requested by the pod or container security context and allowed by the SCC assigned to the pod. In this case, the SCC provides a way to limit the capabilities of a root container.
-
If the container has a UID non 0 (non root), the new container has an empty Effective capability set (see Kubernetes should configure the ambient capability set). In this case the SCC assigned to the pod controls only the capabilities the container may acquire through the file capabilities of binaries it will execute.
Considering the general recommendation to avoid running root containers, capabilities required by non-root containers are controlled by the pod or container security context and the SCC capability attributes but can only be acquired by properly setting the file capabilities of the container binaries.
Refer to Managing security context constraints for more details on how to define and use the SCC.
The default capabilities that are allowed via the restricted SCC are as follows (see default cri-o Linux capabilities)
-
"CHOWN"
-
"DAC_OVERRIDE"
-
"FSETID"
-
"FOWNER"
-
"SETPCAP"
-
"NET_BIND_SERVICE"
Note
|
The capabilities: "SETGID", "SETUID" &"KILL", have been removed from the default OpenShift capabilities. |
IPC_LOCK capability is required if any of these functions are used in an application:
-
mlock()
-
mlockall()
-
shmctl()
-
mmap()
Even though mlock()
is not necessary on systems where page swap is disabled (for example on OpenShift), it may still be required as it is a function that is built into DPDK libraries, and DPDK based applications may indirectly call it by calling other functions.
See test case access-control-ipc-lock-capability-check
NET_ADMIN capability is required to perform various network related administrative operations inside container such as:
-
MTU setting
-
Link state modification
-
MAC/IP address assignment
-
IP address flushing
-
Route insertion/deletion/replacement
-
Control network driver and hardware settings via
ethtool
This doesn’t include:
-
add/delete a virtual interface inside a container. For example: adding a VLAN interface
-
Setting VF device properties
All the administrative operations (except ethtool
) mentioned above that require the NET_ADMIN
capability should already be supported on the host by various CNIs in Openshift.
Important
|
Workload requirement
Only userplane applications or applications using SR-IOV or Multicast can request NET_ADMIN capability See test case access-control-net-admin-capability-check |
This capability is very powerful and overloaded. It allows the application to perform a range of system administration operations to the host. So you should avoid requiring this capability in your application.
Important
|
Workload requirement
Applications MUST NOT use the SYS_ADMIN Linux capability See test case access-control-sys-admin-capability-check |
In the case that a workload is running on a node and is using DPDK, SYS_NICE will be used to allow DPDK application to switch to SCHED_FIFO.
See test case access-control-sys-nice-realtime-capability
This capability is required when using Process Namespace Sharing. This is used when processes from one Container need to be exposed to another Container. For example, to send signals like SIGHUP from a process in a Container to another process in another Container. See Share Process Namespace between Containers in a Pod for more details. For more information on these capabilities refer to Linux Capabilities in OpenShift.
See test case access-control-sys-ptrace-capability