The NVIDIA Driver Manager is a Kubernetes component which assist in seamless upgrades of NVIDIA Driver on each node of the cluster. This component ensure that all pre-requisites are met before driver upgrades can be performed using NVIDIA GPU Driver. Following are the actions performed by this component when upgrade is required.
- Check for already installed kernel modules.
- Perform Drain on the node ignoring Daemonset pods.
- Evict GPU Operator components like Device-Plugin, GPU Feature Discovery, DCGM Exporter etc.
- Unload kernel-modules.
- Unmount Driver root filesystem mounted on the host previously under /run/nvidia/driver.
- Uncordon the node.
These steps allows new versions can be easily installed in the Kubernetes cluster.