You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, when I attempt to install v24.9.1, gpu-operator pod will not run correctly:
[Update] I am also having this same result with v24.6.2
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 32s default-scheduler Successfully assigned gpu-operator/gpu-operator-86c586f6df-vchlw to <node-name>
Normal Pulled 11s (x3 over 31s) kubelet Container image "nvcr.io/nvidia/gpu-operator:v24.9.1" already present on machine
Normal Created 11s (x3 over 31s) kubelet Created container gpu-operator
Warning Failed 11s (x3 over 31s) kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "gpu-operator": executable file not found in $PATH: unknown
Warning BackOff 1s (x5 over 29s) kubelet Back-off restarting failed container gpu-operator in pod gpu-operator-86c586f6df-vchlw_gpu-operator(a99e3e8d-47d8-47c9-b3fd-db6d796fa0d5)
Mon Jan 27 16:02:14 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L4 Off | 00000000:8B:00.0 Off | 0 |
| N/A 38C P8 17W / 72W | 1MiB / 23034MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
The text was updated successfully, but these errors were encountered:
OS: RHEL 9.3
GPU: NVIDIA L4
GPU Operator: v24.9.1
containerd: 1.6.28
rke2: 1.26.10
nvidia-container-toolkit: 1.17.3
environment: air-gapped
I am able to deploy the helm chart with the following settings using v23.9.1:
However, when I attempt to install v24.9.1, gpu-operator pod will not run correctly:
[Update] I am also having this same result with v24.6.2
The text was updated successfully, but these errors were encountered: