-
Notifications
You must be signed in to change notification settings - Fork 297
Issues: NVIDIA/gpu-operator
NOTICE: Containers losing access to GPUs with error: "Failed ...
#485
opened Feb 7, 2023 by
cdesiniotis
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
container-toolkit fails to start after upgrading to v24.9.0 on k3s cluster
bug
Issue/PR to expose/discuss/fix a bug
#1109
opened Nov 7, 2024 by
logan2211
NVIDIA Device Plugin Only Exposes One GPU Out of Two GPUs Installed on Single Node
#1079
opened Oct 29, 2024 by
amir-bialek
chroot: failed to run command 'nvidia-smi': No such file or directory
#1063
opened Oct 24, 2024 by
vanloswang
ServiceAccount
node-feature-discovery
should not be included in ClusterRoleBinding when nfd.enabled: false
#1038
opened Oct 14, 2024 by
cmontemuino
Allow adding custom labels and securityContext to the components deployed by ClusterPolicy
#1030
opened Oct 10, 2024 by
inesshz
Not able to view Gpu utilization metrics in openshift dashboard
#1002
opened Sep 20, 2024 by
umeshvw
Following gpu-operator documentation will break RKE2 cluster after reboot
#992
opened Sep 16, 2024 by
aiicore
containerd restart from nvidia-container-toolkit causes other daemonsets to get stuck
#991
opened Sep 13, 2024 by
chiragjn
DCGM_FI_DEV_GPU_UTIL metric giving empty value from prometheus
#983
opened Sep 10, 2024 by
Vijaygawate
Add validate nouveau whether in blacklist
feature
issue/PR that proposes a new feature or functionality
#974
opened Sep 5, 2024 by
lengrongfu
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.