-
Notifications
You must be signed in to change notification settings - Fork 430
Open
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.
Description
❯ nvidia-ctk --version
NVIDIA Container Toolkit CLI version 1.15.0-rc.3
i'm attempting to use nvidia-ctk to generate a CDI spec in WSL running NixOS, but am getting the following error:
❯ nvidia-ctk cdi generate --nvidia-ctk-path /run/current-system/sw/bin/nvidia-ctk --ldconfig-path /run/current-system/sw/bin/ldconfig --mode wsl
INFO[0000] Selecting /dev/dxg as /dev/dxg
ERRO[0000] failed to generate CDI spec: failed to create edits common for entities: failed to create discoverer for WSL driver: failed to initialize dxcore: failed to initialize dxcore context
if i generate the CDI spec on a different VM and use that config directly (only changing the location of nvidia-ctk) then nvidia-ctk successfully finds the device and i can use it in containers:
nvidia-container-toolkit.json (click to expand)
{
"cdiVersion": "0.3.0",
"containerEdits": {
"hooks": [
{
"args": [
"nvidia-ctk",
"hook",
"create-symlinks",
"--link",
"/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/nvidia-smi::/usr/bin/nvidia-smi"
],
"hookName": "createContainer",
"path": "/run/current-system/sw/bin/nvidia-ctk"
},
{
"args": [
"nvidia-ctk",
"hook",
"update-ldcache",
"--folder",
"/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69",
"--folder",
"/usr/lib/wsl/lib"
],
"hookName": "createContainer",
"path": "/run/current-system/sw/bin/nvidia-ctk"
}
],
"mounts": [
{
"containerPath": "/usr/lib/wsl/lib/libdxcore.so",
"hostPath": "/usr/lib/wsl/lib/libdxcore.so",
"options": [
"ro",
"nosuid",
"nodev",
"bind"
]
},
{
"containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libcuda.so.1.1",
"hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libcuda.so.1.1",
"options": [
"ro",
"nosuid",
"nodev",
"bind"
]
},
{
"containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libcuda_loader.so",
"hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libcuda_loader.so",
"options": [
"ro",
"nosuid",
"nodev",
"bind"
]
},
{
"containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ml.so.1",
"hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ml.so.1",
"options": [
"ro",
"nosuid",
"nodev",
"bind"
]
},
{
"containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ml_loader.so",
"hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ml_loader.so",
"options": [
"ro",
"nosuid",
"nodev",
"bind"
]
},
{
"containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ptxjitcompiler.so.1",
"hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ptxjitcompiler.so.1",
"options": [
"ro",
"nosuid",
"nodev",
"bind"
]
},
{
"containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/nvcubins.bin",
"hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/nvcubins.bin",
"options": [
"ro",
"nosuid",
"nodev",
"bind"
]
},
{
"containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/nvidia-smi",
"hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/nvidia-smi",
"options": [
"ro",
"nosuid",
"nodev",
"bind"
]
}
]
},
"devices": [
{
"containerEdits": {
"deviceNodes": [
{
"path": "/dev/dxg"
}
]
},
"name": "all"
}
],
"kind": "nvidia.com/gpu"
}
i've also tried populating every other flag with the locations of the files in /usr/lib/wsl/ but that didn't make a difference, i assume that's handled by --mode wsl
here's the relevant nix config if it helps (ommitting nixos-wsl import section):
{
wsl.enable = true;
environment.systemPackages = with pkgs; [ nvidia-container-toolkit ];
virtualisation.podman.enable = true;
virtualisation.containers.cdi.dynamic.nvidia.enable = true;
programs.nix-ld.enable = true;
environment.variables = lib.mkForce {
NIX_LD_LIBRARY_PATH = "/usr/lib/wsl/lib/";
NIX_LD = "${pkgs.glibc}/lib/ld-linux-x86-64.so.2";
};
}
and here's the gpu working with the manual config:
❯ nvidia-ctk cdi list
INFO[0000] Found 1 CDI devices
nvidia.com/gpu=all
❯ podman run --device nvidia.com/gpu=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -benchmark -gpu
--- cut ---
> Compute 7.5 CUDA device: [NVIDIA GeForce RTX 2070]
36864 bodies, total time for 10 iterations: 65.098 ms
= 208.756 billion interactions per second
= 4175.121 single-precision GFLOP/s at 20 flops per interaction
let me know if there's any more information i can provide!
j-ckal
Metadata
Metadata
Assignees
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.