Skip to content

WSL NixOS cdi generate Error: failed to initialize dxcore context #452

@Samiser

Description

@Samiser
❯ nvidia-ctk --version
NVIDIA Container Toolkit CLI version 1.15.0-rc.3

i'm attempting to use nvidia-ctk to generate a CDI spec in WSL running NixOS, but am getting the following error:

❯ nvidia-ctk cdi generate --nvidia-ctk-path /run/current-system/sw/bin/nvidia-ctk --ldconfig-path /run/current-system/sw/bin/ldconfig --mode wsl
INFO[0000] Selecting /dev/dxg as /dev/dxg
ERRO[0000] failed to generate CDI spec: failed to create edits common for entities: failed to create discoverer for WSL driver: failed to initialize dxcore: failed to initialize dxcore context

if i generate the CDI spec on a different VM and use that config directly (only changing the location of nvidia-ctk) then nvidia-ctk successfully finds the device and i can use it in containers:

nvidia-container-toolkit.json (click to expand)
{
    "cdiVersion": "0.3.0",
    "containerEdits": {
        "hooks": [
            {
                "args": [
                    "nvidia-ctk",
                    "hook",
                    "create-symlinks",
                    "--link",
                    "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/nvidia-smi::/usr/bin/nvidia-smi"
                ],
                "hookName": "createContainer",
                "path": "/run/current-system/sw/bin/nvidia-ctk"
            },
            {
                "args": [
                    "nvidia-ctk",
                    "hook",
                    "update-ldcache",
                    "--folder",
                    "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69",
                    "--folder",
                    "/usr/lib/wsl/lib"
                ],
                "hookName": "createContainer",
                "path": "/run/current-system/sw/bin/nvidia-ctk"
            }
        ],
        "mounts": [
            {
                "containerPath": "/usr/lib/wsl/lib/libdxcore.so",
                "hostPath": "/usr/lib/wsl/lib/libdxcore.so",
                "options": [
                    "ro",
                    "nosuid",
                    "nodev",
                    "bind"
                ]
            },
            {
                "containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libcuda.so.1.1",
                "hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libcuda.so.1.1",
                "options": [
                    "ro",
                    "nosuid",
                    "nodev",
                    "bind"
                ]
            },
            {
                "containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libcuda_loader.so",
                "hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libcuda_loader.so",
                "options": [
                    "ro",
                    "nosuid",
                    "nodev",
                    "bind"
                ]
            },
            {
                "containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ml.so.1",
                "hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ml.so.1",
                "options": [
                    "ro",
                    "nosuid",
                    "nodev",
                    "bind"
                ]
            },
            {
                "containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ml_loader.so",
                "hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ml_loader.so",
                "options": [
                    "ro",
                    "nosuid",
                    "nodev",
                    "bind"
                ]
            },
            {
                "containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ptxjitcompiler.so.1",
                "hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ptxjitcompiler.so.1",
                "options": [
                    "ro",
                    "nosuid",
                    "nodev",
                    "bind"
                ]
            },
            {
                "containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/nvcubins.bin",
                "hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/nvcubins.bin",
                "options": [
                    "ro",
                    "nosuid",
                    "nodev",
                    "bind"
                ]
            },
            {
                "containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/nvidia-smi",
                "hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/nvidia-smi",
                "options": [
                    "ro",
                    "nosuid",
                    "nodev",
                    "bind"
                ]
            }
        ]
    },
    "devices": [
        {
            "containerEdits": {
                "deviceNodes": [
                    {
                        "path": "/dev/dxg"
                    }
                ]
            },
            "name": "all"
        }
    ],
    "kind": "nvidia.com/gpu"
}

i've also tried populating every other flag with the locations of the files in /usr/lib/wsl/ but that didn't make a difference, i assume that's handled by --mode wsl

here's the relevant nix config if it helps (ommitting nixos-wsl import section):

{
  wsl.enable = true;

  environment.systemPackages = with pkgs; [ nvidia-container-toolkit ];

  virtualisation.podman.enable = true;
  virtualisation.containers.cdi.dynamic.nvidia.enable = true;

  programs.nix-ld.enable = true;

  environment.variables = lib.mkForce {
    NIX_LD_LIBRARY_PATH = "/usr/lib/wsl/lib/";
    NIX_LD = "${pkgs.glibc}/lib/ld-linux-x86-64.so.2";
  };
}

and here's the gpu working with the manual config:

❯ nvidia-ctk cdi list
INFO[0000] Found 1 CDI devices
nvidia.com/gpu=all

❯ podman run --device nvidia.com/gpu=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -benchmark -gpu
--- cut ---
> Compute 7.5 CUDA device: [NVIDIA GeForce RTX 2070]
36864 bodies, total time for 10 iterations: 65.098 ms
= 208.756 billion interactions per second
= 4175.121 single-precision GFLOP/s at 20 flops per interaction

let me know if there's any more information i can provide!

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions