-
Notifications
You must be signed in to change notification settings - Fork 70
fix: add NVIDIA CDI device for WSL2 GPU support #3895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: limam-B <[email protected]>
| Type: 'bind', | ||
| }); | ||
|
|
||
| devices.push({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: This is a flag not a path to share, what is the rationale to do that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a CDI (Container Device Interface) device identifier. Podman uses nvidia.com/gpu=all as a CDI spec name to automatically mount all NVIDIA GPU devices.
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on that link - i'm not that excpert - i belielve that Podman Devices array accepts CDI device names like nvidia.com/gpu=all in PathOnHost , so when Podman sees such format, it automatically resolves it via CDI and mounts all GPU devices.
This is the same pattern used for Linux (as screen shoot above).
The alternative would be using the --device CLI flag, but since we're using the API, this is the equivalent approach.
|
Can confirm adding nvidia.com/gpu=all works (mimicking what is being done for the nvidia docs and what ramalama uses), but we already have this as part of the driver enablement Will investigate more but nvidia gpu passthrough was working via wsl before and no code relating to the gpu has changed. I have a hunch it could be something else I know that the qe team was recently checking out gpu stuff and would appreciate their knowledge on this too! Thank you! |
Thanks for testing. Regarding "nvidia gpu passthrough was working via wsl before" - that was with The switch to ramalama/cuda-llama-server in e34d59f changed this - the new image expects CDI injection.
The old image had the CUDA stack baked in; the new one doesn't. |
|
Ah thanks for the in-depth explanation that makes sense! |
axel7083
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a pretty old issue #1824 on detecting the nvidia CDI
As of today, we do some magic 🪄 trick to let the container access the GPU on WSL, which is not ideal but work for all user even when they do not have CDI installed
I ma okay with this change, if this is not causing errors for user that do not have it
| }); | ||
|
|
||
| devices.push({ | ||
| PathOnHost: 'nvidia.com/gpu=all', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: what happens if the podman machine do not have the nvidia CDI installed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess If CDI isn't configured, Podman will fail to resolve nvidia.com/gpu=all and the container won't start.
But users enabling GPU support should have nvidia-container-toolkit installed which generates the CDI spec.
Maybe should add a check like the Linux case does with isNvidiaCDIConfigured()?
"I'll confirm by reproducing this scenario, drop more later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test Scenario: What happens without CDI?
Check current CDI status in Podman machine
podman machine ssh cat /etc/cdi/nvidia.yaml
file exists, CDI configured.
Temporarily disable CDI
- SSH into the Podman machine
podman machine ssh
- Disable/Backup the CDI config
sudo mv /etc/cdi/nvidia.yaml /etc/cdi/nvidia.yaml.disabled
- Exit the SSH session
exit
Test Results
Inference server with [ GPU ENABLED | no CDI ] in AI Lab.
Inference server with [ GPU DISABLED | no CDI ] in AI Lab.
Why this behavior is correct:
Thanks to the conditional checks at:
- LlamaCppPython.ts:230 - Only gets GPU if experimentalGPU setting is enabled
- LlamaCppPython.ts:108 - Only adds CDI device if gpu object exists
The CDI device is only added when GPU is explicitly enabled in settings.
Conclusion:
This is the correct behavior since RamaLama requires CDI:
https://github.com/containers/ramalama/blob/main/docs/ramalama-cuda.7.md
- CPU mode is unaffected (no CDI device added when GPU is disabled)
- GPU mode gives clear error when CDI is missing
- GPU mode works when CDI is properly configured
- RamaLama requires CDI Documented
- AI Lab Extension requires CDI Documented
Background:
The "magic trick" in #1824 worked with the old ai-lab-playground-chat-cuda image (CUDA embedded).
RamaLama images expect CDI injection instead , this change happened in e34d59f.
We should update the AI Lab documentation to mention CDI is required for WSL GPU support. Maybe?
What does this PR do?
Adds NVIDIA CDI device (
nvidia.com/gpu=all) to WSL2 container creation to enable actual GPU access. Previously, WSL2 containers had GPU environment variables but missing device mounting, causing inference to run on CPU despite showing "GPU Inference" badge.Screenshot / video of UI
No UI changes, backend fix only
What issues does this PR fix or reference?
Fixes #3431
How to test this PR?
1- Windows 11 + WSL2 with NVIDIA GPU and drivers installed
2- Install NVIDIA Container Toolkit in WSL2 and generate CDI config (
nvidia-ctk cdi generate)3- Enable "Experimental GPU" in AI Lab settings
4- Create a new service with any model
5- In WSL2, run
nvidia-smi- should show GPU usage and llama-server process6- Verify container devices:
podman inspect <container-id> | grep -A5 Devices- should show nvidia.com/gpu device (not empty)