Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vfio-manage.sh can‘t bind multi-aux dev in nvidia-vfio-manager #1328

Open
yuntianfeijing opened this issue Mar 12, 2025 · 0 comments · May be fixed by #1329
Open

vfio-manage.sh can‘t bind multi-aux dev in nvidia-vfio-manager #1328

yuntianfeijing opened this issue Mar 12, 2025 · 0 comments · May be fixed by #1329

Comments

@yuntianfeijing
Copy link

My GPU is NVIDIA Corporation TU104GL [Quadro RTX 4000], the GPU have 3 aux dev

When I set up the GPU for the use of kubevirt vm pass through, the script vfio-mageme.sh cannot bind all aux dev to the vfio-pci driver

Bug In https://github.com/NVIDIA/gpu-operator/blob/main/assets/state-vfio-manager/0400_configmap.yaml#L128

The function get_grapcs_aux_dev should not use if ls "/sys/bus/pci/devices/$aux_dev/" as a criterion for judgment, and should return a string array. In the functions bind_device and unbind_device, loop through this array and perform judgment and corresponding operations

lspci -Dnnkv -d 10de:

0000:52:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104GL [Quadro RTX 4000] [10de:1eb1] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation Device [10de:12a0]
Physical Slot: 8191-8
Flags: fast devsel, IRQ 16, NUMA node 0, IOMMU group 35
Memory at b3000000 (32-bit, non-prefetchable) [size=16M]
Memory at 20ffe0000000 (64-bit, prefetchable) [size=256M]
Memory at 20fff0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 7000 [size=128]
Expansion ROM at b4000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting Capabilities: [420] Advanced Error Reporting Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024
Capabilities: [900] Secondary PCI Express
Capabilities: [bb0] Physical Resizable BAR
Kernel driver in use: vfio-pci
Kernel modules: nouveau

0000:52:00.1 Audio device [0403]: NVIDIA Corporation TU104 HD Audio Controller [10de:10f8] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:12a0]
Physical Slot: 8191-8
Flags: bus master, fast devsel, latency 0, IRQ 17, NUMA node 0, IOMMU group 35
Memory at b4080000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel

0000:52:00.2 USB controller [0c03]: NVIDIA Corporation TU104 USB 3.1 Host Controller [10de:1ad8] (rev a1) (prog-if 30 [XHCI])
Subsystem: NVIDIA Corporation Device [10de:12a0]
Physical Slot: 8191-8
Flags: fast devsel, IRQ 64, NUMA node 0, IOMMU group 35
Memory at 20fff2000000 (64-bit, prefetchable) [size=256K]
Memory at 20fff2040000 (64-bit, prefetchable) [size=64K]
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [b4] Power Management version 3
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: xhci_hcd

0000:52:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller [10de:1ad9] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:12a0]
Physical Slot: 8191-8
Flags: fast devsel, IRQ 255, NUMA node 0, IOMMU group 35
Memory at b4084000 (32-bit, non-prefetchable) [disabled] [size=4K]
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [b4] Power Management version 3
Capabilities: [100] Advanced Error Reporting

@yuntianfeijing yuntianfeijing changed the title vfio-manage.sh can‘t bind nvidia-vfio-manager vfio-manage.sh can‘t bind multi-aux dev in nvidia-vfio-manager Mar 12, 2025
yuntianfeijing added a commit to yuntianfeijing/gpu-operator that referenced this issue Mar 12, 2025
@yuntianfeijing yuntianfeijing linked a pull request Mar 12, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant