-
Notifications
You must be signed in to change notification settings - Fork 182
Add guest boot sanity tests with passthrough device support #4372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
0c55772 to
5f97b03
Compare
Add test and configurations to validate the following scenarios: 1. Launch guests in different guest interrupt modes: a. AVIC b. APIC c. x2APIC d. x2AVIC 2. Boot the above guests with vCPU counts: 512, 288, 256, 254, 128, 64 3. Boot guests with PCI passthrough devices. 4. Boot guests with emulated AMD IOMMU and emulated Intel IOMMU. Notes: 1. AVIC and APIC support fewer than 255 vCPUs. 2. x2APIC and x2AVIC modes require extended-apicid=on, or the presence of emulated AMD IOMMU / emulated Intel IOMMU, to support booting with >255 vCPUs. Signed-off-by: Dheeraj Kumar Srivastava <[email protected]>
5f97b03 to
6836866
Compare
WalkthroughIntroduces new QEMU PCI passthrough test infrastructure comprising four configuration files defining test scenarios for IOMMU guest modes and AVIC configurations across multiple CPU topologies (64–1024 vCPUs), and a test implementation file orchestrating guest boot validation with support checks, device attachment, and cleanup. Changes
Sequence DiagramsequenceDiagram
participant Test as Test Framework
participant Host as Host Kernel
participant VM as Guest VM
participant Cleanup as Cleanup
Test->>Host: Check AVIC/x2AVIC support (MSR reads)
Host-->>Test: Support status
alt Unsupported Mode
Test->>Test: Cancel test
end
Test->>Host: Prepare PCI device (vfio-pci binding)
Host-->>Test: Device ready / Error
Test->>VM: Create & boot VM with IOMMU config
VM-->>Test: VM alive
Test->>VM: Login to guest
VM-->>Test: Session established
Test->>VM: Verify APIC mode in dmesg
VM-->>Test: Mode confirmed / Mismatch
Test->>VM: Collect guest system details
VM-->>Test: System info
Test->>VM: Check for kernel crashes
VM-->>Test: Status OK / Crash detected
Test->>Cleanup: Close session
Cleanup->>VM: Session closed
Test->>Cleanup: Destroy VM
Cleanup->>VM: VM destroyed
Test->>Host: Unbind/rebind PCI device
Host-->>Cleanup: Device status restored
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Multiple configuration files follow consistent patterns but require validation of parameter correctness across IOMMU and APIC variants. Test implementation is substantial with multiple helper functions, support detection logic, conditional flows, and device lifecycle management, but follows a clear orchestration pattern. Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (8)
qemu/tests/cfg/emulated_amd_iommu.cfg (2)
21-24: Fill the TODO: add passthrough variant (device required to validate IOMMU).Add a variant that sets pci_device to a valid host BDF and exercises AMD IOMMU with a real device; otherwise this config never validates remapping paths.
I can propose a variant block wired to vfio_pci_passthrough.cfg conventions if you share the target device format.
1-3: Gate all AMD-IOMMU variants on minimum QEMU version.
required_qemu = [8.2.90,)appears only underno_passthrough. The AMD IOMMU device and flags should be version-gated across all variants.Apply at the top-level or duplicate under
x2apic_emul_amd_iommuandx2avic_emul_amd_iommu.Also applies to: 50-57
qemu/tests/cfg/iommu_accelerated_guest_mode.cfg (1)
52-65: Document host prerequisites for AVIC/x2AVIC runs.Given
kvm_probe_module_parameters = "avic=1", add a brief note/guard thatkvm_amdmust be (re)loaded withavic=1and that msr-tools is available for probes, to reduce false positives and test flakiness.qemu/tests/cfg/vfio_pci_passthrough.cfg (1)
12-14: Align irqchip setting with other configs.Consider adding
machine_type_extra_params = "kernel-irqchip=split"here too for consistency with the x86 test matrix, especially when mixing large vCPU counts and x2APIC modes.qemu/tests/cfg/iommu_guest_mode.cfg (1)
63-69: Clarify Intel IOMMU emulation on AMD host.Since
only HostCpuVendor.amdis set, a short comment thatintel_iommu = yesis intentionally emulated on AMD hosts would help avoid confusion.qemu/tests/qemu_pci_passthrough.py (3)
128-139: Make device parsing robust and avoid repeated splits.Use
split()(any whitespace) once, validate, and iterate by value. This simplifies the loop and avoids empty entries.Apply this refactor:
- # Prepare for pci passthrough - for i in range(len(pci_device.split(" "))): - # Check if device input is valid - if pci_device.split(" ")[i] not in pci.get_pci_addresses(): - test.cancel("Please provide valid pci device input.") - - driver_list.append(pci.get_driver(pci_device.split(" ")[i])) - pci.attach_driver(pci_device.split(" ")[i], "vfio-pci") - params["extra_params"] += ( - f" -device vfio-pci,host={pci_device.split(' ')[i]}" - ) + # Prepare for pci passthrough + devices = [d for d in pci_device.split() if d] + for dev in devices: + if dev not in pci.get_pci_addresses(): + test.cancel("Please provide valid pci device input.") + driver_list.append(pci.get_driver(dev)) + pci.attach_driver(dev, "vfio-pci") + params["extra_params"] += f" -device vfio-pci,host={dev}"
57-70: Guard rdmsr calls; cancel gracefully when msr-tools is missing.
process.run()will error ifrdmsris not installed. Convert to a guarded call and cancel with a clear message.Apply pattern to both support checks:
- out = process.run(cmd, sudo=True, shell=True).stdout_text.strip() + try: + out = process.run(cmd, sudo=True, shell=True).stdout_text.strip() + except process.CmdError as e: + test.cancel(f"msr-tools (rdmsr) not available or failed: {e}")
78-85: Avoid stale dmesg matches; clear or bound log window before checks.
check_kernel_logs()can match older lines. Clear dmesg (or start a watcher) before enabling AVIC/x2AVIC, then check.For example, call a dmesg reset before module setup:
- dmesg.clear_dmesg() prior to AVIC enablement
- then run checks for "AVIC enabled" and "x2AVIC enabled"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
qemu/tests/cfg/emulated_amd_iommu.cfg(1 hunks)qemu/tests/cfg/iommu_accelerated_guest_mode.cfg(1 hunks)qemu/tests/cfg/iommu_guest_mode.cfg(1 hunks)qemu/tests/cfg/vfio_pci_passthrough.cfg(1 hunks)qemu/tests/qemu_pci_passthrough.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
qemu/tests/qemu_pci_passthrough.py (1)
provider/storage_benchmark.py (1)
session(83-90)
🔇 Additional comments (1)
qemu/tests/qemu_pci_passthrough.py (1)
113-122: Ensurekvm_amdis actually (re)loaded withavic=1.The test only validates via dmesg; it doesn’t enforce loading
kvm_amdwith the requested parameter. Confirm the harness applieskvm_probe_module_parametersor explicitly (re)loadkvm_amdwith that param here.If not already handled by the framework, add:
- linux_modules.configure_module("kvm_amd", "CONFIG_KVM_AMD", module_parameters=kvm_probe_module_parameters)
| if pci_device != "": | ||
| for i in range(len(pci_device.split(" "))): | ||
| if pci_device.split(" ")[i] not in pci.get_pci_addresses(): | ||
| break | ||
| if driver_list[i] is None: | ||
| cur_driver = pci.get_driver(pci_device.split(" ")[i]) | ||
| if cur_driver is not None: | ||
| pci.unbind(cur_driver, pci_device.split(" ")[i]) | ||
| else: | ||
| pci.attach_driver(pci_device.split(" ")[i], driver_list[i]) | ||
| except ValueError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix cleanup: original driver not restored; loop breaks early; may leave devices bound to vfio.
- Using
breakskips remaining devices. - Restoring logic runs only when
driver_list[i] is None, which is inverted, and can callattach_driver(..., None).
Apply this diff to correctly unbind vfio and restore original drivers:
- if pci_device != "":
- for i in range(len(pci_device.split(" "))):
- if pci_device.split(" ")[i] not in pci.get_pci_addresses():
- break
- if driver_list[i] is None:
- cur_driver = pci.get_driver(pci_device.split(" ")[i])
- if cur_driver is not None:
- pci.unbind(cur_driver, pci_device.split(" ")[i])
- else:
- pci.attach_driver(pci_device.split(" ")[i], driver_list[i])
+ if pci_device:
+ devices = [d for d in pci_device.split() if d]
+ for idx, dev in enumerate(devices):
+ if dev not in pci.get_pci_addresses():
+ continue
+ cur_driver = pci.get_driver(dev)
+ if cur_driver == "vfio-pci":
+ pci.unbind("vfio-pci", dev)
+ orig_driver = driver_list[idx] if idx < len(driver_list) else None
+ if orig_driver:
+ pci.attach_driver(dev, orig_driver)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if pci_device != "": | |
| for i in range(len(pci_device.split(" "))): | |
| if pci_device.split(" ")[i] not in pci.get_pci_addresses(): | |
| break | |
| if driver_list[i] is None: | |
| cur_driver = pci.get_driver(pci_device.split(" ")[i]) | |
| if cur_driver is not None: | |
| pci.unbind(cur_driver, pci_device.split(" ")[i]) | |
| else: | |
| pci.attach_driver(pci_device.split(" ")[i], driver_list[i]) | |
| except ValueError as e: | |
| if pci_device: | |
| devices = [d for d in pci_device.split() if d] | |
| for idx, dev in enumerate(devices): | |
| if dev not in pci.get_pci_addresses(): | |
| continue | |
| cur_driver = pci.get_driver(dev) | |
| if cur_driver == "vfio-pci": | |
| pci.unbind("vfio-pci", dev) | |
| orig_driver = driver_list[idx] if idx < len(driver_list) else None | |
| if orig_driver: | |
| pci.attach_driver(dev, orig_driver) | |
| except ValueError as e: |
🤖 Prompt for AI Agents
In qemu/tests/qemu_pci_passthrough.py around lines 163-173, the rollback loop
breaks early, repeatedly calls split(" ") and inverts the restore logic (can
call attach_driver with None); fix by precomputing addresses =
pci_device.split(" "), iterate with for i, addr in enumerate(addresses) (no
break — use continue), skip addresses not present via if addr not in
pci.get_pci_addresses(): continue, and on cleanup for each present addr: if
driver_list[i] is not None call pci.attach_driver(addr, driver_list[i]) to
restore the original driver, else obtain cur_driver = pci.get_driver(addr) and
if cur_driver is not None call pci.unbind(cur_driver, addr) to unbind any
temporary driver (this avoids attaching None and ensures all devices are
processed).
This patch depends on