Skip to content

Conversation

hholoubk
Copy link
Contributor

@hholoubk hholoubk commented Sep 24, 2025

This change adds new test to verify virsh shutdown command is working as expected for VM with vIOMMU devices. Also added the optional possibility to start the VM again after the shutdown and parametrize the shutdown_timeout.

Improving test coverage for RHEL-109504.

Tests running and properly failing as the code not in distro yet.

(1/3) type_specific.io-github-autotest-libvirt.vIOMMU.iommu_device_lifecycle.shutdown.virtio_muti_devices.vhost_on.virtio: STARTED
(1/3) type_specific.io-github-autotest-libvirt.vIOMMU.iommu_device_lifecycle.shutdown.virtio_muti_devices.vhost_on.virtio: FAIL: VM failed to shutdown (71.56 s)
(2/3) type_specific.io-github-autotest-libvirt.vIOMMU.iommu_device_lifecycle.shutdown.virtio_muti_devices.vhost_on.smmuv3: STARTED
(2/3) type_specific.io-github-autotest-libvirt.vIOMMU.iommu_device_lifecycle.shutdown.virtio_muti_devices.vhost_on.smmuv3: FAIL: VM failed to shutdown (71.61 s)
(3/3) type_specific.io-github-autotest-libvirt.vIOMMU.iommu_device_lifecycle.shutdown.hostdev_iface.virtio: STARTED
(3/3) type_specific.io-github-autotest-libvirt.vIOMMU.iommu_device_lifecycle.shutdown.hostdev_iface.virtio: FAIL: VM failed to shutdown (91.44 s)
RESULTS : PASS 0 | ERROR 0 | FAIL 3 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB HTML : /var/log/avocado/job-results/job-2025-09-23T06.51-04fee85/results.html
JOB TIME : 237.12 s

Summary by CodeRabbit

  • Tests
    • Expanded vIOMMU device lifecycle scenarios with a new shutdown phase, including configurable shutdown timeout (60s) and optional auto-restart after shutdown.
    • Improved lifecycle orchestration and VM setup to centralize start/stop/save/resume/reboot flows, reducing flakiness.
    • Enhanced post-action verification with more consistent login and network checks, plus clearer warnings for unknown scenarios.

@hholoubk
Copy link
Contributor Author

Hi @dzhengfy and @yalzhang could you please review the changes (I've already integrated coderabbit suggestions, as they make sense and I was forced due technical issues to create new pull request

Copy link

coderabbitai bot commented Sep 24, 2025

Warning

Rate limit exceeded

@hholoubk has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 1 minutes and 52 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 6ccf141 and a2d8060.

📒 Files selected for processing (2)
  • libvirt/tests/cfg/sriov/vIOMMU/iommu_device_lifecycle.cfg (1 hunks)
  • libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py (2 hunks)

Walkthrough

Adds a shutdown step to the vIOMMU lifecycle test config and restructures the test implementation: VM XML setup is centralized, new helpers handle network checks, save/suspend and reboot/reset/shutdown flows (including shutdown + optional restart), and orchestration now delegates to these helpers.

Changes

Cohort / File(s) Summary
Config: lifecycle scenarios
libvirt/tests/cfg/sriov/vIOMMU/iommu_device_lifecycle.cfg
Adds a shutdown step to test_scenario variants with parameters start_after_shutdown = yes and shutdown_timeout = 60.
Test implementation: orchestration & helpers
libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py
Refactors run flow to use setup_vm_xml; adds helper functions check_network_access, test_save_suspend, test_reboot_reset_shutdown, and setup_vm_xml; implements shutdown handling (virsh shutdown, wait for dead, optional restart), unified login/network checks, and cleanup of save state.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Tester
  participant Test as Test Logic
  participant Virsh
  participant VM
  participant Serial
  participant Network

  Tester->>Test: start run()
  Note over Test: setup_vm_xml(...) -> configure VM xml/devices/interfaces
  Test->>Virsh: define/start VM
  Virsh-->>Test: VM started
  Test->>Serial: open serial console
  Serial-->>Test: console ready
  alt scenario in {reboot, reset, shutdown}
    Note over Test,Virsh: lifecycle loop
    Test->>Virsh: shutdown (if shutdown step)
    Virsh-->>Test: ack
    Test->>VM: wait state=dead (timeout)
    VM-->>Test: dead
    alt start_after_shutdown = yes
      Test->>Virsh: start VM
      Virsh-->>Test: ack
      Test->>Serial: recreate / reset console
      Serial-->>Test: ready
      Test->>VM: wait login
      VM-->>Test: login ok
    end
    Test->>Network: check_network_access (ping or SR-IOV check)
    Network-->>Test: ok / fail
  else scenario == save/suspend
    Note over Test: test_save_suspend -> save/restore or suspend/resume + checks
    Test->>VM: save/suspend
    VM-->>Test: state saved/suspended
    Test->>VM: restore/resume
    VM-->>Test: running
    Test->>Network: check_network_access
    Network-->>Test: ok / fail
  end
  Test-->>Tester: finalize, cleanup
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I nibble configs, tweak a loop or two,
A shutdown hop, then back I brew—
Consoles cheer, the network pings,
Save, reboot, the lifecycle sings.
A rabbit’s patch: tests green and true. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly identifies the primary enhancement introduced by this pull request, namely adding a VM shutdown test to the existing iommu_device_lifecycle tests. It concisely conveys the main change without extraneous details or file lists, allowing reviewers to immediately understand the core purpose. Including the ticket reference at the start does not detract from its clarity and aids in traceability. The phrasing is specific enough that a teammate scanning the history will recognize the new shutdown test focus.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

  • Public repositories are always opted into early access features.
  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
libvirt/tests/cfg/sriov/vIOMMU/iommu_device_lifecycle.cfg (1)

44-46: Use a higher default timeout and quote yes for consistency.

  • 60s was not enough in CI; bump to 120s to reduce flakes.
  • Quote yes to stay consistent with surrounding params and avoid parser quirks.
-        - shutdown:
-            start_after_shutdown = yes
-            shutdown_timeout = 60
+        - shutdown:
+            start_after_shutdown = "yes"
+            shutdown_timeout = 120
libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py (3)

94-94: Adopt a longer default shutdown timeout (120s).

Matches the cfg suggestion and reduces flakes on slow guests.

-            shutdown_timeout = int(params.get("shutdown_timeout", "60"))
+            shutdown_timeout = int(params.get("shutdown_timeout", "120"))

101-109: Validate guest health after optional restart.

Reuse existing connectivity checks to ensure the VM is really back after shutdown.

                 session = vm.wait_for_serial_login(
                     timeout=int(params.get('login_timeout', 240)))
-                test.log.info("VM successfully started after shutdown")
+                if need_sriov:
+                    sriov_check_points.check_vm_network_accessed(session, ping_dest=ping_dest)
+                else:
+                    s, o = utils_net.ping(ping_dest, count=5, timeout=10, session=session)
+                    if s:
+                        test.fail("Failed to ping %s after restart! status: %s, output: %s." % (ping_dest, s, o))
+                test.log.info("VM successfully started after shutdown")
                 session.close()

91-98: Optionally gate known failures to avoid noisy CI until the fix lands.

If shutdown is expected to fail before RHEL-109504 is in the distro, consider a temporary guard (e.g., cancel/xfail based on a param or version check) to keep CI signal clean.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 983f459 and 0dc31b9.

📒 Files selected for processing (2)
  • libvirt/tests/cfg/sriov/vIOMMU/iommu_device_lifecycle.cfg (1 hunks)
  • libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py (1 hunks)

virsh.shutdown(vm.name, **VIRSH_ARGS)
shutdown_timeout = int(params.get("shutdown_timeout", "60"))
if not utils_misc.wait_for(lambda: vm.is_dead(), shutdown_timeout):
test.fail("VM failed to shutdown")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test.fail("VM failed to shutdown within %s seconds" % shutdown_timeout)

loop_time = 5
- reset:
- shutdown:
start_after_shutdown = yes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this parameter? I do not see other cases use this parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because it will prolong the test run ... so it will be better to have the possibility to start VM again.
... and other tests reboot and reset are running in loop ... so it is "hidden"


# Optional: Start VM again based on parameter
start_after_shutdown = "yes" == params.get("start_after_shutdown", "no")
if start_after_shutdown:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for shutdown case, so we do not need to add this parameter checking.
Instead, we can directly start vm after shutdown, right?

if s:
test.fail("Failed to ping %s! status: %s, output: %s." % (ping_dest, s, o))

elif test_scenario == "shutdown":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To refine the code structure, I suggest we create a new function, like test_shutdown(), to include below codes.
And below codes could be moved under line 60 below.
if test_scenario in ["reboot_many_times", "reset", "shutdown"]:
The benifit is your new case can also leverage the network checking codes on line 84 if need_sriov:

It is better to also move other scenarios into new functions. Then the structure will be clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! ... I'll work on it

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py (4)

19-28: Remove unused vm parameter.

The vm parameter is not used in the function body. The session is passed directly, making vm unnecessary.

Apply this diff:

-def check_network_access(test, vm, session, need_sriov, ping_dest):
+def check_network_access(test, session, need_sriov, ping_dest):
     """
     Check network access in VM
     """

Then update the call sites at lines 75 and 138 to remove the vm argument.


31-46: Remove unused params parameter.

The params parameter is not used anywhere in the function body.

Apply this diff:

-def test_save_suspend(test, vm, params, save_path, test_scenario):
+def test_save_suspend(test, vm, save_path, test_scenario):
     """
     Test save/restore or suspend/resume scenario
     """

Then update the call site at line 140 to remove the params argument.


98-108: Remove redundant iface_dicts assignment.

Line 100 assigns iface_dicts but it's only used for logging at line 101. Line 102 then calls parse_iface_dict() again to update params. This is redundant.

Apply this diff to simplify:

     # Configure network interfaces
     if need_sriov:
         sriov_test_obj = sriov_base.SRIOVTest(vm, test, test_obj.params)
-        iface_dicts = sriov_test_obj.parse_iface_dict()
-        test.log.debug(iface_dicts)
-        test_obj.params["iface_dict"] = str(sriov_test_obj.parse_iface_dict())
+        iface_dict_parsed = sriov_test_obj.parse_iface_dict()
+        test.log.debug(iface_dict_parsed)
+        test_obj.params["iface_dict"] = str(iface_dict_parsed)
 
     iface_dict = test_obj.parse_iface_dict()

85-85: Consider using ast.literal_eval for safer dict parsing.

The eval() calls at lines 85 and 91 are flagged by static analysis. While this is test code where params come from config files rather than untrusted user input, ast.literal_eval would be safer and is considered best practice.

Note: This pattern appears throughout the codebase (see relevant snippets from viommu_base.py and sriov_base.py), so changing it here would create inconsistency unless addressed project-wide.

Also applies to: 91-91

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0dc31b9 and 6ccf141.

📒 Files selected for processing (1)
  • libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-24T08:01:27.899Z
Learnt from: hholoubk
PR: autotest/tp-libvirt#6579
File: libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py:95-97
Timestamp: 2025-09-24T08:01:27.899Z
Learning: In the libvirt test framework used in tp-libvirt, VM cleanup including destroying running VMs is handled by the teardown method (test_obj.teardown_iommu_test()) called in the finally block, so explicit VM destroy calls on timeout are not necessary according to the maintainers.

Applied to files:

  • libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py
🧬 Code graph analysis (1)
libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py (3)
provider/save/save_base.py (2)
  • pre_save_setup (11-36)
  • post_save_check (39-70)
provider/viommu/viommu_base.py (4)
  • setup_iommu_test (219-234)
  • prepare_controller (134-190)
  • update_disk_addr (192-217)
  • parse_iface_dict (112-132)
provider/sriov/sriov_base.py (3)
  • setup_iommu_test (396-420)
  • SRIOVTest (103-436)
  • parse_iface_dict (173-195)
🪛 Ruff (0.13.1)
libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py

19-19: Unused function argument: vm

(ARG001)


31-31: Unused function argument: params

(ARG001)


85-85: Use of possibly insecure function; consider using ast.literal_eval

(S307)


91-91: Use of possibly insecure function; consider using ast.literal_eval

(S307)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Python 3.8
  • GitHub Check: Python 3.11
  • GitHub Check: Python 3.9
  • GitHub Check: Python 3.12
🔇 Additional comments (3)
libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py (3)

14-14: LGTM!

The import of lifecycle_base is properly used in the test_reboot_reset_shutdown function for shutdown and reset operations.


49-76: LGTM!

The function properly handles shutdown, reset, and reboot scenarios with appropriate serial console setup and network verification. The conditional restart after shutdown is well-structured.


127-142: LGTM!

The refactoring improves code structure by centralizing VM XML setup and delegating to scenario-specific test functions. The unknown scenario handling provides good defensive behavior.

This change adds new test to verify virsh shutdown command is working as expected for VM with vIOMMU devices.
Also added the optional possibility to start the VM again after the shutdown and parametrize the shutdown_timeout.

Improving test coverage for RHEL-109504.
@hholoubk hholoubk force-pushed the LIBVIRTAT-22036-add-VM-shutdown branch from 6ccf141 to ed6d92f Compare September 30, 2025 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants