Skip to content

vmbus_server: always notify that the connection is modified when disconnecting #1809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 5, 2025

Conversation

SvenGroot
Copy link
Member

This change fixes an issue where, if all channels are already reset when a disconnect happens, the server would not invoke Notifier::modify_connection. This means that the state such as the interrupt page and monitor pages is not reset, and in the case of OpenHCL the relay is not notified of the disconnect (which can leave host state intact, including monitor pages if MNF is handled by the host).

This caused an issue where Linux would occasionally crash during resume from hibernate. When resuming, Linux makes two connections, first to read the memory image, and then to resume normal operations, both using MNF. When the first connection unloads, the overlay pages for the monitor pages were not removed until the reconnect, leading to memory corruption when Linux proceeds to use these pages as normal memory.

This change also adds some tests ensuring the notifier is invoked for an unload with open channels, without open channels, and a forced disconnect when a new InitiateContact message is received.

@SvenGroot SvenGroot requested review from a team as code owners August 5, 2025 19:07
@SvenGroot SvenGroot changed the title vmbus_server: always notify the connection is modified when disconnecting vmbus_server: always notify that the connection is modified when disconnecting Aug 5, 2025
jstarks
jstarks previously approved these changes Aug 5, 2025
@benhillis benhillis requested a review from Copilot August 5, 2025 19:20
@benhillis benhillis added the backport_2505 Change should be backported to the release/2505 branch label Aug 5, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug where the VMBus server would not properly notify connection modifications during disconnection when all channels were already reset. The issue caused memory corruption in Linux during hibernate resume operations due to monitor pages not being properly cleaned up.

Key Changes

  • Always invoke notify_disconnect() when disconnecting, even if all channels are already reset
  • Extracted notify_disconnect() method to consolidate disconnection notification logic
  • Added comprehensive tests for disconnect scenarios including channels with/without open channels and forced disconnects

@benhillis benhillis added the backport_2411 Change should be backported to the release/2411 branch label Aug 5, 2025
@SvenGroot SvenGroot enabled auto-merge (squash) August 5, 2025 19:34
@SvenGroot SvenGroot merged commit 8b953ec into microsoft:main Aug 5, 2025
29 checks passed
SvenGroot added a commit to SvenGroot/openvmm that referenced this pull request Aug 5, 2025
…onnecting (microsoft#1809)

This change fixes an issue where, if all channels are already reset when
a disconnect happens, the server would not invoke
`Notifier::modify_connection`. This means that the state such as the
interrupt page and monitor pages is not reset, and in the case of
OpenHCL the relay is not notified of the disconnect (which can leave
host state intact, including monitor pages if MNF is handled by the
host).

This caused an issue where Linux would occasionally crash during resume
from hibernate. When resuming, Linux makes two connections, first to
read the memory image, and then to resume normal operations, both using
MNF. When the first connection unloads, the overlay pages for the
monitor pages were not removed until the reconnect, leading to memory
corruption when Linux proceeds to use these pages as normal memory.

This change also adds some tests ensuring the notifier is invoked for an
unload with open channels, without open channels, and a forced
disconnect when a new InitiateContact message is received.

---------

Co-authored-by: Copilot <[email protected]>
benhillis pushed a commit that referenced this pull request Aug 5, 2025
…onnecting (#1812)

This change fixes an issue where, if all channels are already reset when
a disconnect happens, the server would not invoke
`Notifier::modify_connection`. This means that the state such as the
interrupt page and monitor pages is not reset, and in the case of
OpenHCL the relay is not notified of the disconnect (which can leave
host state intact, including monitor pages if MNF is handled by the
host).

This caused an issue where Linux would occasionally crash during resume
from hibernate. When resuming, Linux makes two connections, first to
read the memory image, and then to resume normal operations, both using
MNF. When the first connection unloads, the overlay pages for the
monitor pages were not removed until the reconnect, leading to memory
corruption when Linux proceeds to use these pages as normal memory.

This change also adds some tests ensuring the notifier is invoked for an
unload with open channels, without open channels, and a forced
disconnect when a new InitiateContact message is received.

Cherry-picked from #1809

---------

Co-authored-by: Copilot <[email protected]>
@benhillis
Copy link
Member

Backported to release/2411 in #1812

@benhillis benhillis added backported_2411 PR that has been backported to release/2411 and removed backport_2411 Change should be backported to the release/2411 branch labels Aug 5, 2025
SvenGroot added a commit to SvenGroot/openvmm that referenced this pull request Aug 6, 2025
…onnecting (microsoft#1809)

This change fixes an issue where, if all channels are already reset when
a disconnect happens, the server would not invoke
`Notifier::modify_connection`. This means that the state such as the
interrupt page and monitor pages is not reset, and in the case of
OpenHCL the relay is not notified of the disconnect (which can leave
host state intact, including monitor pages if MNF is handled by the
host).

This caused an issue where Linux would occasionally crash during resume
from hibernate. When resuming, Linux makes two connections, first to
read the memory image, and then to resume normal operations, both using
MNF. When the first connection unloads, the overlay pages for the
monitor pages were not removed until the reconnect, leading to memory
corruption when Linux proceeds to use these pages as normal memory.

This change also adds some tests ensuring the notifier is invoked for an
unload with open channels, without open channels, and a forced
disconnect when a new InitiateContact message is received.

---------

Co-authored-by: Copilot <[email protected]>
benhillis pushed a commit that referenced this pull request Aug 6, 2025
…onnecting (#1817)

This change fixes an issue where, if all channels are already reset when
a disconnect happens, the server would not invoke
`Notifier::modify_connection`. This means that the state such as the
interrupt page and monitor pages is not reset, and in the case of
OpenHCL the relay is not notified of the disconnect (which can leave
host state intact, including monitor pages if MNF is handled by the
host).

This caused an issue where Linux would occasionally crash during resume
from hibernate. When resuming, Linux makes two connections, first to
read the memory image, and then to resume normal operations, both using
MNF. When the first connection unloads, the overlay pages for the
monitor pages were not removed until the reconnect, leading to memory
corruption when Linux proceeds to use these pages as normal memory.

This change also adds some tests ensuring the notifier is invoked for an
unload with open channels, without open channels, and a forced
disconnect when a new InitiateContact message is received.

Cherry-picked from #1809

Co-authored-by: Copilot <[email protected]>
@benhillis
Copy link
Member

Backported to release/2505 in #1817

@benhillis benhillis added backported_2505 PR that has been backported to release/2505 and removed backport_2505 Change should be backported to the release/2505 branch labels Aug 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported_2411 PR that has been backported to release/2411 backported_2505 PR that has been backported to release/2505
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants