NMI for CVM in OpenHCL #2049

jennagoddard · 2025-09-30T17:01:09Z

The PR adds support for injecting a LINT1 debug interrupt into the guest VTL0 for both TDX and SNP CVM. This enables investigations into unresponsive guests.

Copilot

Pull Request Overview

This PR implements NMI support for CVM (Confidential Virtual Machine) in OpenHCL by adding LINT1 interrupt handling capability. The implementation allows for debug interrupt injection through the Guest Emulation Transport (GET) protocol.

Key changes include:

Added LINT1 interrupt support to the Local APIC implementation
Extended GET protocol with debug interrupt notification capability
Implemented NMI masking and suppression logic for hardware-backed CVMs

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
vmm_core/virt_support_apic/src/lib.rs	Added LINT1 interrupt support including statistics, work flags, and request handling
vm/devices/get/guest_emulation_transport/src/process_loop.rs	Added debug interrupt notification handling and callback storage
vm/devices/get/guest_emulation_transport/src/client.rs	Added client method to set debug interrupt callback
vm/devices/get/get_protocol/src/lib.rs	Extended protocol with InjectDebugInterruptNotification structure
openhcl/virt_mshv_vtl/src/processor/tdx/mod.rs	Added NMI masking support for TDX processors
openhcl/virt_mshv_vtl/src/processor/snp/mod.rs	Added NMI suppression logic for SNP processors
openhcl/virt_mshv_vtl/src/processor/mod.rs	Added LAPIC state fields for NMI and LINT1 handling
openhcl/virt_mshv_vtl/src/processor/hardware_cvm/mod.rs	Added cross-VTL NMI tracking
openhcl/virt_mshv_vtl/src/processor/hardware_cvm/apic.rs	Implemented LINT1 and NMI handling with masking support
openhcl/virt_mshv_vtl/src/lib.rs	Added assert_debug_interrupt method to trigger LINT1
openhcl/underhill_core/src/worker.rs	Connected GET callback to partition's debug interrupt method

Comments suppressed due to low confidence (1)

vm/devices/get/guest_emulation_transport/src/process_loop.rs:1

Corrected duplicate word 'the the' to 'the'.

// Copyright (c) Microsoft Corporation.

Copilot · 2025-10-07T03:28:45Z

openhcl/underhill_core/src/worker.rs

        );
    }

+    // Set the the callback in GET to trigger the debug interrupt.


Corrected duplicate word 'the the' to 'the'.

Suggested change

// Set the the callback in GET to trigger the debug interrupt.

// Set the callback in GET to trigger the debug interrupt.

``` #Resolved

github-actions · 2025-10-07T04:18:49Z

At least one Petri test failed.
#Closed

openhcl/virt_mshv_vtl/src/lib.rs

vmm_core/virt_support_apic/src/lib.rs

vm/devices/get/guest_emulation_transport/src/process_loop.rs

smalis-msft · 2025-10-07T15:46:24Z

vm/devices/get/guest_emulation_transport/src/process_loop.rs

+        }
+
+        // Trigger the LINT1 interrupt vector on the LAPIC of the BSP.
+        self.set_debug_interrupt


If no callback has been set we should probably trace a warning or something instead of silently doing nothing?

no need- if its that early lint1 is masked and there isn't an expectation LINT1 could be delivered at that time. For early boot diagnostics we'll need to use true NMI

Tracing is very low cost. Even if a case is unexpected, that's exactly where a trace could help the most in figuring out something weird down the line. If we ask the VM to do something, and then absolutely nothing happens, that's confusing. If we ask it to do something and it prints a warning saying "i can't do that thing" that's helpful.

smalis-msft · 2025-10-07T15:47:51Z

openhcl/virt_mshv_vtl/src/processor/mod.rs


+// NMI suppression state to prevent duplicate NMI
+#[cfg(guest_arch = "x86_64")]
+const NMI_SUPPRESS_LINT1_DELIVERED: u32 = 1;


These and nmi_suppression should be combined into a bitfield type

smalis-msft · 2025-10-07T15:48:50Z

openhcl/virt_mshv_vtl/src/processor/snp/mod.rs

+            // If a LINT1 NMI has been requested, then it is being delivered now,
+            // so no further NMIs can be delivered.
+            self.backing.cvm.lapics[vtl].nmi_suppression &= !NMI_SUPPRESS_LINT1_REQUESTED;
+            self.backing.cvm.lapics[vtl].nmi_suppression |= NMI_SUPPRESS_LINT1_DELIVERED;


Maybe I missed it, but where do we clear DELIVERED? #Resolved

DELIVERED isn't cleared. Once LINT1 debug interrupt has been delivered you can't deliver another LINT1 or NMI. The HCL doesn't know when the NMI is complete so its not safe to signal twice.

Should we trace a warning or something then if we get one of these and delivered is already set?

the guest is already crashing and hopefully writing a crash dump so no need to add another write at that time

Same comment about tracing being low cost

smalis-msft · 2025-10-09T17:01:55Z

PR needs a description too. What we're adding and why. #Resolved

jstarks · 2025-10-09T17:28:35Z

vmm_core/virt_support_apic/src/lib.rs

+
+    /// Returns true if the VP should be woken up to scan the APIC.
+    #[must_use]
+    fn request_lint1(


This doesn't make sense.

LINT1 is just a line on the processor that triggers a particular kind of interrupt, based on the APIC's configuration. This is typically but not always configured by the firmware/OS to be NMI. It doesn't make sense for the CPU to get a "LINT1 request" from the APIC.

I needed a way to differentiate between true NMI and LINT1 delivered as NMI. suggestions?

Why do you need to distinguish between those cases?

(I absolutely do not want to add any code to virt_support_apic for non-architectural behavior, other than the Hyper-V auto EOI extensions that we have to support. If we must distinguish between these paths then it must be outside of this crate.)

jstarks · 2025-10-09T17:33:30Z

openhcl/virt_mshv_vtl/src/processor/hardware_cvm/apic.rs

    if lapic.activity != MpState::WaitForSipi {
-        if nmi || lapic.nmi_pending {
+        if lint1 {
+            if supports_nmi_masking || !lapic.cross_vtl_nmi_requested {


Can you elaborate on this cross_vtl_nmi_requested thing?

VTL0-generated NMIs may occur during normal execution e.g. due to KD interaction, and so we don't want these to block crash dumps. But if VTL1 injected an NMI that only occurs due to bug check so don't allow other NMI sources

jennagoddard · 2025-10-09T17:50:08Z

confirmed known issue

In reply to: 3375135354

smalis-msft · 2025-10-09T18:06:14Z

openhcl/virt_mshv_vtl/src/lib.rs

+    #[cfg(guest_arch = "x86_64")]
+    pub fn assert_debug_interrupt(&self, _vtl: u8) {
+        let bsp_index = VpIndex::new(0);
+        self.pulse_lint(bsp_index, Vtl::Vtl0, 1)


just use the vtl parameter? is there any technical reason we need to prevent injecting into vtl1, or is it just we don't expect it to happen today? If we just don't expect it to happen, but we can trivially support it, then why don't we? #Resolved

The VTL parameter is in the protocol so we can add VTL 2 crash dumps if or when we support the necessary encryption/security measures to do so. From discussion with Andrea, the recommended way to collect a crash dump of VTL1 is to inject NMI into VTL0

Is that the case for anything that may ever run in VTL 1, or just for today's implementation of Windows SK though? Some other OS (like say Linux) could add their own code that runs in VTL 1 with a different contract at some point right?

yes though we don't know what the contract would be there.

Right, but my point is if supporting injection into VTL 1 is trivial for us to do now, why not just link it up?

smalis-msft · 2025-10-09T18:44:01Z

openhcl/virt_mshv_vtl/src/lib.rs

    /// Trigger the LINT1 interrupt vector on the LAPIC of the BSP.
    #[cfg(guest_arch = "x86_64")]
-    pub fn assert_debug_interrupt(&self, _vtl: u8) {
+    pub fn assert_debug_interrupt(&self, vtl: u8) {


The vtl param should probably be a GuestVtl, and the conversion should happen at a higher level. I guess in the callback.

Jenna Goddard added 3 commits September 23, 2025 03:15

draft

ab9add3

use callback instead of queue

0a496bb

cross VTL suppression of LINT1 NMIs

4f9a7b1

jennagoddard changed the title ~~Draft: NMI for CVM in OpenHCL~~ NMI for CVM in OpenHCL Oct 7, 2025

jennagoddard marked this pull request as ready for review October 7, 2025 03:28

jennagoddard requested a review from a team as a code owner October 7, 2025 03:28

jennagoddard requested review from Copilot and mebersol October 7, 2025 03:28

Copilot AI reviewed Oct 7, 2025

View reviewed changes

jennagoddard requested a review from msft-jlange October 7, 2025 03:29

smalis-msft reviewed Oct 7, 2025

View reviewed changes

openhcl/virt_mshv_vtl/src/lib.rs Outdated Show resolved Hide resolved

smalis-msft reviewed Oct 7, 2025

View reviewed changes

vmm_core/virt_support_apic/src/lib.rs Show resolved Hide resolved

smalis-msft reviewed Oct 7, 2025

View reviewed changes

vm/devices/get/guest_emulation_transport/src/process_loop.rs Outdated Show resolved Hide resolved

smalis-msft reviewed Oct 7, 2025

View reviewed changes

jstarks reviewed Oct 9, 2025

View reviewed changes

address feedback

9bdc3d5

smalis-msft reviewed Oct 9, 2025

View reviewed changes

fix vtl param

240afba

smalis-msft reviewed Oct 9, 2025

View reviewed changes

	// Set the the callback in GET to trigger the debug interrupt.
	// Set the callback in GET to trigger the debug interrupt.
	``` #Resolved

NMI for CVM in OpenHCL #2049

Are you sure you want to change the base?

NMI for CVM in OpenHCL #2049

Uh oh!

Conversation

jennagoddard commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 7, 2025 • edited by jennagoddard Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 7, 2025 • edited by jennagoddard Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jennagoddard Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smalis-msft Oct 7, 2025 • edited by jennagoddard Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smalis-msft commented Oct 9, 2025 • edited by jennagoddard Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jennagoddard commented Oct 9, 2025

Uh oh!

smalis-msft Oct 9, 2025 • edited by jennagoddard Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

jennagoddard commented Sep 30, 2025 •

edited

Loading

Copilot AI Oct 7, 2025 •

edited by jennagoddard

Loading

github-actions bot commented Oct 7, 2025 •

edited by jennagoddard

Loading

jennagoddard Oct 9, 2025 •

edited

Loading

smalis-msft Oct 7, 2025 •

edited by jennagoddard

Loading

smalis-msft commented Oct 9, 2025 •

edited by jennagoddard

Loading

smalis-msft Oct 9, 2025 •

edited by jennagoddard

Loading