Skip to content

Conversation

chris-oo
Copy link
Member

@chris-oo chris-oo commented Sep 22, 2025

Support save and restore of topology information from a previous instance of OpenHCL to the next openhcl_boot. This allows supporting correctly persisting any private pool allocated from the address space to the next instance, without relying on the allocation logic being exactly the same.

This is done via reserving the first page in the VTL2 region as a persisted header, followed by a protobuf encoded payload containing the previous instance's topology. In the future, we could choose to put more saved state into this region, such as the whole usermode protobuf payload that today goes to the host, via the GET.

@chris-oo chris-oo force-pushed the bootshim-persisted-state branch from dbb0607 to d7f1f2b Compare September 23, 2025 18:20
Comment on lines 258 to 338
pub struct PersistedStateHeader {
/// A magic value. If this is not set to [`PersistedStateHeader::MAGIC`],
/// then the previous instance did not support this region.
pub magic: u64,
/// The gpa for the start of the protobuf region. This must be 4K aligned.
pub protobuf_base: u64,
/// The size of the protobuf region in bytes.
pub protobuf_region_len: u64,
/// The size of the protobuf payload in bytes.
/// This must be less than or equal to `protobuf_region_len`.
pub protobuf_payload_len: u64,
}

impl PersistedStateHeader {
/// "OHCLPHDR" in ASCII.
pub const MAGIC: u64 = 0x4F48434C50484452;
}

/// A local newtype wrapper that represents a [`igvm_defs::MemoryMapEntryType`].
///
/// This is required to make it protobuf deriveable.
#[derive(mesh_protobuf::Protobuf, Clone, Debug, PartialEq)]
#[mesh(package = "openhcl.openhcl_boot")]
pub struct IgvmMemoryType(#[mesh(1)] u16);

impl From<igvm_defs::MemoryMapEntryType> for IgvmMemoryType {
fn from(igvm_type: igvm_defs::MemoryMapEntryType) -> Self {
Self(igvm_type.0)
}
}

impl From<IgvmMemoryType> for igvm_defs::MemoryMapEntryType {
fn from(igvm_type: IgvmMemoryType) -> Self {
igvm_defs::MemoryMapEntryType(igvm_type.0)
}
}

#[derive(mesh_protobuf::Protobuf, Debug)]
#[mesh(package = "openhcl.openhcl_boot")]
pub struct MemoryEntry {
#[mesh(1)]
pub range: MemoryRange,
#[mesh(2)]
pub vnode: u32,
#[mesh(3)]
pub vtl_type: MemoryVtlType,
#[mesh(4)]
pub igvm_type: IgvmMemoryType,
}

#[derive(mesh_protobuf::Protobuf, Debug)]
#[mesh(package = "openhcl.openhcl_boot")]
pub struct MmioEntry {
#[mesh(1)]
pub range: MemoryRange,
#[mesh(2)]
pub vtl_type: MemoryVtlType,
}

/// The format for saved state between the previous instance of OpenHCL and the
/// next.
#[derive(mesh_protobuf::Protobuf, Debug)]
#[mesh(package = "openhcl.openhcl_boot")]
pub struct SavedState {
#[mesh(1)]
pub partition_memory: Vec<MemoryEntry>,
#[mesh(2)]
pub partition_mmio: Vec<MmioEntry>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattkur todo

@chris-oo chris-oo force-pushed the bootshim-persisted-state branch 3 times, most recently from 71229e5 to 69894a7 Compare September 29, 2025 22:44
@chris-oo chris-oo changed the title [WIP] persist and save/restore vtl2 topology from usermode to bootshim openhcl: support save restore of vtl2 topology from usermode to bootshim Oct 1, 2025
@chris-oo chris-oo changed the title openhcl: support save restore of vtl2 topology from usermode to bootshim openhcl: support save restore of partition topology from usermode to next openhcl_boot Oct 1, 2025
@chris-oo chris-oo marked this pull request as ready for review October 1, 2025 22:40
@chris-oo chris-oo requested review from a team as code owners October 1, 2025 22:40
@Copilot Copilot AI review requested due to automatic review settings October 1, 2025 22:40
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for saving and restoring partition topology information from a previous OpenHCL instance to the next openhcl_boot. The implementation reserves the first page in the VTL2 region as a persisted header, followed by a protobuf encoded payload containing the previous instance's topology information.

Key changes include:

  • Introduction of persisted state region management in the bootloader
  • New protobuf-based serialization for topology data
  • Enhanced memory layout to support persisted state across boots
  • Integration of saved state reading during boot process

Reviewed Changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
vm/loader/src/paravisor.rs Reserves 2MB persisted state region in memory layout for both x86 and ARM64
vm/loader/loader_defs/src/shim.rs Defines persisted state structures, protobuf messages, and memory type enums
vm/loader/loader_defs/Cargo.toml Adds mesh_protobuf and memory_range dependencies
openhcl/underhill_core/src/loader/vtl2_config/mod.rs Implements persisted state writing functionality
openhcl/underhill_core/Cargo.toml Adds mesh_protobuf dependency
openhcl/openhcl_boot/src/memory.rs Updates address space manager to handle persisted state regions
openhcl/openhcl_boot/src/main.rs Updates E820 entry generation and test cases for persisted state
openhcl/openhcl_boot/src/host_params/shim_params.rs Adds persisted state region to ShimParams structure
openhcl/openhcl_boot/src/host_params/dt/mod.rs Major refactoring to support topology from persisted state vs host DT
openhcl/openhcl_boot/Cargo.toml Adds mesh_protobuf dependency
openhcl/bootloader_fdt_parser/src/lib.rs Updates FDT parser to handle persisted state regions
Comments suppressed due to low confidence (1)

openhcl/openhcl_boot/src/host_params/dt/mod.rs:1

  • Using format! in error contexts can be inefficient. Consider using a closure with with_context to avoid string formatting unless the error actually occurs: .with_context(|| format!(\"{} missing numa-node-id\", node.name))?
// Copyright (c) Microsoft Corporation.

@chris-oo
Copy link
Member Author

chris-oo commented Oct 1, 2025

attempt fixing #1345, since i think this PR should fix it?

@chris-oo chris-oo force-pushed the bootshim-persisted-state branch from bef3266 to 79cc202 Compare October 1, 2025 22:48
@chris-oo chris-oo force-pushed the bootshim-persisted-state branch from 79cc202 to f200047 Compare October 1, 2025 23:04
Copy link

github-actions bot commented Oct 2, 2025

@chris-oo chris-oo force-pushed the bootshim-persisted-state branch from 1fc27e2 to 0677aff Compare October 2, 2025 22:07
}

/// A description of a protobuf `oneof`.
#[cfg_attr(not(feature = "std"), allow(dead_code))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: expects

);

let ranges = [parsed.vtl2_persisted_header];
let mapping =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inline the two 'ranges' vars and rename the two 'mapping' vars to different names to make it clearer what gets written where.

Vtl2ParamsMap::new_writeable(&ranges).context("failed to map persisted protobuf region")?;

// Create the serialized data to write.
let state = SavedState {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this maybe be a save method on the partition_memory_map instead of being done here?

// as all information we're persisting is currently known. In the future, if
// we plan on putting more usermode specific data such as the full openvmm
// saved state, we should probably move this to a servicing specific call.
write_persisted_info(&parsed_openhcl_boot)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we do this all the time, right? The memory we're writing to will be private on a cvm, so this isn't a security concern correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct we do this all the time. And yes, this will be private on a CVM - I think this realistically is the only way we will get servicing to work in a CVM ie have openhcl launch the next openhcl, and supply parameters/information somehow.

)
};

ALLOCATOR.enable_alloc();
Copy link
Contributor

@smalis-msft smalis-msft Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda wonder if it's worth switching this to be like an ALLOCATOR.with_alloc(closure) instead? That would prevent us from forgetting to disable it. And given our scoping model, that feels safer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds pretty reasonable (ie treat it like how std treats TLS?). I'll take a stab at it on monday.

@chris-oo chris-oo force-pushed the bootshim-persisted-state branch from a4fe858 to 403ef68 Compare October 6, 2025 20:35
Copy link

github-actions bot commented Oct 6, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants