Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,24 @@ and this project adheres to

### Added

- [#5510](https://github.com/firecracker-microvm/firecracker/pull/5510),
[#5593](https://github.com/firecracker-microvm/firecracker/pull/5593),
[#5564](https://github.com/firecracker-microvm/firecracker/pull/5564): Add
support for the
[VMClock device](https://uapi-group.org/specifications/specs/vmclock). The
implementation supports the snapshot safety features proposed
[here](https://lore.kernel.org/lkml/[email protected]/),
but doesn't provide currently any clock-specific information for helping the
guest synchronize its clocks. More information can be found in
[docs](docs/snapshotting/snapshot-support.md#userspace-notifications-of-loading-virtual-machine-snapshots).

### Changed

- [#5564](https://github.com/firecracker-microvm/firecracker/pull/5564): which
added support for VMClock, uses one extra GSI for the VMClock device itself
which reduces the available GSIs for VirtIO devices. New maximum values is 92
devices on Aarch64 and 17 devices on x86.

### Deprecated

### Removed
Expand Down
36 changes: 36 additions & 0 deletions docs/snapshotting/snapshot-support.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
- [Snapshot security and uniqueness](#snapshot-security-and-uniqueness)
- [Secure and insecure usage examples](#usage-examples)
- [Reusing snapshotted states securely](#reusing-snapshotted-states-securely)
- [Userspace notifications of loading Virtual Machine snapshots](#userspace-notifications-of-loading-virtual-machine-snapshots)
- [Vsock device limitation](#vsock-device-limitation)
- [VMGenID device limitation](#vmgenid-device-limitation)
- [Where can I resume my snapshots?](#where-can-i-resume-my-snapshots)
Expand Down Expand Up @@ -590,6 +591,41 @@ identifiers, cached random numbers, cryptographic tokens, etc **will** still be
replicated across multiple microVMs resumed from the same snapshot. Users need
to implement mechanisms for ensuring de-duplication of such state, where needed.

## Userspace notifications of loading Virtual Machine snapshots

VMClock device
([specification](https://uapi-group.org/specifications/specs/vmclock/)) is a
device that enables efficient application clock synchronization against real
wallclock time, for applications running inside Virtual Machines. VMCLock also
takes care situations where there is some sort disruption happens to the clock.
It handles these through fields in the
[`vmlcock_abi`](https://uapi-group.org/specifications/specs/vmclock/#the-vmclock_abi-structure).
Currently, it handles two cases:

1. Live migration through the `disruption_marker` field.
1. Virtual machine snapshots through the `vm_generation_counter`.

Whenever a VM starts from a snapshot VMClock will present a new (different that
what was previously stored) value in the `vm_generation_counter`. This happens
in an atomic way, i.e. `vm_generation_counter` will include the new value as
soon as vCPUs are resumed post snapshot loading.

User space libraries, e.g. userspace PRNGs can mmap() `vmclock_abi` and monitor
changes in `vm_generation_counter` to observe when they need to adapt and/or
recreate state.

Moreover, VMClock allows processes to call poll() on the VMClock device and get
notified about changes through an event loop.

> [!IMPORTANT] Support for `vm_generation_counter` and `poll()` is implemented
> in Linux through the patches
> [here](https://lore.kernel.org/lkml/[email protected]/).
> We have backported these patches for AL kernels
> [here](../../resources/patches/vmclock) 5.10 and 6.1 kernels. Using the
> kernels suggested from the [Getting Started Guide](../getting-started.md)
> includes these patches. When using mainline kernels users need to make sure
> that they apply the linked patches, until these get merged upstream.

## Vsock device reset

The vsock device is reset across snapshot/restore to avoid inconsistent state
Expand Down
14 changes: 14 additions & 0 deletions src/vmm/src/arch/aarch64/fdt.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ use crate::arch::{
use crate::device_manager::DeviceManager;
use crate::device_manager::mmio::MMIODeviceInfo;
use crate::device_manager::pci_mngr::PciDevices;
use crate::devices::acpi::vmclock::{VMCLOCK_SIZE, VmClock};
use crate::devices::acpi::vmgenid::{VMGENID_MEM_SIZE, VmGenId};
use crate::initrd::InitrdConfig;
use crate::vstate::memory::{Address, GuestMemory, GuestMemoryMmap, GuestRegionType};
Expand Down Expand Up @@ -97,6 +98,7 @@ pub fn create_fdt(
create_psci_node(&mut fdt_writer)?;
create_devices_node(&mut fdt_writer, device_manager)?;
create_vmgenid_node(&mut fdt_writer, &device_manager.acpi_devices.vmgenid)?;
create_vmclock_node(&mut fdt_writer, &device_manager.acpi_devices.vmclock)?;
create_pci_nodes(&mut fdt_writer, &device_manager.pci_devices)?;

// End Header node.
Expand Down Expand Up @@ -287,6 +289,18 @@ fn create_vmgenid_node(fdt: &mut FdtWriter, vmgenid: &VmGenId) -> Result<(), Fdt
Ok(())
}

fn create_vmclock_node(fdt: &mut FdtWriter, vmclock: &VmClock) -> Result<(), FdtError> {
let vmclock_node = fdt.begin_node(&format!("ptp@{}", vmclock.guest_address.0))?;
fdt.property_string("compatible", "amazon,vmclock")?;
fdt.property_array_u64("reg", &[vmclock.guest_address.0, VMCLOCK_SIZE as u64])?;
fdt.property_array_u32(
"interrupts",
&[GIC_FDT_IRQ_TYPE_SPI, vmclock.gsi, IRQ_TYPE_EDGE_RISING],
)?;
fdt.end_node(vmclock_node)?;
Ok(())
}

fn create_gic_node(fdt: &mut FdtWriter, gic_device: &GICDevice) -> Result<(), FdtError> {
let interrupt = fdt.begin_node("intc")?;
fdt.property_string("compatible", gic_device.fdt_compatibility())?;
Expand Down
Binary file modified src/vmm/src/arch/aarch64/output_GICv3.dtb
Binary file not shown.
Binary file modified src/vmm/src/arch/aarch64/output_initrd_GICv3.dtb
Binary file not shown.
1 change: 0 additions & 1 deletion src/vmm/src/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -288,7 +288,6 @@ pub fn build_microvm_for_boot(
)?;

device_manager.attach_vmgenid_device(&vm)?;
#[cfg(target_arch = "x86_64")]
device_manager.attach_vmclock_device(&vm)?;

#[cfg(target_arch = "aarch64")]
Expand Down
53 changes: 29 additions & 24 deletions src/vmm/src/device_manager/acpi.rs
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
// Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0

#[cfg(target_arch = "x86_64")]
use acpi_tables::{Aml, aml};
use vm_memory::GuestMemoryError;

use crate::Vm;
#[cfg(target_arch = "x86_64")]
use crate::devices::acpi::vmclock::VmClock;
use crate::devices::acpi::vmgenid::VmGenId;
use crate::vstate::resources::ResourceAllocator;
Expand All @@ -23,7 +23,6 @@ pub struct ACPIDeviceManager {
/// VMGenID device
pub vmgenid: VmGenId,
/// VMclock device
#[cfg(target_arch = "x86_64")]
pub vmclock: VmClock,
}

Expand All @@ -32,7 +31,6 @@ impl ACPIDeviceManager {
pub fn new(resource_allocator: &mut ResourceAllocator) -> Self {
ACPIDeviceManager {
vmgenid: VmGenId::new(resource_allocator),
#[cfg(target_arch = "x86_64")]
vmclock: VmClock::new(resource_allocator),
}
}
Expand All @@ -43,19 +41,19 @@ impl ACPIDeviceManager {
Ok(())
}

#[cfg(target_arch = "x86_64")]
pub fn attach_vmclock(&self, vm: &Vm) -> Result<(), ACPIDeviceError> {
vm.register_irq(&self.vmclock.interrupt_evt, self.vmclock.gsi)?;
self.vmclock.activate(vm.guest_memory())?;
Ok(())
}
}

#[cfg(target_arch = "x86_64")]
impl Aml for ACPIDeviceManager {
fn append_aml_bytes(&self, v: &mut Vec<u8>) -> Result<(), aml::AmlError> {
// AML for [`VmGenId`] device.
self.vmgenid.append_aml_bytes(v)?;
// AML for [`VmClock`] device.
#[cfg(target_arch = "x86_64")]
self.vmclock.append_aml_bytes(v)?;

// Create the AML for the GED interrupt handler
Expand All @@ -65,30 +63,37 @@ impl Aml for ACPIDeviceManager {
&aml::Name::new("_HID".try_into()?, &"ACPI0013")?,
&aml::Name::new(
"_CRS".try_into()?,
&aml::ResourceTemplate::new(vec![&aml::Interrupt::new(
true,
true,
false,
false,
self.vmgenid.gsi,
)]),
&aml::ResourceTemplate::new(vec![
&aml::Interrupt::new(true, true, false, false, self.vmgenid.gsi),
&aml::Interrupt::new(true, true, false, false, self.vmclock.gsi),
]),
)?,
// We know that the maximum IRQ number fits in a u8. We have up to
// 32 IRQs in x86 and up to 128 in ARM (look into `vmm::crate::arch::layout::GSI_LEGACY_END`).
// Both `vmgenid.gsi` and `vmclock.gsi` can safely be cast to `u8`
// without truncation, so we let clippy know.
&aml::Method::new(
"_EVT".try_into()?,
1,
true,
vec![&aml::If::new(
// We know that the maximum IRQ number fits in a u8. We have up to
// 32 IRQs in x86 and up to 128 in
// ARM (look into
// `vmm::crate::arch::layout::GSI_LEGACY_END`)
#[allow(clippy::cast_possible_truncation)]
&aml::Equal::new(&aml::Arg(0), &(self.vmgenid.gsi as u8)),
vec![&aml::Notify::new(
&aml::Path::new("\\_SB_.VGEN")?,
&0x80usize,
)],
)],
vec![
&aml::If::new(
#[allow(clippy::cast_possible_truncation)]
&aml::Equal::new(&aml::Arg(0), &(self.vmgenid.gsi as u8)),
vec![&aml::Notify::new(
&aml::Path::new("\\_SB_.VGEN")?,
&0x80usize,
)],
),
&aml::If::new(
#[allow(clippy::cast_possible_truncation)]
&aml::Equal::new(&aml::Arg(0), &(self.vmclock.gsi as u8)),
vec![&aml::Notify::new(
&aml::Path::new("\\_SB_.VCLK")?,
&0x80usize,
)],
),
],
),
],
)
Expand Down
4 changes: 3 additions & 1 deletion src/vmm/src/device_manager/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,6 @@ impl DeviceManager {
Ok(())
}

#[cfg(target_arch = "x86_64")]
pub(crate) fn attach_vmclock_device(&mut self, vm: &Vm) -> Result<(), AttachDeviceError> {
self.acpi_devices.attach_vmclock(vm)?;
Ok(())
Expand Down Expand Up @@ -465,6 +464,9 @@ impl<'a> Persist<'a> for DeviceManager {
// Restore ACPI devices
let mut acpi_devices = ACPIDeviceManager::restore(constructor_args.vm, &state.acpi_state)?;
acpi_devices.vmgenid.notify_guest()?;
acpi_devices
.vmclock
.post_load_update(constructor_args.vm.guest_memory());

// Restore PCI devices
let pci_ctor_args = PciDevicesConstructorArgs {
Expand Down
11 changes: 6 additions & 5 deletions src/vmm/src/device_manager/persist.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ use super::mmio::*;
#[cfg(target_arch = "aarch64")]
use crate::arch::DeviceType;
use crate::device_manager::acpi::ACPIDeviceError;
#[cfg(target_arch = "x86_64")]
use crate::devices::acpi::vmclock::{VmClock, VmClockState};
use crate::devices::acpi::vmgenid::{VMGenIDState, VmGenId};
#[cfg(target_arch = "aarch64")]
Expand Down Expand Up @@ -168,7 +167,6 @@ impl fmt::Debug for MMIODevManagerConstructorArgs<'_> {
#[derive(Default, Debug, Clone, Serialize, Deserialize)]
pub struct ACPIDeviceManagerState {
vmgenid: VMGenIDState,
#[cfg(target_arch = "x86_64")]
vmclock: VmClockState,
}

Expand All @@ -180,7 +178,6 @@ impl<'a> Persist<'a> for ACPIDeviceManager {
fn save(&self) -> Self::State {
ACPIDeviceManagerState {
vmgenid: self.vmgenid.save(),
#[cfg(target_arch = "x86_64")]
vmclock: self.vmclock.save(),
}
}
Expand All @@ -190,10 +187,14 @@ impl<'a> Persist<'a> for ACPIDeviceManager {
// Safe to unwrap() here, this will never return an error.
vmgenid: VmGenId::restore((), &state.vmgenid).unwrap(),
// Safe to unwrap() here, this will never return an error.
#[cfg(target_arch = "x86_64")]
vmclock: VmClock::restore(vm.guest_memory(), &state.vmclock).unwrap(),
vmclock: VmClock::restore((), &state.vmclock).unwrap(),
};

vm.register_irq(
&acpi_devices.vmclock.interrupt_evt,
acpi_devices.vmclock.gsi,
)?;

acpi_devices.attach_vmgenid(vm)?;
Ok(acpi_devices)
}
Expand Down
7 changes: 6 additions & 1 deletion src/vmm/src/devices/acpi/generated/vmclock_abi.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ pub const VMCLOCK_FLAG_PERIOD_MAXERROR_VALID: u64 = 16;
pub const VMCLOCK_FLAG_TIME_ESTERROR_VALID: u64 = 32;
pub const VMCLOCK_FLAG_TIME_MAXERROR_VALID: u64 = 64;
pub const VMCLOCK_FLAG_TIME_MONOTONIC: u64 = 128;
pub const VMCLOCK_FLAG_VM_GEN_COUNTER_PRESENT: u64 = 256;
pub const VMCLOCK_FLAG_NOTIFICATION_PRESENT: u64 = 512;
pub const VMCLOCK_STATUS_UNKNOWN: u8 = 0;
pub const VMCLOCK_STATUS_INITIALIZING: u8 = 1;
pub const VMCLOCK_STATUS_SYNCHRONIZED: u8 = 2;
Expand Down Expand Up @@ -153,10 +155,11 @@ pub struct vmclock_abi {
pub time_frac_sec: __le64,
pub time_esterror_nanosec: __le64,
pub time_maxerror_nanosec: __le64,
pub vm_generation_counter: __le64,
}
#[allow(clippy::unnecessary_operation, clippy::identity_op)]
const _: () = {
["Size of vmclock_abi"][::std::mem::size_of::<vmclock_abi>() - 104usize];
["Size of vmclock_abi"][::std::mem::size_of::<vmclock_abi>() - 112usize];
["Alignment of vmclock_abi"][::std::mem::align_of::<vmclock_abi>() - 8usize];
["Offset of field: vmclock_abi::magic"][::std::mem::offset_of!(vmclock_abi, magic) - 0usize];
["Offset of field: vmclock_abi::size"][::std::mem::offset_of!(vmclock_abi, size) - 4usize];
Expand Down Expand Up @@ -198,4 +201,6 @@ const _: () = {
[::std::mem::offset_of!(vmclock_abi, time_esterror_nanosec) - 88usize];
["Offset of field: vmclock_abi::time_maxerror_nanosec"]
[::std::mem::offset_of!(vmclock_abi, time_maxerror_nanosec) - 96usize];
["Offset of field: vmclock_abi::vm_generation_counter"]
[::std::mem::offset_of!(vmclock_abi, vm_generation_counter) - 104usize];
};
Loading