Skip to content

Conversation

@losfair
Copy link

@losfair losfair commented Oct 29, 2025

Changes

Add a new machine-config field, enable_thp, that controls whether transparent huge pages (THP) is enabled for the microVM.

Reason

Currently, even for a VM that is not restored from a snapshot and not using any vhost-user devices, Firecracker does not attempt to enable transparent huge pages for it. This makes it necessary to reserve hugetlb pages on hypervisor host machines, making things hard on hosts that run mixed workloads.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

Add a new machine-config field, `enable_thp`, that controls whether
transparent huge pages (THP) is enabled for the microVM.

Signed-off-by: Heyang Zhou <[email protected]>
@xmarcalx
Copy link
Contributor

xmarcalx commented Nov 9, 2025

Hi @losfair ,

Thanks a lot for your contribution and sorry for late reply.

We are definitely interested in the possibility to support THP. We attempted to include support when we added support for huge pages but at the time we actually realized that there were incompatibility with UFFD which is also the most common feature used with Firecracker therefore the de facto will be a dead feature.
Would be possible to do some more researches on how we can add THP pages support while working with UFFD, maybe using UFFDIO_MOVE or maybe making it work with UFFDIO_COPY?

@piscisaureus
Copy link

piscisaureus commented Nov 19, 2025

However, it's possible to make UFFD work with THP enabled for guest memory:

  • Snapshot restore uses UFFDIO_COPY (base page granularity)
  • After restore, THPs can form via khugepaged or explicit madvise(MADV_COLLAPSE)
  • Tradeoff: THP benefits are delayed until pages are collapsed (not immediate after restore)

@xmarcalx would this gradual THP formation be acceptable, or do you need THPs immediately after restore?

@JamesC1305
Copy link
Contributor

Hi @piscisaureus, thank you for the suggestion, and apologies for the late reply.

We need to discuss internally our position on this. Immediately, one potential concern I have is if we register the UFFD to receive UFFD_REMOVE events, with the handling desired behavior being zeroing pages on removal (i.e. UFFDIO_ZEROPAGE). I believe this will fail, as it uses the same mfill_atomic function used by UFFDIO_COPY.

Regardless, we will have a chat to determine where we stand with your proposed implementation. We'll likely get back to you in early January, as a lot of people will be away for the holidays later this month. Cheers.

@JamesC1305 JamesC1305 self-assigned this Dec 11, 2025
@Manciukic
Copy link
Contributor

Hey @piscisaureus, thanks for the contribution and sorry for the delays.
We want to make sure there's a usecase for using THP with the limitation of being usable only on booted VMs. This is because the main reason we added hugepages support was exactly for the snapshot/restore usecase, where a bigger page size greatly improves the restore latency due to fewer page faults. This is why UFFD support is very important for us. While your suggestion is valid, it doesn't seem to address the issue of the restore latency as the cost is paid before the pages are collapsed, if I understood correctly.
To move this forward, we'd like to understand the usecase of THP without snapshot/restore support and whether there is a path forward for supporting that (which there doesn't seem to be at the moment). If you can provide some details on how that is useful to you, and some performance measurements sustaining those claims, that would be great!
Thanks again for your contribution, and we hope to hear back from you soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants