Skip to content

Conversation

ludfjig
Copy link
Contributor

@ludfjig ludfjig commented Sep 5, 2025

This PR improves error handling between host and guest functions to prevent memory leaks and ensure more reliable deserialization.

Previously, when a host function (invoked from the guest) returned an error, the error was reported immediately without unwinding the guest stack. This left guest-side allocations in an inconsistent state, leading to memory leaks on subsequent entries into the guest.

In addition, error reporting relied on a fragile mechanism: guest errors were manually serialized into a buffer, and the host would attempt to detect them by trying to deserialize an error. If deserialization succeeded, an error was assumed to have occurred. This approach is risky because there was nothing preventing GuestError and FunctionCallResult from possibly having the same serialized format since they are completely separate.

Changes

  • Guest/host function calls now always return a FunctionCallResult, which explicitly represents either Ok or Err.
  • If host function returns an error, it's serialized back into the guest, and the guest will properly unwind, and report it back to the host, fixing a memory leak.

TODO:

For C guests: If host function returns an error, guest will panic when trying to read an expected good value (because there is only an error instead), and leak memory as a result. Will fix this in follow up PR

Closes #826
Closes #497

@ludfjig ludfjig force-pushed the host_error_leak_fix branch 2 times, most recently from d346b32 to 2e45037 Compare September 8, 2025 18:07
@ludfjig ludfjig added the kind/bugfix For PRs that fix bugs label Sep 8, 2025
@ludfjig ludfjig force-pushed the host_error_leak_fix branch 6 times, most recently from 8818cf5 to ae355d7 Compare September 9, 2025 18:10
@ludfjig ludfjig marked this pull request as ready for review September 9, 2025 18:20
Copy link
Contributor

@jsturtevant jsturtevant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! overall this looks like an improvement but others should probably take a look as this was my first time reviewing this code.

assert!(
matches!(&res, HyperlightError::GuestError(_, msg) if msg == "Host function error!") // rust guest
|| matches!(&res, HyperlightError::GuestAborted(_, msg) if msg.contains("Host function error!")) // c guest
|| matches!(&res, HyperlightError::StackOverflow()) // c guest. TODO fix this. C guest leaks when host func returns error guest panics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have an issue for this? was this the case prior?

Copy link
Contributor Author

@ludfjig ludfjig Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the case prior. Feedback appreciated how to make this a nicer C-API (for all types, not just i32):

#[unsafe(no_mangle)]
pub extern "C" fn hl_get_host_return_value_as_Int() -> i32 {
    get_host_return_value().expect("Unable to get host return value as int")
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we mimic something like wasmtime does? https://github.com/bytecodealliance/wasmtime/blob/7380932631f7784d944cb0326a6ffaaf5dac29fc/crates/c-api/src/component/val.rs#L180-L185

I don't think this needs to be a blocker as we are in this state today

jsturtevant
jsturtevant previously approved these changes Sep 29, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR restructures error handling between guest and host functions to fix memory leaks and improve reliability. Previously, host function errors would return immediately without unwinding the guest stack, causing memory leaks. Additionally, error detection relied on attempting to deserialize errors, which was fragile.

Key changes:

  • All function calls now return a FunctionCallResult that explicitly represents either Ok or Err
  • Host function errors are properly serialized back to the guest for unwinding before being reported to the host
  • Removed fragile error detection mechanism that relied on attempting deserialization

Reviewed Changes

Copilot reviewed 23 out of 36 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/schema/function_call_result.fbs Modified to use union-based Result pattern with ReturnValueBox and GuestError
src/schema/guest_error.fbs Added HostError enum variant for host function errors
src/schema/all.fbs New consolidated schema file for all FlatBuffer schemas
src/hyperlight_host/src/sandbox/outb.rs Updated host function call handling to return FunctionCallResult
src/hyperlight_host/src/sandbox/initialized_multi_use.rs Replaced error checking with direct FunctionCallResult handling
src/hyperlight_host/src/mem/mgr.rs Updated memory manager to handle FunctionCallResult instead of separate error checking
src/hyperlight_guest/src/guest_handle/host_comm.rs Modified host return value handling for new FunctionCallResult format
src/hyperlight_guest_bin/src/guest_function/call.rs Updated guest function dispatch to use FunctionCallResult
src/hyperlight_common/src/flatbuffer_wrappers/function_types.rs Added FunctionCallResult wrapper and updated serialization
src/tests/rust_guests/simpleguest/src/main.rs Added test function for host error handling

Copy link
Contributor

@danbugs danbugs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work. Mostly LGTM, just got some minor nits.

@ludfjig ludfjig force-pushed the host_error_leak_fix branch from 7489937 to 611e659 Compare October 2, 2025 21:33
adamperlin
adamperlin previously approved these changes Oct 2, 2025
Copy link
Contributor

@adamperlin adamperlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! It's really great that this fixed an additional fragile error handling case as well with error deserialization. One question: have you tried reverting the behavior of host call fuzzing back to how it was before we discovered this issue, i.e., re-using the same sandbox state for each fuzzed call? Maybe we can open a follow up issue to do this?

ludfjig added 12 commits October 2, 2025 16:41
- Add all.fbs to include all schema files in one place
- Restructure function_call_result.fbs to use Result-like union
- Add HostError variant to ErrorCode enum in guest_error.fbs
- Update flatbuffer generation command in Justfile to use all.fbs
- Update documentation for new generation process

Signed-off-by: Ludvig Liljenberg <[email protected]>
Update all generated Rust code based on the new schema definitions.
This includes new types for error handling and result structures.

Signed-off-by: Ludvig Liljenberg <[email protected]>
- Update function_types.rs to handle Result-like return values
- Simplify guest_error.rs wrapper implementation
- Update util.rs for new generated types
- Update mod.rs  for new generated types

Signed-off-by: Ludvig Liljenberg <[email protected]>
- Remove guest_err.rs from hyperlight_host (replaced by new error handling)
- Remove guest_err.rs from hyperlight_guest_bin (replaced by new error handling)
- Update func/mod.rs to remove obsolete import

Signed-off-by: Ludvig Liljenberg <[email protected]>
- Update initialized_multi_use.rs to use new Result-like error handling
- Update mem/mgr.rs to handle host function errors properly
- Update sandbox/outb.rs for new error propagation pattern

Signed-off-by: Ludvig Liljenberg <[email protected]>
- Update guest/host_comm.rs to use new Result-like return values
- Update guest_bin/call.rs to properly handle host function errors
- Update guest_bin/lib.rs to remove obsolete error handling import and make GUEST_HANDLE public (for use in C-API)
- Update guest_capi/error.rs to support new error types

Signed-off-by: Ludvig Liljenberg <[email protected]>
Update sandbox_host_tests.rs to use the new Result-like error handling pattern.

Signed-off-by: Ludvig Liljenberg <[email protected]>
Update Cargo.lock and Cargo.toml files to reflect the dependency changes
needed for the new error handling implementation.

Signed-off-by: Ludvig Liljenberg <[email protected]>
Signed-off-by: Ludvig Liljenberg <[email protected]>
Signed-off-by: Ludvig Liljenberg <[email protected]>
Signed-off-by: Ludvig Liljenberg <[email protected]>
Signed-off-by: Ludvig Liljenberg <[email protected]>
@adamperlin
Copy link
Contributor

This looks good to me! It's really great that this fixed an additional fragile error handling case as well with error deserialization. One question: have you tried reverting the behavior of host call fuzzing back to how it was before we discovered this issue, i.e., re-using the same sandbox state for each fuzzed call? Maybe we can open a follow up issue to do this?

Added #933 to track this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bugfix For PRs that fix bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory Leak in Guest on Error Calling Host Function Confusing ERROR tracing when not actually an error
4 participants