Skip to content

Optimize function call serializing #778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ludfjig
Copy link
Contributor

@ludfjig ludfjig commented Aug 11, 2025

Closes #789

This PR optimizes serialization of FunctionCalls by:

  1. Making the serialization call return a &[u8] instead of Vec, saving a memory allocation when writing it to memory.
  2. Avoids an unnecessary step where VecBytes(Vec) were converted to vec through an useless Iterator.
  3. Preallocates a FlatBufferBuilder with a specific capacity to avoid reallocations when it grows. This is at a small runtime cost of estimating the capacity that will be needed. In practice this estimation is very fast and almost always worth it

Future Todos:

  • This PR only affects the host side, and there's a lot of improvement that can be made on the guest side: Guest functions should not need to return a Vec. Instead they should be passed some kind of Writer on which they pass return values.
  • FunctionCall should contained a borrowed &str/&[u8] instead of String and Vec

Relevant benchmark results compared to main branch (but with c2e6cdd which adds the 2 first benchmarks):

guest_functions_with_large_parameters/guest_call_with_large_parameters
                        time:   [723.24 ms 756.56 ms 793.92 ms]
                        change: [−15.950% −10.778% −5.4238%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  5 (5.00%) high mild
  10 (10.00%) high severe

function_call_serialization/serialize_function_call
                        time:   [5.5803 ms 5.6351 ms 5.7012 ms]
                        change: [−80.248% −79.529% −78.899%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
function_call_serialization/deserialize_function_call
                        time:   [8.2955 ms 8.3794 ms 8.4724 ms]
                        change: [−58.447% −57.639% −56.842%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe

sample_workloads/24K_in_8K_out_c
                        time:   [28.390 µs 28.675 µs 29.009 µs]
                        change: [−45.020% −42.425% −39.887%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
sample_workloads/24K_in_8K_out_rust
                        time:   [27.788 µs 27.997 µs 28.237 µs]
                        change: [−38.325% −36.829% −35.245%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

@ludfjig ludfjig force-pushed the big_param_opt branch 2 times, most recently from bf0445e to 4196a2f Compare August 11, 2025 22:45
@ludfjig ludfjig added kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. area/performance Addresses performance labels Aug 11, 2025
@ludfjig ludfjig force-pushed the big_param_opt branch 7 times, most recently from 1ffa58d to 0c83035 Compare August 14, 2025 19:26
Copy link
Contributor

@jsturtevant jsturtevant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the extensive tests!

and a c+rust sample workload benchmark

Signed-off-by: Ludvig Liljenberg <[email protected]>
This should save memory allocations, but require that a
FlatBufferBuilder is passed in.

Signed-off-by: Ludvig Liljenberg <[email protected]>
Signed-off-by: Ludvig Liljenberg <[email protected]>
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes FunctionCall serialization to improve performance by reducing memory allocations and improving buffer capacity estimation. The changes focus on making serialization more efficient by returning borrowed slices instead of owned vectors and pre-allocating FlatBufferBuilder capacity.

Key changes:

  • Refactored FunctionCall serialization to use encode() method returning &[u8] instead of TryFrom<FunctionCall> for Vec<u8>
  • Added capacity estimation function to pre-allocate FlatBufferBuilder capacity and avoid reallocations
  • Updated function signatures to accept &[u8] instead of Vec<u8> where possible

Reviewed Changes

Copilot reviewed 11 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/hyperlight_common/src/flatbuffer_wrappers/function_call.rs Refactored serialization from TryFrom trait to encode() method returning borrowed slice
src/hyperlight_common/src/flatbuffer_wrappers/util.rs Added capacity estimation function with comprehensive tests
src/hyperlight_common/src/flatbuffer_wrappers/function_types.rs Fixed VecBytes deserialization to use .bytes().to_vec() instead of iterator
src/hyperlight_host/src/sandbox/initialized_multi_use.rs Updated to use new encode method with capacity estimation
src/hyperlight_guest/src/guest_handle/io.rs Changed parameter from Vec<u8> to &[u8]
src/hyperlight_guest/src/guest_handle/host_comm.rs Updated to use new encode method and pass references
src/hyperlight_guest_bin/src/guest_function/call.rs Updated function call to pass reference instead of owned vector
src/hyperlight_host/benches/benchmarks.rs Added benchmarks for serialization performance and sample workloads
src/tests/rust_guests/simpleguest/src/main.rs Added benchmark test function for 24K input/8K output scenario
src/tests/c_guests/c_simpleguest/main.c Added C version of benchmark test function
src/hyperlight_guest/Cargo.toml Added flatbuffers dependency

}

// match how vec grows
estimated_capacity.next_power_of_two()
Copy link
Preview

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next_power_of_two() call could potentially over-allocate significantly for large capacities. For example, if the estimated capacity is just over a power of 2, this will double the allocation. Consider adding a maximum threshold or using a more conservative growth strategy for very large estimates.

Suggested change
estimated_capacity.next_power_of_two()
// match how vec grows, but avoid excessive over-allocation for large buffers
const MAX_POWER_OF_TWO_THRESHOLD: usize = 1_048_576; // 1 MiB
if estimated_capacity < MAX_POWER_OF_TWO_THRESHOLD {
estimated_capacity.next_power_of_two()
} else {
estimated_capacity
}

Copilot uses AI. Check for mistakes.

})
.transpose()?;
/// Encodes self into the given builder and returns the encoded data.
pub fn encode<'a>(&self, builder: &'a mut FlatBufferBuilder) -> &'a [u8] {
Copy link
Preview

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encode method mutates the passed FlatBufferBuilder and returns a reference to its internal data. This creates a tight coupling between the builder's lifetime and the returned slice. Consider documenting that the builder should not be reused after this call, or design the API to take ownership of the builder to make this constraint clearer.

Suggested change
pub fn encode<'a>(&self, builder: &'a mut FlatBufferBuilder) -> &'a [u8] {
/// Encodes self into the given builder and returns the encoded data as a Vec<u8>.
/// Consumes the builder, which must not be reused after this call.
pub fn encode(&self, mut builder: FlatBufferBuilder) -> Vec<u8> {

Copilot uses AI. Check for mistakes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a reasonable suggestion

Copy link
Contributor

@danbugs danbugs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimization! Mostly LGTM. Just a couple of questions/nits here and there 👍

Comment on lines +188 to +206
let mut estimated_capacity = 20;

// Function name overhead
estimated_capacity += function_name.len() + 12;

// Parameters vector overhead
estimated_capacity += 12 + args.len() * 6;

// Per-parameter overhead
for arg in args {
estimated_capacity += 16; // Base parameter structure
estimated_capacity += match arg {
ParameterValue::String(s) => s.len() + 20,
ParameterValue::VecBytes(v) => v.len() + 20,
ParameterValue::Int(_) | ParameterValue::UInt(_) => 16,
ParameterValue::Long(_) | ParameterValue::ULong(_) => 20,
ParameterValue::Float(_) => 16,
ParameterValue::Double(_) => 20,
ParameterValue::Bool(_) => 12,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe consider making these values consts to avoid diff magic numbers here and there?

Copy link
Contributor

@jsturtevant jsturtevant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to address the remaining comments but this LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Addresses performance kind/enhancement For PRs adding features, improving functionality, docs, tests, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FunctionCall should serialize to &[u8] instead of Vec<u8>
3 participants