-
Notifications
You must be signed in to change notification settings - Fork 0
Description
My initial implementation is far enough along that I can run artificially crafted microbenchmarks. The benchmarks are here, feel free to play with them yourself (with a sufficiently recent build of V8).
Scenario: a Wasm module with 10K pairs of types (a described struct and a descriptor struct), and 10 empty functions (no params, no results, no bodies) to be installed on each descriptor, so 100K functions in total. They are exported/installed with names "f0"
through "f99999"
. In those cases where DescriptorOptions are imported, they use a module name "p"
and names "p0"
through "p9999"
-- these choices are meant to optimize for small wire bytes sizes.
I'm aware that 10K types / 100K cumulative methods is more than most folks expect to be needing at first, but we want a design that scales; besides, startup-related things are hard to measure because repeating the test in the same process creates unrealistic effects, so a larger scenario tends to yield more reliable numbers.
I'm not creating any constructor functions in this first round of benchmarking.
Happy to flesh any of this out to be somewhat more realistic if anyone has suggestions for how to make it more interesting.
The contestants:
1: "Baseline". This module exports all of its 100K functions, but doesn't set up any prototypes. It reflects what you'd have to do today (i.e. before this proposal) if you wanted that rich of an interface to your Wasm module.
2: "Imperative". This module allocates prototype objects and new WebAssembly.DescriptorOptions()
imperatively in JavaScript and provides them as imports. After instantiation, it again uses imperative JavaScript to install the exported functions on the prototypes that were previously wrapped in the DescriptorOptions.
3: "Modular (named)". The Custom Section based approach approximately (though, for logistical reasons, not exactly) as currently described in Overview.md: the custom section controls implicit creation of DescriptorOptions. Aside from the custom section, the wire bytes are identical to "Imperative".
4. "Modular (indexed)". Same as "Modular (named)", but imports (for materialized DescriptorOptions) and exports (for fetching functions to install on prototypes) are referenced by their import/export index rather than by name. That's more in line with how other module-internal things (types, functions, globals, ...) are referenced, and avoids a bunch of hash map lookups mapping names to indices.
5. "Direct". One possible design of the idea discussed in issue #22: neither prototypes nor DescriptorOptions are explicitly imported by the Wasm module, the custom section abstractly "associates" them with globals. Also, installed functions are not explicitly exported; instead they are referenced by their internal function index.
6. "StartFunc". Per the discussion at the recent CG meeting, methods are installed by a "wasm:js-prototypes" "configureAll"
compile-time import that's called from the start function with three big arrays (one for funcrefs, one for prototypes, one for byte data describing what to do), which in turn are created from element/data segments. My prototype implementation recognizes this pattern and avoids the array allocations, reading the respective segments directly instead, which created some possibilities for further optimizations to reduce overhead. Prototypes are imported, and allocated on demand by a JS Proxy, which is more efficient than building up a huge imports object, and even saves wire bytes because they can all be imported with an empty name (and the Proxy's get
trap simply returns a new object every time it is called, regardless of the name that's being requested). I'm reporting two variants: with DescriptorOptions
(as currently proposed, same as "Imperative"/"Modular"), and without (just passing the prototypes around without wrapping them).
Results:
Approach | Instantiation time | Wire bytes size |
---|---|---|
Baseline | 73 ms | 2,895,955 bytes |
Imperative | 310 ms / +324% | 10,021,403 bytes / +246% (includes JS setup code) |
Modular (named) | (~147 ms / +77%) | 4,651,310 bytes / +60% |
Modular (indexed) | 105 ms / +44% | 4,206,890 bytes / +45% |
Direct | 90 ms / +23% | 2,945,860 bytes / +1.7% |
StartFunc (with DO) | 80 ms / +10% | 3,075,893 bytes / +6.2% |
StartFunc (no DO) | 77 ms / +5.5% | 3,075,893 bytes / +6.2% |
Update 2025-04-30: Changed "Imperative" to eval()
a long sequence of generated straight-line code instead of a loop for setting up the prototypes after instantiation. This might over-estimate the required time now (contrary to the previous under-estimate).
Update 2025-05-15: Added "Modular (named)" for completeness. Previous "Modular" is now clarified to be "Modular (indexed)".
Update 2025-07-30: Added "StartFunc". Since some of the optimizations I made are affecting the other scenarios too, I've updated all the numbers (except "Modular (named)" which would take me more time to get back to; it's also not very interesting).
Comments:
"StartFunc" wasn't that fast at first, but with some optimizations ended up being even more efficient than "Direct"; this suggests that the latter probably has room for fine-tuning, but given its lack of popularity I haven't spent the time to explore that. "StartFunc" does need a few more bytes, but that's probably a difference we can live with.
"Modular" (either version) is bigger than "Direct" because it needs both large import/export sections and the method descriptions in the custom section. "Imperative" is biggest because it needs large amounts of (toolchain-generated) JS code; I'm not sure how realistic my simulation of that is. It would likely benefit somewhat from a WebAssembly.MarkAsReceiverIsFirstParam()
method (see issue 31).
"Direct" is so far implemented fully eagerly; it is the only contestant that has the potential to be implemented with a lazy approach (but I haven't had time to do that yet).
The overhead of DescriptorOptions
wrapping is not huge but clearly measurable; I think it's big enough that we're better off without the wrapping.
Personally I'm not holding much hope for an importable one-method-at-a-time function to be competitive with the everything-at-once approach I've prototyped; but if someone wants to try to prove me wrong on that assumption I'm happy to implement such a function and let you figure out how to build a sufficiently efficient start function that calls it.