Skip to content

Different approach to JS prototypes: more efficient, less visible #22

@jakobkummerow

Description

@jakobkummerow

I'd like to propose a slightly different approach for specifying JS prototypes: to fully lean into a custom section specified and interpreted by the JS embedding.

Compared to the earlier sketch that went in this direction, this approach does not leave any traces in the core Wasm world. In particular, there are no "weird" externref fields in special positions or with special types or values or annotations that then magically get copied somewhere and seem useless otherwise. In a non-Web embedding where the custom section gets ignored, the rest of the Wasm module looks and behaves exactly as it would if we didn't introduce this mechanism at all.

Full disclosure up front: this approach is not compliant with the restrictions currently described by the embedder spec. However, since (1) those rules were written long before WasmGC came to be (and hence long before wanting to expose Wasm structs to JS in idiomatic/convenient ways was on anyone's radar), and (2) the proposed mechanism doesn't burden or constrain core Wasm or other embeddings in any way (notably including: it doesn't stand in the way of any conceivable future additions to core Wasm), I think it would be fair game to relax those rules a little bit, to allow embedders to reflect Wasm structs on their side of the fence if and how the module requests it (via the custom section).

Compared to the approach currently under discussion in PR#19, the key benefit is improved efficiency. Wasm modules would need to spend a lot fewer bytes on imports and exports. At the same time, engines would have maximum freedom to explore various eagerness/laziness tradeoffs they find interesting; they are not required (but certainly allowed) to process the entire custom section during instantiation of the module. (To clarify, I don't know yet what would be a good tradeoff there, so allowing freedom in this regard isn't a veiled way of saying that I'd expect any particular engine to make use of this freedom in any particular way. I just think it's good when implementers have options to try different strategies, and also to change their strategies when new needs emerge; for example, leaning towards eagerness might be simpler initially, but if a new generation of larger modules gains popularity, then lazy strategies might become more desirable than they were before.)
Personally (and I realize that this is somewhat subjective), I also find the alternative proposed here to be simpler overall: nothing is magically added to or removed from the imports or exports of the module, there is no special time window where unpopulated prototypes are observable, there is no question whether DescriptorOptions need to be stored forever or can be discarded after use, there are no intermediate wrapper modules being generated on the fly.

The key idea is to do everything declaratively.

In its simplest form, that would mean that the custom section describes a JS prototype for each Wasm type. However, we have reason to believe that it will be very useful (or even strictly required) to have more than one prototype per Wasm type, in particular due to type minimization by optimizers, type canonicalization, and module merging, all of which could result in multiple unrelated surface-language types being represented by the same Wasm type and needing different JS prototypes. So we need a different "key" to describe where each prototype belongs.
It turns out that in practice, toolchains store vtables and similar type identifiers (i.e. in the future that this proposal envisions: custom descriptors) in Wasm globals. So we can embrace that as our vehicle.

The section will contain a list of entries like the following (here presented in JSON-like syntax for readability, exact encoding TBD):

[{
  "global_index": 123,
  "parent_prototype": 89,
  "methods": [
    {
      "name": "foo",
      "func_index": 45,
    }, …
  ],
}, …]

(There may be a few more things to specify, this is just a first draft to illustrate the most important parts.)

The behavior triggered by such an entry is, roughly:
If $global123 is an immutable global,
whose type is (ref $x) for some type $x that describes some other type $y,
and whose initializer ends with a struct.new[_default] $x instruction,
then when $global123 is initialized, then along with this struct a JS-side prototype P is allocated and associated with the RTT contained in this descriptor struct.

Note 1: The "association" between the descriptor struct's RTT and P is invisible to Wasm. In implementations, we expect that P will be stored in some hidden/internal field in the descriptor or the RTT (or those two might be the same internal object anyway). P is not directly accessible from Wasm. The reason is twofold: there is no need to access it because the concept of "prototype" has no meaning to Wasm, and it would be in conflict with the goal that the section can be ignored without any observable changes in behavior for core Wasm.
Note 2: The allocation and configuration of P is unobservable. That gives engines the freedom to defer this work until some event makes the presence of P potentially-observable (such as: a Wasm object crossing over the "boundary" to JS, or an actual attempt to access a Wasm struct's prototype, or the first allocation of a Wasm struct that uses the descriptor struct stored in $global123 – in fact, the entire initialization of $global123 could be deferred until $global123 is accessed for the first time).
Note 3: This bit is what violates the current embedder restrictions: we allow the embedder to observe initialization of a global, and perform a "mirror" of this action in the embedder world.

This JS prototype P is populated with methods according to the "methods" list. The function indices in this list refer to the module's list of functions. These functions do not need to be exported with the usual exports mechanism (but they may be; they may also be imported). P also gets the specified parent prototype, if present, as its prototype; it is identified by another global index ($global89 in the example). We should probably require that any parent prototype must have a lower index (which corresponds to Wasm supertypes needing to have a lower type index), to rule out cycles by construction.
As far as JavaScript is concerned, when a Wasm struct of type $y (i.e. having descriptor $x) enters the JS realm, it will have P as its prototype. The identity of the struct's prototype is immutable, i.e. the struct's prototype cannot be replaced; this is in line with Wasm structs generally appearing as "sealed" in JS. P itself is mutable, however, so additional methods can be installed on it, or its prototype chain can be modified, e.g. to inject a custom prototype that wasn't available to the declarative mechanism. The preinstalled methods' property attributes are: not writable, not enumerable, not configurable. (This is the conservative choice, we could change it if there is a need.) When JavaScript calls one of the methods that have been declaratively installed on P, the corresponding Wasm function is called, and the "receiver" (aka "this") is passed as the first parameter. Along with any additional parameters, it is subject to the usual ToWebAssemblyValue rules, just like all other calls of (exported) Wasm functions.

A fun observation is that the contents of the proposed section are actually not JS-specific at all (if you read "parent_prototype" simply as "parent", but that string is just a descriptive label and wouldn't show up in the wire bytes anyway). All it really describes is "when Wasm structs with this descriptor flow out to the embedder, then their 'class' (whatever that means for the respective embedder) should have the following methods". I think a hypothetical Wasm embedder in other GC'ed, OO-friendly, dynamic or reflection-supporting languages (Python, Java, ...) could probably also make good use of this section. But that's not an explicit design goal here, just a welcome side effect. However, if there is interest, we could use this observation as motivation for specifying the section as "embedder section" in core Wasm rather than in one or more embedders.

WDYT?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions