Skip to content

Commit 8a42fa4

Browse files
committed
Change async CABI, add context.{get,set} and waitable sets
1 parent 4b2c906 commit 8a42fa4

File tree

6 files changed

+955
-407
lines changed

6 files changed

+955
-407
lines changed

design/mvp/Async.md

Lines changed: 154 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ summary of the motivation and animated sketch of the design in action.
1515
* [Sync and Async Functions](#sync-and-async-functions)
1616
* [Task](#task)
1717
* [Current task](#current-task)
18+
* [Context-Local Storage](#context-local-storage)
1819
* [Subtask and Supertask](#subtask-and-supertask)
1920
* [Structured concurrency](#structured-concurrency)
2021
* [Streams and Futures](#streams-and-futures)
@@ -181,6 +182,38 @@ although there can be multiple live `Task` objects in a component instance,
181182
"the current one" is always clear: it's the one passed to the current function
182183
as a parameter.
183184

185+
### Context-Local Storage
186+
187+
Each task contains a distinct mutable **context-local storage** array. The
188+
current task's context-local storage can be read and written from core wasm
189+
code by calling the [`context.get`] and [`context.set`] built-ins.
190+
191+
The context-local storage array's length is currently fixed to contain exactly
192+
2 `i32`s with the goal of allowing this array to be stored inline in whatever
193+
existing runtime data structure is already efficiently reachable from ambient
194+
compiled wasm code. Because module instantiation is declarative in the
195+
Component Model, the imported `context.{get,set}` built-ins can be inlined by
196+
the core wasm compiler as-if they were instructions, allowing the generated
197+
machine code to be a single load or store. This makes context-local storage a
198+
good place to store the linear-memory shadow stack pointer as well as the
199+
pointer to the struct used to implement [thread-local storage] APIs used by
200+
guest code.
201+
202+
When [memory64] is integrated into the Component Model's Canonical ABI,
203+
`context.{get,set}` will be backwards-compatibly relaxed to allow `i64`
204+
pointers (overlaying the `i32` values like hardware 32/64-bit registers). When
205+
[wasm-gc] is integrated, these integral context values can serve as indices
206+
into guest-managed tables of typed GC references.
207+
208+
When [threads are added](#interaction-with-multi-threading), each thread will
209+
also get its own distinct mutable context-local storage array. This is the
210+
reason why "context-local" storage is not called "task-local" storage (where a
211+
"context" is a finer-grained unit of execution than either a "task" or a
212+
"thread").
213+
214+
For details, see [`context.get`] in the AST explainer and [`canon_context_get`]
215+
in the Canonical ABI explainer.
216+
184217
### Subtask and Supertask
185218

186219
Each component-to-component call necessarily creates a new task in the callee.
@@ -322,32 +355,52 @@ maintained for streams and futures by the Canonical ABI.
322355
When a component asynchronously lowers an import, it is explicitly requesting
323356
that, if the import blocks, control flow be returned back to the calling task
324357
so that it can do something else. Similarly, if `stream.read` or `stream.write`
325-
would block, they return a "blocked" code so that the caller can continue to
326-
make progress on other things. But eventually, a task will run out of other
327-
things to do and will need to **wait** for progress on one of the task's
328-
subtasks, readable stream ends, writable stream ends, readable future ends or
329-
writable future ends, which are collectively called its **waitables**. While a
330-
task is waiting on its waitables, the Component Model runtime can switch to
331-
other running tasks or start new tasks by invoking exports.
332-
333-
The Canonical ABI provides two ways for a task to wait:
334-
* The task can call the [`task.wait`] built-in to synchronously wait for
335-
progress. This is specified in the Canonical ABI by the [`canon_task_wait`]
336-
function.
337-
* The task can specify a `callback` function (in the `canon lift` definition)
338-
and return to the event loop to wait for notification of progress by a call
339-
to the `callback` function. This is specified in the Canonical ABI by
340-
the `opts.callback` case in [`canon_lift`].
358+
are called asynchronously and would block, they return a "blocked" code so that
359+
the caller can continue to make progress on other things. But eventually, a
360+
task will run out of other things to do and will need to **wait** for progress
361+
on one of the task's subtasks, reads or writes, which are collectively called
362+
its **waitables**. The Canonical ABI Python represents waitables with the
363+
[`Waitable`] base class. While a task is waiting, the Component Model runtime
364+
can switch to other running tasks or start new tasks by invoking exports.
365+
366+
To avoid the O(N) cost of processing an N-ary list of waitables every time a
367+
task needs to wait (which is the classic performance bottleneck of, e.g., POSIX
368+
`select()`), the Canonical ABI allows waitables to be maintained in **waitable
369+
sets** which (like `epoll()`) can be waited upon as a whole for any one of the
370+
member waitables to make progress. Waitable sets are independent of tasks;
371+
tasks can wait on different waitable sets over time and a single waitable set
372+
can be waited upon by multiple tasks at once. Waitable sets are local to a
373+
component instance and cannot be shared across component boundaries.
374+
375+
The Canonical ABI provides two ways for a task to wait on a waitable set:
376+
* Core wasm can pass (the index of) the waitable set as a parameter to the
377+
[`waitable-set.wait`] built-in which blocks and returns the event that
378+
occurred.
379+
* If the task uses a `callback` function, core wasm can return (the index of)
380+
the waitable set as a return value to the event loop, which will block and
381+
then pass the event that occurred as a parameter to the `callback`.
341382

342383
While the two approaches have significant runtime implementation differences
343384
(the former requires [fibers] or a [CPS transform] while the latter only
344-
requires storing a small `i32` "context" in the task), semantically they do the
345-
same thing which, in the Canonical ABI Python code, is factored out into
346-
[`Task`]'s `wait` method. Thus, the difference between `callback` and
347-
non-`callback` is mostly one of optimization, not expressivity.
348-
349-
The Canonical ABI Python represents waitables with a common [`Waitable`]
350-
base class.
385+
requires storing fixed-size context-local storage and [`Task`] state),
386+
semantically they do the same thing which, in the Canonical ABI Python code, is
387+
factored out into the [`Task.wait`] method. Thus, the difference between
388+
`callback` and non-`callback` is one of optimization, not expressivity.
389+
390+
In addition to waiting for an event to occur, a task can also **poll** for
391+
whether an event has already occurred. Polling does not block, but does allow
392+
other tasks to be switched to and executed. Polling is opportunistic, allowing
393+
the servicing of higher-priority events in the middle of longer-running
394+
computations; when there is nothing left to do, a task must *wait*. A task
395+
can poll by either calling [`waitable-set.poll`] or, when using a
396+
`callback`, by returning the Canonical-ABI-defined "poll" code to the event loop
397+
along with (the index of) the waitable set to poll.
398+
399+
Lastly, if a long-running task wants to allow other tasks to execute, without
400+
having any of its own subtasks to wait on, it can **yield**, allowing other
401+
tasks to be scheduled before continuing execution of the current task. A task
402+
can yield by either calling [`yield`] or, when using a `callback`, by returning
403+
the Canonical-ABI-defined "yield" code to the event loop.
351404

352405
### Backpressure
353406

@@ -356,16 +409,16 @@ export calls can start piling up, each consuming some of the component's finite
356409
private resources (like linear memory), requiring the component to be able to
357410
exert *backpressure* to allow some tasks to finish (and release private
358411
resources) before admitting new async export calls. To do this, a component may
359-
call the `task.backpressure` built-in to set a "backpressure" flag that causes
360-
subsequent export calls to immediately return in the "starting" state without
361-
calling the component's Core WebAssembly code.
412+
call the [`backpressure.set`] built-in to set a component-instance-wide
413+
"backpressure" flag that causes subsequent export calls to immediately return
414+
in the "starting" state without calling the component's Core WebAssembly code.
362415

363416
Once task enables backpressure, it can [wait](#waiting) for existing tasks to
364417
finish and release their associated resources. Thus, a task can choose to
365418
[wait](#waiting) with or without backpressure enabled, depending on whether it
366419
wants to accept new accept new export calls while waiting or not.
367420

368-
See the [`canon_task_backpressure`] function and [`Task.enter`] method in the
421+
See the [`canon_backpressure_set`] function and [`Task.enter`] method in the
369422
Canonical ABI explainer for the setting and implementation of backpressure.
370423

371424
Once a task is allowed to start according to these backpressure rules, its
@@ -415,18 +468,29 @@ replaced with `...` to focus on the overall flow of function calls.
415468
(import "libc" "mem" (memory 1))
416469
(import "libc" "realloc" (func (param i32 i32 i32 i32) (result i32)))
417470
(import "" "fetch" (func $fetch (param i32 i32) (result i32)))
471+
(import "" "waitable-set.new" (func $new_waitable_set (result i32)))
472+
(import "" "waitable-set.wait" (func $wait (param i32 i32) (result i32)))
473+
(import "" "waitable.join" (func $join (param i32 i32)))
418474
(import "" "task.return" (func $task_return (param i32 i32)))
419-
(import "" "task.wait" (func $wait (param i32) (result i32)))
475+
(global $wsi (mut i32))
476+
(func $start
477+
(global.set $wsi (call $new_waitable_set))
478+
)
479+
(start $start)
420480
(func (export "summarize") (param i32 i32)
421481
...
422482
loop
423483
...
424484
call $fetch ;; pass a pointer-to-string and pointer-to-list-of-bytes outparam
425485
... ;; ... and receive the index of a new async subtask
486+
global.get $wsi
487+
call $join ;; ... and add it to the waitable set
488+
...
426489
end
427490
loop ;; loop as long as there are any subtasks
428491
...
429-
call $task_wait ;; wait for a subtask to make progress
492+
global.get $wsi
493+
call $wait ;; wait for a subtask in the waitable set to make progress
430494
...
431495
end
432496
...
@@ -438,14 +502,18 @@ replaced with `...` to focus on the overall flow of function calls.
438502
(alias $libc "mem" (core memory $mem))
439503
(alias $libc "realloc" (core func $realloc))
440504
(canon lower $fetch async (memory $mem) (realloc $realloc) (core func $fetch'))
505+
(canon waitable-set.new (core func $new))
506+
(canon waitable-set.wait async (memory $mem) (core func $wait))
507+
(canon waitable.join (core func $join))
441508
(canon task.return (result string) async (memory $mem) (realloc $realloc) (core func $task_return))
442-
(canon task.wait async (memory $mem) (core func $task_wait))
443509
(core instance $main (instantiate $Main (with "" (instance
444510
(export "mem" (memory $mem))
445511
(export "realloc" (func $realloc))
446512
(export "fetch" (func $fetch'))
513+
(export "waitable-set.new" (func $new))
514+
(export "waitable-set.wait" (func $wait))
515+
(export "waitable.join" (func $join))
447516
(export "task.return" (func $task_return))
448-
(export "task.wait" (func $task_wait))
449517
))))
450518
(canon lift (core func $main "summarize")
451519
async (memory $mem) (realloc $realloc)
@@ -456,25 +524,21 @@ replaced with `...` to focus on the overall flow of function calls.
456524
Because the imported `fetch` function is `canon lower`ed with `async`, its core
457525
function type (shown in the first import of `$Main`) takes pointers to the
458526
parameter and results (which are asynchronously read-from and written-to) and
459-
returns the index of a new subtask. `summarize` calls `task.wait` repeatedly
460-
until all `fetch` subtasks have finished, noting that `task.wait` can return
461-
intermediate progress (as subtasks transition from "starting" to "started" to
462-
"returned") which tell the surrounding core wasm code that it can reclaim the
463-
memory passed arguments or use the results that have now been written to the
464-
outparam memory.
527+
returns the index of a new subtask. `summarize` calls `waitable-set.wait`
528+
repeatedly until all `fetch` subtasks have finished, noting that
529+
`waitable-set.wait` can return intermediate progress (as subtasks transition
530+
from "starting" to "started" to "returned") which tell the surrounding core
531+
wasm code that it can reclaim the memory passed arguments or use the results
532+
that have now been written to the outparam memory.
465533

466534
Because the `summarize` function is `canon lift`ed with `async`, its core
467-
function type has no results, since results are passed out via `task.return`.
468-
It also means that multiple `summarize` calls can be active at once: once the
469-
first call to `task.wait` blocks, the runtime will suspend its callstack
535+
function type has no results; results are passed out via `task.return`. It also
536+
means that multiple `summarize` calls can be active at once: once the first
537+
call to `waitable-set.wait` blocks, the runtime will suspend its callstack
470538
(fiber) and start a new stack for the new call to `summarize`. Thus,
471539
`summarize` must be careful to allocate a separate linear-memory stack in its
472-
entry point, if one is needed, and to save and restore this before and after
473-
calling `task.wait`.
474-
475-
(Note that, for brevity this example ignores the `memory` and `realloc`
476-
immediates required by `canon lift` and `canon lower` to allocate the `list`
477-
param and `string` result, resp.)
540+
entry point and store it in context-local storage (via `context.set`) instead
541+
of simply using a `global`, as in a synchronous function.
478542

479543
This same example can be re-written to use the `callback` immediate (thereby
480544
avoiding the need for fibers) as follows. Note that the internal structure of
@@ -495,37 +559,55 @@ not externally-visible behavior.
495559
(import "libc" "mem" (memory 1))
496560
(import "libc" "realloc" (func (param i32 i32 i32 i32) (result i32)))
497561
(import "" "fetch" (func $fetch (param i32 i32) (result i32)))
562+
(import "" "waitable-set.new" (func $new_waitable_set (result i32)))
563+
(import "" "waitable.join" (func $join (param i32 i32)))
498564
(import "" "task.return" (func $task_return (param i32 i32)))
565+
(global $wsi (mut i32))
566+
(func $start
567+
(global.set $wsi (call $new_waitable_set))
568+
)
569+
(start $start)
499570
(func (export "summarize") (param i32 i32) (result i32)
500571
...
501572
loop
502573
...
503574
call $fetch ;; pass a pointer-to-string and pointer-to-list-of-bytes outparam
504575
... ;; ... and receive the index of a new async subtask
576+
global.get $wsi
577+
call $join ;; ... and add it to the waitable set
578+
...
505579
end
506-
... ;; return a non-zero "cx" value passed to the next call to "cb"
580+
(i32.or ;; return (WAIT | ($wsi << 4))
581+
(i32.const 2) ;; 2 -> WAIT
582+
(i32.shl
583+
(global.get $wsi)
584+
(i32.const 4)))
507585
)
508-
(func (export "cb") (param $cx i32) (param $event i32) (param $p1 i32) (param $p2 i32)
586+
(func (export "cb") (param $event i32) (param $p1 i32) (param $p2 i32)
509587
...
510-
if ... subtasks remain ...
511-
get_local $cx
512-
return ;; wait for another subtask to make progress
588+
if (result i32) ;; if subtasks remain:
589+
i32.const 2 ;; return WAIT
590+
else ;; if no subtasks remain:
591+
...
592+
call $task_return ;; return the string result (pointer,length)
593+
...
594+
i32.const 0 ;; return EXIT
513595
end
514-
...
515-
call $task_return ;; return the string result (pointer,length)
516-
...
517-
i32.const 0 ;; return zero to signal that this task is done
518596
)
519597
)
520598
(core instance $libc (instantiate $Libc))
521599
(alias $libc "mem" (core memory $mem))
522600
(alias $libc "realloc" (core func $realloc))
523601
(canon lower $fetch async (memory $mem) (realloc $realloc) (core func $fetch'))
602+
(canon waitable-set.new (core func $new))
603+
(canon waitable.join (core func $join))
524604
(canon task.return (result string) async (memory $mem) (realloc $realloc) (core func $task_return))
525605
(core instance $main (instantiate $Main (with "" (instance
526606
(export "mem" (memory $mem))
527607
(export "realloc" (func $realloc))
528608
(export "fetch" (func $fetch'))
609+
(export "waitable-set.new" (func $new))
610+
(export "waitable.join" (func $join))
529611
(export "task.return" (func $task_return))
530612
))))
531613
(canon lift (core func $main "summarize")
@@ -534,6 +616,9 @@ not externally-visible behavior.
534616
(export "summarize" (func $summarize))
535617
)
536618
```
619+
For an explanation of the bitpacking of the `i32` callback return value,
620+
see [`unpack_callback_result`] in the Canonical ABI explainer.
621+
537622
While this example spawns all the subtasks in the initial call to `summarize`,
538623
subtasks can also be spawned from `cb` (even after the call to `task.return`).
539624
It's also possible for `summarize` to call `task.return` called eagerly in the
@@ -623,25 +708,33 @@ comes after:
623708
[Event Loop]: https://en.wikipedia.org/wiki/Event_loop
624709
[Structured Concurrency]: https://en.wikipedia.org/wiki/Structured_concurrency
625710
[Unit]: https://en.wikipedia.org/wiki/Unit_type
711+
[Thread-local Storage]: https://en.wikipedia.org/wiki/Thread-local_storage
626712

627713
[AST Explainer]: Explainer.md
628714
[Lift and Lower Definitions]: Explainer.md#canonical-definitions
629715
[Lifted]: Explainer.md#canonical-definitions
630716
[Canonical Built-in]: Explainer.md#canonical-built-ins
717+
[`context.get`]: Explainer.md#-contextget
718+
[`context.set`]: Explainer.md#-contextset
719+
[`backpressure.set`]: Explainer.md#-backpressureset
631720
[`task.return`]: Explainer.md#-taskreturn
632-
[`task.wait`]: Explainer.md#-taskwait
721+
[`yield`]: Explainer.md#-yield
722+
[`waitable-set.wait`]: Explainer.md#-waitable-setwait
723+
[`waitable-set.poll`]: Explainer.md#-waitable-setpoll
633724
[`thread.spawn`]: Explainer.md#-threadspawn
634725
[ESM-integration]: Explainer.md#ESM-integration
635726

636727
[Canonical ABI Explainer]: CanonicalABI.md
637728
[`canon_lift`]: CanonicalABI.md#canon-lift
638-
[`canon_lift`]: CanonicalABI.md#canon-lift
729+
[`unpack_callback_result`]: CanonicalABI.md#canon-lift
639730
[`canon_lower`]: CanonicalABI.md#canon-lower
640-
[`canon_task_wait`]: CanonicalABI.md#-canon-taskwait
641-
[`canon_task_backpressure`]: CanonicalABI.md#-canon-taskbackpressure
731+
[`canon_context_get`]: CanonicalABI.md#-canon-contextget
732+
[`canon_backpressure_set`]: CanonicalABI.md#-canon-backpressureset
733+
[`canon_waitable_set_wait`]: CanonicalABI.md#-canon-waitable-setwait
642734
[`canon_task_return`]: CanonicalABI.md#-canon-taskreturn
643735
[`Task`]: CanonicalABI.md#task-state
644736
[`Task.enter`]: CanonicalABI.md#task-state
737+
[`Task.wait`]: CanonicalABI.md#task-state
645738
[`Waitable`]: CanonicalABI.md#waitable-state
646739
[`Subtask`]: CanonicalABI.md#subtask-state
647740
[Stream State]: CanonicalABI.md#stream-state
@@ -657,6 +750,8 @@ comes after:
657750
[stack-switching]: https://github.com/WebAssembly/stack-switching/
658751
[JSPI]: https://github.com/WebAssembly/js-promise-integration/
659752
[shared-everything-threads]: https://github.com/webAssembly/shared-everything-threads
753+
[memory64]: https://github.com/webAssembly/memory64
754+
[wasm-gc]: https://github.com/WebAssembly/gc/blob/main/proposals/gc/MVP.md
660755

661756
[WASI Preview 3]: https://github.com/WebAssembly/WASI/tree/main/wasip2#looking-forward-to-preview-3
662757
[`wasi:http/handler.handle`]: https://github.com/WebAssembly/wasi-http/blob/main/wit-0.3.0-draft/handler.wit

0 commit comments

Comments
 (0)