Skip to content

add Wit interfaces #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Mar 4, 2023
Merged

add Wit interfaces #3

merged 12 commits into from
Mar 4, 2023

Conversation

lukewagner
Copy link
Member

This PR proposes some initial Wit interfaces and a wasi:http/proxy world using them that's mostly a synthesis of a lot of the existing proposals. As the proposal develops, I expect us to add new "bigger" worlds than wasi:http/proxy world (either here or in other proposals), but this wasi:http/proxy world seems like perhaps a good minimal concrete starting point. There's still a lot more to fill in to make this a proper WASI proposal, but perhaps this is enough for us to chew on now. Lastly, I still consider this very much in a draft state so happy to iterate with folks in this PR.

(cc @PiotrSikora @brendandburns @eduardomourar)

Copy link
Collaborator

@Mossaka Mossaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Luke, thanks for the PR! This looks great to me!

// `wasi:http/outgoing-handler` into a single `wasi:http/handler` interface
// that takes a `request` parameter and returns a `response` result.
//
default interface incoming-handler {
Copy link
Contributor

@brendandburns brendandburns Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to split this out into separate PRs? I feel like it's going to be a heavy lift to do both client and server implementations in a single PR.

Of course, if it's ok to only have a partial implementation, that works too.

@brendandburns
Copy link
Contributor

This looks great for a start. I had a couple of questions:

The first is that is it ok to do a partial implementation of the spec when we start adding it to wasmtime? If not I worry that this is so comprehensive it will be a large change to get landed.

The second is that I'm expecting this API will adapt and change as we (and others) gain experience with it. Thus my review is "is it good enough?" not "is it perfect?" I hope that works.

@lukewagner
Copy link
Member Author

@brendandburns Thanks for the review and good questions.

The first is that is it ok to do a partial implementation of the spec when we start adding it to wasmtime?

I think it's pretty normal for Wasmtime to land implementations incrementally and so adding a partial implementation makes sense to me, but I'd be interested to hear from @pchickey or @alexcrichton. It might be useful to open an early issue to discuss the rough shape of the end-state design, though, since I think by now we have a few distinct Wasmtime embeddings that are pretty interested in reusing a common shared HTTP implementation.

Thus my review is "is it good enough?" not "is it perfect?" I hope that works.

Yes, that is totally the appropriate level of review at this early stage. Iterating on the design in response to implementation feedback is an essential and expected part of both the core wasm and WASI phases process.

@alexcrichton
Copy link
Contributor

Yes landing incrementally in Wasmtime is ok. The main threshold for landing an experimental feature in Wasmtime is "it doesn't affect anything else or otherwise has consensus about how it should be designed". The consensus part is nontrivial to acquire but I suspect an implementation here wouldn't affect other pieces so consensus shouldn't be necessary.

@brendandburns
Copy link
Contributor

Any chance that we can do a merge and iterate here? Looking through the comments it feels like the major questions (e.g. timeouts) have been resolved, and while there remain minor adjustments, this is the kind of thing that we can adapt as we implement (and indeed watching users actually use a specific implementation may help guide the design)

@lukewagner
Copy link
Member Author

Yeah, I was just thinking the same thing; I'll merge later today if there are no objections.


/// Describes messages indicating serious errors.
error,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not the right PR, but should we have a higher log level (e.g. critical for messages proceeding panic, etc.)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's right, wasi-logging would be the repo to file that under.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed WebAssembly/wasi-logging#9 to track this.

wit/types.wit Outdated
// standard Contents. With Preview3, all of these fields can be replaced by a
// single type definition:
//
// type body = stream<u8, option<trailers>>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailers shouldn't be part of the body, they are a separate thing (think: trailing headers).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially I was going to simply have a method returning a (pseudo) future<trailers> (as we discussed a while back), but then I realized that this introduces a tricky question: if I poll_oneoff the trailers without concurrently consuming the body stream, does that (1) deadlock (b/c we're not consuming the body, there is backpressure, so we never get the trailers), (2) implicitly discard/ignore the body (so that we can get to the trailers)? Putting the trailers as the "final value" of the stream avoids this design problem. But agreed that "body" is a misnomer if the type includes trailers. What do you think of this alternative naming that still keeps the same API shape, but avoids calling the composite "body"?

request: outgoing-request,
options: option<request-options>
) -> future-incoming-response
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find those pairings (incoming-request and outgoing-response, and outgoing-request and incoming-response) very non-intuitive and a bit redundant.

Could we consider the same prefix on the same connection, e.g. downstream-request and downstream-response, and upstream-request and upstream-response?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 9110 suggests inbound and outbound as the preferred terms here, and defines upstream and downstream differently than you are using them: https://www.rfc-editor.org/rfc/rfc9110#name-intermediaries

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the popular implementations (e.g. NGINX and Envoy) disagree with the RFC about upstream and downstream, and I'm not aware of anything that actually uses the RFC terms.

I considered suggesting inbound and outbound, but it's yet another term, and it's actually used in service mesh environment to describe traffic direction in a global sense, and not from the perspective of a local endpoint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our experience at fastly was that upstream and downstream were so confusing internally, and to our customers, that we stopped using those terms throughout our api and docs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did you end up replacing it with? I'm not a fan of downstream / upstream myself (but mostly because the RFCs disagree with implementations), so I'm fine with other terms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that upstream/downstream is extremely confusing in practice (as evidenced by my own misuse of it below). I saw incoming/outgoing in one of the other proposals and it seemed more clear when you think of it from the component author's perspective. That being said, note that all incoming-/outgoing- prefixes would go away in Preview3 when we get to merge all these resources/interfaces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most programming languages have standardized on request/response without incoming/outgoing b/c the context is pretty obvious from whether you are implementing a server or making a client call and programmers don't really get confused about that.

I think we should delete incoming/outgoing or downstream/upstream and just stick to request/response with the context providing the direction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is definitely the goal with Preview 3. However, in a Preview 2 timeframe, where we want to do concurrent streaming of requests/responses and have blocking poll_oneoff, the concrete signatures of the request/response resources have to be different depending on whether you are reading from them or writing to them, so we unfortunately can't wait and infer when the request or response hits a call boundary. I do believe it's possible to implement source-language libraries which are direction-agnostic (e.g., JS Response) in terms of these these directional resource types, though, so that this is abstracted from the developer (e.g., when constructing a Response, the impl would internally call new-outgoing-response...). Avoiding this extra work and complexity is one of the main motivations for adding future/stream (and the stack-switching required to implement these in the underlying implementation).

wit/proxy.wit Outdated
// This is the default handler to use when user code simply wants to make an
// HTTP request (e.g., via `fetch()`) but doesn't otherwise specify a
// particular handler.
import default-upstream-HTTP: pkg.outgoing-handler
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP world is already overloaded with terms regarding direction, with upstream/downstream often meaning opposite things in different documents and software. Could we avoid using multiple terms describing the same thing in this spec? Maybe let's stick to upstream/downstream, so this should be changed to:
import default-upstream-HTTP: pkg.upstream-handler

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, my mistake mixing and then inverting terminologies. Fixed

protocol-error(string),
status-error(u16),
unexpected-error(string)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of weird enum, since it mixes status codes with non-HTTP errors. What's the intent here?

Also, the list is pretty limited. Perhaps we should include errors from the Proxy-Status RFC (i.e. https://www.iana.org/assignments/http-proxy-status/http-proxy-status.xhtml)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question; I mostly just lifted this from one of the other proposals. Aligning with HTTP semantically-defined errors sounds generally good, but don't we already get that from the status-code? Perhaps this is something best hashed out in a separate issue. For now, I've added a TODO to the type definition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should get rid of status-error in general. From the client perspective, if HTTP communication worked, that's not an error, obviously the client surfaces the received status code up to the caller, but it's up to the caller to decide if it is an error.

A classic example of this is whether 404 is an error or not. It really depends on the calling context, and if you force a programmer to do something special when they actually expect a 404 then it becomes problematic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #5 for the more general question.

wit/types.wit Outdated
incoming-request-scheme: func(request: incoming-request) -> option<scheme>
incoming-request-authority: func(request: incoming-request) -> string
incoming-request-headers: func(request: incoming-request) -> headers
incoming-request-consume: func(request: incoming-request) -> result<incoming-body>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body is optional, and it's absent in 90%++ of requests.

However, it seems that this API assumes it's always present (presumably, a missing body is represented as an empty stream with EOF, but I don't think that's a good choice), and it requires calls to consume and finish to get trailers, which adds a lot of unnecessary overhead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. So would a solution be to have consume return a result containing a 3-case variant such as:

variant {
  body(incoming-stream),
  trailers(trailers),
  none
}

(and similarly for write, but outgoing)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #6 to consider further.

@lukewagner
Copy link
Member Author

Thanks for the great feedback Piotr, you're mostly who I was hoping to hear from before, so I'll merge once we resolve these new comments.

@brendandburns
Copy link
Contributor

brendandburns commented Mar 3, 2023

@lukewagner so I started to implement this, and I discovered two things:

  1. The way it is currently written, you are required to implement the inbound HTTP handler, even if you only want to do outbout HTTP calls.

This code won't compile:

#include "proxy.h"
#include <stdio.h>

int main() {
    default_outgoing_http_outgoing_request_t req;
    default_outgoing_http_request_options_t opts;
    default_outgoing_http_future_incoming_response_t res;

    res = default_outgoing_http_handle(req, &opts);
    printf("FOO: %d\n", res);
}

But this code does:

#include "proxy.h"
#include <stdio.h>

void http_handle(uint32_t arg, uint32_t arg0) {

}


int main() {
    default_outgoing_http_outgoing_request_t req;
    default_outgoing_http_request_options_t opts;
    default_outgoing_http_future_incoming_response_t res;

    res = default_outgoing_http_handle(req, &opts);
    printf("FOO: %d\n", res);
}

That seems problematic.

  1. Also, it seems that linking in component model code via the bindgen macro into wasmtime isn't quite working since the existing linking code uses wasmtime::Linker and the bindgen! macro uses wasmtime::component::Linker

Thoughts on how to proceed would be useful/interesting.

Thanks!

@Mossaka
Copy link
Collaborator

Mossaka commented Mar 4, 2023

Also, it seems that linking in component model code via the bindgen macro into wasmtime isn't quite working since the existing linking code uses wasmtime::Linker and the bindgen! macro uses wasmtime::component::Linker

@brendandburns can you try using the component-model feature of wasmtime. That will allow you to use wasmtime::component::Linker. You can do that by adding this to the "Cargo.toml"

wasmtime = { workspace = true, features = ["component-model"] }

The way it is currently written, you are required to implement the inbound HTTP handler, even if you only want to do outbout HTTP calls.

I want to hear @lukewagner's opinions on this but it seems to me that this is a good motivation for "optional imports and exports" I've been hearing.

@brendandburns
Copy link
Contributor

@Mossaka the existing code paths for the wasmtime binary do not make use of the component model linker, and when I tried to instantiate a component model linker (wasmtime:components:Linker) pointed at the same Engine the existing linker was using my WASM couldn't find the function to link to.

It's possible that there is some name mangling or something going on, but it's hard to debug. I'm happy to share the PR with the implementation if that's useful.

@brendandburns
Copy link
Contributor

brendandburns commented Mar 4, 2023

@lukewagner As I've coded against this more, I'm wondering about whether we really want to store the request and headers options on the host side, rather than just serializing it across as part of the request.

As this is currently written, you cross the guest/host boundary a minimum of 3 times:

  1. Allocate a list of headers
  2. Allocate the Request object (using the headers from 1)
  3. Make the actual request

This doesn't seem very ideal to me as it involves marshalling three different pointers across the boundary. It seems cleaner to just keep everything as strings, and only martial char* across the boundary once.

Thoughts?

@danbugs
Copy link
Collaborator

danbugs commented Mar 4, 2023

@Mossaka the existing code paths for the wasmtime binary do not make use of the component model linker, and when I tried to instantiate a component model linker (wasmtime:components:Linker) pointed at the same Engine the existing linker was using my WASM couldn't find the function to link to.

It's possible that there is some name mangling or something going on, but it's hard to debug. I'm happy to share the PR with the implementation if that's useful.

@brendandburns – If you want to look at a couple of example implementations of other proposals (i.e., wasi-keyvalue, wasi-messaging, and wasi-sql), I recommend checking out the following repos:

~not sure if this helps, but figured I'd drop it here just in case 👍

Copy link
Collaborator

@danbugs danbugs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this LGTM after Piotr's comments are addressed~ would be good to merge it and further iterate w/ subsequent PRs 👍

@brendandburns
Copy link
Contributor

@danbugs I get that I could build my own binary that uses the component linker, but I don't really want to do that. I want this to be lit up in the main wasmtime binary.

It's not good to have lots of fragmented binaries that each implement different wasi specs, we need to centralize them in a single runtime.

@brendandburns
Copy link
Contributor

(and just to be clear, I'm a firm +1 on LGTM/merge + iterate)

@lukewagner
Copy link
Member Author

Alright, I'll merge and then file issues for all the remaining issues here and of course everyone else should feel free to do likewise.

@lukewagner
Copy link
Member Author

And just to follow up on @brendandburns's question above: I think @Mossaka is right that this is related to the issue of optional exports mentioned here. The summary is: before too long, we should be able to mark imports and exports as optional and, until then, perhaps our tooling should simply treat all exports declared in a world as optional (therefore not giving an error if you fail to export one).

That being said, the reason we made world a first-class concept in Wit is so that there could be many of them defined, capturing the heterogeneity of embeddings we're all building here. If a component does not export an HTTP handle method, how does it get invoked by the host? In your example code, the answer is from an exported main() function which could correspond to a standard wasi:cli/main interface export. So this would be an entirely different world from wasi:http/proxy, e.g.:

// app-server.wit
world app-server {
  import default-outgoing-HTTP: pkg.outgoing-handler
  export cli.main
}

This world could be included in the wasi-http proposal alongside the proxy world and what's useful about having these as two different worlds is it would capture the difference between a host that starts and runs a component in a more traditional app-server-y model vs. a host that is invoking wasm in response to the arrival of HTTP requests in a more serverless model. (To wit, with the stack suspending/resuming functionality introduced by Preview 3, it should be possible to automatically virtualize both worlds in terms of the other.)

If anyone's interested, I'm happy to create a new issue to dig into this direction more.

@brendandburns
Copy link
Contributor

I have begun an implementation here:

bytecodealliance/wasmtime#5929

It is massively incomplete and also hacky, but it (partially) works for some simple examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.