Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 Invalid sequences for component model strings #224

Closed
rajsite opened this issue Aug 8, 2023 · 2 comments
Closed

UTF-8 Invalid sequences for component model strings #224

rajsite opened this issue Aug 8, 2023 · 2 comments
Labels
question Further information is requested

Comments

@rajsite
Copy link

rajsite commented Aug 8, 2023

Forgive me if I'm missing it, but is there a discussion of how the Unicode UTR 36: UTF-8 Exploits are addressed by the component model strings?

From what I can tell looking at the CanonicalABI it looks like the string lift operation is responsible for validation and trapping on "Unicode Errors".

I'm wondering what guarantees I have as a component author that lowered strings are valid UTF-8 strings from the security perspective of that report. For example, overlong string encodings in the cited UTR 36 document and in the UTF-8 Wikipedia: Invalid Sequences and Error Handling topic are specifically described as being the cause of security issues in web services (a relevant use-case for WASM components) and potentially overlooked by decoders (WASM component authors).

Some concrete questions:

  1. Are there guarantees that can be documented for component authors about strings lowered into a component and errors raised for improperly formatting sequences for strings lifted?

  2. Are there compliance tests for tooling / host runtimes around expected UTF-8 validation (particularly as the security issues there are relevant to server applications of WASM components).

  3. The canon_lower topic has a discussion point on efficient trampolines:

    Since any cross-component call necessarily transits through a statically-known canon_lower+canon_lift call pair, an AOT compiler can fuse canon_lift and canon_lower into a single, efficient trampoline.

    Is there a discussion of the validation expectations of such efficient trampoline optimizations? I'd assume you would still need to run the validation passes associated with a lift on a UTF-8 string to prevent issues like overlong encoding being overlooked.

My goal in the end is to make sure I'm not doing that work twice. If there are strong guarantees clearly described about what validation is done on strings I can skip doing that work or conversely make sure it is done.

@lukewagner
Copy link
Member

Hi, great questions!

  1. Yes. When a string is lowered into your component, the wasm running inside that component can rely on the bytes being valid in the encoding your component specified in the canonopt which would, to your first point, reliably be in the minimal form. And for lifting, invalid encodings are precisely defined to produce a fatal wasm traps that halt execution (so there's no question of what the lowering side receives in case of invalid input).
  2. We don't have a centralized test suite yet, but the intention is to have one in this repo, analogous to what's in the core wasm spec/test directory. Work is started in Initial implementation of resource type validation #192. There's also a bunch of unit tests in wasmtime that we'll want to merge into this repo.
  3. Good point. The expectation is that in this fused trampoline, there is a single-pass loop that both validates and copies. Because import calls are disallowed during this operation (and there are no observable side effects other than through import calls) and memory is not considered observable after a trap (due to the lockdown-on-trap rules), the loop can perform the writes and checks in any order, which should enable efficient vectorization of the loop.

So given all that, core wasm running in the component shouldn't need to validate incoming strings.

@ricochet ricochet added the question Further information is requested label Aug 23, 2023
@sunfishcode
Copy link
Member

The questions here appear to be answered. For any further questions, please file new issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants