Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to decode potential JavaScript #7

Open
annevk opened this issue Jan 11, 2021 · 6 comments
Open

How to decode potential JavaScript #7

annevk opened this issue Jan 11, 2021 · 6 comments
Labels

Comments

@annevk
Copy link
Owner

annevk commented Jan 11, 2021

We might not always have an encoding, e.g., fetch(..., { mode: "no-cors" }). Is it reasonable to always use UTF-8 for this check?

@annevk
Copy link
Owner Author

annevk commented Jan 22, 2021

Looking at this again and in particular https://html.spec.whatwg.org/#fetch-a-classic-script I think the simplest option here is that we pass the encoding along with the request and then we need to abstract or duplicate these steps (and maybe improve them while we're at it, especially getting the charset parameter from the Content-Type header):

  1. If response's Content Type metadata, if any, specifies a character encoding, and the user agent supports that encoding, then set character encoding to that encoding (ignoring the passed-in value).
  2. Let source text be the result of decoding response's body to Unicode, using character encoding as the fallback encoding.
  3. Let script be the result of creating a classic script given source text, settings object, response's url, options, and muted errors.

And then if script's record is null parsing failed.

@domenic does that seem right to you?

@domenic
Copy link

domenic commented Jan 22, 2021

I don't have the full context on what security guarantees we're trying to preserve here (is it bad to leak information about the Content-Type header?) but in terms of a spec refactoring, that seems reasonable.

@domenic
Copy link

domenic commented Jan 22, 2021

and maybe improve them while we're at it, especially getting the charset parameter from the Content-Type header

Basically every usage of "Content-Type metadata" in HTML could be improved by using the new MIME type getter, I think.

@annevk annevk mentioned this issue May 17, 2021
@annevk
Copy link
Owner Author

annevk commented Oct 4, 2021

One risk here is that the attacker has control over the encoding, so this technically gives them more opportunity to find a way to get something parsed as JavaScript. In practice it still seems hard to parse as JavaScript as the majority of significant bytes are in the ASCII range.

@annevk annevk added the mvp label May 17, 2022
@annevk
Copy link
Owner Author

annevk commented May 17, 2022

I included a fix for this in whatwg/fetch#1442 which I think works. The HTML side will need to set it on requests, but that's a very straightforward change.

And while it is unfortunate that the fallback encoding is in the hands of the attacker, this is no different from the status quo.

@annevk
Copy link
Owner Author

annevk commented Jun 1, 2022

I forgot that the response itself also carries encoding-related information. whatwg/fetch#1447 tackles the first part of that. Once that lands it should be easy to call from Fetch's ORB PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants