-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a definition of {{StringContext}} extended attribute. #841
base: main
Are you sure you want to change the base?
Conversation
This overall makes sense to me. I'd like to get @bzbarsky's take on this architecture. In particular, defining the extended attribute here, and having the "validate the string in context" be in HTML, is a clever split; it echoes the "perform a security check" split we already have. But there are other choices., such as defining [StringContext] entirely in HTML or in the Trusted Types spec, and I'd love further opinions on which split is best. |
I think I'd like to see more multi-vendor buy-in on the trusted types spec before we add hooks to IDL for it, honestly... |
That said, the basic principle of defining the extended attribute and where it fits in the processing model in the IDL spec makes sense, as does having the actual checks live in HTML or Trusted Types. The other options sort of involve either monkeypatching IDL or doing the check on the post-IDL level. I have not carefully looked at whether the actual arguments being passed to the checking algorithm make sense in general. |
One thing I would like to understand, though: why is "check on the post-IDL level" not considered desirable here? |
After some conversations with @annevk, we realized that the existing model of running JavaScript in the middle of a DOM operation was a bad idea; w3c/trusted-types#248 walks through the thought process that landed us at this solution as an alternative that should help us ensure that we only run script before, and not during, DOM tree mutation. w3c/trusted-types#248 (comment) is a summary of where we ended up. |
Sure. The question is why it can't run after IDL argument work but before anything else (just like CEReactions run after everything else on methods that have them). |
IDL level is a single place, where the check may be applied effectively via extended attribute (alternatively, multiple places in DOM and HTML would have to be modified separately). Additionally, it solves the issue with the default policy - in short, it's advisable to run TT operations before DOM gets to mutate the tree. |
TT perform data validation, running them after the data was already processed (e.g. inserted into the DOM) in a separate microtask like CEReactions is too late to surface the violation to the developers. |
This was not the case for [CEReactions], right? Why is this case fundamentally different?
As I said above, that does not require doing it as part of IDL argument conversions.
I didn't say to use the same semantics as CEReactions, which is obviously a daft thing to suggest, so I'm not sure why we're discussing that strawman. I asked why we can't have the semantics of "run before any of the operation steps". Sort of like how all CanvasRenderingContext2D methods have a first step of "validate all the doubles". Those are written out by hand in the spec right now, but they could be done via extended attribute (and are implemented that way in Gecko, fwiw). |
That is a possible alternative - in fact this is what we used before. We could have that language in the other specs instead, possibly added via an extended attribute. The move to IDL type mapping is to avoid having type unions and branching on the types in each operation. Additional small advantage to changing this in IDL type conversion is that it aligns with what we implement (and what we think is the best way to implement TT). |
OK. I'd like to understand what the problems are with that.
I'm not quite following this, sorry. Can you explain further what you mean here?
That really depends on how type conversion is implemented and how generic or special-cased it is. For example, the type conversion code may well not have the attribute/operation involved available to it (as trusted types requires), because they are not needed in that code right now. That said, I would really like to avoid diving into implementation specifics, which differ greatly across engines, I suspect, and sort out the conceptual right place to have this live. Now that I think I understand the setup (though I thought that before too!), it seems like the possible places to put this are:
Now I have my own opinions of how these various options compare, but I'd love a summary of what you view as the tradeoffs and what led to choosing option 1 with the "before" behavior, especially if this is already written down somewhere. In terms of author-observable behavior, there are three different behaviors here: "option 1 with 'before'" is one, "option 1 with 'after'" and "option 2" are another, I think (in that they behave identically), and "options 3"/"option 4", which also behave identically. Note that I did read through the two issues listed in the original PR, but there is a lot of back-and-forth in there and it wasn't clear to me exactly why we landed on this outcome. |
Actually, there is an important behavior difference between options 3 and 4 that I just thought of. When other specs invoke the setter or operation directly, not via the JS reflection, option 4 will lead to the values they pass being sanitized while option 3 (or 2 or 1) will not, right? I don't know how common such direct-invoking is for the cases we care about here, but if it happens it would definitely be observable to script via the callbacks, not to mention possibly affecting the sanitization behavior. |
Just to clarify, the intent is that this is valid:
right? But probably not:
per the discussion in #827 ? And in particular, in this case the sanitizer should be invoked if the passed-in thing is not a Node, but not invoked if it is a Node? |
To my understading, previous version required us to use:
and then the setter for
Correct. IIUC doing this "after" https://heycam.github.io/webidl/#es-DOMString step 2 is undesirable, as we'd lose the JS type information. The logic for the sanitization of values is roughly:
so the JS type information needs to be available at the time of the checks (in HTML or TT or CSP).
This seems roughly what the previous iteration did. The downside, apart from the type branching required (we need to preserve the JS type for the sanitization check), was exactly what you point out:
The fact that the invoking the setter from outside the JS bypasses the sanitizing is actually desirable. For example, that allows the HTML parser algorithm to create the DOM elements without needing to carve out the exception for it (we are guarding e.g. innerHTML setter, but when the string actually gets allowed to be parsed - or is parsed after navigation - arbitrary nodes with arbitrary attributes are allowed). In other words, Trusted Type try to prevent the JS code from interacting with risky APIs, not the browser algorithms.
The intended behavior is that the sanitization logic gets access to the JS type used, for the value to skip the sanitization ste). The simplest way I see it to hook that as early as possible, and convert to the IDL type later.
I don't think there's something fundamentally wrong with using type unions in IDL definitions for our purpose - the annotated type route was taken as a preferred option for simplicity (one place to change things, and upstream specs remain to only see DOMStrings). Perhaps @domenic can weigh in here.
Thanks a lot, and sorry for looping you in that late into the discussion. Some of the them were triggered internally, so it's indeed not clear from GitHub alone why certain approaches were taken. I also am definitely not an expert in IDL, and I learn as I go. Thank you for taking your time to review this.
Good point. I didn't consider this case, but indeed it would be fine for the sanitizer to be skipped if the extra type was included in the IDL. In the case of |
Thank you, that is very helpful. So the key is that there is still a concept of "already validated" on the caller side (which was not clear to me from the above) and that we want to only do validation if the thing we have is not already validated. OK, that makes things much clearer.
OK, I see. One thing that's not entirely clear to me is whether the default policy case should be passed the initial value, or a stringification of the original value, or a stringification with the USVString fixups applied, or whether it doesn't matter much. But ok, now I understand why you want to at least invoke this algorithm before stringification of the original value.
OK. That's a reasonable tradeoff, as long as you make sure that all the relevant boundary bits are annotated.
I had been thinking of option 2 as a post-conversion step, so coming at a later point than "option 1 + before". It looks like you're asking about doing it as a pre-conversion step. The main observable difference, I guess, is what happens with a union type. "Option 2" would invoke the sanitization stuff no matter what the value is, and while "option 1 + before" will only invoke it if we determine that we're taking the DOMString branch of the union, right? Presumably we want the latter behavior, in general? |
Er, clicked submit too early.
Yes, this makes sense.
No need to apologize. I don't have the bandwidth to be in all discussions all the time, and certainly didn't for this one... I think I finally understand what the proposed setup here is. From an IDL and implementation point of view (now with my Mozilla hat on for the latter), the main complexity is threading through the information about what operation/setter is involved to the conversion site. If we didn't need to worry about unions we could do the check as in "option 2" before doing the arg conversion, and that's fairly simple to spec and implement, depending on what information about the operation is needed. Specifically, it doesn't allow distinguishing overloads, but neither does any variant of option 1; the only way to do that is options 3 or 4. But if we want to have the semantics that |
Yes, sorry about not being explicit about that early on. I don't think we have a precedent for such behavior in the web platform. It's familiar and intuitive to our team, as we're working on the thing, but understandably, it's a bit of a surprising twist.
It would be preferable if we could get the initial value in the default policy. The current spec draft is worded differently, but we're thinking of changing that, as that allows the authors to allow a selection of "their" types to pass through. It's not a strong requirement though, just a recent idea we had.
👍
Yes, the latter would be preferable.
What we'd be using in the sanitizing callout is:
If that helps, I only see a single case for us where that would be relevant: https://w3c.github.io/webappsec-trusted-types/dist/spec/#enforcement-in-timer-functions, but we can special case it in HTML I think. |
OK. Back with my implementor hat on, this is currently not readily available in the guts of type conversions in Gecko, fwiw... With my spec hat on, it's sort of ambient state, kinda, if you know your type conversion is happening in the right places.... The CSP check doesn't need to know what method is involved?
Again, not really something that's available in the guts of type conversions right now, either in the spec or in (some) impls. This one would be much more clearly available in the "option 2, before" scenario, of course, since that has access to the operation (in the spec). Again, threading it through is possible, but needs to be done; definitely on the spec side.
This is the value of the extended attribute, so is conceptually available wherever we need it, assuming we can tell that we have the extended attribute at all at that point. |
Ah, I guess it just needs to know "the type to convert to". |
the method as well (a name of it). Technically, we call the default policy function, the second argument to which is, say, "Element.innerHTML", or "Document.write" (see Example 14). |
So, summarizing the options the integration as outline is OK given the constraints and the intention of the API, but there's a slight preference to do option 2 + before, at least for the simplicity for the implementors (the operation data is already available there):
Correct? I have a folowup question - would it be preferable to merge with IDL now-ish, or should we monkey-patch IDL in TT? I don't know what the usual convention is for new APIs. |
I have a question.
where I'm now just wondering if this is cleaner or clearer than If this doesn't matter much, please move on. |
@yuki3 that does seem quite a bit better to me. That would also more clearly allow for new endpoints that only take a type. |
@annevk That is quite similar to how it is defined currently: We typedef HTMLString union and set the attribute on the union. We could set it for the The real issue is that, to my understanding, the original design as outlined above caused issues with the default policy for DOM (w3c/trusted-types#248) - which triggered the redesign and this PR. It seems to me like now we're going back to where we were before. Does that not cause default policy issues in the end? The intention is for most spec algos to continue dealing with strings only, and for the sanitization happening before they run. If we can achieve that by rephrasing https://w3c.github.io/webappsec-trusted-types/dist/spec/#!trustedtypes-extended-attribute, and without modifying WebIDL, I'm sold. |
I don't think we can achieve that without introducing a modification to the string type. A union type will always cause specs to branch on both forks of the union. I think the better resolution to @yuki3's concern is to move [StringContext] to another spec, if he thinks it is bad for Web IDL to have knowledge of it. But I stand by my earlier comment that the division of putting [StringContext] here and putting the "validate the string in context" in another spec is a reasonable division. |
The actual operation for the TrustedXYZ though is just stringification. DOMString is already a validated value, if stars are aligned correctly. So the modification to the algoritm could just be to stringify an argument (perhaps that logic could also be provided in an extended attribute definition). |
I don't know how much important this is, but given it's important, I'm fine with the current plan. (I understand that it's itchy to add a line "convert the argument XXX to a DOMString with using a certain process YYY" into each spec, but I don't know how much important it is. Anyway, we need to update each spec to change the argument type, I think.) |
I feel like I'm missing something, but would something like |
The main problem with this solution is that it's a union type. This means that, like all union types, specification writers need to branch on which type they recieved. So specification writers need to write something like:
The desired experience, by using
|
I agree with @domenic here, we're avoiding type unions to avoid branching in other spec algos, and centralize the behavior in WebIDL (to define the extended attribute and callouts) and HTML (to implement the TT-specific check). The current Blink implementation follows the PR above ("option 1 + before" from #841 (comment)), as we need the original type information when checking. In our impl piping through the context ended up being quite simple, and we get the benefits of removing many custom changes at the actual setters (the latter would be possible with other options too, as long as we don't choose the "type unions" approach). I think the decision process for choosing option 1 is neatly summarized in #841 (comment). |
What's the latest status of this change? |
index.bs
Outdated
adhere to the context the string is used in. | ||
|
||
The [{{StringContext}}] extended attribute must [=takes an identifier|take an identifier=]. The [=identifier=] | ||
must be one of "<code>html</code>", "<code>script-url</code>" and "<code>script</code>". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TrustedTypes spec as it currently is uses the TrustedXXX as the identifiers. It would be good to get clarity on whether that is correct or if this is correct?
(DOM|USV)String. This is to hook up the Trusted Types validation during the ES->IDL type conversion to avoid funky issues with its default policy. See w3c/trusted-types#248, w3c/trusted-types#176
…and regular argument types explicitly.
b549db8
to
885ae1e
Compare
[{{EnforceRange}}], and | ||
[{{LegacyNullToEmptyString}}]. | ||
[{{EnforceRange}}], | ||
[{{LegacyNullToEmptyString}}] and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please bring back the Oxford comma.
@@ -7581,6 +7583,13 @@ value when its bit pattern is interpreted as an unsigned 64-bit integer. | |||
A JavaScript value |V| is [=converted to an IDL value|converted=] | |||
to an IDL {{DOMString}} value by running the following algorithm: | |||
|
|||
1. If the conversion is to an IDL type [=extended attribute associated with|associated with=] the | |||
[{{StringContext}}] extended attribute, then set |V| to the result of [=validate the string in context=], passing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validating*
[{{StringContext}}] extended attribute may only annotate a type of a [=regular attribute=] or | ||
a [=regular operation=] argument. A type annotated with the [{{StringContext}}] | ||
extended attribute must not appear in a [=read only=] attribute. The [=regular attribute=] or | ||
a [=regular operation=] argument that the type annotated with the [{{StringContext}}] extended |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a [=regular operation=] argument that the type annotated with the [{{StringContext}}] extended | |
[=regular operation=] argument that the type annotated with the [{{StringContext}}] extended |
[=this=], |V|, the {{StringContext}} extended attribute [=identifier=], and the [=identifier=] | ||
of the [{{StringContext}}] extended attribute [=related construct=]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has some extreme magic that I don't really understand.
As far as I know when we do type conversion there is no "this" and there is no access to something like a "related construct".
I'd like to hear what @domenic and @Ms2ger think, but it seems to me you have to patch algorithms such as https://webidl.spec.whatwg.org/#dfn-attribute-setter instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this doesn't work with the layering as it currently exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you cannot access "this" here. Some call sites don't have a "this".
Putting it in the appropriate call sites instead (e.g., the overload resolution algorithm and attribute setters) seems like the best approach.
[=this=], |V|, the {{StringContext}} extended attribute [=identifier=], and the [=identifier=] | ||
of the [{{StringContext}}] extended attribute [=related construct=]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this doesn't work with the layering as it currently exists.
1. the {{StringContext}} [=identifier=], and | ||
1. the [=identifier=] of the operation or attribute. | ||
|
||
The algorithm returns an ECMAScript String value, or [=ECMAScript/throws=] a {{ECMAScript/TypeError}}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that https://w3c.github.io/trusted-types/dist/spec/#html-validate-the-string-in-context doesn't necessarily return a string, and if it did, the conversion algorithm above does some unnecessary work.
Can the PR preview please be fixed? CC @koto |
A type that is not {{DOMString}} or {{USVString}} must not be [=extended attributes associated with|associated with=] | ||
the [{{StringContext}}] extended attribute. | ||
|
||
See the rules for converting ECMAScript values to the IDL types in [[#es-DOMString]] and [[#es-USVString]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two es-X references are fails in bikeshed which is breaking the preview
@@ -11056,6 +11100,21 @@ allowed. The security check takes the following three inputs: | |||
|
|||
Note: The HTML Standard defines how a security check is performed. [[!HTML]] | |||
|
|||
Certain algorithms in [[#es-type-mapping]] are defined to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This es-X is likewise a bikeshed fail
WIP: This extended attribute can be defined on
DOMString
andUSVString
.This is to hook up the Trusted Types validation during the ES->IDL type
conversion to avoid funky issues with its default policy.
See w3c/trusted-types#248,
w3c/trusted-types#176
The actual logic of the string validation calls into HTML, and will be something similar to https://w3c.github.io/trusted-types/dist/spec/#html-validate-the-string-in-context.
Preview | Diff
💥 Error: 400 Bad Request 💥
PR Preview failed to build. (Last tried on Jan 26, 2024, 12:41 PM UTC).
More
PR Preview relies on a number of web services to run. There seems to be an issue with the following one:
🚨 CSS Spec Preprocessor - CSS Spec Preprocessor is the web service used to build Bikeshed specs.
🔗 Related URL
If you don't have enough information above to solve the error by yourself (or to understand to which web service the error is related to, if any), please file an issue.