In a couple of places, Wasm wire bytes contain UTF-8 encoded strings:
- import names
- export names
- names of custom sections
- contents of the name section
- contents of the
sourceMappingURL section
All of these are expected to be "reasonably short" in practice, and there is currently no specified limit for their lengths. So the only effective limit for them is the overall maximum module size, which is 1 GiB on the web.
In practical deployments, this state of things is working fine. However, every now and then some fuzzer discovers that it could create a module that consists almost entirely of one massive string (in any of the places listed above, e.g. an import name). V8's internal string length limit is currently between about 256 MiB and 512 MiB depending on platform and configuration. When a fuzzer creates a module with a bigger string than that, V8 crashes. This is unfortunate because at the very least it reduces that fuzzer's signal-to-noise ratio; in other words it takes human time to triage such reports without actually improving the state of the world in any way.
We could mitigate this in a number of ways. As a very simple and robust solution, I'd like to introduce an upper limit on the number of bytes that the UTF-8 representation of a string in the wire bytes is allowed to have. V8 will reject modules with longer strings as invalid. Using the number of bytes is convenient because wire bytes represent UTF-8 encoded strings as vec(bytes), so the length of that vector is immediately available. For real-world modules, even a fairly low limit such as 10,000 or 100,000 bytes would probably suffice; I'm also fine with being very generous and allowing 100,000,000 bytes per string.
I'd be supportive of adding this limit to the list of standardized limits in the JS API spec. Considering how exceedingly rare this edge case is, I'm also comfortable with treating this issue as an "FYI" and just making the change in V8 (with a super high limit).
@eqrion @kmiller68 any thoughts?
In a couple of places, Wasm wire bytes contain UTF-8 encoded strings:
sourceMappingURLsectionAll of these are expected to be "reasonably short" in practice, and there is currently no specified limit for their lengths. So the only effective limit for them is the overall maximum module size, which is 1 GiB on the web.
In practical deployments, this state of things is working fine. However, every now and then some fuzzer discovers that it could create a module that consists almost entirely of one massive string (in any of the places listed above, e.g. an import name). V8's internal string length limit is currently between about 256 MiB and 512 MiB depending on platform and configuration. When a fuzzer creates a module with a bigger string than that, V8 crashes. This is unfortunate because at the very least it reduces that fuzzer's signal-to-noise ratio; in other words it takes human time to triage such reports without actually improving the state of the world in any way.
We could mitigate this in a number of ways. As a very simple and robust solution, I'd like to introduce an upper limit on the number of bytes that the UTF-8 representation of a string in the wire bytes is allowed to have. V8 will reject modules with longer strings as invalid. Using the number of bytes is convenient because wire bytes represent UTF-8 encoded strings as
vec(bytes), so the length of that vector is immediately available. For real-world modules, even a fairly low limit such as 10,000 or 100,000 bytes would probably suffice; I'm also fine with being very generous and allowing 100,000,000 bytes per string.I'd be supportive of adding this limit to the list of standardized limits in the JS API spec. Considering how exceedingly rare this edge case is, I'm also comfortable with treating this issue as an "FYI" and just making the change in V8 (with a super high limit).
@eqrion @kmiller68 any thoughts?