-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Web compatibility issue with various unknown (external) protocols like ed2k #815
Comments
Are these web compatibility issues or issues with extensions? It seems any website breakage would also impact Safari and I haven't seen any reports about breakage. |
These are issues with external applications, not extensions, which are supposed to be opened as external protocol handlers. I assume most users of e.g. eDonkey are on Windows, which might affect Safari less. |
I think there are two different issues with these schemes.
While I don't have any experience with |
The ed2k issue seems to come from U+007C (vertical bar) being listed as a forbidden host code-point. Personally I think it would be very low-risk to allow that character in opaque hostnames. Failing that, it would be reasonable to at least percent-encode the character -- it's very possible that the application's processing would be tolerant to such a change. |
BMO 1878295 has an other example with edit: Live URL Viewer |
It's easy to see why space and code points below it would be forbidden. It's easy to see why DELETE would be forbidden. Also, it's easy to see why squary brackets (IPv6) are forbidden. It's easy to see why characters that occur before or after the host are forbidden. Why are ^, |, and % forbidden? (Today I learned that Thunderbird expects post-parse host to be able to contain %. However, Firefox has not allowed % since 2019, so chances are that it's not a Web compat issue for % to be forbidden..) |
|
I suspect they are inherited from RFC-2396:
https://www.ietf.org/rfc/rfc2396.txt (2.4.3. Excluded US-ASCII Characters) I further suspect the latter part ("they are used as delimiters") is much more common than gateways or other transport agents modifying these characters in URLs. But really, the idea of URLs escaping delimiter characters of popular enclosing document formats is inherently flawed. Consider that parentheses are allowed without escaping, and by some cruel irony are used by the Markdown document format specifically for delimiting URLs. Rust and Swift source code allow user-customisable delimiters for string literals (e.g. At least for characters where there are not likely to be any web-related delimiter issues (vertical bar, curly braces, etc), I think we can afford to be more relaxed and allow them to be used without escaping. |
From reading those previous discussions:
#458 seems to indicate that WebKit used to allow it. If I'm reading the Gecko bug report correctly, their implementation of origins included a separator character for internal flags (which just so happened to be
In my opinion, this seems like rather weak justification for disallowing this character in all URLs. We could at least allow it in non-special URLs (i.e. opaque hostnames), since they do not have defined origins.
Okay, for file URLs it's fair enough, because this standard does actually define a meaning for this character in the hostname of a file URL. But it shouldn't apply to non-file URLs. I think we can at least allow it in opaque hostnames, to solve the In general, it usually doesn't matter if we're overly restrictive for domains/special URLs (which is what browsers tend to care about), because those special characters often won't be registered to any actual domains. But when it comes to opaque hostnames (which browsers have had very spotty support for), it does matter a great deal, because they contain arbitrary content that will be processed in an arbitrary way. The changes which forbade this characters strike me as being overly broad. |
A couple of updates: First, Gecko no longer needs to allow Second, the remaining Gecko deviation from the spec is that Gecko prohibits Additionally, there exists an example in the Web Platform that expects the asterisk not to be part of the normal domain name value space so that it's legitimate to use it as a wild card: wild card certificates. There are other things that deal with origins and explicitly don't allow wild cards. Prohibiting Does URL really need to allow |
@hsivonen see #397. The rationale for allowing non-DNS domains is to allow for non-DNS systems to the widest extent possible. Potentially we could make additional restrictions, but we might well run into issues and as such it seems safer to allow and potentially reject the domain at a layer further down. |
I'm not sure. Note that RFC 3986 allows But since we cannot align it with DNS completely and the partial DNS-alignment is also somewhat weird for opaque URLs, I'd rather leave things as-is as any further change seems risky and not worth it. |
What is the issue with the URL Standard?
After Firefox shipped the new standard URL parser in Firefox 122, we have received multiple bug reports about external protocol handlers that don't work anymore.
The most common seems to be
ed2k:
, a protocol used for the eDonkey file sharing network. It's notable because even the Wikipedia page contains URLs that aren't parseable using a WhatWG URL conformant implementation.Various other issues are related to handling of
://
. For exampleopenimstoolkit://http://example.com
is now parsed asopenimstoolkit://http//example.com
(note the missing:
after http) (Bug 1876729). A similar issue happens forpotplayer:
(Bug 1876731).The text was updated successfully, but these errors were encountered: