-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix invalid url escaping - replaced url parsing from url.parse to new… #178
base: main
Are you sure you want to change the base?
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This makes sense, and I think in general it's probably the right direction to go. I've left some comments though - this is more complicated than it appears.
One other notable case is websockets - much of this same logic also appears in a very similar format in websocket-handler, and that needs to stay in sync with this.
Once this is raedy, I will probably release it as a major bump, since it might change some behaviour in meaningful ways, but that's OK, I think it's about the right time. Given that though, it would probably be sensible to mop up the many other url.parse
usages in the rest of the codebase, to clean everything up fully and parse URLs consistently & more accurately everywhere. Is that something you'd be happy to look into? That's not strictly required to merge this as a useful first step, but it would be useful if you're interested.
@@ -430,7 +429,7 @@ export class PassThroughHandler extends PassThroughHandlerDefinition { | |||
hostname, | |||
clientReq.remoteIpAddress, | |||
getDnsLookupFunction(this.lookupOptions) | |||
); | |||
) || ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? Currently it looks like we previously assumed hostname really could be null
here, and accepted that (and getClientRelativeHostname
preserves that). That's not great - I think in reality it is always set, since req.url
is always an absolute URL in our case (we preprocess to ensure this in MockttpServer).
However, setting it to ""
isn't very good either and might do all sorts of weird things.
In reality, with new URL()
I think hostname
is now always set, and that's recognized in the types, so really we should change getClientRelativeHostname
here - either to remove the null
support entirely so it's always set, or to improve the types so we know that string hostname in => string hostname out.
@@ -465,7 +464,7 @@ export class PassThroughHandler extends PassThroughHandlerDefinition { | |||
hostHeader[1] = updateHostHeader; | |||
} // Otherwise: falsey means don't touch it. | |||
|
|||
reqUrl = new URL(`${protocol}//${hostname}${(port ? `:${port}` : '')}${path}`).toString(); | |||
reqUrl = new URL(`${protocol}//${hostname}${(port ? `:${port}` : '')}${pathname}${search}`).toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, there are other differences here where the url.parse
output is preferable... One notable example is a simple ?
at the end of the path:
console.log(url.parse('https://example.com/?').search);
// '?'
console.log(new URL('https://example.com/?').search)
// ''
These might be semantically equivalent in theory, but they're not actually the same in practice (a server will literally receive a different string, and can legitimately do different things based on that if it so chooses). With this change, when proxying a request to that URL through Node, the client will send a slightly different request to the server that it expected. For more context, there's a WHATWG discussion on this that you might find interesting at whatwg/url#779 (I've just added a comment there with my perspective).
This is not great, but of course you do make a good argument about the other differences where url.parse
is wrong too! I think maybe we might want to make our own URL parser that wraps the two to use WHATWG parsing as standard but with a few extra checks for details like this that get lost otherwise. What do you think?
This pull requests replaces url handling in request processing.
REASON
Some urls can change due to legacy parsing.
For example, parsing the following url
https://example.com:8080/validate?param=\\/|+-
results in the following:
url.parse
:https://example.com:8080/validate?param=%5C/%7C+-
new URL
:https://example.com:8080/validate?param=\\/|+-
It can lead to incompatibilities with some sites.
As stated by NodeJS docs
url.parse
is deprecated due to non-standard parsing and error-prone behavior. URL should be used instead.