Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix invalid url escaping - replaced url parsing from url.parse to new… #178

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dukei
Copy link

@dukei dukei commented Aug 20, 2024

This pull requests replaces url handling in request processing.

REASON
Some urls can change due to legacy parsing.

For example, parsing the following url https://example.com:8080/validate?param=\\/|+-
results in the following:
url.parse: https://example.com:8080/validate?param=%5C/%7C+-
new URL: https://example.com:8080/validate?param=\\/|+-
It can lead to incompatibilities with some sites.

As stated by NodeJS docs url.parse is deprecated due to non-standard parsing and error-prone behavior. URL should be used instead.

@CLAassistant
Copy link

CLAassistant commented Aug 20, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Member

@pimterry pimterry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This makes sense, and I think in general it's probably the right direction to go. I've left some comments though - this is more complicated than it appears.

One other notable case is websockets - much of this same logic also appears in a very similar format in websocket-handler, and that needs to stay in sync with this.

Once this is raedy, I will probably release it as a major bump, since it might change some behaviour in meaningful ways, but that's OK, I think it's about the right time. Given that though, it would probably be sensible to mop up the many other url.parse usages in the rest of the codebase, to clean everything up fully and parse URLs consistently & more accurately everywhere. Is that something you'd be happy to look into? That's not strictly required to merge this as a useful first step, but it would be useful if you're interested.

@@ -430,7 +429,7 @@ export class PassThroughHandler extends PassThroughHandlerDefinition {
hostname,
clientReq.remoteIpAddress,
getDnsLookupFunction(this.lookupOptions)
);
) || "";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Currently it looks like we previously assumed hostname really could be null here, and accepted that (and getClientRelativeHostname preserves that). That's not great - I think in reality it is always set, since req.url is always an absolute URL in our case (we preprocess to ensure this in MockttpServer).

However, setting it to "" isn't very good either and might do all sorts of weird things.

In reality, with new URL() I think hostname is now always set, and that's recognized in the types, so really we should change getClientRelativeHostname here - either to remove the null support entirely so it's always set, or to improve the types so we know that string hostname in => string hostname out.

@@ -465,7 +464,7 @@ export class PassThroughHandler extends PassThroughHandlerDefinition {
hostHeader[1] = updateHostHeader;
} // Otherwise: falsey means don't touch it.

reqUrl = new URL(`${protocol}//${hostname}${(port ? `:${port}` : '')}${path}`).toString();
reqUrl = new URL(`${protocol}//${hostname}${(port ? `:${port}` : '')}${pathname}${search}`).toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, there are other differences here where the url.parse output is preferable... One notable example is a simple ? at the end of the path:

console.log(url.parse('https://example.com/?').search);
// '?'

console.log(new URL('https://example.com/?').search)
// ''

These might be semantically equivalent in theory, but they're not actually the same in practice (a server will literally receive a different string, and can legitimately do different things based on that if it so chooses). With this change, when proxying a request to that URL through Node, the client will send a slightly different request to the server that it expected. For more context, there's a WHATWG discussion on this that you might find interesting at whatwg/url#779 (I've just added a comment there with my perspective).

This is not great, but of course you do make a good argument about the other differences where url.parse is wrong too! I think maybe we might want to make our own URL parser that wraps the two to use WHATWG parsing as standard but with a few extra checks for details like this that get lost otherwise. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants