-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
libutil: relax git URL parsing to accept paths without a user component #14863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Looking over the history of It looks like the tests introduced during that effort made it in 2b310ae and more test work has landed since. Hopefully the impact to the tests made here is palatable as it tries to address some known shortcomings. |
|
Thanks! Generally, not a fan of re-adding regexes once again - using those is what got us into this messy situation. Could we look into re-landing the #13821? That was mostly reverted due to the fact that we wanted tests first before functional changes (also the handling of relative paths was borked). Hardcoding |
|
I agree about the regex usage. Putting the implementation aside, would you be willing to (in)validate some of my assumptions with the test changes? I'd like to get the tests nailed down and then will work on simplifying the implementation (will take a longer look at #13821). Some questions that came up while working with the tests:
The "git+file" and "path with a space" cases were added to cover regressions discovered when running against the full Nix test suite. Thanks for taking a look over the changes! |
Bug fixes are good. All the test changes aside from relative paths look good to me. We can fix the cases when we produce utterly invalid results that git doesn't understand or can't reasonable succeed. One thing is that this is used in
This looks like a bug. Currently nix discards consecutive slashes in libfetchers already in quite a few places (not sure it's the case in the git fetcher), so this change seems good to me from the back-compat side of things.
I'm not sure. Issue is that we can't distinguish scp-style URLs from ligitimate URLs where it's a scheme, but git-style URLs shouldn't have that collision. Can you make sure that fixGitUrl is only used when we know it must be a git one? It might be that it's misused in a few places (hopefully I'm wrong).
Yeah, I'm not sure how to reconcile this. What are the semantics of this? Use the working directory in the remote? We can stick with fixing everything but that and explicitly ban relative paths for now.
Yeah, this doesn't sound like a good idea.
I think that we prefer to use the empty authority case rather than the no-authority. I recall from the RCCs that the second variant is less preferred.
That is good, thanks! Always nice to be thorough about these things. Technically nix URL semantics sit somewhere between WHATWG spec (spaces and ^ are allowed unencoded), but closer to RFC3986. |
|
FWIW, it's only recently that we started doing proper URL parsing in more places and it was all stringly typed everywhere, so that recent big rework is surfacing some bugs. Specifically in the to_string assertions that validate the invariants required by the spec. |
|
Also can you make sure that we also fix #5958? |
I've not familiar with this code base but all the
I'm unable to reason about the semantics either. I like banning relative paths, but that breaks
I'm confused here. You reopened on Oct 20, but this test appears to validate the expected behavior (port number parsed as the authority's port instead of a path component). // https://github.com/NixOS/nix/issues/5958
// Already proper URL with git+ssh
FixGitURLParam{
.input = "git+ssh://user@domain:1234/path",
.expected = "ssh://user@domain:1234/path",
.parsed =
ParsedURL{
.scheme = "ssh",
.authority =
ParsedURL::Authority{
.host = "domain",
.user = "user",
.port = 1234,
},
.path = {"", "path"},
},
}, |
|
I think this is ready for another look. I can rebase+squash if that would be easier to read. Thanks! |
|
IIUC, the primary headache comes from the fact that git itself doesn't treat really do pct encoding, but fetchTree does, so you must ensure that everything is percent encoded when it's converted back into a string, but when rendering into a string when passing to git it doesn't get re-encoded. Special-casing just the spaces doesn't seem like the best approach. So my suggestion (more concise than above):
That should cover all the bases pretty ok-ish. It's not efficient to do this encoding/decoding but we care about correctness here. fitGitURL should only really care about massaging scp-style into ssh one without touching the contents at all and ensuring that ParsedURL gets the contents verbatim. |
|
Thank you for the feedback/insight. I've moved the path %-encoding into fixGitURL and working with a |
|
@tones111, I'm going to push over your changes if there are no objections. |
|
@tones111, also added support for IPv6 and made the percent encoding much less janky. Should be much more consistent now. |
Probably the best way is to stick with rewriting to absolute ones for now. I think we actual git and not git forges using tilde expansion for the home directory is best for the time being. Ideally we'd be able to differentiate between ssh URLs and SCP ones, but that poses a bit of an issue with flake lockfiles and back-compat. |
| /* TODO: What to do about query parameters? Git should pass those to the * http(s) remotes. Ignore for now and | ||
| * just pass through. Will fail later. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, it is parsing query params?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but downstream users are broken.
src/libutil/url.cc
Outdated
| /* HACK: SCP syntax overlaps with file:/path/to/repo. Git itself doesn't recognize it (or rather treats `file` as | ||
| * the host name), but Nix accepts file:/path/to/repo as well as file:///path/to/repo. */ | ||
| if (schemeOrHost == "file" || schemeOrHost == "git+file") { | ||
| auto res = parseURL(url); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A potential issue with using parseURL here and not for the absolute path above is that they're no longer equivalent. If I modify this test to use a file scheme the output differs.
existing test:
.input = "/repos/git repo",
.expected = "file:///repos/git%20repo",
modified to use file:///
FixGitURLParam{
.input = "file:///repos/git repo",
.expected = "file:///repos/git%20repo",
.parsed =
ParsedURL{
.scheme = "file",
.authority = ParsedURL::Authority{},
.path = {"", "repos", "git repo"},
},
},
fails with
unknown file: Failure
C++ exception with description "�[31;1merror:�[0m '�[35;1mfile:///repos/git repo�[0m' is not a valid URL: �[35;1mleftover�[0m" thrown in the test body.
The git clone url documentation says they should be equivalent
For local repositories, also supported by Git natively, the following syntaxes may be used:
/path/to/repo.git/
file:///path/to/repo.git/
These two syntaxes are mostly equivalent, except the former implies --local option.
While it is a url, git is able to handle the raw space.
$ git clone file:///home/paul/src/my\ repo my\ repo2
Cloning into 'my repo2'...
<...>
$ cd my\ repo2/
$ git remote -v
origin file:///home/paul/src/my repo (fetch)
origin file:///home/paul/src/my repo (push)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is OK for us to be stricter than Git --- the most important thing is that when we do succeed, we agree with Git. It is OK to fail where Git would succeed I think, especially because the point of file:// vs plain paths is to be explicit.
766ab55 to
6cce38b
Compare
|
OK @xokdvium, this is back to you now. |
| * When Git encounters a URL of the form <transport>://<address>, where | ||
| * <transport> is a protocol that it cannot handle natively, it | ||
| * automatically invokes git remote-<transport> with the full URL as the | ||
| * second argument. https://git-scm.com/docs/gitremote-helpers. If the | ||
| * url doesn't look like it would be accepted by the remote helper, | ||
| * treat it as a SCP-style one. Don't do any pct-decoding in that case. | ||
| * Schemes supported by git are excluded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this comment is in the right spot now, as this function is about what to do in the not :// case.
Ericson2314
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ git clone 'user[2001:db8:1::2]:/home/@file'
Cloning into '@file'...
ssh: Could not resolve hostname user[2001: Name or service not known
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
$ git clone 'user:[2001:db8:1::2]:/home/file'
Cloning into 'file'...
ssh: Could not resolve hostname user: Name or service not known
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
$ git clone 'user:@[2001:db8:1::2]:/home/file'
Cloning into 'file'...
^C⏎ # timeout connecting to IP
@xokdvium from some manual testing of truly insane stuff, I don't think the new algorithm is right yet.
|
I added some WIP additinoal tests for these crazy cases. |
This change resolves several deficiencies with the current url normalization. Tests documenting these deficiencies have been corrected and new tests added to cover additional fetchGit test expectations. Co-authored-by: Sergei Zimmerman <[email protected]> Co-authored-by: John Ericson <[email protected]>
Ericson2314
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made all the code changes I want to make
|
@xokdvium Thanks |
Motivation
This change resolves several deficiencies with the current url normalization. Tests documenting these deficiencies have been corrected and new tests added to cover additional fetchGit test expectations.
Context
fixes #14852
fixes #14867
The primary new functionality is that SCP-like paths no longer require specifying the user component of the uri authority. This complicates the implementation as there is now overlap with simple uris. To prevent processing uris as SCP paths we now attempt to filter them, but the regex for uri and scp detection are definitely more complicated.
While this effort keeps the url normalization logic in
fixGitURLit feels like it's poorly duplicating boosts parsing logic. However, I understand parsing urls has a lot of edge cases, so leveraging a high-quality external library is likely still the right strategy. These regex are only trying to identify enough shape to differentiate them from SCP-like paths.I'm very new to Nix/NixOS. I'd like to confirm this build fixes the
nixos-rebuilderror motivating this change but can't figure out how to get it to use my custom build. Having it in the path (fromnix developin the source tree) was not sufficient. Is there any documentation on how to do this?Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.