[WIP] First split wheel variant PEP#35
Conversation
daf9494 to
12ca9ca
Compare
8057baf to
1950523
Compare
Documentation build overview
Show files changed (9 files in total): 📝 6 modified | ➕ 3 added | ➖ 0 deleted
|
1950523 to
c6aa3d3
Compare
rgommers
left a comment
There was a problem hiding this comment.
Thanks @mgorny, this reads quite well. Putting Rationale below Specification is unconventional, but it does work I think - better than the other way around.
Regarding the content in this PR, only one thing really stood out to me: the pylock.toml specification is still unreadable standalone, and there is no explanation under Rationale either. We had a clearer explanation after the previous discussion ("add all variants + metadata, leaving selection of the variant to install time" I believe), so this should be added.
A question that came to mind is what of all the wheel metadata is missing from this PEP. I think the only thing is install-time: and static-properties (for AoT providers)? That seems suboptimal, introducing a format in one PEP, and then immediately extending it in a second PEP. The addition is minimal, so I see no compelling reason to not get the full metadata format included in this PEP.
581cddf to
28740d0
Compare
rgommers
left a comment
There was a problem hiding this comment.
A question that came to mind is what of all the wheel metadata is missing from this PEP. I think the only thing is
install-time:andstatic-properties(for AoT providers)? That seems suboptimal, introducing a format in one PEP, and then immediately extending it in a second PEP. The addition is minimal, so I see no compelling reason to not get the full metadata format included in this PEP.
Looked at this again, for the current state of the PEP and after writing up wheelnext/pep_817_wheel_variants#109. This PEP does not contain any pyproject.toml content, so the only thing we'd need is the static_properties key in the .json metadata. Not even the description under "Installing a package from an index" would have to change. It seems worth adding that now to me, because it's an obvious omission - especially given that the BLAS provider is kept in the example, which is currently misleading.
That said, I can live with going ahead without it and adding it back in in a second revision if we're in a hurry.
That's my only comment, the rest LGTM. The additions in the pylock.toml section make it much easier to understand now.
|
The problem with |
|
I don't think that's relevant: the examples in this PEP already contain "provider" (not explained) as well as exact property values. Unless more is missing from the actual metadata format, leaving out literally a single element doesn't seem like a good choice; we'll then have to add it into this PEP later anyway. |
That's not that "provider" :-). It's just a feature name. That said, maybe I'll change it to |
Yes please. However, there's also "GPU providers" in the text. |
Changed to "vendors". I think that puts the point clearer even. |
rgommers
left a comment
There was a problem hiding this comment.
Okay, I'm still not convinced it's a good idea to leave out only a single element of all format metadata, but I'll approve either way. It's not a blocking concern and can be dealt with later.
| Variant wheels MUST include the variant label component. Conversely, | ||
| wheels without variant label are non-variant wheels. The variant label | ||
| MUST consist only of ``0-9``, ``a-z``, ``_`` and ``.`` ASCII characters, | ||
| and be 1-16 characters long (``^[0-9a-z_.]{1,16}$``). |
There was a problem hiding this comment.
It appears like the set of characters here largely removes the normalization requirements we have elsewhere. I'm not sure that it really matters, but should we limit runs of punctuation? That's the only thing that other areas of packaging normalize that is still possible to do with a variant label, and may make it harder to differentiate between two variant labels?
There was a problem hiding this comment.
I was wondering the same thing, I mainly didn't comment because I couldn't identify a case where this would cause a problem. As in, people shouldn't start with punctuation, shouldn't stop with punctuation and shouldn't use runs of punctuation, but is that something that every tools needs to have logic to enforce this, or is it enough if reasonable maintainers do it naturally?
There was a problem hiding this comment.
I kinda agree with you both. I agree that it would be nice to avoid such names but it feels a bit of a stretch to come up with an awfully complex rule to enforce all that. I don't think we can even do it with a single regex, or at least not within what I'd call reasonably readable regex.
There was a problem hiding this comment.
We could just say what the requirements are, in PEP 503 we just said (of normalization):
This PEP references the concept of a “normalized” project name. As per PEP 426 the only valid characters in a name are the ASCII alphabet, ASCII numbers, ., -, and _. The name should be lowercased with all runs of the characters ., -, or _ replaced with a single - character. This can be implemented in Python with the re module:
import re def normalize(name): return re.sub(r"[-_.]+", "-", name).lower()
Slightly less nice than a single regex, but pretty easy to understand?
Although I think that this regex works for detecting the cases needed ^[a-z0-9]([a-z0-9]|[_.](?=[a-z0-9])){1,14}[a-z0-9]$ but I'll leave it up to the reader if that's a reasonably readable regex or not 😅 .
There was a problem hiding this comment.
^([a-z0-9]|(?<=[a-z0-9])[_.](?=[a-z0-9])){1,16}$ might be more (or less) readable 😅
There was a problem hiding this comment.
I wouldn't exactly call this readable ... I had to inject it inside https://regex101.com/ to understand it :D
I don't think the trade-off to "harden the regex" is positive honestly...
Doesn't matter how hard you go ... there's always a way to "obscurify" the name you pick...
Sooo where do we stop ? Where is too much regex complexity and where it's reasonable ?
If my objective was to create the least readable "variant" I would start by creating a package with the least readable name : https://packaging.python.org/en/latest/specifications/name-normalization/#name-format
^([A-Z0-9]|[A-Z0-9][A-Z0-9._-]*[A-Z0-9])\Z
I just don't see the point of "us" trying to mitigate that. If they want to make it unreadable ... So be it, they will lose their users and some other product will benefit...
EDIT: maybe I'll concede that starting & finishing by [a-z0-9] is probably "reasonable"
^[a-z0-9][0-9a-z_.]{0,14}[a-z0-9]$
| select the best wheel. | ||
|
|
||
| Note that steps 4. through 8. are introduced specifically for variant | ||
| wheels. The remaining steps correspond to the current installer |
There was a problem hiding this comment.
I think this is wrong? At least I believe pip does not select a version prior to evaluating the platform tags for a wheel. I'm not sure what uv does here, but I believe for pip the order of operations is basically:
- Query the remote repositories to get a list of wheels for the package.
- Filter available wheels based on platform compatibility tags (technically they also sort them at this stage too).
- Select a package version meeting the version constraints.
- Select the best wheel for that version.
Basically inverting option 2 and 3.
That's not a trivial difference I don't think, because doing them in the order that pip does means the question of "what happens if there are no files for a given version that are supported on this platform" is just sort of implicitly answered-- pip won't even see that as a version that exists to the resolver.
As it stands, with the order of 2/3 in the PEP, it seems to be missing an answer to the question of what should happen if, after selecting a version in (2), all of the wheels are evaluated (both for platform tags and for variants), and there are no wheels compatible with this machine for that version?
There was a problem hiding this comment.
This is indeed derived from poetry/uv (that doesn't cover pip)
There was a problem hiding this comment.
Does poetry/uv just fail if there's a project that only has a linux wheel for the latest release if you're not on linux?
There was a problem hiding this comment.
Just to be clear, the algorithm doesn't prohibit backtracking if there's no suitable wheel for a version. We're just trying to keep it simple.
There was a problem hiding this comment.
Does poetry/uv just fail if there's a project that only has a linux wheel for the latest release if you're not on linux?
For uv it depends on mode (pip-like vs. universal) and settings (are there required platforms)
| The file should not be changed once it is published, as clients may have | ||
| already cached it or locked to the existing hash. For this reason, if | ||
| the index is responsible for generating the file, it should use some | ||
| mechanism to defer publishing it until the release is fully uploaded |
There was a problem hiding this comment.
Even with PEP 694 there's not really a concept on PyPI of a release being "fully uploaded". You can come back a year later and release a new wheel for an existing release (maybe to support new hardware, or a new Python version, etc).
I don't think it's workable to say that the contents of this file cannot change-- unless we require it to come from users-- but even then it feels wrong to say it can't change? We have caching on /simple/ today without requiring things not change, the HTTP caching semantics deal with that in a reasonable-ish way.
There was a problem hiding this comment.
I pushed for this being kinda immutable for client reasons, but I'm find for skipping this part. Until Upload API 2.0 ships the whole uploading story even without variants can't be fixed :/
There was a problem hiding this comment.
It's non-normative section, so it doesn't have to be 100% possible right now.
Following the suggestions given in the previous PEP 817 thread [1], we have decided to split PEP 817 into a series of smaller PEPs, with the hope that this will make it easier to comprehend the concept and discuss it. This is the first split PEP, that specifically focuses on the low-level details necessary for variants wheels to work, that is: - adding variant label to the filename - storing variant properties in the file - exposing variants on the index - ordering/selecting variants - introducing variant-conditional dependencies via environment markers - exposing variant wheels in `pylock.toml` The PEP keeps variant properties abstract, deferring their governance and determining their compatibility to a subsequent PEP, along with building wheels. We've also significantly cut motivation down (the original is kept in PEP 817 for reference). We've tried to make the "specification" part easier to comprehend, and removed the duplicate "rationale-overview", in favor of a more focused "rationale" section. Compared to the previous iteration of PEP 817, we've also corrected the variant ordering algorithm to handle corner cases better. [1] https://discuss.python.org/t/pep-817-wheel-variants-beyond-platform-tags/105860 Signed-off-by: Michał Górny <mgorny@quansight.com> Co-authored-by: Jonathan Dekhtiar <jonathan@dekhtiar.com> Co-authored-by: Konstantin Schütze <konstin@mailbox.org> Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>
69e0c98 to
5c995bf
Compare
The specification should be ready. The remaining sections I'll finish later today.
PEP Preview: https://wheelnext-peps--35.org.readthedocs.build/en/35/pep-9999/