-
-
Couldn't load subscription status.
- Fork 17.1k
stdenv: Add CPE fields to meta #409797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stdenv: Add CPE fields to meta #409797
Conversation
thats not true, take the following examples which are hosted on github.com and requires pkg:github/xxx instead of an tar.gz download. Therefore we need the possibility to specify purl's as well to avoid loosing informations. Workarounds like long-running nixtract to gather meta informations is not an option, as it is a reverse engineering approach. jq, fetchurl: https://github.com/NixOS/nixpkgs/blob/nixos-25.05/pkgs/by-name/jq/jq/package.nix#L18 the same applies for a ruby gem (rubygems.org) / python lib (pypi) which can get downloaded through github.com as well (pkg:github vs. pkg:pypi / pkg:gems). Some packages just match for "pkg:pypi" and not pkg:github, because SBOM scanners have strict requirements about how to detect new releases (not just tags, tags + releases are mandatory) |
|
@h0nIg Good point. I didn't mean to imply that PURLs would not have an option to override them when needed. I decided to start with CPE just for example, but I'd expect to have a similar structure for PURLs as well. You would be able to override any part of the PURL, or all of it at once. We could also have tools like Security Tracker suggesting fixes for inconsistent autogenerated PURLs, just like for CPEs. |
e48d97e to
9094380
Compare
I think that's the right approach: focus on one type of identifier in the beginning. Trying to build the perfect all-encompassing solution will just leave us deadlocked and unable to make any progress. |
|
I've added a section in the manual to cover these meta attributes. There is space to add more identifiers in the future there. Please take a look at it. |
pkgs/stdenv/generic/check-meta.nix
Outdated
| language = ""; | ||
| other = ""; | ||
| } // attrs.meta.identifiers.cpeParts or { }; | ||
| cpe = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would make CPE a list.
reason: https://nvd.nist.gov/vuln/detail/cve-2024-12084
CPE: cpe:2.3:a:samba:rsync:3.2.7:-:*:*:*:*:*:*
the patch version is not contained in the cpeParts.update but in the cpeParts.version part. You should not make any assumption how vendors use the CPE, therefore match the 2 combinations by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. Such ambiguity would only make it harder to support a clean source of truth. If rsync doesn't use update field, we should specify this in its definition. Supporting all variations of all approaches is not feasible. Splitting out patch version from semantic versions seems good enough default as it covers many packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to avoid ambiguity? You probably want to upload the generated CPEs to check for findings. You can not ask the vendor to change its processes, nor you support the maintainer of such problem components. Some people have a different understanding of semantic versioning and its even hard to cope with different processes.
I can share some experience with the (here unrelated) PURLs: some of the components are matched e.g. with pkg:gems/myrubygem and others with pkg:github/org/myrubygem. Conclusion: list of multiple identifiers, all of them are valid. Where / why do you see postprocessing / further structured access required? In addition once they can access the derivation, they have anyhow access to all attributes?
Please share examples or reasons, because i think we need a pragmatic approach here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a good examples, take a list and search (e.g. once per day): https://nvdlib.com/en/latest/v2/CVEv2.html#searching-cves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added examples for rsync and bash. My assumption is that the semantic version splitting works for the majority of cases. If that's not correct, I'd prefer to remove this assumption instead of trying to accommodate all possible variations.
The goal here is to produce a mapping from the package to the identifier of this package. So far you provided an example of rsync that just uses different values in the CPE fields, so we should specify this just for the rsync package. CPE cpe:2.3:a:samba:rsync:3.2:7:*:*:*:*:*:* is not valid for rsync, just as CPE cpe:2.3:a:gnu:hello:2.12.2:-:*:*:*:*:*:* is not valid for hello, so we shouldn't have these guesses attached to these versions of these packages. Tools like Security Tracker can use data from cpeParts field to query vulnerabilities using different formats. The idea is that in the end there will be only one correct CPE for the specific version of the specific package, and we want to provide it and only it in Nixpkgs. In case Security Tracker finds that this identifier is wrong, we should fix the identifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added examples for rsync and bash. My assumption is that the semantic version splitting works for the majority of cases. If that's not correct, I'd prefer to remove this assumption instead of trying to accommodate all possible variations.
i have to admit, that my proposal to include the (.|p) into the regex is too ambitious, given additional cases like this:
https://github.com/NixOS/nixpkgs/blob/master/pkgs/by-name/su/sudo/package.nix#L20
i like pushing a limited amount of special cases to the maintainer, if they are not covered by the general approach. Pushing the majority of problems with semantic versioning including patch versions to the maintainer, is not a limited amount. Please use a list instead
The goal here is to produce a mapping from the package to the identifier of this package
wrong assumption, there is no mapping from the package to the identifier. OR: there is no single mapping.
As said: the goal is to upload a list of CPE's and check for CVE. this list is to be understood as a query which needs to be wildcard as much as possible.
The idea is that in the end there will be only one correct CPE for the specific version of the specific package, and we want to provide it and only it in Nixpkgs
CPE = query parameters
You can not make assumptions how people will request CVE, even under pressure, please use wildcard even for "hello":
so instead of
cpe:2.3:a:gnu:hello:2.12.2:-:*:*:*:*:*:*
you should create the search query with *:
cpe:2.3:a:gnu:hello:2.12.2:*:*:*:*:*:*:*
Tools like Security Tracker can use data from cpeParts field to query vulnerabilities using different formats
@fricklerhandwerk can you please share some insights and/or requirements? Do you really want to parse cpeParts and postprocess or do you just want to upload the list of CPE for vulnerability pre-matching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be that we have two pieces of software in one derivation so that we need two identifiers for them? I think that could happen, so making it a list is just future proof even if it is only used by 5 packages and not even by default.
|
My belief, after reading through the feedback, is that this is good to merge and iterate on once @infinisil approves it. Is that correct? |
yes, can this get merged now? or at least can @infinisil approve? |
Add `identifiers` attr to `meta` attribute with following attrs: * `cpe` with the full CPE string when available * `possibleCPEs` with the list of potential CPEs when not all information is provided * `cpeParts` with the destructured CPE string, allowing to override it whenever needed * `v1` attribute set with `cpe` and `cpeParts` from above and a guarantee of a backwards-compatible interface Related issue: NixOS#354012
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, looks good to me too!
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release-25.05
git worktree add -d .worktree/backport-409797-to-release-25.05 origin/release-25.05
cd .worktree/backport-409797-to-release-25.05
git switch --create backport-409797-to-release-25.05
git cherry-pick -x b0ce3dc09f5146e831f1116e95b99017fc0f4a64 |
|
@YorikSar @fricklerhandwerk @infinisil @nikstur happy to get an review & approval for the matching pURL parts as well: #421125, thank you |
|
I did a manual backport to 25.05 here: #438385 |
|
Trivial reproducer: |
|
I believe that nix-env dislikes when meta contains |
You are correct, the error message comes from here: https://github.com/NixOS/nix/blob/401e7fe3ad2d01bab628c50bb34450e29d95882b/src/nix/nix-env/nix-env.cc#L1227 It’s bad that the error was only discovered after merge since we don’t seem to use nix-env in GitHub CI. Is also a strange requirement on the |
|
The problem in |
| cpe = (makeCPE guessedParts); | ||
| } | ||
| ) possibleCPEPartsFuns; | ||
| v1 = { inherit cpeParts cpe possibleCPEs; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v1 = filterAttrsRecursive (n: v: v != null) {
should solve these nix-env problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would, but it's better to not add null values there in the first place instead. filterAttrsRecursive is quite expensive to run for each derivation.
nix-env writes a warning for each derivation that has null in its meta values, so fields without known values are removed from the result. Fixes issue raised by @K900 in NixOS#409797 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR that reapplies this with a fix for nix-env: #439074
| cpe = (makeCPE guessedParts); | ||
| } | ||
| ) possibleCPEPartsFuns; | ||
| v1 = { inherit cpeParts cpe possibleCPEs; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would, but it's better to not add null values there in the first place instead. filterAttrsRecursive is quite expensive to run for each derivation.
nix-env writes a warning for each derivation that has null in its meta values, so fields without known values are removed from the result. Fixes issue raised by @K900 in NixOS#409797 (comment)
nix-env writes a warning for each derivation that has null in its meta values, so fields without known values are removed from the result. Fixes issue raised by @K900 in NixOS#409797 (comment)
nix-env writes a warning for each derivation that has null in its meta values, so fields without known values are removed from the result. Fixes issue raised by @K900 in NixOS#409797 (comment) (cherry picked from commit a178fd8)
Add
identifiersattr tometaattribute with following attrs:cpewith the full CPE string when availablecpePartswith the destructured CPE string, allowing to override it whenever neededv1attribute set withcpeandcpePartsfrom above and a guarantee of a backwards-compatible interfaceAlso add
vendorpart as an example to packages:hello,gcc,clang.Related issue: #354012
This is the first step towards adding software identifiers to Nixpkgs. See issue for the discussion.
Here's my summary of the discussion and decisions made here:
Proposal
Interested actors
Options
Identifier types
It seems that the most common identifier formats are CPE and PURL.
CPE
CPE comes from NIST, with the official list of CPE names maintained in NVD.
CPE looks like
cpe:2.3:a:gnu:glibc:2.40:1::::::with parts meaning:2.3afor "application"CPE allows to be very specific, but requires knowledge about vendor and the versioning process of the software to match NVD data.
NVD attaches CPE identifiers to CVE entries, but this mapping is by no means full. Some CVE entries have no CPE, some have strangely formatted ones.
If we maintain a good list of CPE identifiers of our own, we could influence NVD to make CVE database better in this regard.
PURL
PURL stand for "package URL" (not "persistent URL" from 1990s). They are adopted by many SBOM replated tools and OSV database.
PURLs look like
pkg:deb/debian/[email protected]?arch=i386&distro=jessie, with mandatory parts:pkg:is the URL schemaWhile PURLs are less specific, they can be derived completely from the information about package sources.
Structuring
We can write and present identifiers either just as a whole string (e.g. in
hello.meta.cpeorhello.meta.purl), providing authors with functions to generate appropriate values, for example:The other option is to destructure these values both on input and on output and make all generation logic implicit in
mkDerivation, for example:In both cases Nixpkgs authors would be able to provide just the information that cannot be correctly derived from other arguments.
Consumers that want to match these identifiers against upstream databases would only need the final identifier available. However, in cases when direct matching doesn't yield gooed enough results, they might rely on heuristics that require identifier's consituent parts. Both formats are easy to parse, but it can be benefitial to not have to do this additional step.
Tools like Security Tracker want to be able to update these identifiers in Nixpkgs. Finding the place to edit necessary part seems to be easier in the second example.
Versioning
Consumers that have to support multiple versions of Nixpkgs, want to distinguish which set of fields they can expect to read and write in Nixpkgs. Currently this is not supported anywhere, but we could start with namespacing package identifiers with their own version. For example:
Note that this forces versioning on Nixpkgs authors as well as consumers, which increases cognitive load while writing package definitions (which version do I need to write? are there other versions? what do I need to do to support them?). Neither Nixpkgs authors nor "single-version" Nixpkgs consumers (ones that don't collect data over multiple Nixpkgs versions) benefit from such versioning.
We could provide a stable versioned output for this information while keeping input simple:
In this example
tracking-information.v1would be always backwards-compatible, providing fields likecpe,cpeParts,purl,purlPartsand possible some new ones in the future.Add a 👍 reaction to pull requests you find important.