Skip to content

Conversation

@YorikSar
Copy link
Contributor

@YorikSar YorikSar commented May 22, 2025

Add identifiers attr to meta attribute with following attrs:

  • cpe with the full CPE string when available
  • cpeParts with the destructured CPE string, allowing to override it whenever needed
  • v1 attribute set with cpe and cpeParts from above and a guarantee of a backwards-compatible interface

Also add vendor part as an example to packages: hello, gcc, clang.

Related issue: #354012

This is the first step towards adding software identifiers to Nixpkgs. See issue for the discussion.

Here's my summary of the discussion and decisions made here:

Proposal

Interested actors

  • Nixpkgs authors
    • want to provide necessary details with as little effort as possible
  • Nixpkgs consumers
    • want to be able to use external tools to find known vulnerabilities
  • Tools
    • SBOM generators (bombon, sbomnix)
      • want to use less heuristic, more precision for identifiers in generated SBOMs
    • Security tracker
      • wants to match identifiers against external vulnerability databases, avoid heuristics
      • wants to provide tools to enrich Nixpkgs with more data and more precision (detect inconsistencies, generate PRs)
      • wants to become a point of contact with external vulnerability databases, providing them with more data

Options

Identifier types

It seems that the most common identifier formats are CPE and PURL.

CPE

CPE comes from NIST, with the official list of CPE names maintained in NVD.

CPE looks like cpe:2.3:a:gnu:glibc:2.40:1:::::: with parts meaning:

  • CPE version - current version of CPE is 2.3
  • part - a for "application"
  • vendor - can point to the source of the package, or to Nixpkgs itself
  • product - name of the package
  • version - version of the package
  • update - name of the latest update, can be a patch version as in the example
  • edition - any additional specification about the version
  • many more fields that seem to be generally unused for software

CPE allows to be very specific, but requires knowledge about vendor and the versioning process of the software to match NVD data.

NVD attaches CPE identifiers to CVE entries, but this mapping is by no means full. Some CVE entries have no CPE, some have strangely formatted ones.

If we maintain a good list of CPE identifiers of our own, we could influence NVD to make CVE database better in this regard.

PURL

PURL stand for "package URL" (not "persistent URL" from 1990s). They are adopted by many SBOM replated tools and OSV database.

PURLs look like pkg:deb/debian/[email protected]?arch=i386&distro=jessie, with mandatory parts:

  • pkg: is the URL schema
  • type - comes from the list of known PURL types and defines what namespace and qualifiers mean.
  • namespace - whatever the type requires, for example, it can be empty or point to a specific repository
  • name - name of the package
  • version - version of the package, in type specific format
  • qualifiers - depend on the type

While PURLs are less specific, they can be derived completely from the information about package sources.

Structuring

We can write and present identifiers either just as a whole string (e.g. in hello.meta.cpe or hello.meta.purl), providing authors with functions to generate appropriate values, for example:

drv = stdenv.mkDerivation (finalAttrs: {
  pname = "foo";
  version = "1.2.3";
  ...
  meta.cpe = lib.mkCPE finalAttrs {vendor = "foo_upstream";};
})

drv.meta.cpe => "cpe:2.3:a:foo_upstream:foo:1.2:3::::::"

The other option is to destructure these values both on input and on output and make all generation logic implicit in mkDerivation, for example:

drv = stdenv.mkDerivation (finalAttrs: {
  pname = "foo";
  version = "1.2.3";
  ...
  meta.cpeParts.vendor = "foo_upstream";
})

drv.meta.cpe => "cpe:2.3:a:foo_upstream:foo:1.2:3::::::"
drv.meta.cpeParts => {
  vendor = "foo_upstream";
  product = "foo";
  version = "1.2";
  update = "3";
}

In both cases Nixpkgs authors would be able to provide just the information that cannot be correctly derived from other arguments.

Consumers that want to match these identifiers against upstream databases would only need the final identifier available. However, in cases when direct matching doesn't yield gooed enough results, they might rely on heuristics that require identifier's consituent parts. Both formats are easy to parse, but it can be benefitial to not have to do this additional step.

Tools like Security Tracker want to be able to update these identifiers in Nixpkgs. Finding the place to edit necessary part seems to be easier in the second example.

Versioning

Consumers that have to support multiple versions of Nixpkgs, want to distinguish which set of fields they can expect to read and write in Nixpkgs. Currently this is not supported anywhere, but we could start with namespacing package identifiers with their own version. For example:

drv = stdenv.mkDerivation (finalAttrs: {
  pname = "foo";
  version = "1.2.3";
  ...
  meta.tracking-information.v1.cpeParts.vendor = "foo_upstream";
})

drv.meta.tracking-information.v1.cpe => "cpe:2.3:a:foo_upstream:foo:1.2:3::::::"

Note that this forces versioning on Nixpkgs authors as well as consumers, which increases cognitive load while writing package definitions (which version do I need to write? are there other versions? what do I need to do to support them?). Neither Nixpkgs authors nor "single-version" Nixpkgs consumers (ones that don't collect data over multiple Nixpkgs versions) benefit from such versioning.

We could provide a stable versioned output for this information while keeping input simple:

drv = stdenv.mkDerivation (finalAttrs: {
  pname = "foo";
  version = "1.2.3";
  ...
  meta.tracking-information.cpeParts.vendor = "foo_upstream";
})

drv.meta.tracking-information.cpe => "cpe:2.3:a:foo_upstream:foo:1.2:3::::::"
drv.meta.tracking-information.v1.cpe => "cpe:2.3:a:foo_upstream:foo:1.2:3::::::"

In this example tracking-information.v1 would be always backwards-compatible, providing fields like cpe, cpeParts, purl, purlParts and possible some new ones in the future.


Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added 6.topic: stdenv Standard environment 6.topic: llvm/clang Issues related to llvmPackages, clangStdenv and related labels May 22, 2025
@h0nIg
Copy link
Contributor

h0nIg commented May 26, 2025

While PURLs are less specific, they can be derived completely from the information about package sources.

thats not true, take the following examples which are hosted on github.com and requires pkg:github/xxx instead of an tar.gz download. Therefore we need the possibility to specify purl's as well to avoid loosing informations. Workarounds like long-running nixtract to gather meta informations is not an option, as it is a reverse engineering approach.

jq, fetchurl: https://github.com/NixOS/nixpkgs/blob/nixos-25.05/pkgs/by-name/jq/jq/package.nix#L18
python, fetchurl: https://github.com/NixOS/nixpkgs/blob/nixos-25.05/pkgs/development/interpreters/python/cpython/default.nix#L265

the same applies for a ruby gem (rubygems.org) / python lib (pypi) which can get downloaded through github.com as well (pkg:github vs. pkg:pypi / pkg:gems). Some packages just match for "pkg:pypi" and not pkg:github, because SBOM scanners have strict requirements about how to detect new releases (not just tags, tags + releases are mandatory)

@YorikSar
Copy link
Contributor Author

@h0nIg Good point. I didn't mean to imply that PURLs would not have an option to override them when needed. I decided to start with CPE just for example, but I'd expect to have a similar structure for PURLs as well. You would be able to override any part of the PURL, or all of it at once. We could also have tools like Security Tracker suggesting fixes for inconsistent autogenerated PURLs, just like for CPEs.

@YorikSar YorikSar force-pushed the cpe branch 3 times, most recently from e48d97e to 9094380 Compare May 27, 2025 15:02
@github-actions github-actions bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. labels May 27, 2025
@fricklerhandwerk
Copy link
Contributor

Pinging @Mic92 @Lassulus @RaitoBezarius @arianvp @blitz @nikstur from the original issue FYI

@nikstur
Copy link
Contributor

nikstur commented Jun 14, 2025

I decided to start with CPE

I think that's the right approach: focus on one type of identifier in the beginning. Trying to build the perfect all-encompassing solution will just leave us deadlocked and unable to make any progress.

@YorikSar YorikSar marked this pull request as ready for review June 19, 2025 14:01
@YorikSar
Copy link
Contributor Author

I've added a section in the manual to cover these meta attributes. There is space to add more identifiers in the future there. Please take a look at it.

@github-actions github-actions bot added 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-linux: 1 This PR causes 1 package to rebuild on Linux. 8.has: documentation This PR adds or changes documentation and removed 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. labels Jun 19, 2025
@github-actions github-actions bot added the 12.approvals: 1 This PR was reviewed and approved by one person. label Jun 19, 2025
language = "";
other = "";
} // attrs.meta.identifiers.cpeParts or { };
cpe =
Copy link
Contributor

@h0nIg h0nIg Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would make CPE a list.

reason: https://nvd.nist.gov/vuln/detail/cve-2024-12084

CPE: cpe:2.3:a:samba:rsync:3.2.7:-:*:*:*:*:*:*

the patch version is not contained in the cpeParts.update but in the cpeParts.version part. You should not make any assumption how vendors use the CPE, therefore match the 2 combinations by default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. Such ambiguity would only make it harder to support a clean source of truth. If rsync doesn't use update field, we should specify this in its definition. Supporting all variations of all approaches is not feasible. Splitting out patch version from semantic versions seems good enough default as it covers many packages.

Copy link
Contributor

@h0nIg h0nIg Jun 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to avoid ambiguity? You probably want to upload the generated CPEs to check for findings. You can not ask the vendor to change its processes, nor you support the maintainer of such problem components. Some people have a different understanding of semantic versioning and its even hard to cope with different processes.

I can share some experience with the (here unrelated) PURLs: some of the components are matched e.g. with pkg:gems/myrubygem and others with pkg:github/org/myrubygem. Conclusion: list of multiple identifiers, all of them are valid. Where / why do you see postprocessing / further structured access required? In addition once they can access the derivation, they have anyhow access to all attributes?

Please share examples or reasons, because i think we need a pragmatic approach here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a good examples, take a list and search (e.g. once per day): https://nvdlib.com/en/latest/v2/CVEv2.html#searching-cves

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added examples for rsync and bash. My assumption is that the semantic version splitting works for the majority of cases. If that's not correct, I'd prefer to remove this assumption instead of trying to accommodate all possible variations.

The goal here is to produce a mapping from the package to the identifier of this package. So far you provided an example of rsync that just uses different values in the CPE fields, so we should specify this just for the rsync package. CPE cpe:2.3:a:samba:rsync:3.2:7:*:*:*:*:*:* is not valid for rsync, just as CPE cpe:2.3:a:gnu:hello:2.12.2:-:*:*:*:*:*:* is not valid for hello, so we shouldn't have these guesses attached to these versions of these packages. Tools like Security Tracker can use data from cpeParts field to query vulnerabilities using different formats. The idea is that in the end there will be only one correct CPE for the specific version of the specific package, and we want to provide it and only it in Nixpkgs. In case Security Tracker finds that this identifier is wrong, we should fix the identifier.

Copy link
Contributor

@h0nIg h0nIg Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added examples for rsync and bash. My assumption is that the semantic version splitting works for the majority of cases. If that's not correct, I'd prefer to remove this assumption instead of trying to accommodate all possible variations.

i have to admit, that my proposal to include the (.|p) into the regex is too ambitious, given additional cases like this:
https://github.com/NixOS/nixpkgs/blob/master/pkgs/by-name/su/sudo/package.nix#L20
i like pushing a limited amount of special cases to the maintainer, if they are not covered by the general approach. Pushing the majority of problems with semantic versioning including patch versions to the maintainer, is not a limited amount. Please use a list instead

The goal here is to produce a mapping from the package to the identifier of this package
wrong assumption, there is no mapping from the package to the identifier. OR: there is no single mapping.
As said: the goal is to upload a list of CPE's and check for CVE. this list is to be understood as a query which needs to be wildcard as much as possible.

The idea is that in the end there will be only one correct CPE for the specific version of the specific package, and we want to provide it and only it in Nixpkgs
CPE = query parameters

You can not make assumptions how people will request CVE, even under pressure, please use wildcard even for "hello":

so instead of

cpe:2.3:a:gnu:hello:2.12.2:-:*:*:*:*:*:*

you should create the search query with *:

cpe:2.3:a:gnu:hello:2.12.2:*:*:*:*:*:*:*

Tools like Security Tracker can use data from cpeParts field to query vulnerabilities using different formats

@fricklerhandwerk can you please share some insights and/or requirements? Do you really want to parse cpeParts and postprocess or do you just want to upload the list of CPE for vulnerability pre-matching

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that we have two pieces of software in one derivation so that we need two identifiers for them? I think that could happen, so making it a list is just future proof even if it is only used by 5 packages and not even by default.

h0nIg-tailscale

This comment was marked as duplicate.

@philiptaron philiptaron added the 1.severity: significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. label Aug 22, 2025
@philiptaron
Copy link
Contributor

My belief, after reading through the feedback, is that this is good to merge and iterate on once @infinisil approves it. Is that correct?

@wegank wegank added the 2.status: merge conflict This PR has merge conflicts with the target branch label Aug 23, 2025
@h0nIg
Copy link
Contributor

h0nIg commented Aug 28, 2025

My belief, after reading through the feedback, is that this is good to merge and iterate on once @infinisil approves it. Is that correct?

yes, can this get merged now? or at least can @infinisil approve?

@oneingan oneingan requested a review from infinisil August 29, 2025 02:07
Add `identifiers` attr to `meta` attribute with following attrs:
* `cpe` with the full CPE string when available
* `possibleCPEs` with the list of potential CPEs when not all
  information is provided
* `cpeParts` with the destructured CPE string, allowing to override it
  whenever needed
* `v1` attribute set with `cpe` and `cpeParts` from above and a
  guarantee of a backwards-compatible interface

Related issue: NixOS#354012
@nixpkgs-ci nixpkgs-ci bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Aug 29, 2025
Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, looks good to me too!

@infinisil infinisil merged commit e83e8da into NixOS:master Aug 29, 2025
28 of 31 checks passed
@infinisil infinisil deleted the cpe branch August 29, 2025 19:47
@github-project-automation github-project-automation bot moved this from In Progress to Done in Stdenv Aug 29, 2025
@nixpkgs-ci
Copy link
Contributor

nixpkgs-ci bot commented Aug 29, 2025

Backport failed for release-25.05, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin release-25.05
git worktree add -d .worktree/backport-409797-to-release-25.05 origin/release-25.05
cd .worktree/backport-409797-to-release-25.05
git switch --create backport-409797-to-release-25.05
git cherry-pick -x b0ce3dc09f5146e831f1116e95b99017fc0f4a64

@h0nIg
Copy link
Contributor

h0nIg commented Aug 29, 2025

@YorikSar @fricklerhandwerk @infinisil @nikstur happy to get an review & approval for the matching pURL parts as well: #421125, thank you

@YorikSar
Copy link
Contributor Author

YorikSar commented Aug 29, 2025

I did a manual backport to 25.05 here: #438385

@K900
Copy link
Contributor

K900 commented Aug 30, 2025

@vcunat
Copy link
Member

vcunat commented Aug 30, 2025

Trivial reproducer:

$ nix-env -f. -qa --meta --xml -A a4 >/dev/null
derivation 'a4-0.2.3' has invalid meta attribute 'identifiers'

@vcunat
Copy link
Member

vcunat commented Aug 30, 2025

I believe that nix-env dislikes when meta contains null values (possibly nested in deeper attributes). The state of the PR right now:

nix-repl> :p a4.meta.identifiers
{
  cpe = null;
  cpeParts = {
    edition = "*";
    language = "*";
    other = "*";
    part = "a";
    product = "a4";
    sw_edition = "*";
    target_hw = "*";
    target_sw = "*";
    update = null;
    vendor = null;
    version = null;
  };
  possibleCPEs = [ ];
  v1 = {
    cpe = null;
    cpeParts = «repeated»;
    possibleCPEs = [ ];
  };
}

@YorikSar
Copy link
Contributor Author

I believe that nix-env dislikes when meta contains null values (possibly nested in deeper attributes).

You are correct, the error message comes from here: https://github.com/NixOS/nix/blob/401e7fe3ad2d01bab628c50bb34450e29d95882b/src/nix/nix-env/nix-env.cc#L1227

It’s bad that the error was only discovered after merge since we don’t seem to use nix-env in GitHub CI. Is also a strange requirement on the nix-env part. I’ll post a fixed version of this later.

@vcunat
Copy link
Member

vcunat commented Aug 30, 2025

The problem in nix-env doesn't manifest until you force it to print meta, so it's easy to miss.

cpe = (makeCPE guessedParts);
}
) possibleCPEPartsFuns;
v1 = { inherit cpeParts cpe possibleCPEs; };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1 = filterAttrsRecursive (n: v: v != null) {
should solve these nix-env problems?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would, but it's better to not add null values there in the first place instead. filterAttrsRecursive is quite expensive to run for each derivation.

YorikSar added a commit to tweag/nixpkgs that referenced this pull request Sep 1, 2025
nix-env writes a warning for each derivation that has null in its meta
values, so fields without known values are removed from the result.

Fixes issue raised by @K900 in NixOS#409797 (comment)
Copy link
Contributor Author

@YorikSar YorikSar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR that reapplies this with a fix for nix-env: #439074

cpe = (makeCPE guessedParts);
}
) possibleCPEPartsFuns;
v1 = { inherit cpeParts cpe possibleCPEs; };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would, but it's better to not add null values there in the first place instead. filterAttrsRecursive is quite expensive to run for each derivation.

YorikSar added a commit to tweag/nixpkgs that referenced this pull request Sep 12, 2025
nix-env writes a warning for each derivation that has null in its meta
values, so fields without known values are removed from the result.

Fixes issue raised by @K900 in NixOS#409797 (comment)
YorikSar added a commit to tweag/nixpkgs that referenced this pull request Sep 15, 2025
nix-env writes a warning for each derivation that has null in its meta
values, so fields without known values are removed from the result.

Fixes issue raised by @K900 in NixOS#409797 (comment)
YorikSar added a commit to tweag/nixpkgs that referenced this pull request Sep 15, 2025
nix-env writes a warning for each derivation that has null in its meta
values, so fields without known values are removed from the result.

Fixes issue raised by @K900 in NixOS#409797 (comment)

(cherry picked from commit a178fd8)
@mdaniels5757 mdaniels5757 added the 8.has: port to stable This PR already has a backport to the stable release. label Oct 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1.severity: significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. 6.topic: kernel The Linux kernel 6.topic: lib The Nixpkgs function library 6.topic: llvm/clang Issues related to llvmPackages, clangStdenv and related 6.topic: stdenv Standard environment 8.has: documentation This PR adds or changes documentation 8.has: port to stable This PR already has a backport to the stable release. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 12.approvals: 2 This PR was reviewed and approved by two persons. backport release-25.05 Backport PR automatically

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.