Skip to content

Conversation

weihanglo
Copy link
Member

@weihanglo weihanglo commented Aug 17, 2025

What does this PR try to resolve?

Fixes #2039

rendered lint doc

Add a new lint invalid_spdx_license_expression

How to test and review this PR?

New tests should already cover

  • all valid/invalid examples in Lint docs
  • EditionFuture (2027?) deny-by-default
  • workspace inherited package.license

Should this lint workspace.package.license even nobody inherits from it?

I guess not, and that should be covered by an unused workspace inherit lint.

spdx crate version pinning

This adds a new dependency spdx but I didn't put exact pin (=) on the version. crates.io use (0.10.9). SPDX crates.io uses is still 2.3 but the latest spec version is v3. Anyway it seems that [email protected] is for SPDX spec v2.3, and [email protected] is for SPDX sped v3. I feel like it is fine we don't pin it at this moment.

https://github.com/rust-lang/crates.io/blob/bc421ae1ade7eed6d1c046456f52292ab9829095/src/licenses.rs

Name bikeshedding

  1. invalid_spdx_license_expression
  2. invalid_spdx_license_expr
  3. invalid_license_expr
  4. non_spdx_license_expression
  5. non_spdx_license_expr
  6. ???

BTW I can kinda feel that we need to pull out the lint machinery into a separate helper crate once we have three lints. I am repeating something there…

@rustbot
Copy link
Collaborator

rustbot commented Aug 17, 2025

r? @ehuss

rustbot has assigned @ehuss.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-documenting-cargo-itself Area: Cargo's documentation A-workspaces Area: workspaces S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 17, 2025
@weihanglo
Copy link
Member Author

r? Muscraft

You might be interested in reviewing new lints.

@rustbot rustbot assigned Muscraft and unassigned ehuss Aug 17, 2025
@weihanglo weihanglo added A-lints Area: rustc lint configuration A-license Area: license handling and removed A-workspaces Area: workspaces labels Aug 17, 2025
Comment on lines +283 to +288
[WARNING] invalid SPDX license expression: `MIT / Apache-2.0`
--> Cargo.toml:6:16
|
6 | license = "MIT / Apache-2.0"
| ------------ invalid character(s)
|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "invalid characters" or "deprecated"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is from spdx, should we tweak them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to treat / separately (which I thought was what was discussed), we'd likely have to do a parse fallback scheme (parse with strict, and if it fails, parse with / allowed and see if that fails).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was keeping the first implementation simple.

We can certainly perform basic checks for common mistake like lowercase OR/AND/WITH, or slash. If that is a requirement, can do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for an invalid license identifier?

// We must be a subset of crates.io's parse mode:
// https://github.com/rust-lang/crates.io/blob/bc421ae1/src/licenses.rs#L3-L8
// we don't want to allow something that would be rejected by crates.io
let Err(e) = spdx::Expression::parse_mode(license, spdx::ParseMode::STRICT) else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't just checking the validity of the expression but whether the identifier is from a known list or not

https://github.com/EmbarkStudios/spdx/blob/e4eac6db2030a7deeca57cc7f5d0485598d8c9e5/src/lexer.rs#L253-L282

https://github.com/EmbarkStudios/spdx/blob/main/src/identifiers.rs#L18

I thought we only wanted to check syntax (and maybe have a separate lint for being a known identifier). Unfortunately, it looks like there isn't a way to do that with spdx.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good for checking the expression, and yeah spdx doesn't offer that 😞. It might be fine as normally people want known identifiers. The remaining questions are:

  • If we have a new lint for checking known identifier, is it a breaking change if this one turns into a expression-only lint?
  • Will we have per-line configuration so people can allowlist their identifiers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main concern is people using a new identifier that cargo doesn't support yet.

It would be bad to have to turn off the syntax lint "temporarily" (if people remember to turn it back on). These also seem like different levels of concern. For example, we were talking about transitioning this to an error on an edition but we likely shouldn't do that for identifiers, leaving that as a warning or allow.

If we have a new lint for checking known identifier, is it a breaking change if this one turns into a expression-only lint?

Lints are allowed to change, unsure how much though.

Will we have per-line configuration so people can allowlist their identifiers?

Having a way to add new identifers through lint config might be a reasonable way to mitigate this but that also requires a change to spdx iiuc

It sounds like we'd need a change either way to spdx to stabilize this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing spdx is definitely the right approach, totally agreed.

However, in practice,

  • I doubt people will use new license identifier right after it gets into SPDX spec.
  • Even if they do, it is likely pretty rare.
  • Will crates.io fail an upload due to license parse error? Asked because if yes, then they will still fail during publish, and this lint must adhere with what crates.io is doing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will crates.io fail an upload due to license parse error? Asked because if yes, then they will still fail during publish, and this lint must adhere with what crates.io is doing.

We can be less strict than crates.io as we support multiple registries.

We also have different update requirements than crates.io (every 6 weeks, no updates to previous version vs crates.io effectively living at HEAD)

Copy link
Member Author

@weihanglo weihanglo Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a discussion with @Muscraft. Let's say we have two lints:

  • invalid_spdx_license_expression
  • invalid_spdx_license_identifier

And here we have dependency problem betweewn two lints dependency issue. If the expression is invalid, identifiers can't be meaningfully validated.

To fix this issue, we have some ideas:

  • Have two lints. Run these lints in one function with two passes: first expression, then identifier.
    • Lint level: expression > identifier -> No issue. Just fire lints.
    • Lint level: expression < identifier -> We may need a special diagnostic saying "identifiers lint could not be checked due to invalid expression".
  • Make just one lint but have its failure cases be configurable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd go with the two lints in one function.

Might also be good to get someone's opinion who has dealt more with licensing, like @joshtriplett

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also the discussion in spdx crate with its author (also the author of cargo-deny): EmbarkStudios/spdx#82.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for the idea of separating "this is a valid SPDX expression" from "all the identifiers in this SPDX expression are known". I'd also love to see this allow for future extensibility, so that people can say "here's a license file, let this new term refer to it" and then write an expression that uses that term. (We should probably check with SPDX at some point about whether there's a reserved portion of the identifier namespace.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put some of my thoughts on the linked issue EmbarkStudios/spdx#82 (comment).

Added a bit more:

  • It is not uncommon one allow a lint because it has false-positive / doesn't meet their their requirements.
  • The more lints the better? Probably yes, but the more lints with similar functionalities, the more confusing (on both implementation and user side).
  • Setting up lint rules is kinda showing a stance of the team. If we being strict on SPDX license, people follow, if we prove invalid identifiers that implies we think that should be a thing people want.
  • Configuration are a bit better as people may want only OSI-approved or copyleft licenses.
  • The initial idea of SPDX dual-license inconsistency #2039 is for SPDX license syntax consistency. I feel like we are drifting to a broader licensing issue now 😆.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-documenting-cargo-itself Area: Cargo's documentation A-license Area: license handling A-lints Area: rustc lint configuration S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SPDX dual-license inconsistency
6 participants