-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat(lints): new lint invalid_spdx_license_expression #15847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
r? Muscraft You might be interested in reviewing new lints. |
[WARNING] invalid SPDX license expression: `MIT / Apache-2.0` | ||
--> Cargo.toml:6:16 | ||
| | ||
6 | license = "MIT / Apache-2.0" | ||
| ------------ invalid character(s) | ||
| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be "invalid characters" or "deprecated"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message is from spdx
, should we tweak them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to treat /
separately (which I thought was what was discussed), we'd likely have to do a parse fallback scheme (parse with strict, and if it fails, parse with /
allowed and see if that fails).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was keeping the first implementation simple.
We can certainly perform basic checks for common mistake like lowercase OR/AND/WITH, or slash. If that is a requirement, can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for an invalid license identifier?
// We must be a subset of crates.io's parse mode: | ||
// https://github.com/rust-lang/crates.io/blob/bc421ae1/src/licenses.rs#L3-L8 | ||
// we don't want to allow something that would be rejected by crates.io | ||
let Err(e) = spdx::Expression::parse_mode(license, spdx::ParseMode::STRICT) else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't just checking the validity of the expression but whether the identifier is from a known list or not
https://github.com/EmbarkStudios/spdx/blob/main/src/identifiers.rs#L18
I thought we only wanted to check syntax (and maybe have a separate lint for being a known identifier). Unfortunately, it looks like there isn't a way to do that with spdx
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good for checking the expression, and yeah spdx doesn't offer that 😞. It might be fine as normally people want known identifiers. The remaining questions are:
- If we have a new lint for checking known identifier, is it a breaking change if this one turns into a expression-only lint?
- Will we have per-line configuration so people can allowlist their identifiers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main concern is people using a new identifier that cargo doesn't support yet.
It would be bad to have to turn off the syntax lint "temporarily" (if people remember to turn it back on). These also seem like different levels of concern. For example, we were talking about transitioning this to an error on an edition but we likely shouldn't do that for identifiers, leaving that as a warning or allow.
If we have a new lint for checking known identifier, is it a breaking change if this one turns into a expression-only lint?
Lints are allowed to change, unsure how much though.
Will we have per-line configuration so people can allowlist their identifiers?
Having a way to add new identifers through lint config might be a reasonable way to mitigate this but that also requires a change to spdx
iiuc
It sounds like we'd need a change either way to spdx
to stabilize this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing spdx
is definitely the right approach, totally agreed.
However, in practice,
- I doubt people will use new license identifier right after it gets into SPDX spec.
- Even if they do, it is likely pretty rare.
- Will crates.io fail an upload due to license parse error? Asked because if yes, then they will still fail during publish, and this lint must adhere with what crates.io is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will crates.io fail an upload due to license parse error? Asked because if yes, then they will still fail during publish, and this lint must adhere with what crates.io is doing.
We can be less strict than crates.io as we support multiple registries.
We also have different update requirements than crates.io (every 6 weeks, no updates to previous version vs crates.io effectively living at HEAD)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a discussion with @Muscraft. Let's say we have two lints:
- invalid_spdx_license_expression
- invalid_spdx_license_identifier
And here we have dependency problem betweewn two lints dependency issue. If the expression is invalid, identifiers can't be meaningfully validated.
To fix this issue, we have some ideas:
- Have two lints. Run these lints in one function with two passes: first expression, then identifier.
- Lint level:
expression > identifier
-> No issue. Just fire lints. - Lint level:
expression < identifier
-> We may need a special diagnostic saying "identifiers lint could not be checked due to invalid expression".
- Lint level:
- Make just one lint but have its failure cases be configurable
- For example, exposing ParseMode but add config for invalid identifiers and else.
- Clippy also has lints with lots of configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'd go with the two lints in one function.
Might also be good to get someone's opinion who has dealt more with licensing, like @joshtriplett
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also the discussion in spdx
crate with its author (also the author of cargo-deny
): EmbarkStudios/spdx#82.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for the idea of separating "this is a valid SPDX expression" from "all the identifiers in this SPDX expression are known". I'd also love to see this allow for future extensibility, so that people can say "here's a license file, let this new term refer to it" and then write an expression that uses that term. (We should probably check with SPDX at some point about whether there's a reserved portion of the identifier namespace.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put some of my thoughts on the linked issue EmbarkStudios/spdx#82 (comment).
Added a bit more:
- It is not uncommon one allow a lint because it has false-positive / doesn't meet their their requirements.
- The more lints the better? Probably yes, but the more lints with similar functionalities, the more confusing (on both implementation and user side).
- Setting up lint rules is kinda showing a stance of the team. If we being strict on SPDX license, people follow, if we prove invalid identifiers that implies we think that should be a thing people want.
- Configuration are a bit better as people may want only OSI-approved or copyleft licenses.
- The initial idea of SPDX dual-license inconsistency #2039 is for SPDX license syntax consistency. I feel like we are drifting to a broader licensing issue now 😆.
What does this PR try to resolve?
Fixes #2039
rendered lint doc
Add a new lint
invalid_spdx_license_expression
How to test and review this PR?
New tests should already cover
package.license
Should this lint workspace.package.license even nobody inherits from it?
I guess not, and that should be covered by an unused workspace inherit lint.
spdx
crate version pinningThis adds a new dependency
spdx
but I didn't put exact pin (=
) on the version. crates.io use (0.10.9). SPDX crates.io uses is still 2.3 but the latest spec version is v3. Anyway it seems that[email protected]
is for SPDX spec v2.3, and[email protected]
is for SPDX sped v3. I feel like it is fine we don't pin it at this moment.https://github.com/rust-lang/crates.io/blob/bc421ae1ade7eed6d1c046456f52292ab9829095/src/licenses.rs
Name bikeshedding
invalid_spdx_license_expression
invalid_spdx_license_expr
invalid_license_expr
non_spdx_license_expression
non_spdx_license_expr
BTW I can kinda feel that we need to pull out the lint machinery into a separate helper crate once we have three lints. I am repeating something there…