-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Scancode] dual license handling in the summarizer #280
Comments
@dabutvin sorry for the late reply. ScanCode returns detected licenses as expressions.
In both cases the expressions are made of ScanCode license keys (which are the keys in https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses) If you want instead SPDX ids each Alternatively, we could update ScanCode to return also license_expressions that would be made of SPDX keys. FWIW, expressions are reasonably common in the JQuery world. Many Rust crates have such expressions. And there are several other cases: about ~20% of the 7K rules in ScanCode return expressions with more than one license |
so in any case AND'ing license objects is unlikely to be correct as you are missing the OR and WITH cases |
Thanks @pombredanne |
@pombredanne @dabutvin Just checking on this for our upgrade to Scancode 3.0? |
This comment has been minimized.
This comment has been minimized.
this is still an issue with the summarizer the readme is correctly detecting MIT OR Apache-2.0 |
Finally I got some good input for you guys, but it will require us to decide how to handle things. The thing is, our summarizer for Scancode will look into the files array provided by the tool, and not in the (very smart) content.summary.license_expressions. On to the decision-making part.
For reference, the package mentioned gives from Scancode the following summarized counts and values (the winner with 117 is the mix we are suggested to take into account): [{"value":null,"count":159},{"value":"mit OR apache-2.0","count":117},{"value":"public-domain","count":16},{"value":"apache-2.0 OR mit","count":5},{"value":"apache-2.0","count":1},{"value":"mit","count":1},{"value":"mit-synopsys","count":1},{"value":"unknown","count":1}] Still, how dangerous is it to put it to file count? What if we got a number of equal-file-count license types? All comments appreciated. |
The component's license is MIT OR Apache-2.0 and not MIT AND Apache-2.0 as initially reported for https://clearlydefined.io/definitions/crate/cratesio/-/regex/1.0.6 |
@geneh I am looking at the example you give with cratesio, but initially the harvested info is a lot more complex and for the older version also includes earlier version runs of ScanCode, which may be an explanation (even though I will debug this to make sure this is the case). |
@geneh what would you say if we remove the original data, and recrawl? We would lose existing information from ScanCode, but I think that is exactly what is wrong. Just an idea. |
@ignacionr Do you mean recrawl all the components? Why would the outcome be any different if the scanocde version is the same? Do you mean we should recompute definitions for the affected scancode results? |
I just ran scancode on the most recent version of regex and saved the result as a gist. Please note that scancode has different results for...
Compare this to the code snippet above from scancode.js which calls _joinExpressions (that function always combines with Because the resulting package "regex" contains a compilation of Unicode data and Rust source code the likely correct SPDX expression for the ensemble is "(mit OR apache-2.0) AND unicode". NOTE: the parentheses are required because otherwise the So here are the questions:
|
@jeffmcaffer @pombredanne @iamwillbar Would love to hear your opinions on this one. |
When we are gathering up the
file.licenses.license
orfile.licenses.spdx_license_key
data we default to AND them together and this is not always correct.Given this rust crate
We list these discovered files as
Apache-2.0 and MIT
, but this is only because scancode found both and we AND them together.The scancode output has other information we should probably consume for example:
cc @pombredanne
The text was updated successfully, but these errors were encountered: