Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Improved transparency of managed dependencies (Maven, NPM, etc.) and their licenses. #787

Open
bduranc opened this issue Jul 19, 2019 · 7 comments
Labels
enhancement GSoC-Candidate GSoC candidate ticket

Comments

@bduranc
Copy link

bduranc commented Jul 19, 2019

Issue Type: Feature Request / Improvement to existing feature

Business Case:
Dependencies have license terms that can affect the core licensing position of a project, or simply be “license incompatible” to the component’s declared license. The most obvious example being a permissively licensed component having a GPL dependency.

Therefore, it is important to make license information of managed dependencies accessible to ClearlyDefined users.

When we speak of “managed” dependencies, we are referring to dependencies that may not actually be included in a project’s source code distribution but are instead automatically downloaded at a later point in time using technologies like NPM and Maven.

Managed dependencies may not directly be a part of the project’s source tree to start off with, but when a user builds a project themselves or consumes pre-built binaries, these dependencies are downloaded from remote repositories and are considered as a direct part of the project.

ClearlyDefined already extracts some basic information about direct managed dependencies from the project’s package manifest files during harvest. However, this information is not easily accessible to users and not all essential information is extracted.

Feature Description:
Surface whatever data is already available regarding direct managed dependencies (name, version, and scope) plus the corresponding license data and make it easily accessible to all users on the component’s definition.

Requirements:

1). Extend the “dependencies[{}]” section under the raw harvested data to include the dependency’s applicable license. Currently, the following data is available:

  • Dependency Name
  • Dependency Version
  • Dependency Scope

We’d need to add an additional field called “license” to be able to include license details for each dependency. The license data could come from a supported central repository such as NPM or Maven, depending on which one is being used.

2). Surface the existing package management metadata currently available under the raw “Harvested data” tab and normalize it.

Below is a sample of data as it is currently made available and presented for a Maven component:

image

Below is a sample of data as it is currently made available and presented for an NPM component (there are separate sections for each scope).

image

Below is a sample of data as it is currently made available and presented for a Ruby component (again, there are separate sections for each scope):

image
image

A nice improvement would be to create a new tab next to Harvested data, perhaps called “Dependency data”.

This tab would provide a standardized, human-readable summary of dependency information if any exists for a given component and would also include the license detail. The underlying data could be obtained from the package manifest files themselves as illustrated in the examples above, or by using the package repository's API (npmjs.org, etc.). In the case such API is not available, or results differ greatly from the contents of package manifest file, the package manifest file could be used as the primary data source.

image

Alternatively, instead of putting the Dependency data as a tab under the “Raw data” area, we could put it in its own section in between “Files” and “Raw data”.

image

All dependency identifiers should follow the Package URL (PURL) format outlined here: https://github.com/package-url/purl-spec. This format is consistent with the rest of ClearlyDefined.

And to ensure dependency versions are consistent and easy to understand, conventions such as those used in Semantic Versioning (semver) should be adopted where possible to normalize version constraints when we display them to users (package_x ^1.0.0, etc.)

@bduranc
Copy link
Author

bduranc commented Jul 22, 2019

I should add that the intended modality of the data here is for it to be read-only. I.e. people do not curate the package's managed dependencies. The data results from whatever package management systems / files are available. If the data is not available, then we perhaps display a generic message in it's place "No dependency data is available for this component."

Someone also mentioned on our call today about putting up a disclaimer. I think this is a smart idea as we need to indicate that the dependency data is taken as-is from the package repository or manifest file, is subject to availability and may be incomplete. Perhaps one of those little round "i" icons next to the appropriate section heading would be an appropriate place to put such a disclaimer.

More importantly, this disclaimer can be worded to also give the user some insight into how the data is obtained (either through ScanCode looking through package manifest files, or direct from the repository).

@pombredanne
Copy link
Member

@bduranc sorry for not providing feedback earlier. It all makes sense to surface the data especially since it is available in the scans. I am not sure a special disclaimer is needed.

Now what would be especially useful is to provide the ability to link to the corresponding definition either the exact version if that's what is available with the dependency or a browse page listing all the version otherwise, to allow navigation between dependencies

@storrisi
Copy link
Contributor

We’d need to add an additional field called “license” to be able to include license details for each dependency. The license data could come from a supported central repository such as NPM or Maven, depending on which one is being used.

We should understand what this requirement really means. Maybe involving @geneh or @ignacionr into this discussion would help.

A nice improvement would be to create a new tab next to Harvested data, perhaps called “Dependency data”.

I love this idea rather adding a new separate section in the UI

@bduranc
Copy link
Author

bduranc commented Aug 12, 2019

@bduranc sorry for not providing feedback earlier. It all makes sense to surface the data especially since it is available in the scans. I am not sure a special disclaimer is needed.

Now what would be especially useful is to provide the ability to link to the corresponding definition either the exact version if that's what is available with the dependency or a browse page listing all the version otherwise, to allow navigation between dependencies

@pombredanne:

Sorry, after reading this last part, now I'm a bit confused.

Are you proposing we link to the CD definition for a specific version of a dependency? i.e. I have dependency X of CD component Y. Let me see dependency X 's own definition if it exists in CD, otherwise we harvest it.

This is how I understood it on our call. Please let me know if this is not correct :)

@pombredanne
Copy link
Member

@bduranc re

Are you proposing we link to the CD definition for a specific version of a dependency? i.e. I have dependency X of CD component Y. Let me see dependency X 's own definition if it exists in CD, otherwise we harvest it.
This is how I understood it on our call. Please let me know if this is not correct :)

Exactly 👍
We would have a dependency data section for each definition and in that section we should have one of these links for each dep:

  • if the version of a dep is known, we should have a link to that exact definition on CD. If it is not in CD yet, it should be queue for addition/harvest, preferably automatically (no button needed)

  • if the version of a dep is not known (various version range syntax), we should have a link to the main page that would pre-query and list all the definition versions we have on CD. If none are not in CD yet, they should have been queued for addition/harvest, preferably automatically (no button needed)

@bduranc
Copy link
Author

bduranc commented Oct 24, 2019

Hi @pombredanne :

Sorry for the delayed response.

This sounds fine to me and a great opportunity to grow the CD ecosystem using existing data. But I think the focus should first be developing the core functionality proposed in this request as I see that as a pre-requisite to any such downstream effort. If this addition can be included in the initial scope, then even better!

The critical aspect I believe is having the "license" attribute visible on the definition for each identified dependency regardless of whether it's in CD already or not (we pull this data direct from the corresponding package manager / repo, like NPM, if possible) to help data consumers identify license risks associated with managed dependencies.

Has anyone yet been able to look into the feasibility of this request?

Thanks.

@pombredanne pombredanne added the GSoC-Candidate GSoC candidate ticket label Jan 20, 2020
@pombredanne
Copy link
Member

I am making this a Google Summer of Code idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement GSoC-Candidate GSoC candidate ticket
Projects
None yet
Development

No branches or pull requests

4 participants