Data publishers of the Global Biodiversity Information Facility (GBIF) apply a wide range of licenses to their datasets. This is problematic:
- Users of the data need to investigate and understand those licenses before they can use the data.
- Many licenses don't comply with the practices of GBIF and/or open data, limiting the use of the data.
We want to get an overview of the characteristics of the licenses used in all GBIF registered datasets.
- Metadata of all datasets is obtained via the GBIF Registry API and written to datasets.csv.
make data/generated/datasets.csv
- All unique licenses are written to licenses.csv. Rerunning the scripts will append newly found licenses to the file.
make data/licenses.csv
- The characteristics of the licenses are manually interpreted using these guidelines.
- The annotated information is merged with the datasets†.
make data/generated/datasets-annotated.csv
- These data are analyzed.
make analysis
- The results are written to standard-license-data.csv and data.js.
- The latter is used as the basis for charts, which are displayed from the
gh-pages
branch. - The results of the analysis were presented in this blog post.
† You can easily transform the UUID keys to working URLs as follows:
"http://www.gbif.org/dataset/" & key
: http://www.gbif.org/dataset/66f6192f-6cc0-45fd-a2d1-e76f5ae3eab2"http://www.gbif.org/publisher/" & owningOrganizationKey
: http://www.gbif.org/publisher/1989b627-2a61-44db-83e4-392efc5da0a9
These are the requirements for running the analysis:
- Unix make
- Python
- requests
- pandas
- simplejson
These are the libraries used for the charts:
This work (especially the manual interpretation of the licenses) is subject to error. We hope to mitigate this by opening up our workflow in this repository (such as our guidelines), but we disclaim any liability for all uses of this work. As new and updated datasets are published to GBIF all the time, our list of datasets (gets replaced with each analysis) and licenses (new licenses are added with each analysis) will be outdated. Verify the last commit timestamp for these files to see how recent they are.
Want to use this work in a scholarly publication? You can cite this repository as:
Desmet P, Aelterman B (2013) Interpreting licenses of GBIF registered data. https://github.com/Datafable/gbif-data-licenses (accessed yyyy-mm-dd)