Import and convert Figshare data and records to a DataONE repository
- Authors: Nesbitt, Ian (http://orcid.org/0000-0001-5828-6070)
- License: Apache 2
- Package source code on GitHub
- Submit Bugs and feature requests
- Contact us: [email protected]
- DataONE discussions
This software is meant to provide transport of data and translation of metadata from Figshare's json format to DataONE. It uses a custom translation method to convert to Ecological Metadata Language (EML) and upload data, metadata and resource maps to a DataONE Metacat instance. This workflow may be run during repository setup to move a large corpus from Figshare into a new DataONE repository.
DataONE in general, and figshare-import in particular, are open source, community projects. We welcome contributions in many forms, including code, graphics, documentation, bug reports, testing, etc. Use the DataONE discussions to discuss these contributions with us.
Documentation is a work in progress. All functions have reStructuredText docstrings and fairly well commented. In the future, a documentation site will be built into the repository.
- Set the config values in
~/.config/figshare-import/config.json
. Be mindful to run test operations only on staging servers prior to operating in a production environment:{ "rightsholder_orcid": "http://orcid.org/0000-0001-5828-6070", "write_groups": ["CN=Test_Group,DC=dataone,DC=org"], "changePermission_groups": ["CN=Test_Group,DC=dataone,DC=org"], "nodeid": "urn:node:mnTestKNB", "mnurl": "https://dev.nceas.ucsb.edu/knb/d1/mn/", "cnurl": "https://cn-stage.test.dataone.org/cn", "metadata_json": "~/figshare-import/article-details-test.json", "data_root": "/mnt/ceph/repos/si/figshare/FIG-12/" }
- Copy your DataONE authentication token to
~/.config/figshare-import/.d1_token
. - Ensure the
article-details.json
file is in place and noted in the"metadata_json"
field of the config file. - Run the download script
./figshare_import/run_figshare_download.py
. This may take a while depending on how much content you are downloading from Figshare. - Run the upload script
./figshare_import/run_data_upload.py
. This may also take a while. Operations will be significantly quicker when run within the same network as the Member Node you are uploading to.
- Ensure all config values are correct. Triple-check them.
- Ensure your DataONE authentication token is valid and current, and that you have at least write permission on the member node. DataONE tokens expire after 24 hours. Long-lived tokens can be obtained from DataONE support in appropriate cases.
- Ensure Figshare content is public. Script will need to be modified to pass Figshare authentication credentials if you intend to download private datasets.
- File an issue. Be sure to describe your problem in detail, and post the content of your configuration file. DO NOT post your authentication token.
In the terminal:
$ figsharedownload
$ figshareimport
In Python:
>>> from figshare_import.run_figshare_download import run_figshare_download
>>> from figshare_import.run_data_upload import run_data_upload
>>> run_figshare_download()
>>> run_data_upload()
This is a python package built using the Python Poetry build tool.
To install locally, create a virtual environment for python 3.9+,
install poetry, and then install or build the package with poetry install
or poetry build
, respectively.
To run unit tests, navigate to the root directory and run python -m unittest test.py
.
Tests have not yet been fully implemented for this software.
Copyright [2024] [Regents of the University of California]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Work on this package was supported by:
- DataONE Network
Additional support was provided for collaboration by the National Center for Ecological Analysis and Synthesis, a Center funded by the University of California, Santa Barbara, and the State of California.