Releases: pangaea-data-publisher/fuji
v3.2.0
Changes from 3.1.0 to 3.2.0
- Integration of FAIR testing for software, for more details see the following pull request:
- Improved DCAT handling, now avoids overwriting existing license and access rights info; fixed incorrect handling of distribution info (bytesize type)
- Re3data metadata lookup is now always performed, before it was done in case no service endpoint was given only.
- Improved RDFa handling: image tags like are ignore now
- Upgraded connexion to v 3; python 3.11
- Improved XML handling / scheme recognition e.g. for DDI formats
- Improved handling of non HTML “landing content” for DOIs see: #492
- Improved handling of CC licenses, previously these were not always correctly recognized as valid
v3.1.0
The main change in this release is the data_harvester behavior which is now using threads to download data objects/files. This allows to include more data files for the assessment. In detail, F-UJI now is trying to analyse up to 5 files per mime-type (as listed in the metadata).
Some other changes to note:
All: Incorrect handling of some landing pages which cause the evaluator to stop has been fixed.
R1.1: Licenses packed as lists are now unpacked and correctly identified
I3: In some cases scores for I3 are improved due to the inclusion of schema.org/citation as scanned relation property
R1: Incorrect handling of file sizes given or interpreted as strings like 'None', which were accepted as valid content, caused incorrect (too high) scoring of R1, score might be lower but correct now in theses cases.
R1: Improved handling of mime types including e.g. charset info (text/plain; charset=US-ASCII) may result in higher score for R1 (FsF-R1-01MD-3)
R1: Improved parsing of content length byte units may improve the scoring.
F2: Improved handling of RDF graphs containing DC or schema.org terms to describe the content may improve findability and other scores
R1.3: F-UJI now uses threads to download more data objects (up to five files/links per claimed content type) which improves its capability to evaluate data content
v3.0.0
This new release allows configuration of metric YAML which also affects how tests are performed. More documentation about this will be published soon in the README.
Some changes of F-UJI's behaviour have to be mentioned:
The role of the YAML metric definition file is more important now. It also allows defining individual scores and maturity levels which are now longer hardcoded.
Metrics and tests which are not listed in the YAML files are not performed/assessed; this allows to switch on/off metrics and tests for community specific metrics to be defined in dedicated yaml files.
F-UJI is now able to use different metrics the REST has now an additional parameter ‘metric_version’ by which the yaml file can be defined (default metrics_v0.5.yaml)
F-UJI > 2.3.0 has more tests implemented which allow to define metrics and tests in specific yaml files which are more compatible with RDA and The Evaluator:
- FsF-F1-01DD unique identifier of data
- FsF-F1-02DD persistent identifier of data
- FsF-F1-01M which will replace FsF-F1-01D unique identifier of metadata
- FsF-F1-02M which will replace FsF-F1-02D persistent identifier of metadata
- FsF-F3-02M (metadata include identifier of dataset)
- FsF-F4-01M-2 which tests if OAI-PMH, SPARQl or CSW is used to offer metadata
F-UJI now is not using the first data object for F3, A1, R1 and R1.3 but the first data object which is accessible (HTTP 200)
Fixed a bug which caused wrong scores for R1 because FsF-R1-01MD-3 was sometimes ignoring matching file sizes and types.
F-UJI now also recognizes resource types for R1 if given as URI e.g. schema.org/Dataset
Fixed a bug due to which in 2.2.5 signposting links to JSON-LD files was incorrectly accepted as valid search engine support mechanism.
Fixed a bug which accepted stringified ‘None’ as entry for file type and size and cause wrong scores for R1
Improved license recognition
Improved JSON-LD handling
F-UJI is truncating very large data files prior to testing which caused R1 test FsF-R1-01MD-3 (Data content matches file type and size specified in metadata) to incorrectly compare expected file size with truncated size. Now F-UJI compares expected size with size given in HTTP header (if given) to perform this test for truncated files.
Prior to version 2.3.0 F-UJI was correctly detecting valid domain agnostic metadata standards in R1.3 (FsF-R1.3-01M-3) but did not assign any score for this. This bug was fixed for F-UJI >=2.3.0
Prior to version 3.0.0 F-UJI was accepting content negotiation in addition to html embedding and microdata as a search engine friendly way to offer metadata in FsF-F4-01M - (Metadata is offered in such a way that it can be retrieved programmatically.) Additionally F-UJI did not verify the metadata standard and content offered via RDFa/microdata. Now, F-UJi is exclusively expecting schema.org, DC or DCAT as search engine friendly metadata formats offered via html embedding and microdata/RDFa. It no longer considers empty RDFa content as it did before.
v2.2.5
v2.0.2
Full Changelog: v1.4.9...v.2.0.2
This release is the first which is based on the completely restructured metadata_harvesting class. All metadata and PID collecting methods have moved there from fair_check. This allows easier testing and also using the harvester for other purposes.
v.1.4.9
Includes 1.7.9b
This will be the last version which uses metric 0.4
Improvements:
- Improved signposting handling: better recognition in HTML as well as header; now focusses on metadata and identifier related links and ignores e.g. ORCID author links.
- Improved JSON-LD handling, now tries to identify dataset (preferred) or creative work metadata in case several JSON-LD snippets are given (e.g. one for Webpage and another one for Dataset)
- More mime types now recognized
- Content negotiation now adds a preferred type, e.g. the one found in typed links
- Namespace recognition now case insensitive
- Improved Dublin Core parsing, now case insensitive
- Improved XML mime type recognition