| Literature DB >> 32908882 |
Arnaldo Pereira1, Rui Pedro Lopes2, José Luís Oliveira1.
Abstract
The Semantic Web and Linked Data concepts and technologies have empowered the scientific community with solutions to take full advantage of the increasingly available distributed and heterogeneous data in distinct silos. Additionally, FAIR Data principles established guidelines for data to be Findable, Accessible, Interoperable, and Reusable, and they are gaining traction in data stewardship. However, to explore their full potential, we must be able to transform legacy solutions smoothly into the FAIR Data ecosystem. In this paper, we introduce SCALEUS-FD, a FAIR Data extension of a legacy semantic web tool successfully used for data integration and semantic annotation and enrichment. The core functionalities of the solution follow the Semantic Web and Linked Data principles, offering a FAIR REST API for machine-to-machine operations. We applied a set of metrics to evaluate its "FAIRness" and created an application scenario in the rare diseases domain.Entities:
Mesh:
Year: 2020 PMID: 32908882 PMCID: PMC7471816 DOI: 10.1155/2020/3041498
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1SCALEUS-FD architecture and implementation technologies. At the file system level are the triplestores for the converted data and the metadata. At the abstraction layer, we used Apache Jena and Eclipse RDF4J to implement the modules for dealing with the semantic data, comprising data integration, inference, and the management engine. Finally, we used a Jetty server to build the services layer.
Figure 2SCALEUS-FD metadata.
Figure 3Spreadsheet integration interface.
FAIR principles [27] and SCALEUS-FD evaluation.
| Principle | Evaluation |
|---|---|
| F1. (Meta)data are assigned a globally unique and persistent identifier. | We use HTTP URIs to identify digital resources uniquely. We apply the policy presented in |
| F2. Data are described with rich metadata (defined by R1 below). | The DCAT specification allows us to describe the data considering different layers of machine-readable metadata. |
| F3. Metadata clearly and explicitly include the identifier of the data it describes. | The access URL property of the dcat:Distribution class contains the globally unique and persistent identifier for the digital resource. |
| F4. (Meta)data are registered or indexed in a searchable resource. | We use RDFa to embed the dcat:Dataset class instances within the web documents generated by our app, allowing automatic indexation by the Google Dataset Search engine. |
| A1. (Meta)data are retrievable by their identifier using a standardized communications protocol. | See the evaluation of the following subcriteria. |
| A1.1 The protocol is open, free, and universally implementable. | Data and metadata are retrievable using the Hypertext Transfer Protocol (HTTP), which is a free and open-source protocol. |
| A1.2 The protocol allows for an authentication and authorization procedure, where necessary. | Access authorization: the application provides basic access authorization to perform REST calls that create, update, or delete data and metadata (POST, PUT, and DELETE operations). |
| A2. Metadata are accessible, even when the data are no longer available. | After removing any dataset, metadata continues available. |
| I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. | We use the RDF data model and the OWL formal language for knowledge representation. |
| I2. (Meta)data use vocabularies that follow FAIR principles. | We can describe datasets using existing, well-known ontologies such as the HPO or GO. For the metadata, we use the DCAT ontology. |
| I3. (Meta)data include qualified references to other (meta)data. | Following the SW principles, we use ontologies that include semantically rich relationships. |
| R1. Meta(data) are richly described with a plurality of accurate and relevant attributes. | See the evaluation of the following subcriteria. |
| R1.1. (Meta)data are released with a clear and accessible data usage license. | Accessible usage license: we use the “license” property of the dcat:Distribution class to specify the license document by which the distribution is made available. |
| R1.2. (Meta)data are associated with detailed provenance. | We use the dcat:Catalog class to indicate the provenance information associated with the data. |
| R1.3. (Meta)data meet domain-relevant community standards. | We use the W3C SW standards for both data and metadata. |