| Literature DB >> 34785977 |
Javad Chamanara1,2, Jitendra Gaikwad1, Roman Gerlach1, Alsayed Algergawy1, Andreas Ostrowski1, Birgitta König-Ries1,3.
Abstract
BACKGROUND: Obtaining fit-to-use data associated with diverse aspects of biodiversity, ecology and environment is challenging since often it is fragmented, sub-optimally managed and available in heterogeneous formats. Recently, with the universal acceptance of the FAIR data principles, the requirements and standards of data publications have changed substantially. Researchers are encouraged to manage the data as per the FAIR data principles and ensure that the raw data, metadata, processed data, software, codes and associated material are securely stored and the data be made available with the completion of the research. NEW INFORMATION: We have developed BEXIS2 as an open-source community-driven web-based research data management system to support research data management needs of mid to large-scale research projects with multiple sub-projects and up to several hundred researchers. BEXIS2 is a modular and extensible system providing a range of functions to realise the complete data lifecycle from data structure design to data collection, data discovery, dissemination, integration, quality assurance and research planning. It is an extensible and customisable system that allows for the development of new functions and customisation of its various components from database schemas to the user interface layout, elements and look and feel.During the development of BEXIS2, we aimed to incorporate key aspects of what is encoded in FAIR data principles. To investigate the extent to which BEXIS2 conforms to these principles, we conducted the self-assessment using the FAIR indicators, definitions and criteria provided in the FAIR Data Maturity Model. Even though the FAIR data maturity model is developed initially to judge the conformance of datasets, the self-assessment results indicated that BEXIS2 remarkably conforms and supports FAIR indicators. BEXIS2 strongly conforms to the indicators Findability and Accessibility. The indicator Interoperability is moderately supported as of now; however, for many of the lesssupported facets, we have concrete plans for improvement. Reusability (as defined by the FAIR data principles) is partially achieved.This paper also illustrates community deployment examples of the BEXIS2 instances as success stories to exemplify its capacity to meet the biodiversity and ecological data management needs of differently sized projects and serve as an organisational research data management system. Javad Chamanara, Jitendra Gaikwad, Roman Gerlach, Alsayed Algergawy, Andreas Ostrowski, Birgitta König-Ries.Entities:
Keywords: FAIR data maturity model; FAIR data principles; biodiversity; data lifecycle; ecology; environmental science; open-source; research data management
Year: 2021 PMID: 34785977 PMCID: PMC8589773 DOI: 10.3897/BDJ.9.e72901
Source DB: PubMed Journal: Biodivers Data J ISSN: 1314-2828
Figure 1.The overall BEXIS2 architecture. The modules on the extensibility side are exemplary; the implemented modules may differ.
Figure 2.The main elements of a dataset in the BEXIS2 system.
Self-assessed scoring of BEXIS2 system’s features in conformance with the Findability principle of the RDA maturity model. [E], [I] and [U] represent Essential, Important and Useful priorities, respectively. In the Availability column, D and C stand for Default and Configurable by the system, respectively.
|
|
|
|
|
| RDA-F1-01M: Metadata is identified by a persistent identifier [E] | BEXIS2 is interoperable with DataCite (FABRICA) application and thus capable of generating, registering and assigning DOIs to the datasets. | 4 | C |
| RDA-F1-01D: Data is identified by a persistent identifier [E] | BEXIS2 is interoperable with DataCite (FABRICA) application and thus capable of generating, registering, minting and assigning DOIs to the datasets. | 4 | C |
| RDA-F1-02M: Metadata is identified by a globally unique identifier [E] | Every dataset is assigned a unique identifier, which is used to identify metadata as well as the data. Additionally, every version of a dataset has its own identifier. | 4 | D |
| RDA-F1-02D: Data is identified by a globally unique identifier [E] | Every dataset is assigned a unique identifier, which is used to identify metadata, as well as the data. Additionally, every version of a dataset has its own identifier. | 4 | D |
| RDA-F2-01M: Rich metadata is provided to allow discovery [E] | A minimum set of metadata attributes must be maintained in the BEXIS2 system for each dataset. Additionally, data managers are able to ingest into the system and activate larger and/or customised metadata schemas, for example, EML, Darwin Core, amongst others. | 4 | C |
| RDA-F3-01M: Metadata includes the identifier for the data [E] | The metadata and data are packaged as a dataset (Fig. | 3 | D |
| RDA-F4-01M: Metadata is offered in such a way that it can be harvested and indexed [E] | Metadata can be harvested through the APIs or the specific interface that follows the OAI-PMH Standard. | 4 | D |
Self-assessed scoring of BEXIS2 system’s features in conformance with the Accessibility principle of the RDA maturity model. [E], [I] and [U] represent Essential, Important and Useful priorities, respectively. In the Availability column, D and C stand for Default and Configurable by the system, respectively.
|
|
|
|
|
| RDA-A1-01M: Metadata contain information to enable the user to get access to the data [I] | The information about the access policy and data licences can be provided via the metadata schema. Providing this information is the responsibility of the data manager. | 4 | C |
| RDA-A1-02M: Metadata can be accessed manually (i.e. with human intervention) [E] | Metadata, primary data and their schemas can be created, updated and retrieved through the user interface (GUI), as well as the API. | 4 | D |
| RDA-A1-02D: Data can be accessed manually (i.e. with human intervention) [E] | Metadata, primary data and their schema can be created, updated and retrieved through the user interface (GUI), as well as the API. | 4 | D |
| RDA-A1-03M: Metadata identifier resolves to a metadata record [E] | Identifiers and DOIs resolve to the landing page of the corresponding dataset, which provides access to metadata and data. | 4 | D |
| RDA-A1-03D: Data identifier resolves to a digital object [E] | Identifiers and DOIs resolve to the landing page of the corresponding dataset, which provides access to metadata, data and schemas. | 4 | D |
| RDA-A1-04M: Metadata is accessed through standardised protocol [E] | Metadata can be accessed via HTTP or OAI-PMH protocols in HTML or XML serialisation formats. | 4 | D |
| RDA-A1-04D: Data is accessible through standardised protocol [E] | Data and its schema can be accessed via HTTP or OAI-PMH protocols in HTML, TEXT or JSON serialisation formats. | 4 | D |
| RDA-A1-05D: Data can be accessed automatically (i.e. by a computer program) [I] | Metadata, primary data and their schemas can be accessed via REST API or OAI-PMH interface. | 4 | D |
| RDA-A1.1-01M: Metadata is accessible through a free access protocol [E] | Metadata, primary data and data schema can be accessed via HTTP or OAI-PMH protocols in HTML or XML serialisation formats, all open and free. | 4 | D |
| RDA-A1.1-01D: Data is accessible through a free access protocol [I] | Data and its schema can be accessed via HTTP or OAI-PMH protocols in HTML, TEXT or JSON serialisation formats, all open and free. | 4 | D |
| RDA-A1.2-02D: | Data are accessed via either the GUI or the APIs. Both utilise HTTP, which supports authentication and authorisation. Data access is controlled by fine-grained permissions. | 4 | D |
| RDA-A2-01M: Metadata is guaranteed to remain available after data is no longer available [E] | The system allows for creating datasets with metadata only in the first place. Deleting data elements does not affect the life of the metadata. | 4 | D |
Self-assessed scoring of the BEXIS2 system’s features in conformance with the Interoperability principle of the RDA maturity model. [E], [I] and [U] represent Essential, Important and Useful priorities, respectively. In the Availability column, D and C stand for Default and Configurable by the system, respectively.
|
|
|
|
|
| RDA-I1-01M: Metadata uses knowledge representation expressed in standardised format [I] | Predefined vocabularies can be incorporated. There is a limited possibility to link terminology terms to the metadata attributes. This is the responsibility of the individual instance. | 3 | C |
| RDA-I1-01D: Data uses knowledge representation expressed in standardised format [I] | Users are able to design data structures by specifying their variables, units and data types and re-use them for any dataset. | 3 | C |
| RDA-I1-02M: Metadata uses machine-understandable knowledge representation [I] | Standard metadata schemas, such as EML, can be installed and used, which provide a moderate degree of machine understandability. The adoption of schema.org is in progress. It is also feasible to map metadata elements to their equivalents in, for example, Dublin Core. | 3 | C |
| RDA-I1-02D: Data uses machine-understandable knowledge representation [I] | Data are shipped with its schema that defines the variables, their unit of measurement and validation rules. | 3 | C |
| RDA-I2-01M: Metadata uses FAIR-compliant vocabularies [I] | The system integration with well-known terminology servers, such as the | 2 | C |
| RDA-I2-01D: Data uses FAIR-compliant vocabularies [U] | In tabular data, variables can be defined according to international standards (e.g. SI units) | 2 | C |
| RDA-I3-01M: Metadata includes references to other metadata [I] | Datasets can have qualified relations to other datasets in the system. Metadata contain links to other internal self-defined entities (e.g. publication). Currently, these references are not part of the metadata as default, but as a part of the dataset package during download. Datasets maintain their version tracking; however, by default, the latest version is served. Linking to other internal datasets inside metadata is under implementation. Metadata may contain references to other metadata depending on the setup of the metadata schema. This property is customer-specific. | 3 | C |
| RDA-I3-01D: Data includes references to other data [U] | There are qualified references such as data versioning and related internal or external data inside the system. However, this is not served alongside the data. Data may contain references to other data depending on the setup of the data structure. This property is customer-specific. | 2 | C |
| RDA-I3-02M: Metadata includes references to other data [U] | Datasets can be linked to other datasets, as well as other internal resources. They are always linked to their previous versions. Metadata may also contain references to any other external resource (e.g. using URL). | 3 | D |
| RDA-I3-02D: Data includes qualified references to other data [U] | There are qualified references, such as data versioning and related internal or external data inside the system. However, this is not served alongside the data. Data may contain references to other data depending on the set-up of the data structure. This property is customer-specific | 2 | C |
| RDA-I3-03M: Metadata includes qualified references to other metadata [I] | Datasets can have qualified relations to other datasets in the system. Metadata contain links to other internal self-defined entities (e.g. publication). Datasets maintain their version tracking; however, by default, the latest version is served. Linking to other internal datasets inside metadata is also possible. Metadata may contain references to other metadata depending on the set-up of the metadata schema. This customer-specific property is under implementation. | 3 | C |
| RDA-I3-04M: Metadata include qualified references to other data [U] | Metadata can have qualified links to other datasets, as well as other internal resources. They are always linked to their previous versions. Metadata may also contain references to any other external resource (e.g. using URL). | 3 | C |
Self-assessed scoring of BEXIS2 system’s features in conformance with the Reusability principle of the RDA maturity model. [E], [I] and [U] represent Essential, Important and Useful priorities, respectively. In the Availability column, D and C stand for Default and Configurable by the system, respectively.
|
|
|
|
|
| RDA-R1-01M: Plurality of accurate and relevant attributes are provided to allow Reuse [E] | The system has the capacity to generate rich metadata. Metadata can provide information about the data, contributors, geo-temporal extent, context, schema, software used, licensing, versioning and identification of the designated datasets. The system allows the use of existing metadata schemas, also in co-existence. | 4 | C |
| RDA-R1.1-01M: Metadata includes information about the licence under which the data can be reused [E] | BEXIS2 provides site and tenant-level terms and conditions policies. In addition, each individual dataset can have its own licence. | 4 | D |
| RDA-R1.1-02M: Metadata refers to a standard reuse licence [I] | BEXIS2 provides site and tenant-level terms and conditions policies. In addition, each individual dataset can have its own licence. BEXIS2 is able to restrict the list of licences available to its users to choose from. | 4 | C |
| RDA-R1.1-03M: Metadata refers to a machine-understandable reuse licence [I] | The list of licences can be obtained and chosen from a standard vocabulary, such as SWO ( | 4 | C |
| RDA-R1.2-01M: Metadata includes provenance information according to community-specific standards [I] | Each and every change to the metadata generates a new version in the system. Therefore, the complete change history is maintained and accessible. However, this provenance data are not yet communicated via a community-specific standard. At the moment, BEXIS2 maintains a linear forward versioning scheme similar to that of source control systems. | 2 | D |
| RDA-R1.2-02M: Metadata includes provenance information according to a cross-community language [U] | Each and every change to the metadata commits a new version to the system. Therefore, the complete change history is maintained and accessible. However, this provenance data are not yet communicated via a language, such as | 1 | D |
| RDA-R1.3-01M: Metadata complies with a community standard [E] | BEXIS2 is able to ingest multiple communities and/or cross-community metadata standards for use. BEXIS2 is shipped with some community standard metadata schemas (e.g. EML, Dublin Core). | 4 | C |
| RDA-R1.3-01D: Data complies with a community standard [E] | BEXIS2 provides tabular data in open formats such as TXT, CSV, TSV, as well as public (community applied) formats, such as EXCEL. BEXIS2 maintains the original file format that is used by the data owner and/or the community. | 4 | C |
| RDA-R1.3-02M: Metadata is expressed in compliance with a machine-understandable community standard [E] | BEXIS2 is able to ingest multiple communities and/or cross-community metadata standards for use. BEXIS2 is shipped with some community standard metadata schemas (e.g. EML, Dublin Core). Metadata are accessible via API. | 4 | C |
| RDA-R1.3-02D: Data is expressed in compliance with a machine-understandable community standard [I] | For tabular data, users are able to design data structures by specifying its variables, units and data types and re-use them for any dataset. Thus, users are able to implement community standards. However, data are not yet provided in a machine-understandable way. | 2 | C |
Figure 3.Spider net (Radar) charts illustrate BEXIS2 conformance to the RDA FAIR data maturity model.
Maturity level per indicator (per FAIR area) - 0 – not applicable; 1 – not being considered as yet; 2 – under consideration or in the planning phase; 3 – in implementation phase; 4 – fully implemented.
Figure 4.iDiv data repository home page.
Figure 5.Multimedia module developed by iDiv integrated as part of the core BEXIS2 system.
Figure 6.Use of AD ontology to annotate datasets.
Figure 7.Using AD ontology (ADOn) to link between different datasets.
Figure 8.The data page of BExIS, the previous information system of the BE, showing the distributed search options on datasets in the 2nd menu level.
Figure 9.The search page of BEXIS2, the new information system of the BE, showing one integrated search amongst different entity types (here datasets and publications).