| Literature DB >> 36180441 |
Dominique Batista1, Alejandra Gonzalez-Beltran1,2, Susanna-Assunta Sansone1, Philippe Rocca-Serra3.
Abstract
Community-developed minimum information checklists are designed to drive the rich and consistent reporting of metadata, underpinning the reproducibility and reuse of the data. These reporting guidelines, however, are usually in the form of narratives intended for human consumption. Modular and reusable machine-readable versions are also needed. Firstly, to provide the necessary quantitative and verifiable measures of the degree to which the metadata descriptors meet these community requirements, a requirement of the FAIR Principles. Secondly, to encourage the creation of standards-driven templates for metadata authoring, especially when describing complex experiments that require multiple reporting guidelines to be used in combination or extended. We present new functionalities to support the creation and improvements of machine-readable models. We apply the approach to an exemplar set of reporting guidelines in Life Science and discuss the challenges. Our work, targeted to developers of standards and those familiar with standards, promotes the concept of compositional metadata elements and encourages the creation of community-standards which are modular and interoperable from the onset.Entities:
Mesh:
Year: 2022 PMID: 36180441 PMCID: PMC9525592 DOI: 10.1038/s41597-022-01707-6
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Fig. 1Difference in representation of the MIAME checklist in two public repositories: GEO and ArrayExpress. (A) GEO (10.25504/FAIRsharing.5hc8vt) and ArrayExpress (10.25504/FAIRsharing.6k0kwd) are two databases highly recommended by journals and funders data policies, and both implement the community-defined MIAME reporting guideline to describe microarray experiment (10.25504/FAIRsharing.32b10v), among others. The implementation of MIAME is done via several formats (used to upload and download datasets from these two databases), which include SOFT (10.25504/FAIRsharing.3gxr9) and MINiML (10.25504/FAIRsharing.gaegy8) for GEO; MAGE-ML (10.25504/FAIRsharing.x964fb) that is now deprecated and superseded by MAGE-TAB (10.25504/FAIRsharing.ak8p5g) for the ArrayExpress, which also uses the EFO terminology (10.25504/FAIRsharing.1gr4tz) to annotate the metadata. (B) Using a few metadata requirements from MIAME as example (namely: study, study title, study description) we illustrate how the metadata labels, along with their level of requirement (must, should, may), varies across the formats used by the two databases.
The set of reporting guidelines we selected to illustrate our approach.
| Reporting guideline name | Domain coverage | Format | Date of creation | FAIRsharing record DOI |
|---|---|---|---|---|
| MIAME | Transcriptomics | Textual, PDF file | 1999 | 10.25504/FAIRsharing.32b10v |
| MIACA | Cellular assay | XSD | 2006 | 10.25504/FAIRsharing.7d0yv9 |
| MIFlowcyt | Flow cytometry | Textual, PDF file | 2007 | 10.25504/FAIRsharing.kcnjj2 |
| MINSEQE | High-throughput nucleotide sequencing | Textual, PDF file | 2008 | 10.25504/FAIRsharing.a55z32 |
| MIXS-MIMARKS | Nucleotide sequencing from environmental samples | Excel, XML file | 2011 | 10.25504/FAIRsharing.zvrep1 |
| MIACME | Cell migration assay | generate | 2016 | 10.25504/FAIRsharing.vh2ye1 |
| MIAPPE | Plant phenotyping | Excel, TSV file | 2018 | 10.25504/FAIRsharing.nd9ce9 |
These encompassed examples in narrative and formalized format, older and newer work, and include some that have a domain overlap in order to test the composability capability.
Fig. 2How to create a reporting guideline that is machine-readable ab initio. (1) A checklist/reporting guideline is formally expressed as JSON schemas. 1*) Quality Control step: JSON ScheeLD provides the means to validate the model against the JSON Schema specification; and the JSON Schema Documenter helps visualise models in the browser. (2) JSON ScheeLD creates JSON-LD context file stubs and user provides the mapping manually. 2*) Quality Control step: use JSON Schema Documenter to verify that all the fields are mapped to an ontology term. (3) Export to the CEDAR API and provide stable identifiers.
Fig. 3How to merge two existing guidelines into a new set of schemas. (1) A developer uses the JSON Schema Documenter to explore the different guidelines, MIACME and MIACA. (2) JSON ScheeLD relies on the context files to compare the two given models and outputs a file readable by the JSON Compare Viewer. This allows the developer to see which fields are semantically identical. (3) JSON ScheeLD pulls the fields from the MIACME model and injects them into the MIACA if they are missing and creates a whole new set of schemas and context files. Directionality is important: merging MIACME into MIACA will not produce the same result as merging MIACA into MIACME. (4) After the merge is complete, the developer can go back to step 2 and compare the new model with the old one to ensure quality control.
(*) The CEDAR and FlowRepository API require API keys.
| FAIR Dimension | Output characteristics | JSON ScheeLD function |
|---|---|---|
| Findable | • Schemas are identified by W3id identifiers; • Schemas are exportable via the CEDAR API*. | |
| Accessible | • Schemas are retrievable via https GET method; • Software is API ready. | |
| Interoperable | • Schemas are available as JSON with associated JSON context files; • Schemas and instances are validated; • Provides an example of XML to JSON-LD instance conversion using MiFlowCyt data*; • Supports multiple ontologies to describe the same resources. | |
| Reusable | • Schemas and softwares are available under licensing BSD-3; • Schemas support the declaration of data licences. • Schemas provenance information are available with PROV from CEDAR; • Schemas can be compared and merged. |
To know more, refer to the following documentations: https://metadatacenter.github.io/cedar-manual/advanced_topics/b2_cedars_api/ and https://flowrepository.org/images/pdf/FlowRepositoryAPI.pdf.