| Literature DB >> 33983435 |
Mariya Dimitrova1,2, Raïssa Meyer3, Pier Luigi Buttigieg3, Teodor Georgiev1, Georgi Zhelezov1, Seyhan Demirov, Vincent Smith4, Lyubomir Penev1,5.
Abstract
BACKGROUND: Data papers have emerged as a powerful instrument for open data publishing, obtaining credit, and establishing priority for datasets generated in scientific experiments. Academic publishing improves data and metadata quality through peer review and increases the impact of datasets by enhancing their visibility, accessibility, and reusability.Entities:
Keywords: FAIR principles; MINSEQE; MIxS; data; data paper; genomics; metadata; omics; standards; workflow
Year: 2021 PMID: 33983435 PMCID: PMC8117446 DOI: 10.1093/gigascience/giab034
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:The different layers of FAIRness of data and metadata. Describing data and metadata in a data paper publication helps to enhance their FAIRness through provision of better visibility and accessibility.
OMICS data paper sections, their purpose, and ENA metadata fields from which they are populated, if such fields exist
| Article section | Purpose | ENA metadata source field |
|---|---|---|
|
| Summary of the value of the study, the experimental design and the dataset itself | Study/Project XML://abstract |
| Introduction–Value of the dataset–Scientific value–Societal value | Outline of the reason for the study. Authors should put into perspective its value for the scientific and broader communities. Often sequencing studies are part of large-scale genome sequencing projects and this article section allows authors to explain their role in them | Written by the authors |
| Methods | This section is split into 3 major parts to describe how the physical material was collected, processed, and transformed into a dataset.The “Sampling" section allows authors to outline the environmental and geographic characteristics of the locations where their material was collected. Sampling metadata imported from ENA fills in the “Sampling" section but the “Environmental profile" and “Geographic range" subsections remain to be filled in by the author manually. Authors are encouraged to share as much detail as they can (e.g., geographic coordinates, habitats, seasonal information). The sampling methods should be described in the “Technologies used" subsection. “Sample processing" should explain the laboratory procedures involved in the transition of the physical sample into its digital footprint. Finally, the “Data Processing" subsection should mention the steps taken to transform the raw dataset into the one that was published (e.g., normalization steps). None of the subsections are compulsory and the authors can write the Methods in a form outside these topics, but our template provides a detailed best-practices structure to follow |
|
| Biodiversity profile–Target–Taxonomic range–Functional range–Traits | This section describes the experimental design of the study. The target refers to the molecular target being studied (i.e., DNA, RNA, protein). The taxonomic range refers to the taxonomy of the studied organism(s) or the taxonomic composition of a metagenomic sample. The authors are encouraged to use a common taxonomy, but they can also provide their own during the authoring process in AWT. Authors can specify a particular range of biological functions that was the subject of their study (e.g., metabolic functions), as well as specific traits (e.g., pathogenicity) if relevant to the study | Written by the authors |
|
| This is the section that contains a link to the dataset(s) (preferably to its permanent resolvable identifier, such as a DOI), as well as any accession numbers and data formats |
|
| Data statistics | Quantitative and qualitative description of the dataset (e.g., read depth, coverage, base ratios). This section helps readers to quickly evaluate the dataset by gauging some of its characteristics without having analysed the dataset themselves. Some of the data statistics can be represented as charts and/or short tables | Written by the authors |
| Caveats and limitations | A section to discuss what could be improved in the experiment, what future steps could be taken, and what to consider when reusing the published data | Written by the authors |
| Usage rights | Rights and licenses to use the data. The data paper is open access by default. Authors can read more about Pensoft's recommended data publishing licenses in [ | Written by the authors |
|
| Contains imported MIxS checklists for the imported BioSamples. The checklists are in long format. The table can be downloaded as a separate comma-separated value (CSV) file after publication |
|
The names of manuscript sections that could be automatically populated by the workflow are boldfaced in the first column. Values in the third column refer to the fields in ENA's XML files that contain the information used to automatically fill in the relevant section of the template. We have pointed to the type of XML (boldface) as well as the Xpath used to extract the information.
Figure 2:Metadata extraction workflow from ENA, ArrayExpress, and BioSamples
Figure 3:Automatic metadata import from ENA, ArrayExpress, and BioSamples facilitates the creation of omics data paper manuscripts inside the ARPHA Writing Tool.