| Literature DB >> 34890448 |
Nathan C Sheffield1,2,3,4, Michał Stolarczyk1, Vincent P Reuter1,5, André F Rendeiro6,7.
Abstract
BACKGROUND: Organizing and annotating biological sample data is critical in data-intensive bioinformatics. Unfortunately, metadata formats from a data provider are often incompatible with requirements of a processing tool. There is no broadly accepted standard to organize metadata across biological projects and bioinformatics tools, restricting the portability and reusability of both annotated datasets and analysis software.Entities:
Keywords: interoperability; metadata validation schema; sample annotation table; sample metadata standard
Mesh:
Year: 2021 PMID: 34890448 PMCID: PMC8673555 DOI: 10.1093/gigascience/giab077
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:A data interface links data to analysis. (A) Schematic of a data interface. (B) Each analysis typically describes its own unique data interface. (C) The “one lab, one dataset, one analysis” mode of research tightly couples datasets and analysis. (D) With individual data interfaces, running a dataset through multiple analyses requires reshaping the data for every pairwise connection of data and analysis. (E) The PEP specification provides a standardized interface that reduces reshaping. (F) Using PEP, no reshaping is required to run a dataset through a different analytical tool. (G) A PEP may be used in different contexts and by a variety of tools and programming languages.
Figure 2:The PEP specification. (A) A PEP consists of a YAML configuration file, a sample table, and a subsample table. (B) The YAML file describes project-level attributes. (C) The sample table (and subsample table) describe sample-level attributes. (D) Project modifiers allow the PEP to import values from other PEPs or to embed multiple variations within a single PEP. (E) Sample modifiers can change sample attributes by using the project config YAML file, without actually changing the CSV file.
Figure 3:PEPs can be validated against generic or specific schemas. (A) Validation uses 2 steps so samples are validated after PEP modification. (B) A generic schema ensures compliance with the PEP specification, while specialized schemas describe requirements for a particular analysis. (C) PEP schemas can import other schemas.