| Literature DB >> 32815545 |
Janno Harjes1, Anton Link2, Tanja Weibulat2,3, Dagmar Triebel2,3, Gerhard Rambold1.
Abstract
Repeatability of study setups and reproducibility of research results by underlying data are major requirements in science. Until now, abstract models for describing the structural logic of studies in environmental sciences are lacking and tools for data management are insufficient. Mandatory for repeatability and reproducibility is the use of sophisticated data management solutions going beyond data file sharing. Particularly, it implies maintenance of coherent data along workflows. Design data concern elements from elementary domains of operations being transformation, measurement and transaction. Operation design elements and method information are specified for each consecutive workflow segment from field to laboratory campaigns. The strict linkage of operation design element values, operation values and objects is essential. For enabling coherence of corresponding objects along consecutive workflow segments, the assignment of unique identifiers and the specification of their relations are mandatory. The abstract model presented here addresses these aspects, and the software DiversityDescriptions (DWB-DD) facilitates the management of thusly connected digital data objects and structures. DWB-DD allows for an individual specification of operation design elements and their linking to objects. Two workflow design use cases, one for DNA barcoding and another for cultivation of fungal isolates, are given. To publish those structured data, standard schema mapping and XML-provision of digital objects are essential. Schemas useful for this mapping include the Ecological Markup Language, the Schema for Meta-omics Data of Collection Objects and the Standard for Structured Descriptive Data. Data pipelines with DWB-DD include the mapping and conversion between schemas and functions for data publishing and archiving according to the Open Archival Information System standard. The setting allows for repeatability of study setups, reproducibility of study results and for supporting work groups to structure and maintain their data from the beginning of a study. The theory of 'FAIR++' digital objects is introduced.Entities:
Mesh:
Year: 2020 PMID: 32815545 PMCID: PMC7439577 DOI: 10.1093/database/baaa059
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1Segment of a (multi-segment) workflow or a (multi-segment) campaign with object identity (ID), operation design elements and method information, as well as measurement values as assigned to physical (and digital) objects. Consecutive segments (indicated by the arrow at the lower right) are linked via parent identifier relations of the preceding physical objects or their digital representatives (parent identity relation). Operations are grouped according to the domains transformation, measurement and transaction (TF: transformation design, referring to domain 1, ME: method design, referring to domain 2, TA: transaction design, referring to domain 3), being assigned to (physical and digital) objects by declaration or selection of descriptor states (categorical, various). Measurement (or observation) values are primarily generated from physical objects (and secondarily from digital objects).
Figure 2Digital object including information on physical object identity as well as operation design (OD) and method information (MI) structures and values of three elementary operation domains.
Definitions of core terms
|
|
|
|
|
|
|
|
|
|
Figure 3Exemplary label with operation design code (ODC) and identifier (UUID) combined and both represented additionally as QR code. For facilitating the handling during a workflow segment, creation of labels attached on physical object containers (boxes for environmental samples, tubes for laboratory intermediate objects and for storage) needs be achieved beforehand.
Use case 1 for workflow segments (campaigns). Environmental microbial community barcoding. Domains 1–3: transformation (TF), measurement (ME) and transaction design (TA) elements (E) and method information (MI) elements (E)
|
|
|
|
|
|---|---|---|---|
|
| |||
| E 1: site number/ID | E 1: GPS data | E 1: box ID | E 1 (transformation): sampling protocol |
| E 2: borehole number/ID | E 2: container ID | E 2 (measurement): GPS (time, space) of borehole at site protocol | |
| E 3: soil horizon type/depth definition | E 3 (transaction): sample into container transfer protocol | ||
| E 4: replicate/aliquot number/ID | |||
|
| |||
| E 1: DNA extraction | E 1: DNA extract concentration and purity | E 1: storage box ID | E 1: nucleic acids extraction protocol |
| E 2: storage rack ID | E 2: nucleic acids extract quality and quantity determination protocol | ||
| E 3: microplate ID | E 3: intermediate object transfer into container protocol | ||
| E 4: microplate-internal position coordinate | |||
|
| |||
| E 1: DNA amplification | E 1: PCR product concentration and product size | E 1: storage box ID | E 1: intermediate object amplification (PCR 1) protocol |
| E 2: storage rack ID | E 2: amplicon quality and quantity determination protocol | ||
| E 3: microplate ID | E 3: intermediate object transfer into container protocol | ||
| E 4: microplate-internal position coordinate | |||
|
| |||
| E 1: PCR amplicon purification | E 1: PCR product concentration and product size | E 1: storage box ID | E 1: amplicon purification (ExoSap digestion) protocol |
| E 2: storage rack ID | E 2: purified amplicon quality and quantity determination protocol | ||
| E 3: microplate ID | E 3: intermediate object transfer into container protocol | ||
| E 4: microplate-internal position coordinate | |||
|
| |||
| E 1: DNA amplification | E 1: PCR product concentration and product size | E 1: storage box ID | E 1: intermediate object amplification (PCR 2) protocol |
| E 2: storage rack ID | E 2: amplicon quality and quantity determination protocol | ||
| E 3: microplate ID | E 3: intermediate object transfer into container protocol | ||
| E 4: microplate-internal position coordinate | |||
|
| |||
| E 1: PCR amplicon purification | E 1: PCR product concentration and product size | E 1: storage box ID | E 1: amplicon purification (magnetic beats) protocol |
| E 2: storage rack ID | E 2: purified amplicon quality and quantity determination protocol | ||
| E 3: microplate ID | E 3: intermediate object transfer into container protocol | ||
| E 4: microplate-internal position coordinate | |||
|
| |||
| E 1: PCR amplicon pooling | E 1: PCR product pool concentration | E 1: storage box ID | E 1: amplicon pooling protocol |
| E 2: storage rack ID | E 2: library quality and quantity determination protocol | ||
| E 3: microplate ID | E 3: intermediate object transfer into container protocol | ||
| E 4: microplate-internal position coordinate | |||
|
| |||
| E 1: DNA library storing | E 1: library storage temperature | E 1: storage box ID | E 1: DNA library storage protocol |
| E 2: library storage humidity | E 2: storage rack ID | E 2: storage parameters control protocol | |
| E 3: library storage light | E 3: microplate ID | E 3: product transfer into container protocol | |
| E 4: microplate-internal position coordinate |
Use case 2 for workflow segments (campaigns): Fungal isolates, barcoding and phenotypic trait description: Domains 1–3: transformation (TF), measurement (ME) and transaction design (TA) elements (E 1–n) and method information (MI) elements (E 1–3)
|
|
|
|
|
|---|---|---|---|
|
| |||
| E 1: site number/ID | E 1: GPS data | E 1: box ID | E 1 (transformation): sampling protocol |
| E 2: borehole number/ID | E 2: container ID | E 2 (measurement): GPS (time, space) of borehole at site protocol | |
| E 3: soil horizon type/depth definition | E 3 (transaction): sample into container transfer protocol | ||
| E 4: replicate/aliquot number/ID | |||
|
| |||
| E 1: (sub-)culture generation number | E 1: fungal colony growth rate | E 1: microplate storage rack ID | E 1: fungal strain isolation and cultivation protocol |
| E 2: culture medium type | E 2: microplate ID | E 2: fungal culture growth measurement protocol | |
| E 3: culture medium type variation | E 3: microplate-internal position coordinate | E 3: 1: fungal culture translocation-inoculation measurement protocol | |
| E 4: culture replicate number | E 4: aliquot number/ID | ||
|
| |||
| E 1: DNA extraction | E 1: DNA extract concentration and purity | E 1: microplate storage rack ID | E 1: nucleic acid extraction protocol |
| E 2: microplate ID | E 2: nucleic acid quantity/quality measurement protocol | ||
| E 3: microplate-internal position coordinate | E 3: intermediate object transfer into container | ||
| E 4: aliquot number/ID | |||
|
| |||
| E 1: DNA amplification | E 1: PCR product concentration and purity | E 1: microplate storage rack ID | E 1: DNA amplification protocol |
| E 2: microplate ID | E 2: DNA amplificate quantity/quality measurement protocol | ||
| E 3: microplate-internal position coordinate | E 3: intermediate object transfer into container protocol | ||
| E 4: aliquot number/ID | |||
|
| |||
| E 1: DNA isolate storage | E 1: PCR product concentration | E 1: room number/ID | E 1: DNA isolates storage protocol |
| E 2: freezing device ID | E 2: storage parameters control protocol | ||
| E 3: object/product container ID | E 3: product transfer into container protocol | ||
| E 4: aliquot number/ID | |||
|
| |||
| E 1: DNA amplicon storage | E 1: PCR product storage temperature | E 1: room number/ID | E 1: DNA amplicon storage protocol |
| E 2: freezing device ID | E 2: storage parameters control protocol | ||
| E 3: object/product container ID | E 3: product transfer into container protocol | ||
| E 4: aliquot number/ID | |||
|
| |||
| E 1: fungal culture staining | E 1: fungal trait 1 | E 1: culture storage room number | E 1: culture preparation (staining) for light microscopy protocol |
| E 2: fungal trait 2 | E 2: culture storage rack ID | E 2: culture measurement protocol with checklist of morphological traits (to be recorded in measurement values database) | |
| E 3: fungal trait 3 | E 3: culture storage shelf number | E 3: product transfer onto slide for LM protocol | |
| E 4: fungal trait 3 + n | E 4: storage box ID | ||
|
| |||
| E 1: fungal culture storing | E 1: culture storage temperature | E 1: culture storage room number | E 1: culture storage protocol |
| E 2: culture storage humidity | E 2: culture storage rack ID | E 2: storage parameters control protocol | |
| E 3: culture storage light | E 3: culture storage shelf number | E 3: culture transfer into container protocol | |
| E 4: storage box ID |
Figure 4DiversityDescriptions enabling free definition of descriptors and descriptor states for the representation of descriptive data of a study item based on various basic data types and enrichment via ‘description scopes’ by linking Diversity Workbench modules and external web resources.
Figure 5Research data export from DiversityDescriptions provided as XML files with content data with study design-specific vocabulary provided in various formats: (a) as XML (with local XSD), (b) as XML-EML (and core elements mapped to any further domain standard schema) and (c) XML-SDD (and core elements mapped to any further domain standard schema).
Figure 6FAIR++: Reusability of physical and digital objects in a research setup is preconditional for the repeatability of operation designs on the respective object type. Analysis of digital objects (according to a certain design), entails the reproducibility of the operation results. Reproducibility of operations on physical objects, however, depends on whether or not environmental parameters are fully controlled. If not, a study setup can be repeated, but results may turn out to be different from the initial ones.