| Literature DB >> 24595056 |
Ramona L Walls1, John Deck2, Robert Guralnick3, Steve Baskauf4, Reed Beaman5, Stanley Blum6, Shawn Bowers7, Pier Luigi Buttigieg8, Neil Davies9, Dag Endresen10, Maria Alejandra Gandolfo11, Robert Hanner12, Alyssa Janning13, Leonard Krishtalka14, Andréa Matsunaga15, Peter Midford16, Norman Morrison17, Éamonn Ó Tuama18, Mark Schildhauer19, Barry Smith20, Brian J Stucky21, Andrea Thomer22, John Wieczorek23, Jamie Whitacre24, John Wooley25.
Abstract
The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.Entities:
Mesh:
Year: 2014 PMID: 24595056 PMCID: PMC3940615 DOI: 10.1371/journal.pone.0089606
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Structured sampling schemes.
(A) Biological sampling can be structured in both space and time. Environmental sampling of ocean water often includes sampling along a transect, with samples collected at multiple depths at each location. Additionally, each sample of water collected may be subsampled for metagenomic analysis or measuring chemical content. (B) Sampling schemes in ecological studies are often nested and may include plot; subplot or transect within plot; individual within plot, subplot, or transect; organ (e.g., leaf) within individual; tissue within organ; and DNA or mineral (e.g., C or N) within tissue. DNA extracted from a leaf of a tree that is present in a sub-plot may therefore be characterized by environmental features of the plot.
Metrics on current versions of the BCO, ENVO, and PCO.
| Ontology | # of terms: total/in namespace/imported | # of relations: total/subclassOf | # of deprecated terms |
| Biological Collections Ontology (BCO) | 102/42/60 | 39/24 | 15 |
| Environment Ontology (ENVO) | 1556/1335/221 | 2077/1868 | 19 |
| Population and Community Ontology (PCO) | 1345/24/1321 | 20/18 | 0 |
. For BCO and PCO, the number of relations includes only relations that point to a BCO or PCO term, to adjust for the large proportion of imported terms.
. 39 imported from Basic Formal Ontology, 13 imported from Information Artifact Ontology, 10 imported from Ontology for Biomedical Investigations, 1 imported from Common Anatomy Reference Ontology.
. 172 imported from Chemical Entities of Biological Interest, 49 from Phenotypic Quality Ontology.
. 39 imported from Basic Formal Ontology, 1269 imported from Gene Ontology, 11 imported from Information Artifact Ontology, 2 imported from Common Anatomy Reference Ontology.
Figure 2Core terms of the Biological Collections Ontology (BCO) and their relations to upper ontologies.
Core BCO terms (in orange) are subclasses of terms from the Basic Formal Ontology (BFO – in yellow) or the Ontology for Biomedical Investigations (OBI – in blue). For example, BCO:material sample is a subclass of BFO:material entity and has role BFO:material sample role (which is a BFO:role), while BFO:material sampling process is a subclass of OBI:planned process, and has as specified output BCO:material sample.
Figure 3Linking samples and derivatives from the Moorea Biocode project.
(A) Biodiversity data from the Moorea Biocode project were collected at many different levels that are connected to one another in biologically meaningful ways, such as an Essig Museum specimen collected as part of a Biocode bioinventory event, a tissue sample submitted to the Smithsonian Institution, a metagenomic gut sample collected from the specimen and registered with the CAMERA portal, or DNA extracted from either the tissue or metagenomic sample. (B) A graphical representation of how part of the workflow shown in A (from field collection to tissue sampling to DNA extraction) can be annotated with terms from multiple, coordinated ontologies and queried via an ontology-based data store. Ontology classes are shown as ovals and instances are shown as rectangles, with instances color-coded to match their parent classes. This figure shows how, for example, TaxonID B resulting from the BLAST identification process on Genbank sequence B can be linked back to the original Moorea Biocode sampling process, or how a chain of inputs and outputs can be used to infer that an instance of DNA molecules is derived from an instance of an insect specimen.
Figure 4Linking data across sites in the Genomic Observatories network's Ocean Sampling Day.
(A) Ocean Sampling Day involves the simultaneous sampling of the world's oceans on a single day, as represented by the red stars on the map of the earth. Multiple ocean water sampling processes take place at each location. Those water samples are filtered to produce samples of organismal communities that are submitted to the bioarchive at the Smithsonian Institution. A subsample of the filtered material is analyzed to produce a metagenomic sequence, which may be stored in the Genomes Online Database (GOLD). To be useful in comparative studies, data from each process at each location must be accessible and interpretable. (B) A graphical representation of how part of the workflow shown in A (from ocean water sampling to filtering to metagenomic sequencing) can be annotated with terms from multiple, coordinated ontologies and queried via an ontology-based data store. Ontology classes are shown as ovals and instances are shown as rectangles, with instances color-coded to match their parent classes. This figure shows how a metagenomic sequence and the taxa associated with it can be linked back to the original Ocean Sampling Day collecting event through a chain of inputs and outputs.