| Literature DB >> 27694206 |
G Droege1, K Barker2, O Seberg3, J Coddington2, E Benson4, W G Berendsohn5, B Bunk6, C Butler2, E M Cawsey7, J Deck8, M Döring9, P Flemons10, B Gemeinholzer11, A Güntsch5, T Hollowell2, P Kelbert5, I Kostadinov12, R Kottmann13, R T Lawlor14, C Lyal15, J Mackenzie-Dodds15, C Meyer2, D Mulcahy2, S Y Nussbeck16, É O'Tuama9, T Orrell2, G Petersen3, T Robertson9, C Söhngen6, J Whitacre2, J Wieczorek17, P Yilmaz13, H Zetzsche18, Y Zhang19, X Zhou19.
Abstract
Genomic samples of non-model organisms are becoming increasingly important in a broad range of studies from developmental biology, biodiversity analyses, to conservation. Genomic sample definition, description, quality, voucher information and metadata all need to be digitized and disseminated across scientific communities. This information needs to be concise and consistent in today's ever-increasing bioinformatic era, for complementary data aggregators to easily map databases to one another. In order to facilitate exchange of information on genomic samples and their derived data, the Global Genome Biodiversity Network (GGBN) Data Standard is intended to provide a platform based on a documented agreement to promote the efficient sharing and usage of genomic sample material and associated specimen information in a consistent way. The new data standard presented here build upon existing standards commonly used within the community extending them with the capability to exchange data on tissue, environmental and DNA sample as well as sequences. The GGBN Data Standard will reveal and democratize the hidden contents of biodiversity biobanks, for the convenience of everyone in the wider biobanking community. Technical tools exist for data providers to easily map their databases to the standard.Database URL: http://terms.tdwg.org/wiki/GGBN_Data_Standard.Entities:
Mesh:
Year: 2016 PMID: 27694206 PMCID: PMC5045859 DOI: 10.1093/database/baw125
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Explanation of specific terms used in this article
| Term | Explanation |
|---|---|
| Genomic sample | Any biological material preserved to keep its molecular properties (in general excluding human material). Examples include DNA, RNA, tissue and environmental sample (see ( |
| Genome quality | High-molecular weight DNA or RNA (see ( |
| Material sample | The physical result of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed. (see |
| Environmental sample | A material sample that (i) represents taxonomic biodiversity from across the tree of life (e.g. blood, gut), (ii) represents abiotic substrate or environment (e.g. soil, water, ice core) or (iii) an assemblage of both |
| Environmental DNA | The physical result of DNA extraction of an environmental sample containing DNA of multiple taxa. Often completely consumed during sequencing |
| Ancient environmental DNA | The physical result of DNA extraction of an environmental sample older than 100 years (e.g. teeth) containing DNA of multiple taxa. Usually completely processed during sequencing |
| Tissue sample | A material sample dedicated to a single taxon (e.g. leaf, muscle, leg), often chemically or physically treated to preserve biomolecules from degrading. May contain tissue/DNA of other taxa, e.g. endosymbionts, pathogens, destruents |
| Genomic DNA sample | The physical result of DNA extraction of a tissue sample containing DNA from a single taxon. Usually not completely consumed during sequencing and deposited in a biodiversity biobank |
| Ancient genomic DNA sample | The physical result of DNA extraction of a tissue sample older than 100 years (e.g. bones) containing DNA from a single taxon. Usually not completely consumed during sequencing and deposited in a biodiversity biobank |
Figure 1.Bridging the gaps. Schematic representation of (1) Low percentage of available sequence data in public repositories with proper information where the voucher and/or sample is deposited. (2) Existing tools and platforms for standardized management and access to biodiversity data. (3) Major gaps identified by GGBN and (4) what GGBN has developed to fill these gaps.
Vocabularies used within the GGBN Data Standard. basisOfRecord and materialSampleType serve as top level classification for each record
| Vocabulary | Description |
|---|---|
| basisOfRecord/RecordBasis | Darwin Core/ABCD term: The specific nature of the data record. Controlled vocabulary: PreservedSpecimen, FossilSpecimen, LivingSpecimen, HumanObservation, MachineObservation, MaterialSample |
| materialSampleType | Classification of kind of physical sample in addition to BasisOfRecord/RecordBasis and Preparation Type. Recommended vocabulary: tissue, culture strain, specimen, DNA, RNA, Protein, environmental sample |
| Material Sample | basic lab facts about a physical DNA or tissue sample; contains terms from MIxS and terms matching some of those in BRISQ Tier 1 |
| Loan | aspects of loan information on specimens, tissue or DNA samples |
| Permit | legal aspects of sample acquisition, loans and use |
| Preparation | aspects of specimen or tissue sample preparation or DNA extraction (handled as a preparation); contains terms from SPREC and terms matching some of those in BRISQ Tier 1 |
| Preservation | aspects of sample preservation in a physical collection; contains terms matching some of those in BRISQ Tier 1 |
| Amplification | aspects of amplification, sequencing and genetic accession numbers; contains terms from MIxS |
| DNA Cloning | aspects of DNA cloning and NGS libraries; contains terms from MIxS |
| Gel Image | gel image facts |
| Single Read | aspects of a single read, including chromatograms and primers |
Figure 2.Implementation of the GGBN Data Standard within the GGBN Data Portal. (1) Data are provided by our members by using the GGBN Data Standard with BioCASe or IPT. (2) Harvesting of records with B-HIT occur in compliance with mandatory and highly recommended terms defined by GGBN. (3) Scientific Names are checked against the GBIF checklist bank (http://www.gbif.org/dataset/search?type=CHECKLIST) and CITES (Convention on International Trade in Endangered Species of Wild Fauna and Flora, https://www.cites.org/). In addition to the (4) MySQL database of B-HIT a (5) SOLR instance (http://lucene.apache.org/solr/) is used to speed up queries. Finally, (6) the full record is displayed in the portal with all GGBN Data Standard terms provided by the respective repository as well as associated voucher specimen information, multimedia data, and related information from (7) external sources such as GBIF and INSDC.