| Literature DB >> 27794040 |
Supratim Mukherjee1, Dimitri Stamatis1, Jon Bertsch1, Galina Ovchinnikova1, Olena Verezemska1, Michelle Isbandi1, Alex D Thomas1, Rida Ali1, Kaushal Sharma1, Nikos C Kyrpides2,3, T B K Reddy4.
Abstract
The Genomes Online Database (GOLD) (https://gold.jgi.doe.gov) is a manually curated data management system that catalogs sequencing projects with associated metadata from around the world. In the current version of GOLD (v.6), all projects are organized based on a four level classification system in the form of a Study, Organism (for isolates) or Biosample (for environmental samples), Sequencing Project and Analysis Project. Currently, GOLD provides information for 26 117 Studies, 239 100 Organisms, 15 887 Biosamples, 97 212 Sequencing Projects and 78 579 Analysis Projects. These are integrated with over 312 metadata fields from which 58 are controlled vocabularies with 2067 terms. The web interface facilitates submission of a diverse range of Sequencing Projects (such as isolate genome, single-cell genome, metagenome, metatranscriptome) and complex Analysis Projects (such as genome from metagenome, or combined assembly from multiple Sequencing Projects). GOLD provides a seamless interface with the Integrated Microbial Genomes (IMG) system and supports and promotes the Genomic Standards Consortium (GSC) Minimum Information standards. This paper describes the data updates and additional features added during the last two years. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.Entities:
Mesh:
Year: 2016 PMID: 27794040 PMCID: PMC5210664 DOI: 10.1093/nar/gkw992
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Four level classification system of the Genomes OnLine Database (GOLD) database. A Study lies at the helm of the project classification system in GOLD and is comprised of either Biosamples or Organisms, which in turn form their respective Sequencing Projects. The assembly and analysis of GOLD Sequencing Projects culminate into Analysis Projects, which are passed on to the Integrated Microbial Genomes (IMG) data management and analysis system.
Figure 2.Geographic Distribution of GOLD Biosamples and Organisms. Organism location of isolation is marked in pink while Biosample location of collection is denoted with blue dots.
Sequencing Project types in GOLD
| Sequencing Strategy | No. of SPs |
|---|---|
| Whole Genome Sequencing | 78 246 |
| Metagenome | 13 417 |
| Metatranscriptome | 2320 |
| Transcriptome | 1595 |
| Genome fragments | 1185 |
| Targeted Gene Survey | 198 |
| Methylation | 66 |
| Transposon Mutagenesis | 60 |
| Chloroplast | 52 |
| Others | 69 |
Figure 3.Sequencing projects across top sequencing centers. Comparison of the total number of GOLD Sequencing Projects and corresponding unique Organisms (in terms of genus and species names) per sequencing center. Color of the bars represent each sequencing center as shown in the legend. Unique Organisms are defined as unique species names.
Types of different Analysis Projects in GOLD
| Type of Analysis Project | AP count |
|---|---|
| Genome Analysis | 56 386 |
| Metagenome Analysis | 10 814 |
| Metatranscriptome mapping | 5827 |
| Genome from Metagenome | 1713 |
| Metatranscriptome Analysis | 1684 |
| Single Cell Analysis (screened) | 1185 |
| Single Cell Analysis (unscreened) | 840 |
| Combined Assembly | 109 |
| Transcriptome Analysis | 12 |
| Targeted Gene Survey | 9 |
Number of metadata and CV fields in GOLD
| GOLD Classification Level | No. of fields | No. of CV based fields |
|---|---|---|
| Study | 26 | 6 |
| Biosample | 83 | 11 |
| Organism | 124 | 31 |
| Sequencing Project | 44 | 8 |
| Analysis Project | 35 | 2 |
Figure 4.Advanced Search feature in GOLD. (A) Advanced Search launch page in GOLD with a brief explanation of how to conduct an advanced search. (B) Advanced Search results after applying six different search filters across three GOLD levels. (C) List of GOLD Analysis Projects obtained from the Advanced Search.
Figure 5.Description of a GOLD Metadata Package. Biosample populated using the Biogas/Reactor metadata package. All the different metadata categories that are unique to bioreactor samples are listed here.