| Literature DB >> 35842686 |
Victor Lobanov1, Angélique Gobet2, Alyssa Joyce3.
Abstract
The rapid development of sequencing methods over the past decades has accelerated both the potential scope and depth of microbiota and microbiome studies. Recent developments in the field have been marked by an expansion away from purely categorical studies towards a greater investigation of community functionality. As in-depth genomic and environmental coverage is often distributed unequally across major taxa and ecosystems, it can be difficult to identify or substantiate relationships within microbial communities. Generic databases containing datasets from diverse ecosystems have opened a new era of data accessibility despite costs in terms of data quality and heterogeneity. This challenge is readily embodied in the integration of meta-omics data alongside habitat-specific standards which help contextualise datasets both in terms of sample processing and background within the ecosystem. A special case of large genomic repositories, ecosystem-specific databases (ES-DB's), have emerged to consolidate and better standardise sample processing and analysis protocols around individual ecosystems under study, allowing independent studies to produce comparable datasets. Here, we provide a comprehensive review of this emerging tool for microbial community analysis in relation to current trends in the field. We focus on the factors leading to the formation of ES-DB's, their comparison to traditional microbial databases, the potential for ES-DB integration with meta-omics platforms, as well as inherent limitations in the applicability of ES-DB's.Entities:
Keywords: Community ecology; Data curation; Database management; Ecosystem-specific database; Meta-omics; Microbiome; Microbiota
Year: 2022 PMID: 35842686 PMCID: PMC9287977 DOI: 10.1186/s40793-022-00433-1
Source DB: PubMed Journal: Environ Microbiome ISSN: 2524-6372
Fig. 1Interrelationships between multiple depths of biome characterisation, all which can be unified through microbial database collections
Examples of public databases for microbial community analysis. Prevalent microbial sequence databases are listed below with indications of their omics integration and functional assignment integration where applicable
| Database name | Data type | Meta-omics approach included | Target organisms | URL | References |
|---|---|---|---|---|---|
| China National GeneBank (CNGB) | rRNA subunits Genomes Transcriptomes Proteomes Environmental/ contextual data | Sanger sequencing Metabarcoding Metagenomics Metatranscriptomics Metaproteomics Environmental measurements | All microorganisms | [ | |
| ConsensusPathDB | rRNA subunits Genomes Transcriptomes Proteomes Environmental/ contextual data | Sanger sequencing Metabarcoding Metagenomics Metatranscriptomics Metaproteomics | Animal (human, mouse), fungi (yeast) | [ | |
| DNA DataBank of Japan (DDBJ) | rRNA subunits Genomes Transcriptomes Proteomes Environmental/ contextual data | Sanger sequencing Metabarcoding Metagenomics Metatranscriptomics Metaproteomics | All microorganisms | [ | |
| European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI) European Life-Science Infrastructure (ELIXIR) | rRNA subunits Genomes Transcriptomes Proteomes Metabolomes Environmental/ contextual data | Sanger sequencing Metabarcoding Metagenomics Metatranscriptomics Metaproteomics Metabolomics Environmental measurements | All microorganisms | [ | |
| EzBioCloud | rRNA subunits Genomes Environmental/ contextual data | Sanger sequencing Metabarcoding Metagenomics Environmental measurements | Bacteria and Archaea | [ | |
| International Nucleotide Sequence Database Collaboration (INSDC) | rRNA subunits Genomes | Sanger sequencing Metabarcoding Metagenomics | All microorganisms | [ | |
| Joint Genomic Institute Integrated Microbial Genomes (JGI- IMG) | rRNA subunits Genomes Transcriptomes Proteomes | Sanger sequencing Metabarcoding Metagenomics Metatranscriptomics Metaproteomics | All microorganisms | [ | |
| Metagenomic Rapid Annotations using Subsystems Technology (MG-RAST) | rRNA subunits Genomes Transcriptomes | Sanger sequencing Metabarcoding Metagenomics Metatranscriptomics | All microorganisms | [ | |
| National Center for Biotechnology Information collections (NCBI RefSeq, NCBI BLAST, NCBI Entrez, NCBI GenBank) | rRNA subunits Genomes Transcriptomes Proteomes Metabolomes Environmental/ contextual data | Sanger sequencing Metabarcoding Metagenomics Metatranscriptomics Metaproteomics Metabolomics Environmental measurements | All microorganisms | [ | |
| Protist ribosomal reference database (PR2) | rRNA subunits | Sanger sequencing Metabarcoding | All eukaryotes | [ | |
| SILVA | rRNA subunits | Sanger sequencing Metabarcoding | All microorganisms | [ | |
| University of California, Santa Cruz Genome Browser | rRNA subunits Genomes Transcriptomes | Sanger sequencing Metabarcoding Metagenomics | All microorganisms | [ | |
| Ribosomal RNA operon copy number database (rrnDB) | rRNA subunits | Sanger sequencing Metabarcoding | Bacteria and Archaea | [ | |
| The Microbe Directory (TMD) | rRNA subunits Genomes Environmental/ contextual data | Sanger sequencing Metabarcoding Metagenomics Environmental measurements | Microbial prokaryotes and eukaryotes | [ | |
| Vienna Metabolomics Center (VIME) | rRNA subunits Genomes Transcriptomes Proteomes Metabolomes | Sanger sequencing Metabarcoding Metatranscriptomics Metaproteomics Metabolomics | All microorganisms | [ |
A selection of published ecosystem-specific databases
| Ecosystem-specific database | Target ecosystem(s) | Target organisms | Meta-omics approach used | References |
|---|---|---|---|---|
| Biomes of Australian Soil Environments (BASE) | Australian subcontinent, terrestrial systems | Prokaryotes and fungal-specific eukaryotes | Sanger sequencing Metabarcoding Metagenomics Environmental measurements | [ |
| Dictyopteran gut microbiota reference Database (DictDb) | Dictyopteran gut microbiota | All microorganisms | Sanger sequencing Metabarcoding Metagenomics | [ |
| Earth Microbiome Project (EMP) | EMP Ontology (EMPO) ecosystems | All microorganisms | Sanger sequencing Metabarcoding Metagenomics Metatranscriptomics Metaproteomics Metabolomics Environmental measurements | [ |
| Genome Repository of Oiled Systems (GROS) | Crude oil contaminated environments | All microorganisms | Sanger sequencing Metabarcoding Metagenomics Metatranscriptomics Environmental measurements | [ |
| Global Ocean Sampling (GOS) | Open ocean ecosystems | All microorganisms | Sanger sequencing Metabarcoding Metagenomics Metatranscriptomics Metaproteomics Metabolomics Environmental measurements | [ |
| Human Food Project | Human gastrointestinal tract | All prokaryotes | Sanger sequencing Metabarcoding Metagenomics Metatranscriptomics Metaproteomics Metabolomics Environmental measurements | [ |
| Integrative Human Microbiome Project (HMP) | Human body microbiome environments | All microorganisms | Sanger sequencing Metabarcoding Metagenomics Metatranscriptomics Metaproteomics Metabolomics Environmental measurements | [ |
| Human Oral Microbiome Database (HOMD) | Human oral environment | All microorganisms | Sanger sequencing Metabarcoding Metagenomics Metatranscriptomics Metaproteomics Metabolomics | [ |
| Maarja Öpik arbuscular mycorrhiza database (MaarjAM) | Arbuscular mycorrhizal fungi associated environments | All microorganisms | Sanger sequencing Metabarcoding Metagenomics Environmental measurements | [ |
| Marine databases; MarRef, MarDB, MarCat | Open ocean ecosystems | All microorganisms | Sanger sequencing Metabarcoding Metagenomics Metatranscriptomics Metaproteomics Metabolomics Environmental measurements | [ |
| METAgenomics of the Human Intestinal Tract (MetaHIT) | Human gastrointestinal tract | All microorganisms | Metagenomics Metatranscriptomics Metaproteomics Metabolomics | [ |
| Microbial Database for Activated Sludge (MiDAS) | Activated sludge | All microorganisms | Sanger sequencing Metabarcoding Metagenomics Metabolomics Environmental measurements | [ |
| Rumen and Intestinal Methanogen- DB (RIM-DB) | Ruminant gastrointestinal tract | All microorganisms | Sanger sequencing Metabarcoding Metagenomics | [ |
| Tara Oceans project | Open ocean ecosystems | All microorganisms | Sanger sequencing Metabarcoding Metagenomics Metatranscriptomics Metaproteomics Metabolomics Environmental measurements | [ |
| Unified Human Gastrointestinal Genome (UHGG) collection | Human gut | All microorganisms | Sanger sequencing Metabarcoding Metagenomics Metaproteomics | [ |
A non-exhaustive list of organisational databases pooling data from other sources as an analytical tool
| Functional database | Purpose | Description | References |
|---|---|---|---|
| Functional Ontology Assignments for Metagenomes (FOAM) | Functional analysis | Groups environmental metagenomic sequences based on gene functionality instead of taxonomy | [ |
| EXPath | Functional analysis | Groups microarray expression profiles used to infer metabolic pathways for six model plants | [ |
| Ecopath with Ecosim (EWE) (now grouped under EcoBase) | Functional analysis | Information repository of EwE models (modeling software for ecological phenomena) | [ |
| Gulf of Mexico Ecosystem Services Valuation Database (GecoServ) (now called BlueValue) | Ecosystem service evaluation | Worldwide depository of ecosystem valuation data | [ |
| Open access database on climate change effects on littoral and oceanic ecosystems (OCLE) | Ecosystem service evaluation | Ecological-driven database of present and future hazards for European marine life | [ |
| Biofuel Ecophysiological Traits and Yields Database (BETYdb) | Functional analysis | Open-access repository to facilitate the organisation, discovery, and exchange of information about plant traits, crop yields, and ecosystem functions | [ |
| jae-f-database | Functional analysis | Global database and ‘state of the field’ review of research into ecosystem engineering by land animals | [ |
| Genomes OnLine Database (GOLD) | Metadatabase | Collection of genome projects and associated metadata | [ |
| Omics Discovery Index (OmicsDI) | Metadatabase | Groups datasets across multiple public meta-omics data resources | [ |
| Omics database generator (ODG) | Metadatabase | Groups genomics data, integrates with experimental data to create a comparative, multi-dimensional graphical database | [ |