| Literature DB >> 35035791 |
Samuel Piquer-Esteban1,2, Susana Ruiz-Ruiz2,3, Vicente Arnau1, Wladimiro Diaz1, Andrés Moya1,2,3.
Abstract
The human gut holds a special place in the study of different microbial environments due to growing evidence that the gut microbiota is related to host health. However, despite extensive research, there is still a lack of knowledge about the core taxa forming the gut microbiota and, moreover, available information is biased towards western microbiomes in both genome databases and most core taxa studies. To tackle these limitations, we tested a database enrichment strategy and analyzed public datasets of whole-genome shotgun data, generated from 545 fecal samples, comprising three gradients of westernization. The NT database was selected as a baseline of biological diversity, subsequently being combined with various studies of interest related to the human microbiota. This enrichment strategy made it possible to improve classification capacity, compared to the original unenriched database, regarding the various lifestyles and populations studied. The effects of incomplete-taxonomy metagenome-assembled genomes on genome database enrichment were also examined, revealing that, while they are helpful, they should be used with caution depending on the taxonomic level of interest. Moreover, in terms of high prevalence, the core analysis revealed a conserved set of bacterial taxa in the healthy human gut microbiota worldwide, despite apparent lifestyle differences. Such taxa show a set of traits, metabolic roles, and ancestral status, making them suitable candidates for a hypothetical phylogenetic core of mutualistic microorganisms co-evolving with the human species.Entities:
Keywords: Core microbiota; Enrichment strategies; Genome databases; Human gut microbiota; Metagenomics; NCBI, National Center for Biotechnology Information; OTU, Operational Taxonomic Unit; Western bias
Year: 2021 PMID: 35035791 PMCID: PMC8749183 DOI: 10.1016/j.csbj.2021.12.035
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Kraken 2 results showing a comparison between the enriched and NT original databases. The performance of the databases was examined in terms of classification capacity for all samples as a whole at different taxonomic levels (A) and separating them by lifestyles and countries at genus (B) and species (C) level. In the box plots, the black line within the box marks the median and the red triangle the mean, outliers are presented as red dots. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 2Universal core genera description. (A) Intersections between cores of interest. (B) Taxonomic relationships between universal core genera. The number of core genera assigned to a particular level is indicated inside the square brackets. (C) Prevalence-Abundance Heatmap. Average relative abundances sort taxa and their NCBI’s taxID is indicated in parentheses.
Fig. 3Taxonomic relationships between different intersect core genera in a prevalence gradient. Additional intersect cores were computed defining soft and medium prevalence genera cores, which were compared to the corresponding universal genera core, working in a prevalence range between 0.5 and 1.
Fig. 4Abundance clustering analysis for universal core taxa. Average relative abundances for the different group combinations at genus (A) and species (B) level. Groups and taxa were clustered using the k-means algorithm. NCBI’s taxIDs are indicated in parentheses.
Fig. 5Patterns analysis for universal core taxa. Z-scored average relative abundances for the different groups at genus (A) and species (B) level. Groups and taxa were clustered using the k-means algorithm. NCBI’s taxIDs are indicated in parentheses.
Fig. 6Principal component analysis (PCA). Results are shown by lifestyle (A) and country of origin (B). 95% confidence intervals are represented by ellipses. (C) Top 20 genera scores for the two main components. NCBI’s taxIDs are indicated in parentheses.