| Literature DB >> 34971560 |
Benjamin S Beresford-Jones1, Samuel C Forster2, Mark D Stares2, George Notley2, Elisa Viciani2, Hilary P Browne2, Daniel J Boehmler3, Amelia T Soderholm1, Nitin Kumar2, Kevin Vervier2, Justin R Cross3, Alexandre Almeida4, Trevor D Lawley5, Virginia A Pedicord6.
Abstract
Human health and disease have increasingly been shown to be impacted by the gut microbiota, and mouse models are essential for investigating these effects. However, the compositions of human and mouse gut microbiotas are distinct, limiting translation of microbiota research between these hosts. To address this, we constructed the Mouse Gastrointestinal Bacteria Catalogue (MGBC), a repository of 26,640 high-quality mouse microbiota-derived bacterial genomes. This catalog enables species-level analyses for mapping functions of interest and identifying functionally equivalent taxa between the microbiotas of humans and mice. We have complemented this with a publicly deposited collection of 223 bacterial isolates, including 62 previously uncultured species, to facilitate experimental investigation of individual commensal bacteria functions in vitro and in vivo. Together, these resources provide the ability to identify and test functionally equivalent members of the host-specific gut microbiotas of humans and mice and support the informed use of mouse models in human microbiota research.Entities:
Keywords: bacteria culture collection; butyrate; commensal bacteria; functionally equivalent species; gut microbiota; microbial drug metabolism; mouse gut metagenomes; mouse models; public database; translation between mouse and human
Mesh:
Substances:
Year: 2021 PMID: 34971560 PMCID: PMC8763404 DOI: 10.1016/j.chom.2021.12.003
Source DB: PubMed Journal: Cell Host Microbe ISSN: 1931-3128 Impact factor: 21.023
Figure 1Isolates of the mouse culture collection
(A) Maximum likelihood phylogenetic tree of the 276 bacterial isolate genomes of the MCC. Genome labels indicate genome taxon as assigned by GTDB-Tk; where a genome could not be assigned at species level, lowest taxonomic rank is indicated. Labels are colored by phylum, and the outer ring indicates genomes with no previously cultured representative. Tree distances were calculated from an alignment of 120 core genes using the BLOSUM45 amino acid similarity matrix.
(B) Abundance and prevalence profiles for the 62 previously uncultured species of the MCC based on 2,446 mouse gut metagenomes. Each datapoint represents the percentage of reads assigned to a species for a single sample. Prevalence is calculated as the percentage of samples with species abundance ≥0.01%. Colors represent phyla.
(C) Scatterplot of mean abundance against prevalence for all 132 species of the MCC. Color represents prior cultured status.
Figure 2Genomes of the Mouse Gastrointestinal Bacteria Catalogue
(A) Maximum likelihood tree of representative genomes for the 1,094 species of the MGBC. Color range indicates whether a species cluster is represented by MAGs only (light red), isolates only (light green), or both (light blue). For each species, the innermost color ring represents phylum, the second ring indicates species that could not be assigned at a species level by GTDB-Tk (dark blue), the third ring denotes cultured status of each species (blue), and the outer ring indicates the 62 species that have been uniquely cultured in the MCC (brown). The circumferential bar plot (green) illustrates the number of high-quality genomes representing each species in the MGBC. Tree distances were calculated from an alignment of 120 core genes using the BLOSUM45 amino acid similarity matrix.
(B) Phylum-level distribution of the 26,640 high-quality genomes of the MGBC (left) and percentage of species clusters not assigned to a species-level taxonomy by GTDB-Tk (right).
(C) Stacked bar plots comparing the phylum-level composition of the 276 MCC isolates (MCC isolates) and the 26,640 genomes of the MGBC with the average mouse microbiome (microbiome; n = 2,446). The distributions of each stacked bar were compared using a chi-square test for Independence. MCC, microbiome (p = 0.015, significantly different); MGBC, microbiome (p = 1, not significantly different).
Figure 3Genome quality evaluation and benchmarking of the MGBC
(A) Completeness and contamination of MAGs of the MGBC. Using a modified MIMAGs criteria, 26,640 MAGs were defined as high-quality (blue) (≥90% completeness, ≤5% contamination, metrics of genome fragmentation). Quality estimates were generated using CheckM.
(B) Phylum-level distribution of high-quality and medium-plus MAGs.
(C) Upset plot illustrating the intersections of species between the contributing isolate collections and MAGs of the MGBC (blue). The iMGMC has been included for comparison (gray).
(D) Comparison of representative genome quality for shared species between the MGBC and iMGMC. Genome quality score: QS = Completeness − 5 × Contamination. Color represents phylum.
(E) Read classification rates of 64 independent mouse gut metagenome samples using different custom Kraken2 databases. Box plot color indicates the origins of the genomes used to build each database. Only genomes meeting high-quality criteria were used to build databases, except where indicated (purple). miBC, n = 43; mGMB, n = 100; public (combination of all mouse gut-derived isolates from NCBI), n = 288; MCC, n = 276; MCC+public, n = 564; MGCv1, n = 239; iMGMC, n = 8,509; MGBC, n = 26,640; mq iMGMC, n = 18,306; mq MGBC, n = 65,907; NCBI (standard database), n = 97,603; human (representative genomes of the UHGG), n = 3,006. Significance was determined for selected comparisons using paired t tests, ∗∗∗∗p < 0.0001.
Figure 4Taxonomy-function relationships between species of the human and mouse microbiotas
For a Figure360 author presentation of this figure, see https://doi.org/10.1016/j.chom.2021.12.003.
(A) Principal coordinate analyses for functional (left) and taxonomic (right) relationships between all species of human and mouse gut microbiota. Each data point represents a single species cluster, and point color denotes phylum. Functional analyses use Jaccard distances between pangenomic functional profiles of each species. Taxonomic distances represent phylogenetic branch lengths between species calculated from alignment of 120 core genes. Distance matrices used for ordination were compared using the Mantel test (r = 0.7416, p = 0.001).
(B) Taxonomy-function relationships between human- and mouse-derived bacterial species, stratified by shared taxonomic level. Bars indicate distribution of shared taxonomic rank between closest taxonomically related species. Colored bars and bar statistics indicate number and percentage, respectively, of paired species at each rank where the closest functionally related species is the same taxon as the closest taxonomic relative.
(C) Scatterplot comparing taxonomic distance with functional distance for each human-derived species and the closest taxonomically related mouse-derived species. Color indicates the shared taxonomic rank between these species.
(D) Inverted maximum likelihood tree of the 4,100 species of the human and mouse gut microbiotas. External branches represent phylogenetic relationships between representative genome of each species. Internal connections illustrate closest functionally related species between hosts. Connections are only shown when the closest taxonomically and functionally related taxa differ. Clade color represents phylum of each species, and the inside color bar denotes the host. Color of internal connections indicates shared taxonomic rank of the closest functionally related species.
Figure 5Taxonomic locations of drug metabolism genes between host microbiotas
(A–C) Representative examples of taxonomic locations of drug metabolism genes between host microbiotas. Data illustrate the species-level contribution of genomes encoding the indicated drug metabolism gene (≥95% sequence identity). Genes and associated predicted functions are either (A) shared with a conserved taxonomic location, (B) shared with a different taxonomic location, or (C) not shared between hosts.
Figure 6Identification and validation of butyrate-producing species between hosts
(A and B) The most dominant butyrate-producing species of the human (A) and mouse (B) gut microbiotas, utilizing either the BCOAT (top) or PTB/BUK (bottom) pathways. Color indicates the lowest assigned taxonomic rank for each species by GTDB-Tk, either known species (light blue), novel species (dark blue), or novel genera (green).
(C) Maximum likelihood tree of the representative genomes for species of the Firmicutes_A phylum. Color range represents the order-level taxonomy, and the innermost color bar denotes the host organism. The outer color bars indicate predicted butyrate-producing species using the BCOAT pathway (purple) or the PTB-BUK pathway (orange). The top 5 most dominant butyrogenic pathway encoding species for each host are marked with a colored triangle (mouse) or star (human).
(D) Butyrate production by bacterial isolates in broth monoculture. Bar color indicates the encoded pathway for butyrate synthesis.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Bacterial isolates of the Mouse Culture Collection | This paper | |
| Faecal samples from mouse colonies | This paper | N/A |
| FastDNA SPIN Kit for Soil | MPBio | Cat#6560200 |
| MasterPure Complete DNA and RNA Purification Kit | Lucigen | Cat#MC85200 |
| Whole-genome sequencing data (Mouse Culture Collection) | This paper | SRA: PRJEB18589 |
| Metagenomic sequencing data (mouse faeces) | This paper | SRA: PRJEB44285 |
| Metagenomic sequencing data (mouse faeces) | This paper | SRA: PRJEB44286 |
| Genome assemblies (Mouse Culture Collection) | This paper | SRA: PRJEB45232 |
| Genome assemblies (representative MAGs) | This paper | SRA: PRJEB45234 |
| Data for all genomes | This paper | |
| Custom MGBC Kraken2/Bracken database | This paper | |
| Global mouse metagenome cohort data | This paper | |
| MGBC protein catalogues | This paper | |
| Unified Human Gastrointestinal Genome (UHGG) collection | ||
| Unified Human Gastrointestinal Protein (UHGP) catalogue | ||
| Genome Reference Consortium Mouse Build 39 (GRCm39) | NCBI-BioProject: PRJNA20689 | |
| Coliphage phi-X174 complete genome | NCBI-BioProject: PRJNA14015 | |
| Mouse Gut Gene Catalog (MGCv1) | ||
| Integrated Mouse Gut Metagenomic Catalog (iMGMC) | ||
| Mouse Intestinal Bacterial Collection (miBC) | SRA: PRJEB10572 | |
| Mouse Gut Microbial Biobank (mGMB) | SRA: PRJNA486904 | |
| Primer: Universal 16S rRNA Forward (7F): | N/A | |
| Primer: Universal 16S rRNA Reverse (1510R): | N/A | |
| R version 4.0.2 | ||
| mothur version 1.46.1 | ||
| NCBI BLAST | ||
| Velvet version 1.2 | ||
| VelvetOptimiser version 2.2.5 | N/A | |
| SSPACE version 2.1.1 | ||
| GapFiller | ||
| Prokka version 1.14.5 | ||
| MetaWRAP version 1.2.3 | ||
| KneadData version 0.7.3 | The Huttenhower Lab | |
| Bowtie2 version 2.3.5 | ||
| MetaSPAdes version 3.10.1 | ||
| MEGAHIT version 1.1.1-2-g02102e1 | ||
| MetaBAT2 version 2.9.1 | ||
| MaxBin 2.0 version 2.2.4 | ||
| CONCOCT version 0.4.0 | ||
| CheckM version 1.1.2 | ||
| dRep version 2.5.4 | ||
| GTDB-Tk version 1.3-r95 | ||
| Mash version 2.2.2 | ||
| FastANI version 1.3 | ||
| Panaroo version 1.2.4 | ||
| Kraken2 version 2.0.8 | ||
| Bracken version 2.5.2 | ||
| zCompositions R package version 1.3.4 | ||
| Vegan R package version 2.5-6 | ||
| MMseqs2 version 10-6d92c | ||
| InterProScan version 5.39-77.0 | ||
| Genome Properties version 2.0.1 | ( | |
| EggNOG-mapper version 2.0.1 | ||
| FastTree version 2.1.10 | ||
| IQ-TREE version 1.6.10 | ||
| Interactive Tree Of Life (iTOL) version 5.6.3 | ||
| Ape R package version 5.5 | ||
| MGBC-Toolkit | This paper | |
| BLAST+ version 2.7.1 | ||
| CMseq version | ||
| GUNC version 1.0.4 | ||
| European Nucleotide Archive (ENA) | ||
| FastPrep-24 Classic bead beating grinder and lysis system | MPBio | Cat#6004500 |
| RefSeq Release 205 | ||
| UniProt | The UniProt Consortium (Sao Paulo) | |
| Code for the MGBC | This paper | |
| Code for the MGBC-Toolkit | This paper | |