| Literature DB >> 30782206 |
Ilias Lagkouvardos1, Till R Lesker2, Thomas C A Hitch3, Eric J C Gálvez2, Nathiana Smit2, Klaus Neuhaus1, Jun Wang4, John F Baines4,5, Birte Abt6,7, Bärbel Stecher8,7, Jörg Overmann6,7, Till Strowig9, Thomas Clavel10,11.
Abstract
BACKGROUND: Bacteria within family S24-7 (phylum Bacteroidetes) are dominant in the mouse gut microbiota and detected in the intestine of other animals. Because they had not been cultured until recently and the family classification is still ambiguous, interaction with their host was difficult to study and confusion still exists regarding sequence data annotation.Entities:
Keywords: Bacterial diversity; Bacteroidetes; Cultivation; Family S24-7; Homeothermaceae; Metagenomic species; Mouse gut microbiota; Muribaculaceae
Mesh:
Substances:
Year: 2019 PMID: 30782206 PMCID: PMC6381624 DOI: 10.1186/s40168-019-0637-2
Source DB: PubMed Journal: Microbiome ISSN: 2049-2618 Impact factor: 14.650
Fig. 1Diversity and ecology of family S24-7 species. In all figure panels, only samples with S24-7 matches ≥ 0.25% relative abundance were considered for plotting, a threshold of confidence below which the risk of including spurious OTUs in the analysis increases substantially. a Cladogram of the family’s diversity (based on a Neighbor-Joining tree) as supported by existing full-length 16S rRNA gene sequences (n = 7784 after quality checks). The colored ring (color code corresponds to panel b) indicates the origin of samples recorded in IMNGS-derived samples (www.imngs.org) with the highest relative abundance for the given molecular species. The outer black bars represent the values of these maximum relative abundances (see scale). Terminal tree branches colored in red indicate species with existing genomic information, corresponding to reconstructed metagenomic species, short-term isolates, or those able to be maintained in culture and taxonomically described in this study (yellow stars). b Summary of sample origins recorded in IMNGS with the highest relative abundance of sequences for the 685 molecular species shown in panel a. The category “unknown” corresponds to S24-7 species for which the full-length 16S rRNA gene sequence returned no match to any of the amplicon sequences in existing IMNGS samples at > 97% sequence similarity over > 90% sequence length. c Distribution of highest relative abundances across all molecular species. Only positive species were considered for plotting, i.e., species of unknown origin (no sequences detected in amplicon datasets) or occurring at < 0.25% relative abundance were ignored. d Prevalence of S24-7 spp. in samples of different origins. Each bar represents the percentage of samples in the given category that were positive for S24-7 species. The column labeled “#inDB” provides the total number of samples per category. e Number of co-occurring S24-7 species in the tested human and mouse gut samples and distribution of the cumulative relative abundance of the family in the mouse gut (positive samples only)
Fig. 2Functional features of family S24-7. a Quality plot of the genome sequences used for analysis. Assemblies generated in the present study (isolates and metagenomic species) were considered if the marker gene completeness minus contamination was ≥ 80% [45]. Other reconstructed genomes are from two studies previously published [44, 46]. b Non-supervised clustering based on glycoside hydrolases (GH) occurrences across all 153 genomes available (numbered in increments of 10 on the right-hand side) as performed previously [44]. Already published entries are labeled in blue (“Homeothermaceae,” n = 30) or violet (UBA, n = 37) letters. Those from the present study are in black (MGS, n = 59), gray (short-term isolates, n = 21), or gold (cultured strains; n = 5 novel and 1 type species previously published). For the isolates in gray, names in brackets indicate their facility/vendor of origin (HZI, Helmholtz Center for Infection Research, Braunschweig, Germany; NCI, National Cancer Institute, Maryland, USA; Harlan; Janvier). GH categories (labels on top) considered discriminative between the different functional guilds (a-glucan, host or plant glycans) are colored accordingly (green, orange, and brown, respectively). c Multidimensional plotting of family S24-7 members and those from neighboring families based on KEGG orthology (KO). d Family-specific functions. The plots depict the prevalence (%) of single KOs across the different genome categories (top labels with numbers in brackets). The twelve KOs in violet (top) are specific for the S24-7 family, the seven bluish KOs (bottom) for the members of other families (see panel c) within the order Bacteroidales. KO definition are as follows (from top to bottom): K01577, oxalyl-CoA decarboxylase [EC:4.1.1.8]; K07749, formyl-CoA transferase [EC:2.8.3.16]; K01821, 4-oxalocrotonate tautomerase [EC:5.3.2.6]; K07088, uncharacterized protein; K01058, phospholipase A1/A2 [EC:3.1.1.32 3.1.1.4]; K07054, uncharacterized protein; K11921, family transcriptional regulator, cyn operon transcriptional activator; K21993, formate transporter fdhC; K07126, uncharacterized protein; K06143, inner membrane protein creD; K00283, glycine dehydrogenase subunit 2 [EC:1.4.4.2]; K00282, glycine dehydrogenase subunit 1 [EC:1.4.4.2]; K03685, ribonuclease III [EC:3.1.26.3]; K03284, magnesium transporter; K03575, A/G-specific adenine glycosylase [EC:3.2.2.31]; K01938, formate-tetrahydrofolate ligase [EC:6.3.4.3]; K13993, HSP20 family protein; K00656, formate C-acetyltransferase [EC:2.3.1.54]
Fig. 3Occurrence, glycan degradation capacities, and specific functional features of selected S24-7 species. a UPGMA tree showing the phylogenetic position of the 34 species with both a 16S rRNA gene sequence (used to calculate the tree; see accession numbers) and a draft genome available. Yellow stars indicate cultured species. b Occurrence of the species selection as determined by large-scale amplicon analysis using IMNGS (www.imngs.org). Colored bars (gray, blue, violet) indicate the type of samples positive for the given species, the prevalence indicating the number of corresponding samples out of a total of 93,045, including 10,350 from the mouse gut. A sample was considered positive only if sequence similarity matches occurred ≥ 0.25% relative abundance, a threshold of confidence below which the risk of including spurious OTUs in the analysis increases substantially. Relative abundances shown as box plots (median with interquartile range) include data from the positive samples only and are color-coded according to glycan guilds (see panel c). c Binary presence (black)/absence (white) map of target pathways and single KOs with increased prevalence in S24-7 family members (Fig. 2d and Additional file 3: Table S2). KO and pathway designations are given on the top of the map; KO numbers at the bottom. Blue and red data bar on the right-hand side of the map indicate completeness and contamination values (%) for each of the genomes analyzed
Fig. 4Phylogeny of S24-7 species and novel isolates within the family. a Phylogenomic placement of the Muribaculaceae family within the phylum Bacteroidetes was conducted using PhyloPhlAn [57]. Representative genomes within Bacteroidetes were used to place the Muribaculaceae isolates and MGS genomes generated in the present study; Fibrobacter succinogenes was used as out-group. The tree is drawn to scale with branch lengths measuring the number of amino acid substitutions per site. Local support values were calculated using the Shimodaira-Hasegawa test with 1000 resamples; only those values < 100% are shown. For the sake of clarity, clades with a branch length to nodes < 2 have been collapsed with the size of the triangle being proportional to the number of genomes within the corresponding clade (see Additional file 4: Figure S2 for the original, not collapsed tree structure within Muribaculaceae). Clades are named based on the internal genomes taxonomy, with corresponding numbers of genomes indicated in brackets. b Tree based on 16S rRNA gene sequences showing the phylogenetic position of cultured species within Muribaculaceae compared with members of most closely related families. The evolutionary history was inferred using the Neighbor-Joining method [53]. The optimal tree with the sum of branch length 2.97236459 is shown. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (100 replicates) are shown next to the branches (values equal to 100% are not shown) [15]. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Maximum Composite Likelihood method [64] and are in the units of the number of base substitutions per site. The analysis involved 64 nucleotide sequences. All ambiguous positions were removed for each sequence pair. There were a total of 1518 positions in the final dataset. Evolutionary analyses were conducted in MEGA7 [28]