| Literature DB >> 32513234 |
Nicolai Karcher1, Edoardo Pasolli2, Francesco Asnicar1, Kun D Huang1,3, Adrian Tett1, Serena Manara1, Federica Armanini1, Debbie Bain4, Sylvia H Duncan4, Petra Louis4, Moreno Zolfo1, Paolo Manghi1, Mireia Valles-Colomer1, Roberta Raffaetà5, Omar Rota-Stabelli3, Maria Carmen Collado6, Georg Zeller7, Daniel Falush8, Frank Maixner9, Alan W Walker4, Curtis Huttenhower10,11, Nicola Segata12.
Abstract
BACKGROUND: Eubacterium rectale is one of the most prevalent human gut bacteria, but its diversity and population genetics are not well understood because large-scale whole-genome investigations of this microbe have not been carried out.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32513234 PMCID: PMC7278147 DOI: 10.1186/s13059-020-02042-y
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Reconstruction of 1321 high-quality (HQ) E. rectale genomes from 6775 fecal metagenomes. a The parameters for the binning step of our reference-based workflow (average identity and fraction of contig aligned) were chosen using E. rectale-free metagenomic assemblies spiked with E. rectale sequences obtained from isolate genomes (“Materials and methods”). We report the median number of false positive (FP) bases (binned contigs not coming from spike-in) and false negative (FN) bases (contigs coming from spike-in that were not binned). FP and FN values are scaled with respect to the average E. rectale isolate genome size. The red square indicates the parameter value combination used in this study. b Estimation of completeness and contamination for all extracted genomes using CheckM [17]. c Comparison of genome characteristics for E. rectale isolate genomes, genomes from metagenomes reconstructed with a semi-supervised approach (MCR), and the large set of automatically reconstructed genomes (HQ). d, e Completeness and contamination estimates for bins extracted using the reference-based binning approach used in this study and bins produced by a reference-independent pipeline using metaBAT2 [2, 18]. Only genomes with > 90% completeness and < 5% contamination in both approaches are shown. f The sizes of the E. rectale genomes reconstructed with the reference-based pipeline are very consistent with the genome sizes (gray area) from cultured isolate sequencing (gray shading) while the reference-independent pipeline produces genomes of smaller size. g Pan-genome characteristics for seven E. rectale isolate genomes from NCBI available at the time of processing (Additional file 3: Table S2) as well as seven genomes from the reference-based binning and from Pasolli et al. [2]. For both binning methods, we considered the same seven, randomly selected European metagenomes as well as all seven cultured isolate genomes originating from studies in Europe/North America
Fig. 2E. rectale consists of four geographically stratified subspecies. a Maximum likelihood phylogenetic tree of all E. rectale genomes, built from a concatenated core gene alignment using PhyloPhlAn2 (“Materials and methods”) and rooted based on a phylogenetic tree including E. rectale sister species. b Non-metric multidimensional scaling plot of pairwise genetic distances between all E. rectale genomes. c Distribution of intra- and inter-subspecies core gene genetic distances. p values were obtained using bidirectional Wilcoxon rank-sum tests. d Subspecies assignment using PAM clustering with k = 4 (“Materials and methods”). Black points indicate genomes obtained from cultured isolate sequencing
Fig. 3Eubacterium rectale subspecies distribution suggests subspecies are isolated by distance. a Relative prevalence of E. rectale subspecies per country (European countries are aggregated). The size of the pie charts is proportional to the total number of genomes obtained per region/country. For a map of Europe, see Additional file 1: Fig. S13. b Pairwise approximated geographic distances between subspecies (considering representative locations) correlate with their median genetic distances (“Materials and methods” for details). A Mantel test between pairwise genetic and geographic distances using the Pearson correlation coefficient yielded a correlation of 0.73 and a p value of 0.041
Fig. 4ErEurope is consistently immotile due to loss of motility operons. a No genes from the four motility operons of E. rectale [25] are detected in ErEurope strains, and only a very small fraction of non-ErEurope genomes are lacking some or all of these genes (Additional file 1: Fig. S18). Asterisks denote cultured isolate genomes. b Differentially abundant, non-operon potentially motility-associated KOs between ErEurope and the remaining subspecies. csrA was added despite being present in the flgM/csrA operon because it can be found elsewhere in some E. rectale genomes as well. We annotated genes using eggNOG-mapper [26] and only KOs of the E. rectale reference genome annotated by KEGG [27] are considered. Potentially motility-associated KOs were defined as being part of at least one of the following KEGG pathways: quorum sensing, bacterial chemotaxis, flagellar assembly, and two-component system. p values were calculated using a two-sided Wilcoxon test and corrected for multiple testing at 5% FDR using the Benjamini-Hochberg method. c Core gene sequence and flgB/fliA operon sequence genetic clustering for all motile strains (those belonging to either ErAfrica, ErEurasia or ErAsia). d In vitro motility characterization via phase-contrast microscopy of six E. rectale isolates (“Materials and methods”). Asterisk marks strain L2–21, which is the only immotile ErEurasia strain, presumably as a consequence of the specific lack of the flgB/fliA motility operon we found in its genomes
Fig. 5The immotile subspecies ErEurope exhibits a comparatively strong shift in carbohydrate-active enzyme (CAZy) gene repertoire. a ErEurope exhibits higher carbohydrate-active enzyme (CAZy) family counts than the other subspecies. b Density estimates of the number of CAZy genes per 106 nucleotides in the genome for each subspecies. c Non-metric multidimensional scaling plot based on pairwise Manhattan distances between CAZy gene family abundances. d Left: Differentially abundant carbohydrate-active gene families between genomes of ErEurope and ErEurasia. p values were corrected at 5% family-wise error rate using the Bonferroni method. Color-scale is logarithmic. Middle: Effect size and direction of association (difference in mean copy number between ErEurope and ErEurasia). Right: Putative links between catabolic carbohydrate-active enzyme families (CBM, CE, GH) and their substrates. CBM = carbohydrate-binding module, CE = carbohydrate esterase, GH = glycoside hydrolase, GT = glycosyltransferase
In vitro carbohydrate growth assays (“Materials and methods”). The symbols represent growth (measured by OD at 650 nm after 48 h) as follows: “−”: OD less than 0.1, “+”: OD between 0.1 and 0.3, “++”: OD between 0.3 and 0.7, “+++”: OD greater than 0.7
| Subspecies | Strain | Negative control | Glucose | Raffinose | Sucrose | SPS | Inulin (chicory) | Inulin (dahlia) | Beta-glucan | Arabinan | Xylan | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ErEurope | T3WBE13 | − | ++ | ++ | + | +++ | ++ | − | ++ | ++ | − | ++ | + |
| ErEurope | T1–815 | − | ++ | ++ | − | ++ | ++ | − | ++ | ++ | − | − | + |
| ErEurasia | A1–86 | − | + | ++ | ++ | +++ | ++ | − | + | + | − | − | + |
| ErEurasia | L2–21 | − | ++ | ++ | +++ | +++ | ++ | − | + | + | − | − | − |
| ErEurasia | ATCC 33656 | − | ++ | ++ | ++ | ++ | ++ | − | − | + | − | − | − |
| ErEurasia | M104/1 | − | ++ | ++ | ++ | +++ | ++ | − | + | + | − | − | − |
Fig. 6A newly discovered genomic island enriched for glycosyltransferase genes in ErEurope. a Genome-wide counts of the GT2, GT4, and GT32 families by subspecies. b Annotated open reading frames of the GT-enriched part of a representative example of the genomic island specific to ErEurope. c Comparative genomic analysis of the genomic island (“Materials and methods”). The top five ErEurope strains contain the genomic island, whereas the bottom five do not. Colored segments connecting pairs of genes indicate orthologous genes inferred using progressiveMauve [33]. d GC content along the four contigs from ErEurope strains containing the ErEurope genomic island (“Materials and methods”). YSZC12003_37103 is not shown here because another genomic insertion would misalign the sequences. e Pairwise genetic distances between strains using orthologous genes from the genomic island are lower than those based on core genes. All 56 ErEurope strains with fully extracted genomic island are considered here