| Literature DB >> 34111423 |
Falk Hildebrand1, Toni I Gossmann2, Clémence Frioux3, Ezgi Özkurt4, Pernille Neve Myers5, Pamela Ferretti6, Michael Kuhn6, Mohammad Bahram7, Henrik Bjørn Nielsen8, Peer Bork9.
Abstract
Human gut bacterial strains can co-exist with their hosts for decades, but little is known about how these microbes persist and disperse, and evolve thereby. Here, we examined these processes in 5,278 adult and infant fecal metagenomes, longitudinally sampled in individuals and families. Our analyses revealed that a subset of gut species is extremely persistent in individuals, families, and geographic regions, represented often by locally successful strains of the phylum Bacteroidota. These "tenacious" bacteria show high levels of genetic adaptation to the human host but a high probability of loss upon antibiotic interventions. By contrast, heredipersistent bacteria, notably Firmicutes, often rely on dispersal strategies with weak phylogeographic patterns but strong family transmissions, likely related to sporulation. These analyses describe how different dispersal strategies can lead to the long-term persistence of human gut microbes with implications for gut flora modulations.Entities:
Keywords: antibiotics; bacterial dispersal; gut microbiome; metagenomics; population genetics; strain resolution
Mesh:
Substances:
Year: 2021 PMID: 34111423 PMCID: PMC8288446 DOI: 10.1016/j.chom.2021.05.008
Source DB: PubMed Journal: Cell Host Microbe ISSN: 1931-3128 Impact factor: 21.023
Figure 1Bioinformatic workflow leading to strain-resolved metagenomic species
(A) 5,278 longitudinal metagenomes were co-assembled per individual host (n = 2,089) (Figure S1). From these co-assemblies, a gene catalog with 23,137,742 genes was created and used to cluster 2,474 canopies. In parallel, metagenomic assembled genomes (MAGs) were calculated from the co-assemblies. MAGs and canopy clusters were combined and dereplicated to 1,144 high quality (>80% completeness, <5% contamination) MGS.
(B) Phylogeny and taxonomic assignment of all 1,144 MGS. The outer circle indicates missing taxonomic assignment levels (species, genus, family, order, and class), all MGS had at least phylum-level assignments, 83% were named at the genus level. Branches with >90 bootstrap support have gray circles.
(C) Intraspecific phylogeny exemplified for Prevotella copri. 859 sMGS were reconstructed from 859 metagenomic samples with ≥2X P. copri coverage, tree tips are randomly colored by the host individual. Monophyletic sMGS within the same host or host family were used to identify strains persisting in individuals or families.
(D) Identified strains were used to benchmark sMGS precision. The average nucleotide identity (ANI) was calculated between genetic sequences of strains found recurrently in individuals or families, using core genes of a species (see STAR Methods). 55% of these sequences were completely identical (100% ANI), with 95% of strains having <99.9% ANI in their representative sequences. For brevity, MGS and sMGS will be referred to as species and strains, respectively, in the main text. MAG, metagenomic assembled genome; compl., Cont, completeness and contamination of genomic bin; MGS, metagenomic species; sMGS, strain-delineated MGS; ANI, average nucleotide identity.
Terminology used throughout the manuscript to define different forms of host-bacterial association at differing spatial scales
| Primary | Secondary | Measure | |
|---|---|---|---|
| Tenacity (bacterial persistence within a host, family and geographic region) | persistence | strain persistence | percent of observed time spanned by longitudinal samples with identical strain |
| strain resilience | fraction of consecutive longitudinal samples harboring identical strain | ||
| annual persistence | annual strain survival (Kaplan-Meier analysis) | ||
| family association | horizontal strain transmission (parent-parent) | fraction of identical strains in host families, where species were present in pairwise samples | |
| vertical strain transmission (child-parent) | |||
| phylogeography | country association | within-country compared with between-country strain phylo. dist. (perMANOVA) | |
| geographic associations | correlation of strain phylo. dist. to geographic distance of samples (Mantel) |
Identical strains were defined as groups of monophyletic strains in intraspecific phylogenies. PerMANOVA, permuted multivariate analysis test; Mantel, Mantel test for comparing two distance matrices; phylo. dist, phylogenetic distance based on intraspecific phylogeny.
Figure 2Gut bacterial persistence extends beyond the individual host association
(A) Strain persistence consistently increased with host age. The black line is the average, colored lines the six most abundant phyla. Average persistence was highest in Bacteroidota strains, especially in infants (green line). Dots are the average values in each age window, lines are smoothed splines of data points. Each individual host is represented as their median age. See Figure S2 for delineation of antibiotic exposed hosts. The same taxa colors are used in all panels unless otherwise noted.
(B) Species that are persistent in an individual have a higher probability of being transmitted within a family.
(C) The frequency of vertical transmission in families (parent-child, n = 203 pairs) was often higher than horizontal transmission (parent-parent, n = 13 pairs). Species with <2 potential transmissions (total, vertically and horizontally, and arbitrary threshold) were excluded.
(D) For most of the 440 microbial species, geographic associations were only significant at a local scale (<150 km, 142/440 species, orange bars). The strength of geographic association decreased on average at higher distances (measured as the correlation coefficient between genetic and geographic distance, blue boxplots). Boxplot centers represent the median; the edges represent first and third quartiles.
(E) Persistence and geographic association (across all distance classes, only significant values included) were highly correlated; Bacteroidota (green) and Actinobacteriota (ochre) were notable for their steep correlations. Only species with significant geographic associations were included.
(F) Correlogram of the most important population genetic parameters (synonymous nucleotide diversity [πS], excess of rare alleles [Tajima’s Ds at synonymous sites], non-synonymous to synonymous substitutions [dN/dS]), and how they correlate with different forms of bacterial persistence, family, and phylogeography as well as species’ mean abundance. Stars denote multiple testing corrected Spearman correlation tests: ∗q < 0.05, ∗∗q < 0.01, ∗∗q < 0.001. Only species with significant country or geographic associations were included in correlations involving these.
Figure 3Dispersal patterns of gut bacteria and their link to bacterial evolution
(A) Dispersal patterns of bacteria were reflected in their associations: strong geographic and host association (spatiopersistent); strong family and host association (heredipersistent); no associations (non-persistent); all associations (tenacious); and average geographic or family and host associations (average persistent). The analysis was restricted to 50 high abundant genera.
(B) Tenacious taxa could be genetically well-adapted (high purifying selection [dN/dS], fewer selective sweeps [Tajima’s Ds], high population sizes [πS]), while non-persistent taxa show opposing population genetics. p values are calculated with a non-parametric Kruskal-Wallis test, comparing all six groups.
(C) Tenacious bacteria are significantly more often transmitted vertically than horizontally in families, having a higher likelihood to be inherited between generations. Antibiotic usage usually reduces strain persistence, especially in adult hosts and tenacious bacteria. The color reflects the log10 OR (odds ratio), the value in the squares is the rounded, multiple testing corrected q value of Fisher’s exact test conducted separately for each square, q ≥ 0.1 are shown as white squares. Children are hosts < 3 years old. dN/dS, non-synonymous to synonymous nucleotide substitutions.; πS, synonymous nucleotide diversity; OR, odds ratio in Fisher’s exact test. Boxplot centers represent the median; the edges represent first and third quartiles.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Metagenomic sequences of longitudinal samples | This paper | SRA: PRJEB41102 |
| Gene catalogue nucleotide and amino acid sequences | This paper | |
| SRA: PRJEB17632 | ||
| SRA: PRJEB10391 | ||
| SRA: PRJEB6456 | ||
| SRA: PRJNA322188 | ||
| SRA: PRJNA339914 | ||
| SRA: PRJNA353655 | ||
| SRA: PRJNA289586 | ||
| HMP webpage | N/A | |
| N/A | ||
| SRA: PRJNA290381 | ||
| SRA: PRJNA290380 | ||
| SRA: PRJNA352475 | ||
| SRA: PRJNA354235 | ||
| SRA: PRJEB11532 | ||
| SRA: PRJEB7369 | ||
| SRA: PRJEB8094 | ||
| SRA: ERP022986 | ||
| SRA: PRJNA475246 | ||
| SRA: PRJNA63661 | ||
| SRA: PRJEB12357 | ||
| R version 3.6.2 | R Core Team 2017 | |
| Rarefaction scripts | ||
| Shotgun metagenomic data processing pipeline | ||
| Read depth windows calculation | ||
| Frameshifts fixing in MSAs program | This paper | |
| MetaBAT2 version 2.15 | ||
| sdm version 1.47 | ||
| MegaHit version 1.2.3 beta | ||
| Prodigal version 2.6.1 | ||
| Bowtie2 version 2.3.4.1 | ||
| Samtools version 1.3.1 | ||
| BedTools version 2.21.0 | ||
| CD-HIT version 4.6.1 | ||
| MMseqs2 Version f5a1cdb44c9 | ||
| CheckM version 1.0.11 | ||
| MAFFT version 7.245 | ||
| Trimal version 1.4.rev22 | ||
| IQ-TREE version 1.6.3a | ||
| iTOL | ||
| GTDB-TK version 1.3.0 | ||
| Kraken2 version 2.0.9-beta | ||
| Lambda version 1.9.3 | ||
| EggNOG-mapper version 2.0.0 | ||
| Diamond version 0.9.24.125 | ||
| Bcftools mpileup version 1.9 | ||
| Vegan R package | ||
| FUBAR | ||
| APE | ||