| Literature DB >> 26931140 |
Marguerite Lapierre1, Camille Blin2, Amaury Lambert3, Guillaume Achaz1, Eduardo P C Rocha4.
Abstract
Recent studies have linked demographic changes and epidemiological patterns in bacterial populations using coalescent-based approaches. We identified 26 studies using skyline plots and found that 21 inferred overall population expansion. This surprising result led us to analyze the impact of natural selection, recombination (gene conversion), and sampling biases on demographic inference using skyline plots and site frequency spectra (SFS). Forward simulations based on biologically relevant parameters from Escherichia coli populations showed that theoretical arguments on the detrimental impact of recombination and especially natural selection on the reconstructed genealogies cannot be ignored in practice. In fact, both processes systematically lead to spurious interpretations of population expansion in skyline plots (and in SFS for selection). Weak purifying selection, and especially positive selection, had important effects on skyline plots, showing patterns akin to those of population expansions. State-of-the-art techniques to remove recombination further amplified these biases. We simulated three common sampling biases in microbiological research: uniform, clustered, and mixed sampling. Alone, or together with recombination and selection, they further mislead demographic inferences producing almost any possible skyline shape or SFS. Interestingly, sampling sub-populations also affected skyline plots and SFS, because the coalescent rates of populations and their sub-populations had different distributions. This study suggests that extreme caution is needed to infer demographic changes solely based on reconstructed genealogies. We suggest that the development of novel sampling strategies and the joint analyzes of diverse population genetic methods are strictly necessary to estimate demographic changes in populations where selection, recombination, and biased sampling are present.Entities:
Keywords: Escherichia coli; bacteria; gene conversion; natural selection; population genomics; population size
Mesh:
Year: 2016 PMID: 26931140 PMCID: PMC4915353 DOI: 10.1093/molbev/msw048
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Published Works Using Skyline Plots to Estimate Demographic Changes in Bacteria.
| Species | Conclusion | TMRCA | Authors’ Comments |
|---|---|---|---|
| Expansion | 200 Y | Surprisingly, vaccination was followed by increase not decrease in | |
| Expansion | 35 Y | Population expansion coincides with the first reports of hospital outbreaks ( | |
| Expansion | 140 MY | A population bottleneck had a founding effect by purging diversity and leading to the formation of the extant major groups of | |
| Expansion | 20 Y | Correlation between population and reported number of clinical cases ( | |
| Expansion | 50 MY | The populations of antibiotic resistant isolates expand faster than those of sensitive bacteria ( | |
| All expansion | 70 KY, 6.6 KY, 40Y | (1) Concludes about a parallel evolution between human (mitochondria) and this clade’s | |
| Expansion | 17 Y | Population expansion ( | |
| Expansion, contraction | 40 Y | (1) Population expansion measured in housekeeping functions parallels the number of clinical cases, but not when measured in an antibiotic resistance gene, suggesting it has been subject to positive selection. Results could be used in managing resistance ( | |
| Expansion | 0.005/nt | Assigns the presence of a recent selective sweep ( | |
| Stable | 0.07/nt | Suggests ancient rapid growth followed by stabilization, but very close strains are absent ( | |
| Stable | 0.1/nt | Suggests it is an endemic pathogen ( | |
| Expansion | 450 Y | Population contraction associated with the introduction of antibiotics, followed by expansion that would be associated with environmental changes ( | |
| All expansion | 10–71 KY, 25 Y | (1) Steady increase in population size in the last 3,000 years. Recombinant SNPs removed and strong selection checked ( | |
| Stable | 500 Y | The population size was found to be constant through time ( | |
| Expansion | 20 Y, 50 Y, 30 Y | (1) Rampant expansion might have followed trans-Atlantic spread ( | |
| Contraction | 15 Y | Population expansion and then contraction fits the observed number of clinical cases ( | |
| Expansion | 80 Y | Associates population expansion with the acquisition of super-antigens ( | |
| Expansion | 90 Y | Correlates population expansion with the introduction of new methods used for improved pig genetics ( | |
| Expansion | 7 MY | The demographic history matches the glacial cycles ( | |
| Expansion | 3 Y | Association with the history of the progression of an epidemic ( |
N—We show the TMRCA, the conclusion of the work, and the authors' justifications of the results. Multiple studies published for a given species are indicated as multiple lines in the column TMRCA and by the respective numbers in the last column.
aTMRCA not indicated. The value indicates the span of the X-axis on the skyline plot.
bStudies did not perform time calibration and present only the number of mutations per site.
Parameters for E. coli Populations Used in the Simulations.
| Parameter | Value | Reference |
|---|---|---|
| Effective population size ( | 1.8 × 108 | |
| Genomic adaptive mutation rate | 1 × 10−5 | |
| Genomic deleterious mutation rate | 2 × 10−4 | |
| Average value of | ±7 × 10−3 | |
| Mutation rate per generation (u) | 8.9 × 10−11 | |
| Genome size (nt) | 5 × 106 | |
| Recombination/mutation rate | 1 | |
| Size of recombination tracts | 542 | |
| SNPs recombination/mutation | 2.5 | |
| Weak selection ( | 5 | |
| Strong recombination/mutation rate | 10 |
aThe absolute values of s for adaptive and deleterious mutations being in the same order of magnitude we used an average for both.
FThe effect of recombination on skyline plots and SFS. The simulations used the E. coli population parameters (Recombination), ten times higher recombination rates (10× Recombination), or no recombination (Neutral). Top The simulations in the skyline plots are represented as dotted lines. The thick lines represent the smooth kernel fit (resp. R2 = 0.81, R2 = 0.87, and R2 = 0.38). Bottom. SFS (distribution of the frequencies of all nucleotide polymorphisms in the sample) for each condition. The thick line indicates the average SFS over 1,000 replicates whereas the thin shaded lines are the observed SFS for ten random replicates. All SFS were transformed and normalized (see section “Methods”). Colors match the same datasets in both plots.
FDistribution of the number of segregating sites and Tajima D values in each set of 1,000 simulations. The gray line in the top panel corresponds to the expected number of segregating sites under the standard neutral model: where . Here, , , and . The gray line in the bottom panel corresponds to the expected Tajima D under the neutral model (D = 0).
FBoxplots of the ratios between the maximal and minimal Ne.u values for skyline plots (ten simulations each), across the different types of simulations. All other categories were significantly different from Neutral (all P < 0.01 Wilcoxon tests, except the comparison between Neutral and Mixed, P = 0.0102, same test).
FThe effect of selection on ten skyline plots (top) and 1,000 SFS (bottom). Top The simulations were represented as dotted lines. The thick lines represent the smooth kernel fit for strong and weak selection (resp. R2 = 0.78, R2 = 0.79). For the analysis of selection and recombination only the kernel fits are indicated (R2 = 0.80). The grey box indicates the range of variation of the Neutral simulations in figure 1. Bottom The thick lines represent the average SFS over 1,000 simulations. In all SFS plots, the horizontal black line indicates the neutral expectation. Colors match the same datasets in both plots.
FAnalysis of three types of sampling biases. Top Schematic representation of the different types of sampling biases in a species tree (see section “Methods” for a precise definition). Center Skyline plots for each set of ten simulations. The dotted lines represent the simulations. The thick line represents the smooth kernel fit (resp. Clustered R2 = 0.63, Uniform R2 = 0.86, Mixed R2 = 0.40). The grey box indicates the range of variation of the Neutral simulations in figure 1. See supplementary figure S5, Supplementary Material online for a zoom for values of clustered bias close to zero. Bottom Average SFS for the three datasets (1,000 simulations for each). Colors match the same datasets in both plots.
FTop Skyline plots for clustered, uniform and mixed sampling on simulations with weak selection and recombination (each point is an average of the ten simulations). The grey box indicates the range of variation of the Neutral simulations in figure 1. Bottom Average SFS for the same three datasets (1,000 simulations). Colors match the same datasets in both plots.
FAnalysis of the core genome of E. coli. (A) Values of dN/dS versus dS. Each point represents a comparison between two strains using the concatenate of alignments of genes of the core genome. (B) Skyline plot. We made ten analyses of the dataset by randomly sampling each time a tenth of the core genome. The orange line represents the skyline of the concatenate of genes with reconstructed genealogies not significantly different from those of the core genome (passed the SH test at P < 0.01). The inset represents the ratio between the maximum and minimum values of Ne.u for the 11 skyline plots (10 with the 1/10th samples of the core genome and one with the analysis of the concatenate of genes passing the SH-test). (C) The observed SFS is indicated in dashed red line, the corrected SFS (with Kimura’s two-parameter model) is indicated in solid red line. The horizontal black line indicates the neutral expectation. The corrected SFS with the JC69 model (not shown here) is similar to the SFS corrected with Kimura’s two-parameter model except for the last point, which is slightly higher. (D) E. coli distance-based phenetic tree with the major clades indicated on the right. A similar tree indicating all strains used in the analysis is in supplementary figure S6, Supplementary Material online.