| Literature DB >> 35138123 |
William R Shoemaker1, Jay T Lennon1.
Abstract
The degree to which independent populations subjected to identical environmental conditions evolve in similar ways is a fundamental question in evolution. To address this question, microbial populations are often experimentally passaged in a given environment and sequenced to examine the tendency for similar mutations to repeatedly arise. However, there remains the need to develop an appropriate statistical framework to identify genes that acquired more mutations in one environment than in another (i.e., divergent evolution), genes that serve as genetic candidates of adaptation. Here, we develop a mathematical model to evaluate evolutionary outcomes among replicate populations in the same environment (i.e., parallel evolution), which can then be used to identify genes that contribute to divergent evolution. Applying this approach to data sets from evolve-and-resequence experiments, we found that the distribution of mutation counts among genes can be predicted as an ensemble of independent Poisson random variables with zero free parameters. Building on this result, we propose that the degree of divergent evolution at a given gene between populations from two different environments can be modeled as the difference between two Poisson random variables, known as the Skellam distribution. We then propose and apply a statistical test to identify specific genes that contribute to divergent evolution. By focusing on predicting patterns among replicate populations in a given environment, we are able to identify an appropriate test for divergence between environments that is grounded in first principles. IMPORTANCE There is currently no universally accepted framework for identifying genes that contribute to molecular divergence between microbial populations in different environments. To address this absence, we developed a null model to describe the distribution of mutation counts among genes. We find that divergent evolution within a given gene can be modeled as the absolute difference in the total number of mutations observed between two environments. This quantity is effectively captured by a probability distribution known as the Skellam distribution, providing an appropriate statistical test for researchers seeking to identify the set of genes that contribute to divergent evolution in microbial evolution experiments.Entities:
Keywords: adaptation; evolution; experimental evolution; microbial evolution; parallel evolution
Mesh:
Year: 2022 PMID: 35138123 PMCID: PMC8826959 DOI: 10.1128/msphere.00672-21
Source DB: PubMed Journal: mSphere ISSN: 2379-5042 Impact factor: 4.389
FIG 1(a) A typical evolve-and-resequence experiment is performed by splitting a culture that has been grown from a single colony, inoculating cells into replicate flasks constituting one or more environmental conditions (e.g., purple or orange), and propagating the population over time by periodically transferring cells into new flasks with fresh medium. (b and c) After a given number of generations has elapsed, replicate populations are often sequenced, allowing the number of de novo mutations at a given gene to be calculated. (d to f) The degree of parallel evolution within each environment is quantified by taking the sum of mutation counts across replicate populations for a given gene (d and e), while the degree of divergent evolution is quantified by taking the absolute difference in mutation counts between environments (|Δn|) (f).
FIG 2(a) Using the Poisson distribution, we were able to predict the occupancy of nonsynonymous mutations for a given gene among 115 replicate E. coli populations. (b) Using the same data set, we were able to subsample replicate populations to examine how the level of error in our prediction decreased as the number of replicate populations increased. (c) The degree of covariance between genes is summarized by the primary eigenvalue of the gene-by-population matrix of mutation counts (dashed black line). By generating null count matrices, we simulated a null distribution of primary eigenvalues to calculate the P value for the observed degree of covariance. (d) Similar to the analysis in panel c, we examined how the ability to detect covariance changes as the number of replicate populations increases by calculating the fraction of observed primary eigenvalues greater than the null.