| Literature DB >> 30252076 |
Susan F Bailey1,2, Qianyun Guo1, Thomas Bataillon1.
Abstract
Parallel evolution, defined as identical changes arising in independent populations, is often attributed to similar selective pressures favoring the fixation of identical genetic changes. However, some level of parallel evolution is also expected if mutation rates are heterogeneous across regions of the genome. Theory suggests that mutation and selection can have equal impacts on patterns of parallel evolution; however, empirical studies have yet to jointly quantify the importance of these two processes. Here, we introduce several statistical models to examine the contributions of mutation and selection heterogeneity to shaping parallel evolutionary changes at the gene-level. Using this framework, we analyze published data from forty experimentally evolved Saccharomyces cerevisiae populations. We can partition the effects of a number of genomic variables into those affecting patterns of parallel evolution via effects on the rate of arising mutations, and those affecting the retention versus loss of the arising mutations (i.e., selection). Our results suggest that gene-to-gene heterogeneity in both mutation and selection, associated with gene length, recombination rate, and number of protein domains drive parallel evolution at both synonymous and nonsynonymous sites. While there are still a number of parallel changes that are not well described, we show that allowing for heterogeneous rates of mutation and selection can provide improved predictions of the prevalence and degree of parallel evolution.Entities:
Mesh:
Year: 2018 PMID: 30252076 PMCID: PMC6200314 DOI: 10.1093/gbe/evy210
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.—Schematic showing how the mutation counts data are generated and general assumptions underlying these data.
Genomic Variables Used in This Study
| Variable Name | Description | Reference |
|---|---|---|
| d | Number of synonymous substitutions per synonymous site, estimated from gene alignments of | Estimated for this study. |
| d | Number of nonsynonymous substitutions per nonsynonymous site, estimated from | Estimated for this study. |
| Gene length ( | The number of nucleotides. | ( |
| % GC content ( | Percentage of nucleotides in the gene sequence that are either guanine or cytosine. | ( |
| Multifunctionality ( | Number of different GO slim categories assigned to a gene. | ( |
| Degree of protein–protein interaction ( | The number of physical interactions reported by BioGRID ( | ( |
| Codon adaptation index ( | A measure of bias in the usage of synonymous codons, based on a comparison between codon frequencies in the gene and frequencies observed in a set of highly expressed genes ( | ( |
| Number of domains ( | The number of regions that Pfam ( | ( |
| Level of expression ( | A measure of mRNA level for each gene when grown in standard lab conditions. | ( |
| Local recombination rate ( | Mean recombination rate for a given gene calculated from recombination rate estimate at 0.5 kb intervals using | ( |
| Essential genes ( | A true/false indicator variable denoting whether or not a gene is essential, based on growth assays of deletion strains. | ( |
“MS” Models Testing Assumptions with the Synonymous Mutation Data
| Model | Log-lik. | No. param. | AIC | |
|---|---|---|---|---|
| MS0.P: | Pois ( | −283.0 | 1 | 568.2 |
| MS0.NB: | NB ( | −283.0 | 2 | 569.9 |
| MS1.P: | Pois ( | −273.9 | 2 | 551.8 |
| MS1.NB: | NB ( | −273.9 | 3 | 553.8 |
| MS2.P: | Pois ( | −274.0 | 1 | 549.9 |
| MS2.NB: | NB ( | −274.0 | 2 | 551.9 |
Note.—Log-likelihoods and AIC values are provided. The best model as determined by the lowest AIC with the fewest parameters is MS2.P.
. 2.—Distribution of (A) synonymous and (B) nonsynonymous mutations per gene (totaled over all 40 populations in the data set) and predicted model distributions from M0.P (gray circles), M1.P (black points), M2.P (green triangles), and MN.NB (blue squares), and MN.NBPC (orange diamonds).
“MN” Models Parameter Estimates (constant, α1, α2, etc.) and P-values for Those Estimates
| MN.NB: NB ( | |||
|---|---|---|---|
| Estimate | |||
| 0.001 | |||
| 0.004 | |||
| 0.041 | |||
| constant | 8.084×10−6 | <0.001 | |
| 0.3806 | <0.001 | ||
| MN.NBPC: NB ( | |||
| exp(PC10) | <0.001 | ||
| constant | 8.846×10−5 | <0.001 | |
| 0.3988 | <0.001 | ||
Note.—Only those variables that significantly improved model fit are included.
. 3.—Loadings of the 11 genomic variables on PC10—the only principal component that significantly explains variation in nonsynonymous mutation counts. Genomic variables are ordered from largest to smallest in terms of the absolute value of their loading.
Log-Likelihoods, and AIC Values for the “MN” Models
| Model | Log-lik. | No. param. | AIC |
|---|---|---|---|
| MN0.P: | −1,159.9 | 1 | 2,321.7 |
| MN0.NB: | −1,022.5 | 2 | 2,048.9 |
| MN2.P: | −1,050.3 | 1 | 2,102.7 |
| MN2.NB: | −956.6 | 2 | 1,917.3 |
| MN.P | −1,021.4 | 4 | 2,050.8 |
| MN.PPC | −1,013.0 | 2 | 2,030.1 |
| MN.NB | −944.8 | 5 | 1,899.7 |
| MN.NBPC | −947.9 | 3 | 1,901.8 |
Note.—The best model as determined by the lowest AIC with the fewest parameters is MN.NB.
. 4.—Distribution of the degree of parallelism (estimated as the pairwise Jaccard Index, J) from the real data (yellow bars) and simulated data from the best-fit models (blue bars). Overlapping regions appear green. Panels (A) and (B) show real and simulated data for synonymous and nonsynonymous mutations, respectively.