| Literature DB >> 28341700 |
Benjamin C Haller1, Philipp W Messer2.
Abstract
The McDonald-Kreitman (MK) test is a widely used method for quantifying the role of positive selection in molecular evolution. One key shortcoming of this test lies in its sensitivity to the presence of slightly deleterious mutations, which can severely bias its estimates. An asymptotic version of the MK test was recently introduced that addresses this problem by evaluating polymorphism levels for different mutation frequencies separately, and then extrapolating a function fitted to that data. Here, we present asymptoticMK, a web-based implementation of this asymptotic MK test. Our web service provides a simple R-based interface into which the user can upload the required data (polymorphism and divergence data for the genomic test region and a neutrally evolving reference region). The web service then analyzes the data and provides plots of the test results. This service is free to use, open-source, and available at http://benhaller.com/messerlab/asymptoticMK.html We provide results from simulations to illustrate the performance and robustness of the asymptoticMK test under a wide range of model parameters.Entities:
Keywords: molecular evolution; positive selection; web service
Mesh:
Year: 2017 PMID: 28341700 PMCID: PMC5427504 DOI: 10.1534/g3.117.039693
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1A screenshot of the web page for asymptoticMK. After entering values for d and d0, choosing an input file with binned values for x, p, and p0, and choosing the x interval to fit, the user can click the Submit button and asymptoticMK will provide its results in a new browser window or tab.
Results from asymptoticMK for simulation runs conducted with SLiM 2
| Model | Δasymptotic | Δoriginal | ||||
|---|---|---|---|---|---|---|
| Baseline | 0.329 ± 0.015 | 0.307 ± 0.058 | 0.164 ± 0.035 | 0.045 | 0.165 | 0.75 |
| 0.327 ± 0.008 | 0.301 ± 0.013 | 0.174 ± 0.012 | 0.025 | 0.152 | 1.00 | |
| 0.321 ± 0.067 | 0.246 ± 0.134 | 0.142 ± 0.141 | 0.120 | 0.191 | 0.15 | |
| 0.306 ± 0.005 | 0.287 ± 0.016 | 0.173 ± 0.009 | 0.019 | 0.132 | 1.00 | |
| 0.317 ± 0.057 | 0.288 ± 0.169 | 0.145 ± 0.074 | 0.134 | 0.173 | 0.05 | |
| 0.493 ± 0.018 | 0.481 ± 0.045 | 0.378 ± 0.025 | 0.041 | 0.114 | 0.70 | |
| 0.091 ± 0.014 | 0.115 ± 0.080 | −0.103 ± 0.053 | 0.071 | 0.194 | 0.55 | |
| 0.477 ± 0.016 | 0.451 ± 0.032 | 0.366 ± 0.025 | 0.029 | 0.111 | 0.70 | |
| 0.096 ± 0.011 | 0.090 ± 0.068 | −0.119 ± 0.047 | 0.057 | 0.215 | 0.50 | |
| 0.424 ± 0.024 | 0.422 ± 0.042 | 0.289 ± 0.036 | 0.032 | 0.135 | 0.60 | |
| 0.233 ± 0.011 | 0.234 ± 0.057 | 0.104 ± 0.039 | 0.045 | 0.129 | 0.50 | |
| 0.324 ± 0.006 | 0.302 ± 0.014 | 0.173 ± 0.012 | 0.022 | 0.151 | 1.00 | |
| 0.345 ± 0.063 | 0.369 ± 0.183 | 0.225 ± 0.113 | 0.126 | 0.120 | 0.05 |
The first row shows the averaged results from 20 replicate runs of the baseline SLiM model supplied on GitHub (see text). These runs used parameter values of mutation rate μ = 10−9 per base position per generation, chromosome length L = 107 base positions, beneficial mutation rate rb = 0.0005, beneficial mutation selection coefficient sb = 0.1, deleterious mutation selection coefficient sd = −0.02, and time after burn-in T = 2 × 105 generations. Each subsequent row shows the results from 20 replicate runs using the nonbaseline parameter value shown. αtrue specifies the true value of α averaged across the 20 replicates in each row; αasymptotic and αoriginal specify the asymptoticMK estimate and the estimate from the original test, respectively. SDs across the 20 replicates of each row are shown as ± values. Δasymptotic = |αasymptotic − αtrue| and Δoriginal = |αoriginal − αtrue| specify the mean absolute errors between true α values and the estimates from asymptoticMK and the original test, respectively, in each run, averaged over the 20 replicates. ρexp specifies the fraction of runs in which the exponential fit was chosen.
Figure 2Results from asymptoticMK for three test data sets. (A) Normalized site frequency spectrum (SFS) for the Drosophila data set used in Messer and Petrov (2013). Points show normalized binned polymorphism frequencies for the neutral region (black) and the test region (red). (B) Result of asymptoticMK’s analysis of that data set. The two vertical blue lines show the limits of the frequency cutoff interval used for fitting. Points indicate binned values of α(x), estimated according to Equation 2; points are gray if they are outside the cutoff interval (and thus not used in fitting). The solid red curve shows the fitted αfit(x) (here, exponential). The dashed red line shows the estimate of αasymptotic, obtained from the fitted function according to Equation 3. The gray band indicates the 95% C.I. around this αasymptotic estimate. The dotted gray line shows the estimate of αoriginal, obtained from the original (nonasymptotic) McDonald–Kreitman (MK) test, for comparison (also calculated using only the data within the cutoff interval). (C) and (D) show corresponding results from one SLiM simulation run, and (E) and (F) show results from another SLiM simulation run; in each case, the first panel shows the result of an automated fit using asymptoticMK, whereas the second shows the improvement after hand tailoring of the fit (see Results and Discussion). Note that in all four cases, the linear fit was deemed more appropriate by asymptoticMK. The solid green horizontal lines, finally, show the true value of α in the simulation runs for comparison.