| Literature DB >> 32763953 |
Vladimir Shchur1,2, Jesper Svedberg3, Paloma Medina3, Russell Corbett-Detig2,3, Rasmus Nielsen4,2,5.
Abstract
Admixture is increasingly being recognized as an important factor in evolutionary genetics. The distribution of genomic admixture tracts, and the resulting effects on admixture linkage disequilibrium, can be used to date the timing of admixture between species or populations. However, the theory used for such prediction assumes selective neutrality despite the fact that many famous examples of admixture involve natural selection acting for or against admixture. In this paper, we investigate the effects of positive selection on the distribution of tract lengths. We develop a theoretical framework that relies on approximating the trajectory of the selected allele using a logistic function. By numerically calculating the expected allele trajectory, we also show that the approach can be extended to cases where the logistic approximation is poor due to the effects of genetic drift. Using simulations, we show that the model is highly accurate under most scenarios. We use the model to show that positive selection on average will tend to increase the admixture tract length. However, perhaps counter-intuitively, conditional on the allele frequency at the time of sampling, positive selection will actually produce shorter expected tract lengths. We discuss the consequences of our results in interpreting the timing of the introgression of EPAS1 from Denisovans into the ancestors of Tibetans.Entities:
Keywords: EPAS1; adaptation; adaptive introgression; admixture; selection; tract length
Mesh:
Year: 2020 PMID: 32763953 PMCID: PMC7534438 DOI: 10.1534/g3.120.401616
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Distribution of the distance from the selected locus to one end of the introgressed tract. Selection coefficient , admixture fraction and time since introgression generations. The observed allele frequency is . The first panel shows the probability density functions for the empirical distribution obtained by simulations, the distribution calculated under the deterministic approximation and the exponential distribution with the mean set equal to the simulated mean. Three other panels are qq-plots showing all three pairs of the presented distributions.
The accuracy of the deterministic approximation for the expected tract length under adaptive introgression compared to estimates from simulations. For every set of parameters (introgression fraction, selection coefficient, and time of introgression), we performed replicate= simulations. The haploid effective population size was 10,000 chromosomes, with 100 chromosomes sampled from each population. The relative error was calculated by comparing the simulated expected tract length to the prediction given by the deterministic model.
| Introgression parameters | Expected tract length | ||||
|---|---|---|---|---|---|
| Proportion | Selection | Time | Simulations | Deterministic approximation | Relative error |
| 0.01 | 0.001 | 50 | 0.0417436 | 0.0401907 | 3.7% |
| 100 | 0.0197123 | 0.0200985 | 2.0% | ||
| 500 | 0.00407317 | 0.00402525 | 1.1% | ||
| 1000 | 0.00206252 | 0.00201644 | 2.2% | ||
| 0.01 | 0.01 | 50 | 0.0415686 | 0.0402287 | 3.3% |
| 100 | 0.0202129 | 0.02014 | 0.4% | ||
| 500 | 0.00423004 | 0.00411598 | 2.8% | ||
| 1000 | 0.00237283 | 00227902 | 4.0% | ||
| 0.05 | 0.001 | 50 | 0.0427474 | 0.0419025 | 2.0% |
| 100 | 0.0210069 | 0.0209646 | 0.2% | ||
| 500 | 0.00428813 | 0.00421551 | 1.7% | ||
| 1000 | 0.00217073 | 0.00212356 | 2.2% | ||
| 0.05 | 0.01 | 50 | 0.0428456 | 0.0420981 | 1.7% |
| 100 | 0.021438 | 0.0211766 | 1.2% | ||
| 500 | 0.00470731 | 0.00462923 | 1.7% | ||
| 1000 | 0.00290033 | 0.00286404 | 1.3% | ||
The accuracy of the deterministic approximation for the expected tract length under adaptive introgression compared to the estimates from simulations for numerically estimated trajectory. For scenarios with relatively small admixture fractions ( = Proportion = 0.0006), the logistic function does not accurately describe the allele frequency trajectory, so we numerically estimated the mean trajectory using stochastic simulations. The relative error is for the deterministic approximation with numerically estimated trajectories relatively to the simulation estimates.
| Introgression parameters | Expected tract length | Relative error | ||||
|---|---|---|---|---|---|---|
| Deterministic | Deterministic | |||||
| Proportion | Selection | Time | Simulations | Approximation | Approximation | |
| (numerical) | (logistic) | |||||
| 0.0006 | 0.01 | 1500 | 0.00176 | 0.00173 | 0.00142 | 1.7% |
| 1750 | 0.00158 | 0.00160 | 0.00129 | 1.3% | ||
| 2000 | 0.001503 | 0.001500 | 0.00120 | 0.2% | ||
| 2250 | 0.00141 | 0.00142 | 0.00114 | 0.7% | ||
| 0.0006 | 0.02 | 1500 | 0.00234 | 0.00230 | 0.00198 | 1.7% |
| 1750 | 0.00222 | 0.00215 | 0.00186 | 3.2% | ||
| 2000 | 0.00210 | 0.00203 | 0.00175 | 3.3% | ||
| 2250 | 0.00205 | 0.00193 | 0.00167 | 5.9% | ||
| 0.025 | 0.001 | 1500 | 0.001415 | 0.001436 | 0.00138 | 1.5% |
| 1750 | 0.001227 | 0.001244 | 0.001189 | 1.4% | ||
| 2000 | 0.001136 | 0.001101 | 0.001043 | 3.1% | ||
| 2250 | 0.001020 | 0.000990 | 0.000930 | 2.9% | ||
| 2500 | 0.000883 | 0.000902 | 0.000839 | 2.2% | ||
| 0.025 | 0.005 | 1500 | 0.00164 | 0.00158 | 0.00155 | 4.9% |
| 1750 | 0.00146 | 0.00142 | 0.00139 | 2.7% | ||
| 2000 | 0.00133 | 0.00130 | 0.00128 | 2.3% | ||
| 2250 | 0.00126 | 0.00121 | 0.00120 | 4.0% | ||
| 2500 | 0.00118 | 0.00114 | 0.00114 | 3.4% | ||
Deterministic prediction for the expected tract length in a neutral model compared to the theoretical expectation under SMC’ model.
| Proportion | Time | Expected tract length (deterministic approximation) | Expected tract length (theoretical) | Relative error |
|---|---|---|---|---|
| 0.01 | 10 | 0.20194 | 0.200923 | 0.5% |
| 100 | 0.0201956 | 0.0200946 | 0.5% | |
| 1000 | 0.00202122 | 0.00201177 | 0.5% | |
| 0.05 | 10 | 0.210491 | 0.209387 | 0.5% |
| 100 | 0.021054 | 0.0209442 | 0.5% | |
| 1000 | 0.00211024 | 0.0020999 | 0.5% | |
| 0.1 | 10 | 0.222252 | 0.222278 | 0.01% |
| 100 | 0.022233 | 0.0222778 | 0.2% | |
| 1000 | 0.00223105 | 0.00227824 | 2.1% |
Simulated across and within population variance and the across population variance estimated from the deterministic model. The percents show the relative error of the deterministic approximation compared to the simulations both for across and within population values. The introgression with 0.0006 (or, 0.06%) admixture proportion was chosen to approximate Denisovan introgression into Tibetans. The introgression with 0.025 (or, 2.5%) admixture proportion was chosen to approximate Neanderthal introgression into non-Africans.
| Introgression parameters | Standard deviation of tract length | ||||||
|---|---|---|---|---|---|---|---|
| Proportion | Selection | Time | Deterministic approximation | Simulated (across populations) | Simulated (within population all replicates) | ||
| 0.01 | 0.001 | 50 | 0.028513 | 0.029397 | 3.01% | 0.027405 | 4.04% |
| 100 | 0.014260 | 0.013484 | 5.75% | 0.013376 | 6.61% | ||
| 500 | 0.002858 | 0.002795 | 2.25% | 0.002334 | 22.45% | ||
| 1000 | 0.001433 | 0.001454 | 1.44% | 0.001131 | 26.7% | ||
| 0.01 | 0.01 | 50 | 0.028534 | 0.028719 | 0.64% | 0.027571 | 3.49% |
| 100 | 0.014283 | 0.013762 | 3.79% | 0.013481 | 5.95% | ||
| 500 | 0.002904 | 0.003050 | 4.79% | 0.002621 | 10.8% | ||
| 1000 | 0.001548 | 0.001548 | 0.0% | 0.001438 | 7.65% | ||
| 0.05 | 0.001 | 50 | 0.029728 | 0.030492 | 2.51% | 0.029590 | 0.47% |
| 100 | 0.014875 | 0.014932 | 0.38% | 0.014642 | 1.59% | ||
| 500 | 0.002993 | 0.003035 | 1.38% | 0.002757 | 8.56% | ||
| 1000 | 0.001509 | 0.001554 | 2.9% | 0.001336 | 12.95% | ||
| 0.05 | 0.01 | 50 | 0.029834 | 0.030620 | 2.57% | 0.029731 | 0.35% |
| 100 | 0.014989 | 0.014980 | 0.06% | 0.014830 | 1.07% | ||
| 500 | 0.003196 | 0.003246 | 1.54% | 0.003120 | 2.44% | ||
| 1000 | 0.001835 | 0.001888 | 2.81% | 0.001785 | 2.8% | ||
| 0.0006 | 0.01 | 1500 | 0.001125 | 0.001173 | 4.09% | 0.000962 | 16.94% |
| 1750 | 0.001014 | 0.000998 | 1.6% | 0.000889 | 14.06% | ||
| 2000 | 0.000930 | 0.000954 | 2.52% | 0.000811 | 14.67% | ||
| 2250 | 0.000864 | 0.000884 | 2.26% | 0.000751 | 15.05% | ||
| 0.0006 | 0.02 | 1500 | 0.001350 | 0.001423 | 5.13% | 0.001241 | 8.78% |
| 1750 | 0.001230 | 0.001312 | 6.25% | 0.001132 | 8.66% | ||
| 2000 | 0.001136 | 0.001221 | 6.96% | 0.001050 | 8.19% | ||
| 2250 | 0.001061 | 0.001179 | 10.01% | 0.000974 | 8.93% | ||
| 0.025 | 0.001 | 1500 | 0.001015 | 0.001052 | 3.52% | 0.000806 | 25.93% |
| 1750 | 0.000878 | 0.000840 | 4.52% | 0.000677 | 29.69% | ||
| 2000 | 0.000776 | 0.000835 | 7.07% | 0.000589 | 31.75% | ||
| 2250 | 0.000696 | 0.000691 | 0.72% | 0.000535 | 30.09% | ||
| 2500 | 0.000633 | 0.000608 | 4.11% | 0.000477 | 32.7% | ||
| 0.025 | 0.005 | 1500 | 0.001082 | 0.001136 | 4.75% | 0.000991 | 9.18% |
| 1750 | 0.000956 | 0.000970 | 1.44% | 0.000875 | 9.26% | ||
| 2000 | 0.000863 | 0.000887 | 2.71% | 0.000784 | 10.08% | ||
| 2250 | 0.000793 | 0.000836 | 5.14% | 0.000726 | 9.23% | ||
| 2500 | 0.000737 | 0.000779 | 5.39% | 0.000672 | 9.67% | ||
Figure 2Dependence of the expected tract length on the proportion of introgression. Different panels correspond to different times of introgression (10, 100 and 1000 generations respectively). Different colors correspond to different selection coefficient values (0.01, 0.05, 0.1) Notice that as the time since introgression changes by an order of magnitude, the tract lengths change by an order of magnitude also. So, the y-axes here are shown on different scales.
Figure 3Dependence of the expected tract length on the strength of selection for different times of introgression. Different panels correspond to different times of introgression (10, 100 and 1000 generations respectively). Different colors correspond to different introgression proportion values (0.01, 0.05, 0.1).
Figure 4Dependence of expected tract length on the strength of selection conditioned on the allele frequency at the time of sampling. Different panels correspond to different times of introgression (10, 100 and 1000 generations respectively). Different colors correspond to different allele-frequency values at the time of sampling (0.1, 0.2, 0.5, 0.9).
The effect of assumptions regarding the strength of selection on estimates of the time of the Denisovan introgression into Tibetans for EPAS1. We assume an initial introgression fraction of 0.06% (), and a present-day allele frequency for the EPAS1 allele of 85% in Tibetans (). Under deterministic approximation, the value of the selection coefficient then determines the time of introgression needed for the allele to reach the observed allele frequency at the time of sampling (present time). We calculate the expected length of introgressed Denisovan tracts overlapping EPAS1 allele for each such scenario. In the column “expected tract length (no selection)” we show the expected tract length for the given introgression proportion and time since introgression under hypothesis of no selection using formula 3. The last column shows the relative difference between the expected tract length estimated while taking into account selection and while ignoring it.
| Selection | Time(in generations) | Expected tract length | Expected tract length (no selection) | Relative error |
|---|---|---|---|---|
| 0.005 | 2672 | 0.00101 | 0.00080 | 20.8% |
| 0.006 | 2282 | 0.00117 | 0.00093 | 20.5% |
| 0.007 | 2030 | 0.00130 | 0.00104 | 20.0% |
| 0.008 | 1791 | 0.00146 | 0.00117 | 19.9% |
| 0.009 | 1619 | 0.00160 | 0.00129 | 19.4% |
| 0.01 | 1467 | 0.00175 | 0.00141 | 19.4% |
| 0.011 | 1350 | 0.00190 | 0.00153 | 19.5% |
| 0.012 | 1264 | 0.00201 | 0.00163 | 18.9% |
| 0.013 | 1170 | 0.00217 | 0.00176 | 18.9% |
| 0.014 | 1100 | 0.00230 | 0.00187 | 18.7% |
| 0.015 | 1041 | 0.00242 | 0.00197 | 18.6% |
| 0.016 | 983 | 0.00256 | 0.00209 | 18.4% |
| 0.017 | 928 | 0.00270 | 0.00221 | 18.1% |
| 0.018 | 883 | 0.00283 | 0.00232 | 18.0% |
| 0.019 | 844 | 0.00296 | 0.00242 | 18.2% |
| 0.02 | 806 | 0.00309 | 0.00253 | 18.1% |