| Literature DB >> 25158800 |
Christian D Huber1, Magnus Nordborg2, Joachim Hermisson3, Ines Hellmann4.
Abstract
Detecting positive selection in species with heterogeneous habitats and complex demography is notoriously difficult and prone to statistical biases. The model plant Arabidopsis thaliana exemplifies this problem: In spite of the large amounts of data, little evidence for classic selective sweeps has been found. Moreover, many aspects of the demography are unclear, which makes it hard to judge whether the few signals are indeed signs of selection, or false positives caused by demographic events. Here, we focus on Swedish A. thaliana and we find that the demography can be approximated as a two-population model. Careful analysis of the data shows that such a two island model is characterized by a very old split time that significantly predates the last glacial maximum followed by secondary contact with strong migration. We evaluate selection based on this demography and find that this secondary contact model strongly affects the power to detect sweeps. Moreover, it affects the power differently for northern Sweden (more false positives) as compared with southern Sweden (more false negatives). However, even when the demographic history is accounted for, sweep signals in northern Sweden are stronger than in southern Sweden, with little or no positional overlap. Further simulations including the complex demography and selection confirm that this is not compatible with global selection acting on both populations, and thus can be taken as evidence for local selection within subpopulations of Swedish A. thaliana. This study demonstrates the necessity of combining demographic analyses and sweep scans for the detection of selection, particularly when selection acts predominantly local.Entities:
Keywords: Arabidopsis thaliana; demography; local adaptation; selective sweeps
Mesh:
Year: 2014 PMID: 25158800 PMCID: PMC4209139 DOI: 10.1093/molbev/msu247
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FGeographical position of plants on a map of Sweden and PCA plot of the first two principal coordinates. A subcluster of 25 closely related plants in southern Sweden is encircled by an ellipse.
—Parameter Estimates for Different Models.
| Model Name | No. | AIC | Growth | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| splitSymMig4 | 4 | 4,085 | 29,000 | 568,000 | 0.41 | 5.66 | 1.67 | ||||||||||
| splitMig5 | 5 | 2,896 | 25,000 | 498,000 | 0.19 | 1.21 | 8.00 | 0.74 | |||||||||
| secondaryContact6 | 6 | 2,609 | 124,000 | 153,000 | 39,000 | =0 | =0 | 1.23 | 5.90 | 1.66 | 0.17 | ||||||
| splitMigBottleneck7 | 7 | 3,019 | 47,000 | 433,000 | 63,000 | 0.45 | 2.15 | 3.73 | 198 | 0.41 | |||||||
| splitMigBottleneck8 | 8 | 2,737 | 92,000 | 310,000 | 34,000 | 19,000 | 0.61 | 4.72 | 2.17 | 0.23 | 0.08 | =NN1 | |||||
| splitMigBottleneck9 | 9 | 2,791 | 45,000 | 470,000 | 45,000 | 0.42 | 1.91 | 2.74 | 4.10 | 4.75 | 0.21 | 0.59 | S&N | ||||
| splitMigGrowth7 | 7 | 3,012 | 23,000 | 467,000 | 0.17 | 1.24 | 26.50 | 7.49 | 0.05 | 0.79 | S&N | ||||||
| splitMigRecentGrowth7 | 7 | 4,395 | 51,000 | 358,000 | 14,000 | 0.37 | 2.05 | 3.22 | 200 | 0.41 | S | ||||||
| secondaryContact- | |||||||||||||||||
| RecentGrowth7 | 7 | 2,772 | 125,000 | 154,000 | 34,000 | =0 | =0 | 1.40 | 6.19 | 1.89 | 1.20 | 0.17 | S | ||||
| splitMigRecentGrowth8 | 8 | 4,644 | 97,000 | 216,000 | 14,000 | 0.62 | 4.19 | 1.72 | 196 | 0.17 | 0.26 | S&N | |||||
| secondaryContact- | |||||||||||||||||
| RecentGrowth8 | 8 | 2,852 | 132,000 | 132,000 | 20,000 | =0 | =0 | 2.44 | 8.39 | 1.66 | 1.03 | 0.32 | 0.09 | S&N | |||
| splitAdmixture5-1 | 5 | 7,080 | 153,000 | 20,000 | 19,000 | =0 | =0 | 0.75 | =0 | 2.88 | 0.16 | ||||||
| splitAdmixture5-2 | 5 | 6,018 | 154,000 | 33,000 | 8,000 | =0 | =0 | 0.90 | =0 | 1.61 | 0.25 | ||||||
| splitAdmixture9 | 9 | 2,768 | 132,000 | 130,000 | 11,470 | 11,410 | 0.07 | 10.69 | 1.25 | 0.10 | |||||||
Note.—Parameters are defined according to (fig. 2a). Times are given in years, population sizes in units of N0 and migration rates in units of . Note that the model-acronym refers to the number of parameters, that is, secondaryContact6 is a 6 parameter model.
aSouthern Sweden appears at time T2 as admixture between northern Sweden and an additional ancestral (unsampled) population.
bValues are admixture proportions instead of migration rates.
cThe admixture event as a pulse of migrating northern Swedish plants to southern Sweden at T2.
dModel with three demes, for details see supplementary figure S10, Supplementary Material online.
eTime of the admixture of middle European plants into southern Sweden.
FModel diagrams. (a) General model setup. (b) Model scheme and parameter estimates of backward migration rates , times (in units of ) and population sizes (in units of N0) for the best fitting model (secondaryContact6).
P Values for Model Comparison Based on Parametric Bootstrapping.
| Null Model | Alternative Model | ||
|---|---|---|---|
| splitMig5 | Secondary- Contact6 | splitMig- Bottleneck8 | |
| splitMig5 | — | <0.01 | 0.05 |
| secondaryContact6 | 0.96 | — | 0.87 |
| splitMigBottleneck8 | 0.75 | <0.01 | — |
FPopulation recombination rates for the data and the model. ρ is estimated in 1-Mb windows by fitting a curve to the decay of r2 with distance according to Hill and Weir (1988). On average, there is an about 8-fold smaller effective recombination rate in northern compared with southern Sweden in the data. Simulations from the secondaryContact6 model predict a similar ratio.
FPoint estimates and confidence intervals of robust parameters. (a) Split time between northern Sweden and southern Sweden in generations (with mutation rate μ = 7 × 10−9 per bp per generation, Ossowski et al. 2010), (b) ratio of current population sizes (South over North), and (c) migration rate from South to North over North to South (forward in time), for fitted models with AIC < 4,000. If no confidence intervals are shown they were not calculable.
FDistribution of maximum CLR values in 1-Mb windows. Simulations based on the standard neutral model (constant population size) for southern Sweden (a) and northern Sweden (b). Simulations based on the secondaryContact6 model for southern Sweden (c) and northern Sweden (d). The dashed line indicates the 99% statistical cutoff (223, 303, 110, and 1,667 in a, b, c, and d, respectively).
Comparison of Statistics Derived from the Data and Three Models.
| Statistics | Data | secondary- Contact6 | splitMig5 | splitMig- Bottleneck |
|---|---|---|---|---|
| 8.3 | 7.9 | 9.2 | 8.5 | |
| 1.65 | 1.68 | 1.76 | 1.68 | |
| 0.17 | 0.17 | 0.16 | 0.16 | |
| Tajima’s | −0.52 | −0.48 | −0.41 | −0.40 |
| Tajima’s | 0.10 | 0.26 | 0.29 | 0.20 |
| CLR | 174 | 148 | 121 | 172 |
| CLR | 49 | 23 | 16 | 19 |
aFor synonymous sites.
bMedian of 1-Mb windows.
cFrom jSFS.
dMedian of maximal CLR scores per 1-Mb window.
Characterization of the Significant Sweep Regions.
| Chr | Position ×103 | CLR | Population | Significant Enrichment of Gene Environment Correlations ( | Mean Experimental- Recombination Rate | ||
|---|---|---|---|---|---|---|---|
| 1 | 11,417 | 133 | 0.22 | 47.22e-06 | South | — | 0.029 |
| 1 | 12,855 | 194 | 0.31 | 33.31e-06 | South | — | 0.037 |
| 1 | 19,020 | 1,845 | 0.45 | 2.02e-06 | North | — | 0.016 |
| 6,217(N) | 0.62e-06 (N) | North and | |||||
| 1 | 20,009 | 127(S) | 0.68 | 33.08e-06 (S) | South | — | 0.014 |
| 1 | 24,521 | 141 | 0.16 | 40.81e-06 | South | — | 0.027 |
| Consecutive cold days, | |||||||
| 2 | 13,549 | 135 | 0.14 | 23.22e-06 | South | Relative humidity | 0.024 |
| 3 | 14,961 | 123 | 0.15 | 31.76e-06 | South | — | 0.009 |
| 4 | 5,552 | 228 | 0.11 | 36.92e-06 | South | — | 0.041 |
| 4 | 6,637 | 1,748 | 0.25 | 3.21e-06 | North | — | 0.045 |
| 4 | 9,374 | 120 | 0.16 | 34.85e-06 | South | — | 0.026 |
| Aridity, length of growing season, | |||||||
| Max. precipitation, min. precipitation, | |||||||
| 5 | 2,228 | 1,658 | 0.55 | 1.62e-06 | North | Relative humidity | 0.017 |
| 5 | 5,780 | 122 | 0.32 | 60.59e-06 | South | — | 0.014 |
| 5 | 6,748 | 2,118 | 0.54 | 1.69e-06 | North | Consecutive cold days, daylength | 0.025 |
| 5 | 19,815 | 135 | 0.68 | 51.30e-06 | South | — | 0.015 |
| Consecutive cold days, daylength, | |||||||
| 5 | 26,166 | 1,829 | 0.37 | 1.91e-06 | North | Max. temperature, min.temperature, PAR | 0.021 |
aPositions of the putative sweep regions are rounded to kb.
bLargest value of 100-kb windows within 1 Mb around the CLR peak.
cContains a transposition which is collapsed to a single bp for calculation of FST and CLR.
Power and Fixation Probability for Selective Sweeps Assuming the secondaryContact6 Demographic Model.
| Start Time of Sweep | Population | Origin of Selected Mutation | Mean CLR North | Power North | Mean CLR South | Power South | Fixation Probability | |
|---|---|---|---|---|---|---|---|---|
| 0.0025 | 0.12 | Global | South | 250 | 0 | 210 | 0.54 | 0.004 |
| 0.0025 | 0.12 | Global | North | 460 | 0.01 | 270 | 0.7 | 0.004 |
| 0.0025 | 0.12 | Local south | South | 260 | 0 | 250 | 0.65 | 0.004 |
| 0.0025 | 0.12 | Local south | North | 360 | 0.04 | 150 | 0.34 | 0 |
| 0.0025 | 0.12 | Local north | South | 300 | 0.01 | 30 | 0 | 0.001 |
| 0.0025 | 0.12 | Local north | North | 640 | 0.08 | 25 | 0 | 0.007 |
| 0.01 | 0.05 | Global | South | 400 | 0.02 | 1,100 | 0.99 | 0.021 |
| 0.01 | 0.05 | Global | North | 630 | 0.05 | 900 | 1 | 0.019 |
| 0.01 | 0.05 | Local south | South | 260 | 0 | 1,030 | 0.98 | 0.02 |
| 0.01 | 0.05 | Local south | North | 290 | 0 | 930 | 0.96 | 0.001 |
| 0.01 | 0.05 | Local north | South | 510 | 0.08 | 30 | 0.02 | 0.003 |
| 0.01 | 0.05 | Local north | North | 880 | 0.19 | 30 | 0.02 | 0.025 |
| S:0.0025 N:0.01 | 0.05 | Global | South | 493 | 0.05 | 51 | 0.07 | 0.002 |
| S:0.0025 N:0.01 | 0.05 | Global | North | 746 | 0.09 | 50 | 0.1 | 0.022 |
| S:0.0025 N:0.01 | 0.12 | Global | South | 335 | 0 | 242 | 0.68 | 0.013 |
| S:0.0025 N:0.01 | 0.12 | Global | North | 531 | 0.05 | 269 | 0.69 | 0.014 |
FCLR and FST behavior under global and local selection. (a) and (b) are results of a strong sweep (s = 0.01) starting in northern Sweden, (b) and (d) are results of a weak sweep (s = 0.0025) starting in southern Sweden. Green points indicate results under global selection, cyan points indicate results under local selection. The blue and red circles show the CLR and FST values of the significant sweep regions in northern and southern Sweden, respectively. A single sweep region (violet circle) is significant in both northern and southern Sweden. The colored lines are 95% confidence contours. Upper 99% confidence cutoffs for FST and CLR are calculated from 2,400 neutral simulations of 1 Mb sequence and are indicated by the dashed lines. The pointed line indicates the lower 99% cutoff of FST. The msms code for all simulations can be found in supplementary table S5, Supplementary Material online.