| Literature DB >> 28423003 |
Abstract
The detection of genomic regions involved in local adaptation is an important topic in current population genetics. There are several detection strategies available depending on the kind of genetic and demographic information at hand. A common drawback is the high risk of false positives. In this study we introduce two complementary methods for the detection of divergent selection from populations connected by migration. Both methods have been developed with the aim of being robust to false positives. The first method combines haplotype information with inter-population differentiation (FST). Evidence of divergent selection is concluded only when both the haplotype pattern and the FST value support it. The second method is developed for independently segregating markers i.e. there is no haplotype information. In this case, the power to detect selection is attained by developing a new outlier test based on detecting a bimodal distribution. The test computes the FST outliers and then assumes that those of interest would have a different mode. We demonstrate the utility of the two methods through simulations and the analysis of real data. The simulation results showed power ranging from 60-95% in several of the scenarios whilst the false positive rate was controlled below the nominal level. The analysis of real samples consisted of phased data from the HapMap project and unphased data from intertidal marine snail ecotypes. The results illustrate that the proposed methods could be useful for detecting locally adapted polymorphisms. The software HacDivSel implements the methods explained in this manuscript.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28423003 PMCID: PMC5397020 DOI: 10.1371/journal.pone.0175944
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Positive predictive values for nvdFST, SvdM, EOS and BayeScan (Bayes factor 3: BSBF3 and Bayes factor 100: BSBF100) methods.
There are 6 points (2 mutation x 3 recombination) in the curves corresponding to the haplotype-based method (left panel) and 2 points (high recombination and independent markers) in the outlier-based method (EOS and BayeScan, right panel).
Performance of the combined method (nvdF) with n = 1 selective site located at the center of the chromosome or n = 5 (see S1 Appendix, section A-6).
Selection was α = 4Ns = 600 and migration Nm = 10. Mean localization is given in distance (kb) from the real selective position.
| ∑ | θ | ρ | n | %Power | %FPR ( | Localization (kb) | |
|---|---|---|---|---|---|---|---|
| 65 | 12 | 0 | 1 | 87 | 2.1 | 0.0058 | ±458 |
| 63 | 12 | 4 | 1 | 94 | 2.7 | 0.0008 | ±200 |
| 60 | 12 | 12 | 1 | 90 | 1.0 | 0.0003 | ±33 |
| 251 | 60 | 0 | 1 | 79 | 1.8 | 0.0048 | ±60 |
| 232 | 60 | 4 | 1 | 84 | 6.2 | 0.0011 | ±17 |
| 249 | 60 | 60 | 1 | 86 | 2.4 | 0.0002 | <±1 |
| 282 | 60 | 60 | 5 | 99 | 2.4 | 0.0002 | <±1 |
| 318 | 60 | ∞ | 1 | 0 | 0 | - | - |
∑: Mean number of shared SNPs per Mb. θ: Mutation rate. ρ: Recombination rate. FPR: false positive rate. q-value: mean estimated q-value for the significant tests. ∞: Independently segregating sites.
Fig 2Effect of % phasing accuracy on the power of the nvdF test.
Performance of the combined method (nvdF) with a single selective site in the short-term strong (α = 6000) and the long-term weak (α = 140) selection scenarios.
Nm was 10. Mean localization is given in distance (kb) from the real selective position.
| ∑ | θ | ρ | %Power | %FPR ( | Localization (kb) | |||
|---|---|---|---|---|---|---|---|---|
| 112 | 60 | 0 | 6000 | 500 | 44 | 0 | 0 | ±66 |
| 32 | 60 | 4 | 6000 | 500 | 63 | 0 | 0.0014 | ±5 |
| 62 | 60 | 60 | 6000 | 500 | 67 | 0 | 0.0008 | ±93 |
| 165 | 60 | 0 | 140 | 5,000 | 36 | 1 | 0.007 | ±2 |
| 156 | 60 | 4 | 140 | 5,000 | 30 | 1.5 | 0.003 | ±13 |
| 135 | 60 | 60 | 140 | 5,000 | 17 | 2 | 0.000 | ±1 |
∑: mean number of shared SNPs per Mb. θ: Mutation rate. ρ: recombination rate. t: number of generations. FPR: false positive rate. q-value: mean estimated q-value, the mean is computed only through the significant tests.
Performance of the extreme outlier test (EOS) with n = 1 selective site located at the center of the chromosome or n = 5.
Selection was α = 600 and Nm = 10. Mean localization is given in distance (kb) from the real selective position.
| ∑ | θ | ρ | n | %Power EOS | %FPR ( | q’-value | Localization (kb) |
|---|---|---|---|---|---|---|---|
| 65 | 12 | 0 | 1 | 0 | 0 | - | - |
| 63 | 12 | 4 | 1 | 0.2 | 0 | 0.46 | ±3 |
| 60 | 12 | 12 | 1 | 1.1 | 0 | 0.45 | ±77 |
| 251 | 60 | 0 | 1 | 0.7 | 0 | 0.10 | 0 |
| 232 | 60 | 4 | 1 | 1.3 | 0 | 0.20 | ±150 |
| 249 | 60 | 60 | 1 | 58 | 0.4 | 0.5 | <±1 |
| 282 | 60 | 60 | 5 | 1.6 | 0.4 | 0.49 | ±5 |
| 318 | 60 | ∞ | 1 | 61 | 1.2 | 3×10−6 | ±7 |
∑: mean number of shared SNPs per Mb. θ: mutation rate. ρ: recombination rate. FPR: false positive rate. q’-value: mean corrected (see S1 appendix, section A-4) estimated q-value in the significant tests. ∞: independently segregating sites.
Performance of nvdF and EOS with a single selective site located at different positions.
Selection was α = 600 and Nm = 10. Mean localization is given in distance (kb) from the real selective position. FPRs are the same as in Table 1. q-value refers to the mean q-value for the significant nvdFST tests.
| ∑ | θ | ρ | %Power | Position (kb) | Localization (kb) | |
|---|---|---|---|---|---|---|
| 259 | 60 | 0 | 81, 1 | 0 | 0.0044 | +483, +457 |
| 255 | 60 | 0 | 81, 1.5 | 10 | 0.0049 | +433, +496 |
| 256 | 60 | 0 | 82, 0.9 | 100 | 0.0041 | +350, +413 |
| 255 | 60 | 0 | 78, 0.6 | 250 | 0.0039 | ±194, ±185 |
| 230 | 60 | 4 | 75, 2.5 | 0 | 0.0014 | +324, +127 |
| 226 | 60 | 4 | 77, 3.5 | 10 | 0.0016 | +326, +142 |
| 233 | 60 | 4 | 80, 1.8 | 100 | 0.0017 | +227, +140 |
| 229 | 60 | 4 | 83, 1.6 | 250 | 0.0009 | ±123, ±20 |
| 262 | 60 | 60 | 63, 93 | 0 | 0.0014 | +122, +40 |
| 261 | 60 | 60 | 68, 91 | 10 | 0.0014 | +113, +34 |
| 257 | 60 | 60 | 81, 84 | 100 | 0.0006 | ±44, ±6 |
| 252 | 60 | 60 | 87, 67 | 250 | 0.0004 | ±10, ±0.06 |
∑: Mean number of shared SNPs per Mb. θ: Mutation rate. ρ: Recombination rate. Position: real position of the selective site.
Performance of nvdF in the short term (500 generations) with a single selective site.
Selection was α = 600 and Nm = 50. Mean localization is given in distance (kb) from the real selective position.
| ∑ | θ | ρ | %Power | %FPR ( | Localization (kb) | |
|---|---|---|---|---|---|---|
| 116 | 60 | 0 | 47 | 0 | 0.0098 | ±152 |
| 180 | 60 | 4 | 57 | 0 | 0.0042 | ±123 |
| 178 | 60 | 60 | 31 | 0.1 | 0.0096 | ±4 |
∑: Mean number of shared SNPs per Mb. θ: Mutation rate. ρ: Recombination rate. FPR: false positive rate. q-value: mean estimated q-value for the significant tests.
Top significant divergent selection regions of the human chromosome 2 based on the nvdF test for the 0.1% highest nvd values.
| Populations | # significant SNPs | Positions (Mb) | FST | Genes |
|---|---|---|---|---|
| ASN-CEU | 67 | |||
| 15.49–15.54 | 0.12–0.17 | NBAS | ||
| 27.95–28.1 | 0.21 | RBKS, MRPL33, BRE | ||
| 63.6–63.98 | 0.15–0.21 | WDPCP, UGP2, ACA59 | ||
| 83.23–83.24 | 0.16 | Intergenic | ||
| ASN-YRI | 46 | |||
| 27.98–28.09 | 0.25 | MRPL33, BRE | ||
| 56.89 | 0.16 | intergenic | ||
| 135.69–135.72 | 0.23–0.35 | ZRANB3 | ||
| 203.4–203.9 | 0.18–0.29 | ICA1L, WDR12, CARF, CYP20A1 | ||
| 117.28–117.30 | 0.22–0.36 | intergenic | ||
| 210.26 | 0.19 | MAP2 | ||
| 218.99–219.2 | 0.29–0.53 | CTDSP, VIL1, AK302678, NHEJ1-XLF, SLC23A3, piR-39082, ABCB6, AK091345, ATG9A | ||
| CEU-YRI | 116 | |||
| 15.52–15.53 | 0.29 | NBAS | ||
| 27.9 | 0.25 | RBKS-MRPL33 | ||
| 122.02–122.04 | 0.22–0.23 | CLASP1 | ||
| 135.4–136.9 | 0.29–0.63 | ACMSD, CCNT2-AS1, MAP3K19, RAB3GAP1, ZRANB3, LCT, MCM6, R3HDM1 | ||
| 178.08 | 0.41 | AGP | ||
| 218.8–219.4 | 0.28–0.43 | ARPC2, LINC00608, VIL1, USP37, NHEJ1-XLF, RQCD1, SLC23A3, IL34, ZNF142, BCS1L, GLB1L, STK36, TTLL4, PTPRN, CYP27A1 |
The positions correspond to the minimum and maximum locations for each given region. The FST range corresponds to the minimum and maximum FST for each region.
Outliers detected after EOS analysis of the individual-island filtered loci from L. saxatilis data.
Numbers in parentheses refer to the results in the original analysis [53].
| Island | Unique | Only with Jutholmen | Only with Ramsö | Only with Saltö | Shared all | Total |
|---|---|---|---|---|---|---|
| Jutholmen | 7 (59) | __ | 2 (13) | 0 (16) | 1 (9) | 10 (97) |
| Ramsö | 24 (86) | 2 (13) | __ | 2 (21) | 1 (9) | 29 (129) |
| Saltö | 6 (134) | 0 (16) | 2 (21) | __ | 1 (9) | 9 (180) |
Summary of EOS analysis for the between ecotypes L. saxatilis data [53].
| Island | Non-outliers | Outliers not in EOS | EOS | ||||
|---|---|---|---|---|---|---|---|
| Jutholmen | 4564 | 112 | 10 | 0.045 | 0.48 | 0.002 | 0.51 |
| Ramsö | 4602 | 82 | 29 | 0.064 | 0.53 | 0.005 | 0.63 |
| Saltö | 4632 | 51 | 9 | 0.060 | 0.60 | 0.002 | 0.76 |
FST: Mean FST for the analyzed loci. FST_EOS: Mean FST for the loci included in the extreme outlier set. pvalEOS: Mean p-values across the loci included in the extreme outlier set. qvalEOS: Mean q-values across the loci included in the extreme outlier set.