| Literature DB >> 22411855 |
Simon Boitard1, Christian Schlötterer, Viola Nolte, Ram Vinay Pandey, Andreas Futschik.
Abstract
Due to its cost effectiveness, next-generation sequencing of pools of individuals (Pool-Seq) is becoming a popular strategy for characterizing variation in population samples. Because Pool-Seq provides genome-wide SNP frequency data, it is possible to use them for demographic inference and/or the identification of selective sweeps. Here, we introduce a statistical method that is designed to detect selective sweeps from pooled data by accounting for statistical challenges associated with Pool-Seq, namely sequencing errors and random sampling among chromosomes. This allows for an efficient use of the information: all base calls are included in the analysis, but the higher credibility of regions with higher coverage and base calls with better quality scores is accounted for. Computer simulations show that our method efficiently detects sweeps even at very low coverage (0.5× per chromosome). Indeed, the power of detecting sweeps is similar to what we could expect from sequences of individual chromosomes. Since the inference of selective sweeps is based on the allele frequency spectrum (AFS), we also provide a method to accurately estimate the AFS provided that the quality scores for the sequence reads are reliable. Applying our approach to Pool-Seq data from Drosophila melanogaster, we identify several selective sweep signatures on chromosome X that include some previously well-characterized sweeps like the wapl region.Entities:
Mesh:
Year: 2012 PMID: 22411855 PMCID: PMC3424412 DOI: 10.1093/molbev/mss090
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Selective Sweep Detection Power.
| Sample Size | ||||
|---|---|---|---|---|
| Pooled NGS data | 0.91 | 0.90 | 0.91 | 0.90 |
| Sequence data—all sites | 0.89 | 0.88 | 0.87 | 0.87 |
| Sequence data—segregating sites | 0.57 | 0.60 | 0.64 | 0.62 |
Detection power for pools of n = 25 to n = 200 chromosomes of length L = 100 kb simulated under a constant population size coalescent model with θ = 0.003, ρ = 0.003, and α = 500. NGS data sets were simulated with an expected coverage, λ = 100.
FAFS estimation. Pools of n = 25 (a) and n = 50 (b) chromosomes of length L = 100 kb were simulated under a constant population size coalescent model with θ = 0.003 and ρ = 0.003. Solid lines show the AFS extracted from the complete sequence information and averaged over 100 simulated samples (it closely fits the AFS expected from coalescent theory). Diamonds and error bars represent the average estimated AFS and the average absolute deviation respectively using the same 100 samples. The estimates were obtained from pooled NGS data with 100× expected coverage using the EM algorithm.
Selective Sweeps Detected on Chromosome X in Drosophila melanogaster.
| Region | Start | End | Length | CI | Genes within the Window |
|---|---|---|---|---|---|
| 1 | 19 | 460 | 441 | Inf | CG17636, RhoGAP1A, CG17707, SP71, CG3038, CG2995, cin, CG13377 |
| CG13376, ewg, CG3777, CG13375, CG12470, Or1a, CG32816, y, ac, sc | |||||
| l(1)sc, pcl, ase, Cyp4g1, Exp6, CG13373, CG18275, CG32817, CG18166 | |||||
| CG3176, CG18273, CG3156, CG17896, CG17778, svr, arg, elav, CG4293, Appl | |||||
| 2 | 530 | 669 | 139 | Inf | su(s), CG13367, Roc1a, Suv4-20, skpA, sdk, CG13362, CG13361, CG5254 |
| CG5273, RpL22, fz3 | |||||
| 3 | 1,046 | 1,144 | 98 | Inf | eIF4E-7, CG34320, CG11378, CG11384, CG11379, CG14627, CG14626 |
| CG11380, CG14625, CG11381, CG14624, CG11382, CG11398, CG3638 | |||||
| CG11403, A3-3 | |||||
| 4 | 1,179 | 1,312 | 133 | Inf | CG32812, DAAM, CG18091, fs(1)N, CG11409, CG11412, CG11418, Tsp2A |
| CG12773, CG11417, png, CG14770, CG3056, SNF1A, CG3719, CG32813 | |||||
| CG11448, futsch | |||||
| 5 | 1,338 | 1,369 | 31 | 6.8 | futsch, Gr2a, CG14785, CG14786, CG14787, l(1)G0431, O-fut2, CG14777 |
| CG32808, CG14778, pck, CG14780, Rab27 | |||||
| 6 | 1,373 | 1,408 | 35 | 33.7 | CG14782, sta, Nmdar2, CG14795, CG32810 |
| 7 | 1,456 | 1,484 | 28 | 5.7 | no gene |
| 8 | 1,658 | 1,693 | 35 | 28.1 | Adar, CG32806 |
| 9 | 1,728 | 1,809 | 81 | 33.8 | CG14801, CG14812, deltaCOP, CG14814, MED18, CG14815, CG14803 |
| CG14816, CG14804, CG14817, CG14805, CG14818, CG14806, trr | |||||
| mRpL16, arm, CG32803, CG32801, Edem1, mip130, CG17766 | |||||
| 10 | 1,995 | 2,069 | 74 | 33.1 | csw, ph-d, ph-p, CG3835, Pgd, bcn92, wapl, Cyp4d1, CG3630, CG3621 |
| Cyp4d14 | |||||
| 11 | 2,092 | 2,118 | 26 | 19.8 | Mct1, CG18031, msta, Vinc, CG14052 |
| 12 | 3,662 | 3,681 | 19 | 12.7 | Tlk |
| 13 | 5,766 | 5,784 | 18 | 7.9 | CG3033, mof, CG3016, CG16721 |
| 14 | 6,023 | 6,061 | 38 | 30.6 | Ca-alpha1T |
| 15 | 7,028 | 7,054 | 26 | 27.7 | no gene |
| 16 | 7,152 | 7,191 | 39 | 14.4 | CG1958, CG1677, CG2059, unc-119 |
| 17 | 7,336 | 7,419 | 83 | 32.4 | CG11368, CG32719 |
| 18 | 7,821 | 7,848 | 27 | 31.1 | CG10777, CG10778, RpS14a, RpS14b, CG1530, l(1)G0193, CG1531, CG15332 |
| 19 | 10,358 | 10,383 | 25 | 16.3 | CG17255, CG2889, CG2887, PPP4R2r, CG32687 |
| 20 | 11,371 | 11,407 | 36 | 31.3 | Cyp4g15, CG1749, Spase25, CG33235, CG32666 |
| 21 | 11,441 | 11,499 | 58 | 32.4 | CG32666, CG1572, PGRP-SA, RpII215, CG11699, l(1)G0237, CG11697 |
| CG11696, e(y)2, CG11695, nod, CG1561, rho-4, CG2533 | |||||
| 22 | 11,868 | 11,893 | 26 | 15 | cac, gd, tsg, CG18130, fw |
| 23 | 13,098 | 13,123 | 25 | 15 | sno, REG, mew |
| 24 | 14,937 | 14,953 | 16 | 6.6 | hiw, CG5541 |
| 25 | 15,696 | 15,716 | 20 | 14.7 | PGRP-LE, sd, CG8509 |
| 26 | 15,824 | 15,846 | 22 | 16.5 | Ranbp16, Stim, CG8924, CG8928, CG15603, CG15604 |
| 27 | 17,743 | 17,764 | 21 | 17.1 | CG15814, CG6506, CG32554, CG32557, CG6762, Arp8, CG6769, mnb |
| 28 | 17,924 | 17,956 | 32 | 32.2 | Sh, CG15373 |
| 29 | 18,539 | 18,559 | 20 | 13.2 | l(1)G0003, CG6540, CG6617, Ing3, CG6659, fu, CG6696 |
| 30 | 19,455 | 19,479 | 24 | 17.5 | Grip84, car, Tao-1, CG14218, CG14204 |
| 31 | 20,978 | 21,009 | 31 | 19.6 | CG11566, stg1, unc, CG15445, CG34120 |
| 32 | 21,234 | 21,266 | 32 | 15.5 | waw, bbx, slgA, Hlc, mst |
In kilobases, along the X chromosome.
Confidence Index: Maximum of −log(1−q) over the window, where q is the posterior probability of hidden state “Selection.”
FAFS in Drosophila melanogaster. Estimated from all base calls (a) or only those with PHRED score greater than 35 (b). As we consider the folded AFS, the probabilities for allele frequencies 98/194 to 193/194 (not shown) can be deduced by symmetry from those for allele frequencies 1/194 to 96/194.
FSelective sweeps detected on the X chromosome of D. melanogaster. We used either all base calls or base calls with PHRED score greater than 35. The x axis labels permit to read off the physical position of the sweep window (in kilobases).