| Literature DB >> 25015648 |
Zachary A Szpiech1, Ryan D Hernandez2.
Abstract
Haplotype-based scans to detect natural selection are useful to identify recent or ongoing positive selection in genomes. As both real and simulated genomic data sets grow larger, spanning thousands of samples and millions of markers, there is a need for a fast and efficient implementation of these scans for general use. Here, we present selscan, an efficient multithreaded application that implements Extended Haplotype Homozygosity (EHH), Integrated Haplotype Score (iHS), and Cross-population EHH (XPEHH). selscan accepts phased genotypes in multiple formats, including TPED, and performs extremely well on both simulated and real data and over an order of magnitude faster than existing available implementations. It calculates iHS on chromosome 22 (22,147 loci) across 204 CEU haplotypes in 353 s on one thread (33 s on 16 threads) and calculates XPEHH for the same data relative to 210 YRI haplotypes in 578 s on one thread (52 s on 16 threads). Source code and binaries (Windows, OSX, and Linux) are available at https://github.com/szpiech/selscan.Entities:
Mesh:
Year: 2014 PMID: 25015648 PMCID: PMC4166924 DOI: 10.1093/molbev/msu211
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Runtime Performance (in seconds) of ihs, rehh, and selscan for Calculating Unstandardized iHS for Various Data Sets.
| Data Set | |||||||
|---|---|---|---|---|---|---|---|
| Threads | 2 | 4 | 8 | 16 | |||
| IHS250 | 19,275 | 563 | 618 | 306 | 162 | 84 | 58 |
| IHS500 | 45,547 | 1,652 | 1,554 | 782 | 399 | 220 | 150 |
| IHS1000 | 4,834 | 4,018 | 2,019 | 1,040 | 566 | 380 | |
| IHS2000 | 12,652 | 7,054 | 3,633 | 1,869 | 1,046 | 752 | |
| CEU22 | 19,434 | 588 | 353 | 182 | 93 | 50 | 33 |
Note.—Calculations running over 100,000 s were aborted.
arehh integrates over a physical map instead of a genetic map. Using a physical map does not affect selscan’s runtime (data not shown).
F(A) Unstandardized iHS scores calculated on the CEU22 data set for selscan and ihs (Pearson’s r = 0.9946) and (B) Unstandardized XPEHH scores calculated on the CEUYRI22 data set for selscan and xpehh (Pearson’s r = 0.9999).
Runtime Performance (in seconds) of xpehh and selscan for Calculating Unstandardized XPEHH for Various Data Sets.
| Data Set | ||||||
|---|---|---|---|---|---|---|
| Threads | 2 | 4 | 8 | 16 | ||
| XP250 | 11,113 | 287 | 141 | 71 | 38 | 25 |
| XP500 | 57,006 | 766 | 403 | 194 | 104 | 67 |
| XP1000 | 2,037 | 1,018 | 515 | 274 | 180 | |
| XP2000 | 5,683 | 2,798 | 1,471 | 763 | 493 | |
| CEUYRI22 | 37,271 | 578 | 291 | 150 | 78 | 52 |
Note.—Calculations running over 100,000 s were aborted.