| Literature DB >> 24931983 |
Ya'ara Arkin1, Elior Rahmani1, Marcus E Kleber1, Reijo Laaksonen2, Winfried März3, Eran Halperin3.
Abstract
MOTIVATION: Gene-gene interactions are of potential biological and medical interest, as they can shed light on both the inheritance mechanism of a trait and on the underlying biological mechanisms. Evidence of epistatic interactions has been reported in both humans and other organisms. Unlike single-locus genome-wide association studies (GWAS), which proved efficient in detecting numerous genetic loci related with various traits, interaction-based GWAS have so far produced very few reproducible discoveries. Such studies introduce a great computational and statistical burden by necessitating a large number of hypotheses to be tested including all pairs of single nucleotide polymorphisms (SNPs). Thus, many software tools have been developed for interaction-based case-control studies, some leading to reliable discoveries. For quantitative data, on the other hand, only a handful of tools exist, and the computational burden is still substantial.Entities:
Mesh:
Year: 2014 PMID: 24931983 PMCID: PMC4229902 DOI: 10.1093/bioinformatics/btu261
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The new test-statistic: (a) GLR versus τ2. Data were generated with n = 5000, MAF ; marginal and epistatic effects were sampled uniformly from the range (0, 1); r2 = 0.99. (b) r2 of the linear correlation between and τ2, as a function of n: τ2 is highly correlated with the original test statistic for all tested sample sizes. (c) τ2 distribution is proportional to the chi-square distribution with 1 degree of freedom. Passed a chi-square goodness-of-fit test with P-value of 0.396
Fig. 2.(a) An illustration of the ab′ score distribution for interacting pairs (blue) and non-interacting pairs (gray): interacting pairs have a higher probability of passing a threshold t during the filtering stage. (b) Variance of ab′ as a function of MAFs, for an interacting pair with corrected P-value of 0.05
Runtime of the C++ implementation of EPIQ, compared with other programs available
| Tool | Computational method | Statistical test | Cores | Runtime |
|---|---|---|---|---|
| PLINK ( | Exhaustive search | OLS | 1 | ∼10 years |
| FastEpistasis ( | Exhaustive search | OLS | 8 | 381 h |
| EpiGPU ( | Exhaustive search | – | 9.3–90 h | |
| EpiGPUHSIC ( | Exhaustive search | HSIC | – | 194 h |
| EPIQ ( | Random projections | OLS on binary SNPs | 8 | 3.2 h |
EPIQ was run with the parameter success rate set to 80%, therefore runtime is compared against testing 80% of the pairs in the exhaustive search algorithms (n = 1000, m = 106).
aTimes were extrapolated according to a test of 1000 SNPs performed on the same 2.5 GHz processor, scaling linearly with the number of SNP pairs.
bTimes were extrapolated according to self-reported performance.
cRuntime varies with the chosen GPU.
Fig. 3.Runtime of EPIQ for different settings: (a) runtime for various numbers of SNP pairs, n = 1000; EPIQ scales linearly with the number of pairs. (b) Runtime of EPIQ for different power thresholds; nearly 100% power can be achieved in a matter of hours (n = 1000, m = 106)
Fig. 4.Evaluating the power of EPIQ: (a–c) The power of the different algorithms: EPIQ, all-pairs search and the two-stage search, to discover the true interacting SNP pair. P and P are and , respectively. Under all settings, the relative power of EPIQ compared with the exhaustive search exceeds the requested success rate of 80%. (d) The power of EPIQ compared with the full linear model used by PLINK: Each dot represents a distinct model of interaction from Li and Reich (2000): the x axis is the model number, the y axis is the average power of EPIQ minus the power of PLINK
Fig. 5.Results on the LURIC dataset. (a and b) Manhattan plots of 100 SNPs up- and down-stream of rs436969, rs9385393. The epistasis option of PLINK was used to test for interactions in all 40 401 pairs and the smallest P-value for each SNP was recorded. Note that the P-value for the top scoring pair is slightly higher than the one calculated by EPIQ, as EPIQ was run on the binary representation of the SNPs. (c) A QQ-plot of the P-values distribution shows a negligible inflation. P-values were calculated for a sample of 10 000 SNP pairs