| Literature DB >> 21685073 |
Tony Kam-Thong1, Benno Pütz, Nazanin Karbalai, Bertram Müller-Myhsok, Karsten Borgwardt.
Abstract
MOTIVATION: In recent years, numerous genome-wide association studies have been conducted to identify genetic makeup that explains phenotypic differences observed in human population. Analytical tests on single loci are readily available and embedded in common genome analysis software toolset. The search for significant epistasis (gene-gene interactions) still poses as a computational challenge for modern day computing systems, due to the large number of hypotheses that have to be tested.Entities:
Mesh:
Year: 2011 PMID: 21685073 PMCID: PMC3117340 DOI: 10.1093/bioinformatics/btr218
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.−log10 Linear regression P-values versus the HSIC for 50 SNPs (1225 pairs) — r2=0.9764.
Fig. 2.GPU Runtime versus the number of SNP pairs in Simulation Data.
Results from the full HSIC model
| SNPs | Pairs | Runtime [s] | Interactions [1/s] | Speedup factor versus Single CPU |
|---|---|---|---|---|
| 4000 | 7 998 000 | 111.79 | 71 546.78 | 89.93 |
| 3500 | 6 123 250 | 86.27 | 70 976.10 | 88.72 |
| 3000 | 4 498 500 | 63.86 | 70 444.26 | 88.06 |
| 2500 | 3 123 750 | 44.73 | 69 834.12 | 87.29 |
| 2000 | 2 000 000 | 27.16 | 73 600.88 | 92.00 |
| 1500 | 1 124 250 | 16.86 | 66 669.63 | 83.34 |
| 1000 | 499 500 | 7.87 | 63 476.93 | 79.35 |
Hamilton Rating Scale—Data and performance summary
| HSIC runtime | F |
|---|---|
| GPU [min] | 48 CPUs [min] |
| 2 408.92 | 3 440.30 |
| (~40 h) | (~57 h) |
Checking 536 750 SNPs (1.44×1011 pairs) in 491 subjects. 1 137 450 interactions below a threshold of P<10−5 are found.
Fig. 3.Overall fit done on the top one million matching pairs. −log10 Linear regression interaction model versus HSIC.
Fig. 4.Overall fit done on the top one million matching pairs. −log10 Regression interaction model versus −log10 HSIC P-values.
Fig. 5.Matching pairs capture rate across the first 1000 ranked pairs between the standard linear regression fit and the proposed HSIC method.
weakness
in the method, as HSIC neglects univariate effects in its assessment.
Fig. 6.−log10 Linear regression interaction P-values versus −log10 HSIC P-values (−log10 univariate SNP1 P-values of each pair is represented by color scale ranging from insignificant to significant from the red to blue spectrum).
Hamilton Rating Scale-data and performance summary
| Univariate | Number of pairs | Correlation Coefficient HSIC versus Lin. Reg. |
|---|---|---|
| Low–low | 934 611 | 0.94 |
| Low–medium | 176 099 | 0.83 |
| Low–high | 16 807 | 0.70 |
| Medium–medium | 8239 | 0.74 |
| Medium–high | 1471 | 0.58 |
| High–high | 79 | 0.56 |
Top ten results from Hamilton-score.
| SNP1 | SNP2 | HSIC | Linear regression | |||
|---|---|---|---|---|---|---|
| Value | SNP1 | SNP2 | Interaction | |||
| rs11580794 | rs11812623 | 7.70·10−02 | 4.42·10−10 | 0.9601 | 1.87·10−11 | |
| rs12910772 | rs2338712 | 8.22·10−02 | 1.35·10−10 | 0.3682 | 0.77860 | 2.44·10−11 |
| rs13028359 | rs2888542 | 8.24·10−02 | 7.39·10−11 | 0.2710 | 0.83590 | 2.45·10−11 |
| rs13401572 | rs6130852 | 7.87·10−02 | 3.58·10−10 | 0.1360 | 0.40590 | 2.54·10−11 |
| rs2105126 | rs1885418 | 7.90·10−02 | 2.62·10−10 | 0.2911 | 0.56040 | 2.71·10−11 |
| rs861256 | rs11864516 | 7.93·10−02 | 1.78·10−10 | 0.4486 | 0.91110 | 2.90·10−11 |
| rs6442323 | rs13186058 | 7.74·10−02 | 2.61·10−10 | 0.2621 | 0.66000 | 3.29·10−11 |
| rs6442323 | rs4958287 | 7.74·10−02 | 2.61·10−10 | 0.2621 | 0.66000 | 3.29·10−11 |
| rs6442323 | rs4958505 | 7.80·10−02 | 2.24·10−10 | 0.2621 | 0.58860 | 3.43·10−11 |
| rs7797027 | rs1031912 | 7.87·10−02 | 2.72·10−10 | 0.7797 | 0.85880 | 3.46·10−11 |
Bold P-values indicate significance at the 0.05 level.
Physical annotation of the top ten Hamilton-score results
| SNP | Position Chr [kb] | Gene | Distance [kb] |
|---|---|---|---|
| rs11580794 | 1:163 120 | PBX1 | +40 |
| rs11812623 | 10:79 860 | SNORA71 | +60 |
| rs12910772 | 15:34 960 | MEIS2 | +10 |
| rs2338712 | 22:47 210 | ||
| rs13028359 | 2:19 940 | TTC32 | +20 |
| WDR35 | +30 | ||
| rs2888542 | 2:37 760 | CDC42EP3 | −10 |
| rs13401572 | 2:157 660 | ||
| rs6130852 | 20:43 575 | SPINT3 | 0 |
| WFDC6 | +20 | ||
| SPINLW1 | +30 | ||
| WFDC8 | +40 | ||
| WFDC2 | +30 | ||
| rs2105126 | 1:80 980 | ||
| rs1885418 | 14:95 150 | TCL2 | −40 |
| rs861256 | 11:33 700 | ||
| rs11864516 | 16:725 | NARFL | 0 |
| HAGHL | +5 | ||
| CCDC78 | −10 | ||
| C16orf24 | +10 | ||
| METRN | +20 | ||
| FBXL16 | −30 | ||
| MSLN | −25 | ||
| MPFL | +30 | ||
| RPUSD1 | +50 | ||
| rs6442323 | 3:12 700 | RAF1 | −20 |
| rs13186058 | 5:151 311 | GLRA1 | −30 |
| rs6442323 | 3:12 700 | RAF1 | −20 |
| rs4958287 | 5:151 310 | GLRA1 | −30 |
| rs6442323 | 3:12 700 | RAF1 | −20 |
| rs4958505 | 5:151 325 | GLRA1 | −40 |
| rs7797027 | 7:15 455 | FLJ16327 | 0 |
| rs1031912 | 15:92 390 |
In the distance column ‘−’ indicates upstream of the gene, ‘+’ downstream of the gene, a distance of 0 means in the gene.