| Literature DB >> 25318532 |
Abstract
BACKGROUND: While the importance of gene-gene interactions in human diseases has been well recognized, identifying them has been a great challenge, especially through association studies with millions of genetic markers and thousands of individuals. Computationally efficient and powerful tools are in great need for the identification of new gene-gene interactions in high-dimensional association studies. RESULT: We develop C++ software for genome-wide gene-gene interaction analyses (GWGGI). GWGGI utilizes tree-based algorithms to search a large number of genetic markers for a disease-associated joint association with the consideration of high-order interactions, and then uses non-parametric statistics to test the joint association. The package includes two functions, likelihood ratio Mann-Whitney (LRMW) and Tree Assembling Mann-Whitney (TAMW). We optimize the data storage and computational efficiency of the software, making it feasible to run the genome-wide analysis on a personal computer. The use of GWGGI was demonstrated by using two real data-sets with nearly 500 k genetic markers.Entities:
Mesh:
Year: 2014 PMID: 25318532 PMCID: PMC4201693 DOI: 10.1186/s12863-014-0101-z
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Figure 1A screen shot of the GWGGI software.
Characteristics of GWGGI
|
|
| |||
|---|---|---|---|---|
|
|
|
|
| |
| # of SNPs | 2184 | 2184 | 459 K | 459 K |
| # of samples | 4901 | 4901 | 4864 | 4864 |
| Memory usage | 7 Mb | 7 Mb | 731 M | 738 M |
| Loading time | <1 s | <1 s | 3 min | 3 min |
| Analysis time | 1.5 min | 3 min | 3.5 hr | 10 hr |
| Selected SNPs | 7 SNPs | 472 SNPs | 6 SNPs | 57 SNPs |
| AUC | 0.844 | 0.776 | 0.672 | 0.719 |
| P-value* | 1.12e-268 | 3.06e-138 | 1.78e-33 | 4.11e-65 |
*P-values were calculated by applying the model built from the training dataset to the testing dataset.
Comparison of GWGGI with other software on the T1D dataset
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|
| Memory usage | 7 Mb | 7 Mb | 7 Mb | 5 Mb | 56 Mb | 110 Mb |
| Time | 1.5 min | 3 min | 5.5 hr | 14 s | 2 min | 12 min |
| Model | 7 SNPs | 472 SNPs | 2 SNPs | 2 SNPs | 2 SNPs | 1674 SNPs* |
| P-value | 1.12e-268 | 3.06e-138 | 1.74e-30 | 3.31e-62 | 1.27e-20 | 1.27e-103 |
| Selected SNPs** | rs3957146 | rs9273363 | rs9272723 | rs9270986 | rs9273363 | rs9273363 |
| rs377763 | rs3957146 | rs9469220 | rs9469220 | rs9275418 | rs3957146 | |
| rs9270986 | rs9270986 | rs9275523 | ||||
| rs9273363 | rs3135377 | rs9275418 | ||||
| rs3177928 | rs9275523 | rs9469220 |
*For Random Jungle, we choose the SNPs with permutation importance scores larger than 0.
**SNPs chosen by the methods. If the number is larger than 5, we list the top 5 SNPs.