| Literature DB >> 24933637 |
Chao-Feng Li1, Fu-Tian Luo2, Yi-Xin Zeng3, Wei-Hua Jia4.
Abstract
Determining the complex relationships between diseases, polymorphisms in human genes and environmental factors is challenging. Multifactor dimensionality reduction (MDR) has been proven to be capable of effectively detecting the statistical patterns of epistasis, although classification accuracy is required for this approach. The imbalanced dataset can cause seriously negative effects on classification accuracy. Moreover, MDR methods cannot quantitatively assess the disease risk of genotype combinations. Hence, we introduce a novel weighted risk score-based multifactor dimensionality reduction (WRSMDR) method that uses the Bayesian posterior probability of polymorphism combinations as a new quantitative measure of disease risk. First, we compared the WRSMDR to the MDR method in simulated datasets. Our results showed that the WRSMDR method had reasonable power to identify high-order gene-gene interactions, and it was more effective than MDR at detecting four-locus models. Moreover, WRSMDR reveals more information regarding the effect of genotype combination on the disease risk, and the result was easier to determine and apply than with MDR. Finally, we applied WRSMDR to a nasopharyngeal carcinoma (NPC) case-control study and identified a statistically significant high-order interaction among three polymorphisms: rs2860580, rs11865086 and rs2305806.Entities:
Mesh:
Year: 2014 PMID: 24933637 PMCID: PMC4100176 DOI: 10.3390/ijms150610724
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Power comparison of the MDR and weighted risk score-based multifactor dimensionality reduction (WRSMDR) methods in balanced datasets.
| Evaluation Indicator | Two-Locus | Three-Locus | Four-Locus | |||
|---|---|---|---|---|---|---|
| WRSMDR | MDR | WRSMDR | MDR | WRSMDR | MDR | |
| Specific Detection Rate | 0.87 | 0.83 | 0.74 | 0.83 | 0.92 | 0.46 |
| Detection Rate | 1 | 1 | 1 | 1 | 0.97 | 0.56 |
| Error Rate | 0 | 0 | 0 | 0 | 0.01 | 0.44 |
| No Detection Rate | 0 | 0 | 0 | 0 | 0.02 | 0 |
Power comparison of the MDR and WRSMDR methods in imbalanced datasets.
| Evaluation Indicator | Two-Locus | Three-Locus | Four-Locus | |||
|---|---|---|---|---|---|---|
| WRSMDR | MDR | WRSMDR | MDR | WRSMDR | MDR | |
| Specific Detection Rate | 0.96 | 0.61 | 0.57 | 0.66 | 0.94 | 0.68 |
| Detection Rate | 1 | 0.81 | 0.85 | 0.85 | 0.98 | 0.79 |
| Error Rate | 0 | 0.19 | 0.03 | 0.15 | 0.01 | 0.21 |
| No Detection Rate | 0 | 0 | 0.12 | 0 | 0.01 | 0 |
Summary of the results for applying the WRSMDR method to the nasopharyngeal carcinoma (NPC) dataset.
| Number of Locus | SNPs | Weighted Risk Score | Consistency |
|
|---|---|---|---|---|
| 2 | rs2860580-rs11865086 | 1.324 | 10 | <0.001 |
| 3 | rs2860580-rs11865086-rs2305806 * | 1.332 | 10 | <0.001 |
| 4 | rs2860580-rs11865086-rs836475-rs4976028 | 1.266 | 4 | <0.001 |
| 5 | rs2860580-rs11865086-rs836475-rs4976028-rs6488297 | 1.236 | 7 | <0.001 |
* The three-locus combination was selected as the best model by the WRSMDR method.
Summary of the disease probability estimated using Bayes’ posterior probability.
| Genotype Combination of the Three SNPs a | Disease Probability b | Fold Increase in Risk c | Weight of Genotype d |
|---|---|---|---|
| GG-CC-AG | 0.00077 | 3.07 | 0.03 |
| GG-CC-AA | 0.00045 | 1.78 | 0.03 |
| GG-AC-AA | 0.00038 | 1.51 | 0.09 |
| GG-AC-AG | 0.00037 | 1.49 | 0.11 |
| AG-CC-AG | 0.00036 | 1.43 | 0.03 |
| GG-AA-AA | 0.00034 | 1.36 | 0.08 |
| AG-AC-AA | 0.00032 | 1.29 | 0.09 |
| GG-AA-AG | 0.00031 | 1.24 | 0.08 |
| GG-AA-GG | 0.00031 | 1.23 | 0.02 |
| GG-AC-GG | 0.00027 | 1.07 | 0.03 |
| AG-CC-AA | 0.00026 | 1.03 | 0.02 |
| AG-AC-GG | 0.00019 | 0.77 | 0.03 |
| AG-AC-AG | 0.00019 | 0.77 | 0.09 |
| AG-AA-AG | 0.00017 | 0.67 | 0.07 |
| AG-AA-GG | 0.00016 | 0.66 | 0.02 |
| AG-AA-AA | 0.00016 | 0.62 | 0.08 |
| AA-AC-AG | 0.00013 | 0.52 | 0.02 |
| AA-AC-AA | 0.00010 | 0.39 | 0.02 |
| AA-AA-AG | 0.00008 | 0.33 | 0.01 |
| AA-AA-AA | 0.00006 | 0.26 | 0.01 |
a The three SNPs = rs2860580-rs11865086-rs2305806; b the disease probability is calculated by Bayes’ posterior probability formula, which represents the disease probability of an individual who carries a specific multi-locus genotype combination; c the fold increase in risk compared to the cumulative risk of NPC; d the weight of the genotype is the proportion of samples with the specific genotype combination.
Summary of results applying the MDR method to the NPC dataset.
| Number of Locus | SNPs | Prediction Error (%) | Cross-Validation Consistency |
|
|---|---|---|---|---|
| 2 | rs2860580-rs11865086 | 41.65 | 9/10 | <0.001 |
| 3 | rs2860580-rs11865086-rs2305806 * | 40.48 | 10/10 | <0.001 |
| 4 | rs2860580-rs11865086-rs2305806-rs2115485 | 41.31 | 8/10 | <0.001 |
| 5 | rs2860580-rs11865086-rs2305806 | 45.35 | 5/10 | <0.022 |
* The three-locus combination was selected as the best model by MDR.
Figure 1The procedure for the WRSMDR and MDR methods.
The parameter settings of the three models.
| Parameters | Two-Locus Model | Three-Locus Model | Four-Locus Model |
|---|---|---|---|
| Number of predictive SNPs | 2 | 3 | 4 |
| Number of non-predictive SNPs | 8 | 7 | 6 |
| Heritability | 0.05 | 0.05 | 0.05 |
| MAF of predictive SNPs | 0.2 | 0.2 | 0.2 |
| MAF of non-predictive SNPs | (0.01~0.5) | (0.01~0.5) | (0.01~0.5) |
MAF = minor allele frequency.
NK cell pathway SNPs involved in this study.
| SNP | Chr. | Locus | MA | Chi-Square Value |
|---|---|---|---|---|
| rs2860580 | 6 | HLA-A | A | 89.95 |
| rs11865086 | 16 | MAPK3 | C | 14.96 |
| rs4976028 | 5 | PIK3R1 | G | 9.98 |
| rs11150675 | 16 | LAT | A | 7.47 |
| rs6488297 | 12 | KLRC1 | A | 7.05 |
| rs941831 | 10 | ITGB1 | G | 5.88 |
| rs836475 | 7 | RAC1 | A | 4.80 |
| rs2733840 | 12 | KLRC4 | G | 3.02 |
| rs2733840 | 12 | KLRC3 | G | 3.02 |
| rs10109834 | 8 | PTK2B | C | 2.71 |
| rs2115485 | 9 | SYK | A | 2.68 |
| rs2305806 | 19 | VAV1 | G | 2.57 |
| rs7166547 | 15 | MAP2K1 | A | 2.35 |
| rs744167 | 12 | PTPN6 | A | 1.97 |
| rs7301582 | 12 | KLRC2 | A | 1.45 |
| rs3019238 | 11 | PAK1 | G | 1.23 |
| rs7645550 | 3 | PIK3CA | A | 0.76 |
| rs11214093 | 11 | IL18 | G | 0.70 |
| rs12310310 | 12 | KLRD1 | A | 0.58 |
| rs4780 | 15 | B2M | G | 0.23 |
Chr., chromosome; MA, minor allele.