| Literature DB >> 24236125 |
Cheng-Hong Yang1, Yu-Da Lin, Li-Yeh Chuang, Jin-Bor Chen, Hsueh-Wei Chang.
Abstract
BACKGROUND: Determining the complex relationship between diseases, polymorphisms in human genes and environmental factors is challenging. Multifactor dimensionality reduction (MDR) has proven capable of effectively detecting statistical patterns of epistasis. However, MDR has its weakness in accurately assigning multi-locus genotypes to either high-risk and low-risk groups, and does generally not provide accurate error rates when the case and control data sets are imbalanced. Consequently, results for classification error rates and odds ratios (OR) may provide surprising values in that the true positive (TP) value is often small. METHODOLOGY/PRINCIPALEntities:
Mesh:
Substances:
Year: 2013 PMID: 24236125 PMCID: PMC3827354 DOI: 10.1371/journal.pone.0079387
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
MDR pseudo-code.
| 01: divide data into 10 subsets |
| 02: |
| 03: classify |
| 04: |
| 05: |
| 06: |
| 07: determine the high/low risk groups in |
| 08: |
| 09: compute the misclassification error |
| 10: |
| 11: choose the best combination with the minimum misclassification error |
| 12: |
| 13: |
| 14: compute the prediction error of the best combination in the test data |
| 15: |
| 16: collect the best combination into |
| 17: |
| 18: compute cross-validation consistency from |
| 19: choose the best combination with the minimum prediction error |
Figure 1MDR flowchart.
Estimated effect (odds ratio and 95% CI) from individual SNPs of 23 steroid hormone metabolisms and signalling-related genes on the occurrence of breast cancer in patients.
| Methods | Best candidate model | Consistency | TP | TN | Accuracy | OR (95% CI) |
| 2-locus | ||||||
| MDR-E | 55, 64 | 100/100 | 19 | 689 | 0.54 | 5.02 (2.50–10.07) |
| MDR-ER | 40, 56 | 66/100 | 131 | 307 | 0.56 | 1.63 (1.17–2.29) |
| 3-locus | ||||||
| MDR-E | 3, 55, 64 | 96/100 | 19 | 691 | 0.54 | 5.80 (2.81–11.98) |
| MDR-ER | 21, 59, 64 | 26/100 | 111 | 393 | 0.57 | 1.71 (1.24–2.36) |
| 4-locus | ||||||
| MDR-E | 5, 17, 43, 64 | 33/100 | 22 | 689 | 0.55 | 5.91 (3.00–11.63) |
| MDR-ER | 21, 59, 64, 71 | 64/100 | 108 | 423 | 0.58 | 1.91 (1.39–2.64) |
| 5-locus | ||||||
| MDR-E | 5, 17, 34, 43, 64 | 100/100 | 28 | 686 | 0.56 | 6.47 (3.49–11.97) |
| MDR-ER | 8, 21, 31, 59, 64 | 22/100 | 80 | 538 | 0.59 | 2.29 (1.64–3.21) |
Figure 2Power analysis of the four methods in 10-fold cross-validation.
Analysis results of the difference between MDR-E and MDR-ER in 2-locus genotypes.
| SNPs | Cell frequency | MDR-E | MDR-ER | ||||||||
| Strategy | Class | TP | TN | Error rate | Strategy | Class | TP | TN | Error rate | ||
| 40, 56 | 0.5 | 0.44 | |||||||||
| 114∶342 | 0.33 | Low-risk | 0 | 342 | 1.20 | High-risk | 114 | 0 | |||
| 57∶280 | 0.20 | Low-risk | 0 | 280 | 0.74 | Low-risk | 0 | 280 | |||
| 5∶27 | 0.19 | Low-risk | 0 | 27 | 0.75 | Low-risk | 0 | 27 | |||
| 17∶55 | 0.31 | Low-risk | 0 | 55 | 1.13 | High-risk | 17 | 0 | |||
| Total | 0 | 704 | 131 | 307 | |||||||
| SNPs | |||||||||||
| 55, 64 | 0.46 | 0.46 | |||||||||
| 174∶689 | 0.25 | Low-risk | 0 | 689 | 0.92 | Low-risk | 0 | 530 | |||
| 5∶4 | 1.25 | High-risk | 5 | 0 | 4.56 | High-risk | 5 | 163 | |||
| 14∶11 | 1.27 | High-risk | 14 | 0 | 4.64 | High-risk | 14 | 0 | |||
| 0∶0 | 0 | 0 | |||||||||
| Total | 19 | 689 | 19 | 689 | |||||||
the left number represents the number of cases and the right number represents the number of controls.
Figure 3Frequency analysis of TP, TN, classification error, and numbers of high-risk and low-risk groups in 2-locus genotypes.