| Literature DB >> 16595076 |
Alison A Motsinger1, Marylyn D Ritchie.
Abstract
The detection of gene-gene and gene-environment interactions associated with complex human disease or pharmacogenomic endpoints is a difficult challenge for human geneticists. Unlike rare, Mendelian diseases that are associated with a single gene, most common diseases are caused by the non-linear interaction of numerous genetic and environmental variables. The dimensionality involved in the evaluation of combinations of many such variables quickly diminishes the usefulness of traditional, parametric statistical methods. Multifactor dimensionality reduction (MDR) is a novel and powerful statistical tool for detecting and modelling epistasis. MDR is a non-parametric and model-free approach that has been shown to have reasonable power to detect epistasis in both theoretical and empirical studies. MDR has detected interactions in diseases such as sporadic breast cancer, multiple sclerosis and essential hypertension. As this method is more frequently applied, and was gained acceptance in the study of human disease and pharmacogenomics, it is becoming increasingly important that the implementation of the MDR approach is properly understood. As with all statistical methods, MDR is only powerful and useful when implemented correctly. Concerns regarding dataset structure, configuration parameters and the proper execution of permutation testing in reference to a particular dataset and configuration are essential to the method's effectiveness. The detection, characterisation and interpretation of gene-gene and gene-environment interactions are expected to improve the diagnosis, prevention and treatment of common human diseases. MDR can be a powerful tool in reaching these goals when used appropriately.Entities:
Mesh:
Year: 2006 PMID: 16595076 PMCID: PMC3500181 DOI: 10.1186/1479-7364-2-5-318
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Figure 1Summary of the general steps to implement the MDR method (adapted from Ritchie [9]) In step one, the data are divided into a training set and an independent testing set for cross-validation. In step two, a set of n factors is then selected from the pool of all factors. In step three, the n factors and their possible multifactor cells are represented in n-dimensional space. In step four, each multifactor cell in the n-dimensional space is labelled as high risk if the ratio of affected individuals to unaffected individuals exceeds a threshold of one, and low risk if the threshold is not exceeded. In steps five and six, the model with the best misclassification error is selected and the prediction error of the model is estimated using the independent test data. Steps one through to six are repeated for each possible cross-validation interval. Bars represent hypothetical distributions of cases (left) and controls (right) with each multifactor combination. Dark-shaded cells represent high-risk genotype combinations, whereas light-shaded cells represent low-risk genotype combinations. White cells represent genotype combinations for which no data were observed.
Figure 2Example of trend of classification error(2A) and prediction error(2B) when the number of loci in a model increases. The classification error continues to get smaller and smaller, which indicates over-fitting. The prediction error will average around 50 per cent and will drop for the best model.
Figure 3Flow chart of multifactor dimensionality reduction (MDR) procedure. The flow chart outlines the thought process that must be completed for each data analysis with MDR. The steps in analysis vary with different dataset structures and characteristics, and the flowchart guides the user through the decision-making process associated with any particular analysis.
Figure 4Flow chart of permutation testing procedure. The flow chart carries the user through the process of permutation testing step by step, for any type of dataset analysis.
Three-locus penetrance table where values in bold indicate genotype frequencies and table values indicate penetrance.
| CC | Cc | cc | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 0.07 | 0.02 | 0.01 | 0.00 | 0.08 | 0.07 | 0.04 | 0.02 | 0.00 | ||
| 0.00 | 0.07 | 0.06 | 0.09 | 0.03 | 0.08 | 0.06 | 0.07 | 0.01 | ||
| 0.05 | 0.01 | 0.08 | 0.06 | 0.01 | 0.10 | 0.10 | 0.02 | 0.05 | ||
Penetrance is probability of disease given a particular genotype combination.
| 1 | 1 | 10 | 20.00% |
| 2 | 7, 8 | 7 | 21.28% |
| 3 | 3, 5, 10 | 5 | 22.90% |
| 4 | 2, 3, 5, 10 | 4 | 26.60% |
| 5 | 1, 4, 7, 8, 9 | 4 | 29.66% |
| Final model: locus 1 -- INCORRECT MODEL | |||
| 1 | 5 | 5 | 40.80% |
| 2 | 4, 9 | 3 | 30.53% |
| 3 | 4, 5, 7 | 7 | 31.74% |
| 4 | 3, 5, 8, 10 | 4 | 36.60% |
| 5 | 1, 2, 4, 5, 9 | 5 | 40.00% |
| Final model: loci 4 and 9 -- INCORRECT MODEL | |||
| 1 | 8 | 6 | 49.50% |
| 2 | 10, 5 | 4 | 45.75% |
| 3 | 10, 5, 3 | 10 | 29.97% |
| 4 | 10, 5, 3, 2 | 7 | 33.69% |
| 5 | 10, 8, 5, 4, 3 | 6 | 41.86% |
| Final model: loci 3, 5, 10 -- CORRECT MODEL | |||
| 1 | 8 | 9 | 46.25% |
| 2 | 5, 10 | 3 | 46.75% |
| 3 | 3, 5, 10 | 10 | 25.84% |
| 4 | 3, 5, 8, 10 | 7 | 29.83% |
| 5 | 3, 5, 6, 8, 10 | 5 | 25.88% |
| Final model: loci 3, 5, 10 -- CORRECT MODEL | |||
| 1 | 6 | 5 | 60.00% |
| 2 | 5, 10 | 10 | 32.56% |
| 3 | 3, 5, 10 | 10 | 28.00% |
| 4 | 3, 5, 7, 10 | 7 | 32.53% |
| 5 | 3, 4, 5, 6, 10 | 4 | 39.95% |
| Final model: loci 3, 5, 10 -- CORRECT MODEL | |||
| 1 | 1 | 20.00% | |
| 2 | 7, 8 | 18.79% | |
| 3 | 3, 5, 10 | 18.00% | |
| 4 | 2, 3, 5, 10 | 16.80% | |
| 5 | 1, 4, 7, 8, 9 | 15.60% | |
| Final model: loci 1, 4, 7, 8, 9 -- INCORRECT MODEL | |||