| Literature DB >> 21087514 |
Alison A Motsinger-Reif1, Sushamna Deodhar, Stacey J Winham, Nicholas E Hardison.
Abstract
BACKGROUND: A fundamental goal of human genetics is the discovery of polymorphisms that predict common, complex diseases. It is hypothesized that complex diseases are due to a myriad of factors including environmental exposures and complex genetic risk models, including gene-gene interactions. Such epistatic models present an important analytical challenge, requiring that methods perform not only statistical modeling, but also variable selection to generate testable genetic model hypotheses. This challenge is amplified by recent advances in genotyping technology, as the number of potential predictor variables is rapidly increasing.Entities:
Year: 2010 PMID: 21087514 PMCID: PMC3000379 DOI: 10.1186/1756-0381-3-8
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Figure 1An example of a decision tree generated by GEDT. The corresponding parse string for this tree is also shown, which is obtained by using the mapping process. Here, decision nodes V1, V2 and V3 correspond to the SNP attributes of the data. Case and control values are represented as classes '+' and '-', respectively.
Figure 2The GEDT Algorithm. An overview of the GEDT process that shows the six-step process of initialization, cross-validation, training, fitness evaluation using balanced error, natural selection (tournament) and testing - evaluating prediction error. The steps are as described in the Algorithm section.
Penetrance patterns for 2-locus epistatic models
| Model | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| y | x | y | x | x | x | x | y | y | |
| x | z | x | x | y | y | x | x | y | |
| y | x | y | x | y | y | y | y | x | |
Cells marked "x" represent genotype combinations with lower risk. The values "x," "y," and "z" represent penetrance values with 0 < x < y ≤ z < 1 which were chosen to achieve the desired heritability. For XOR models with MAF = 0.5, z = y; for XOR models with MAF = 0.25, z > y to achieve no marginal effects at either locus.
Power Results for Simulated Models, n = 250
| Model Number | Heritability (%) | Minor Allele Frequency | Genetic Model | Power (%) | Power (Lib %) | ||
|---|---|---|---|---|---|---|---|
| GEDT | Random Search | GEDT | Random Search | ||||
| 1 | 1 | 0.25 | XOR | 0 | 0 | 1 | 1 |
| 2 | 1 | 0.5 | XOR | 0 | 0 | 2 | 1 |
| 3 | 2.5 | 0.25 | XOR | 0 | 0 | 1 | 0 |
| 4 | 2.5 | 0.5 | XOR | 0 | 0 | 2 | 0 |
| 5 | 5 | 0.25 | XOR | 1 | 0 | 2 | 0 |
| 6 | 5 | 0.5 | XOR | 1 | 1 | 2 | 0 |
| 7 | 7.5 | 0.25 | XOR | 3 | 0 | 4 | 0 |
| 8 | 7.5 | 0.5 | XOR | 2 | 0 | 6 | 0 |
| 9 | 10 | 0.25 | XOR | 0 | 0 | 1 | 1 |
| 10 | 10 | 0.5 | XOR | 4 | 0 | 7 | 1 |
| 11 | 1 | 0.25 | Box | 2 | 2 | 28 | 0 |
| 12 | 1 | 0.5 | Box | 5 | 1 | 24 | 0 |
| 13 | 2.5 | 0.25 | Box | 13 | 0 | 59 | 0 |
| 14 | 2.5 | 0.5 | Box | 16 | 0 | 69 | 1 |
| 15 | 5 | 0.25 | Box | 57 | 0 | 90 | 2 |
| 16 | 5 | 0.5 | Box | 35 | 0 | 82 | 0 |
| 17 | 7.5 | 0.25 | Box | 72 | 0 | 95 | 1 |
| 18 | 7.5 | 0.5 | Box | 53 | 0 | 93 | 4 |
| 19 | 10 | 0.25 | Box | 83 | 0 | 100 | 2 |
| 20 | 10 | 0.5 | Box | 69 | 1 | 95 | 1 |
| 21 | 1 | 0.25 | Mod | 1 | 1 | 15 | 3 |
| 22 | 1 | 0.5 | Mod | 1 | 0 | 9 | 4 |
| 23 | 2.5 | 0.25 | Mod | 7 | 0 | 49 | 8 |
| 24 | 2.5 | 0.5 | Mod | 6 | 0 | 2 | 12 |
| 25 | 5 | 0.25 | Mod | 40 | 0 | 89 | 8 |
| 26 | 5 | 0.5 | Mod | 20 | 0 | 46 | 7 |
| 27 | 7.5 | 0.25 | Mod | 79 | 0 | 96 | 4 |
| 28 | 7.5 | 0.5 | Mod | 47 | 0 | 65 | 5 |
| 29 | 10 | 0.25 | Mod | 81 | 0 | 99 | 5 |
| 30 | 10 | 0.5 | Mod | 60 | 0 | 78 | 9 |
Summary characteristics for the models simulated are listed, including the minor allele frequency simulated, the heritability of the simulated model, and the genetic model used. Complete penetrance functions are available from the authors upon request.
Power Results for Purely Epistatic Simulated Models, n = 500
| Model Number | Heritability (%) | Minor Allele Frequency | Genetic Model | Power(%) | Power (Lib %) | ||||
|---|---|---|---|---|---|---|---|---|---|
| GEDT | Random Search | C4.5 | GEDT | Random Search | C4.5 | ||||
| 31 | 1 | 0.25 | XOR | 37 | 1 | 0 | 67 | 4 | 0 |
| 32 | 1 | 0.5 | XOR | 45 | 2 | 0 | 79 | 3 | 0 |
| 33 | 2.5 | 0.25 | XOR | 68 | 1 | 0 | 82 | 4 | 0 |
| 34 | 2.5 | 0.5 | XOR | 75 | 0 | 0 | 93 | 6 | 0 |
| 35 | 5 | 0.25 | XOR | 83 | 0 | 0 | 98 | 7 | 0 |
| 36 | 5 | 0.5 | XOR | 90 | 1 | 0 | 99 | 8 | 0 |
| 37 | 7.5 | 0.25 | XOR | 95 | 0 | 0 | 93 | 2 | 0 |
| 38 | 7.5 | 0.5 | XOR | 95 | 4 | 0 | 99 | 7 | 0 |
| 39 | 10 | 0.25 | XOR | 96 | 3 | 0 | 100 | 11 | 0 |
| 40 | 10 | 0.5 | XOR | 95 | 0 | 0 | 100 | 10 | 0 |
Summary characteristics for the models simulated are listed, including the minor allele frequency simulated, the heritability of the simulated model, and the genetic model used. Complete penetrance functions are available from the authors upon request.
Analysis Times for Larger Datasets
| Number of SNPs | Time for Analysis |
|---|---|
| 1000 | 0.15 |
| 10000 | 0.5 |
| 100000 | 6.9 |
| 500000 | 33.9 |
Times are given in hours, for a single cross-validation interval