| Literature DB >> 22793366 |
Stacey J Winham1, Colin L Colby, Robert R Freimuth, Xin Wang, Mariza de Andrade, Marianne Huebner, Joanna M Biernacka.
Abstract
BACKGROUND: Identifying variants associated with complex human traits in high-dimensional data is a central goal of genome-wide association studies. However, complicated etiologies such as gene-gene interactions are ignored by the univariate analysis usually applied in these studies. Random Forests (RF) are a popular data-mining technique that can accommodate a large number of predictor variables and allow for complex models with interactions. RF analysis produces measures of variable importance that can be used to rank the predictor variables. Thus, single nucleotide polymorphism (SNP) analysis using RFs is gaining popularity as a potential filter approach that considers interactions in high-dimensional data. However, the impact of data dimensionality on the power of RF to identify interactions has not been thoroughly explored. We investigate the ability of rankings from variable importance measures to detect gene-gene interaction effects and their potential effectiveness as filters compared to p-values from univariate logistic regression, particularly as the data becomes increasingly high-dimensional.Entities:
Mesh:
Year: 2012 PMID: 22793366 PMCID: PMC3463421 DOI: 10.1186/1471-2105-13-164
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Summary of the objectives and design of simulations 1-3
| To compare RF VIMs for main and interaction effect detection. | To compare RF measures with p-values from logistic regression for main and interaction effect detection. | Examine RF performance in presence of realistic patterns of LD and MAF. | |
| Yes | Yes | No (LD) | |
| 10, 100, 500, 1000 | 10, 100, 500, 1000 | Fixed at 1000 | |
| 4 | 2 | 2 | |
| Fixed at 0.1, 0.2, 0.3, or 0.4 | Fixed at 0.3 | Varies (0.01–0.50) | |
| 5 | 3 | 4 | |
| Varying effect sizes, HX1X22 vs. HX3X42 | Two interacting SNPs with 0, 1, or 2 having main effects. | Causal SNPs chosen in blocks of strong vs. weak LD with non-causal SNPs. | |
| Phenotype is a dichotomized quantitative (normally distributed) trait. | Phenotype is based on direct penetrance functions. | Phenotypes are generated as in Simulation 1. |
Figure 1Simulation 2 penetrance functions. Penetrance functions for the two locus interactions in the three models used in Simulation 2, with corresponding total, marginal, and interaction heritabilities.
Figure 2Simulation 1 results. Probability of detection for ‘main’, ‘interacting’, and ‘null’ SNPs plotted against the number of total SNPs for select RF VIMs and logistic regression (LR). Top row shows results for the “main effects greater” Model 2; bottom row shows results for “interaction effects greater” Model 4. Results are plotted separately across MAF. Average PE estimates range between 0.430 and 0.476 ( Additional file 2 Table B3).
Figure 3Simulation 2 results. Probability of detection for SNP1 and SNP2 plotted against total number of SNPs by VIM for models with interactions and two main effects (Model 6 - left), one main effect (Model 7 - center), and no main effects (Model 8 - right). Average PE estimates range between 0.465 and 0.508 ( Additional file 2 Table B4).
Simulation 3 results, Model 3
| 1 | Strong | .294 | Causal SNP | 0.13 | 0.12 | 0.14 | 0.08 | 0.21 |
| | | | Causal Region | 0.38 | 0.3 | 0.4 | 0.29 | 0.49 |
| | Strong | .309 | Causal SNP | 0.25 | 0.18 | 0.28 | 0.26 | 0.33 |
| | | | Causal Region | 0.56 | 0.48 | 0.55 | 0.58 | 0.48 |
| 2 | Weak | .294 | Causal SNP | 0.72 | 0.58 | 0.73 | 0.78 | 0.73 |
| | Weak | .281 | Causal SNP | 0.66 | 0.56 | 0.71 | 0.79 | 0.76 |
| 3 | Strong | .294 | Causal SNP | 0.15 | 0.13 | 0.11 | 0.08 | 0.21 |
| | | | Causal Region | 0.5 | 0.46 | 0.52 | 0.39 | 0.71 |
| | Weak | .294 | Causal SNP | 0.59 | 0.52 | 0.63 | 0.78 | 0.5 |
| 4 | None | .294 | Causal SNP | 0.67 | 0.57 | 0.72 | 0.73 | 0.75 |
| None | .294 | Causal SNP | 0.68 | 0.6 | 0.67 | 0.7 | 0.76 |
Detection probability with and without LD for main effects only Model 3 for RF VIMs and logistic regression (LR). Total number of SNPs = 1,000, MAF ≈ 0.3. Average PE estimates range from 0.458 to 0.477 ( Additional file 2, Table B6).
Simulation 3 results, Model 5
| 1 | Strong | .294 | Causal SNP | 0.05 | 0.09 | 0.05 | 0.02 | 0.09 |
| | | | Causal Region | 0.2 | 0.19 | 0.21 | 0.1 | 0.3 |
| | Strong | .309 | Causal SNP | 0.2 | 0.17 | 0.15 | 0.16 | 0.14 |
| | | | Causal Region | 0.47 | 0.42 | 0.38 | 0.41 | 0.33 |
| 2 | Weak | .294 | Causal SNP | 0.43 | 0.34 | 0.49 | 0.52 | 0.4 |
| | Weak | .281 | Causal SNP | 0.35 | 0.25 | 0.35 | 0.42 | 0.29 |
| 3 | Strong | .294 | Causal SNP | 0.06 | 0.09 | 0.04 | 0.04 | 0.12 |
| | | | Causal Region | 0.32 | 0.27 | 0.27 | 0.14 | 0.41 |
| | Weak | .294 | Causal SNP | 0.51 | 0.45 | 0.4 | 0.62 | 0.3 |
| 4 | None | .294 | Causal SNP | 0.28 | 0.21 | 0.3 | 0.31 | 0.33 |
| None | .294 | Causal SNP | 0.29 | 0.27 | 0.3 | 0.31 | 0.29 |
Detection probability with and without LD for interaction effects only Model 5 for RF VIMs and logistic regression (LR). Total number of SNPs = 1,000, MAF ≈ 0.3. Average PE estimates range from 0.479 to 0.496 ( Additional file 2, Table B6).