| Literature DB >> 20018065 |
Abstract
Genetic analysis of complex diseases demands novel analytical methods to interpret data collected on thousands of variables by genome-wide association studies. The complexity of such analysis is multiplied when one has to consider interaction effects, be they among the genetic variations (G x G) or with environment risk factors (G x E). Several statistical learning methods seem quite promising in this context. Herein we consider applications of two such methods, random forest and Bayesian networks, to the simulated dataset for Genetic Analysis Workshop 16 Problem 3. Our evaluation study showed that an iterative search based on the random forest approach has the potential in selecting important variables, while Bayesian networks can capture some of the underlying causal relationships.Entities:
Year: 2009 PMID: 20018065 PMCID: PMC2795972 DOI: 10.1186/1753-6561-3-s7-s70
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Figure 1Rank of risk SNPs in random forest as noise level increases. The five risk SNPs (τ1-τ5) for CAC were tested with 3 environment factors and different numbers of noise SNPs (see text). At each level of noises, the test was repeated 100 times. We define relative rank of a variable to be the rank of variable importance normalized by the total number of predictors. Lower value indicates the variable is easier to be detected by random forest. The plot shows the quantiles of relative rank for the five risk SNPs. The 3 kinds of curves represent 50th, 30th and 10th quantiles, respectively (marked as "50%", "30%" and "10%" in the plot).
Bootstrapping results of detected edges from predictors to phenotypes
| Scenarios | ||||||||
|---|---|---|---|---|---|---|---|---|
| Original | + 7 LD free SNPs | + 7 SNPs with LD | + 50 random SNPs | |||||
| Predictors and SNPs | MIevent2 | cac2 | MIevent2 | cac2 | MIevent2 | cac2 | MIevent2 | cac2 |
| cac2 | 0.000 | 0.000 | 0.000 | 0.000 | ||||
| smoke2 | ||||||||
| age2 | 0.196 | 0.181 | ||||||
| chol2 | 0.173 | 0.176 | 0.174 | 0.122 | ||||
| hdl2 | 0.162 | 0.124 | ||||||
| sex | 0.183 | 0.092 | 0.099 | |||||
| τ1 | 0.146 | 0.126 | 0.144 | 0.122 | 0.063 | 0.067 | ||
| τ2 | 0.171 | 0.154 | 0.175 | 0.156 | 0.102 | 0.059 | ||
| τ3 | 0.192 | 0.136 | 0.134 | 0.113 | 0.135 | 0.090 | 0.085 | |
| τ4 | 0.179 | 0.185 | 0.179 | 0.057 | 0.084 | |||
| τ5 | 0.195 | 0.168 | 0.107 | 0.084 | ||||
| τ6 | 0.169 | 0.106 | 0.096 | 0.099 | 0.076 | |||
| τ7 | 0.158 | 0.182 | 0.136 | 0.133 | 0.110 | 0.063 | ||
| n1 | 0.067 | 0.063 | 0.079 | 0.106 | 0.040 | 0.036 | ||
| n2 | 0.083 | 0.093 | 0.005 | 0.005 | 0.012 | 0.010 | ||
| n3 | 0.041 | 0.051 | 0.127 | 0.097 | 0.013 | 0.016 | ||
| n4 | 0.079 | 0.064 | 0.015 | 0.007 | 0.033 | 0.042 | ||
| n5 | 0.064 | 0.065 | 0.068 | 0.061 | 0.007 | 0.015 | ||
| n6 | 0.040 | 0.034 | 0.113 | 0.143 | 0.022 | 0.032 | ||
| n7 | 0.032 | 0.032 | 0.022 | 0.019 | 0.028 | 0.018 | ||
aBold font indicates relationship found in >30% of the 200 replicates by bootstrapping.
bItalic, underlined font indicates relationship found in >20% (≤ 30%) of the 200 replicates by bootstrapping.