| Literature DB >> 20018062 |
Rui Tang1, Jason P Sinnwell, Jia Li, David N Rider, Mariza de Andrade, Joanna M Biernacka.
Abstract
Random forest (RF) analysis of genetic data does not require specification of the mode of inheritance, and provides measures of variable importance that incorporate interaction effects. In this paper we describe RF-based approaches for assessment of gene and haplotype importance, and apply these approaches to a subset of the North American Rheumatoid Arthritis Consortium case-control data provided by Genetic Analysis Workshop 16. The RF analyses of 37 genes identified many of the same genes as logistic regression, but also suggested importance of certain single-nucleotide polymorphism and genes that were not ranked highly by logistic regression. A new permutation method did not reveal strong evidence of gene-gene interaction effects in these data. Although RFs are a promising approach for genetic data analysis, extensions beyond simple single-nucleotide polymorphism analyses and modifications to improve computational feasibility are needed.Entities:
Year: 2009 PMID: 20018062 PMCID: PMC2795969 DOI: 10.1186/1753-6561-3-s7-s68
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Top ten ranked SNPs based on logistic regression p-values and RF VI (MDA)
| SNP (gene) | |
|---|---|
| rs2074488 ( | rs2249742 ( |
| rs9461680 ( | rs2523619 ( |
| rs2476601 ( | rs833069 ( |
| rs2523619 ( | rs2501787 ( |
| rs3093662 ( | rs10116271 ( |
| rs3761847 ( | rs2596503 ( |
| rs2156875 ( | rs12685344 ( |
| rs2395471 ( | rs2395471 ( |
| rs13207315 ( | rs3093662 ( |
| rs7026551 ( | rs2596501 ( |
Top five ranked genes based on alternative RF approaches
| Original RF | Haplotype RF | ||
|---|---|---|---|
| max VI | mean VI | max VI | mean VI |
Figure 1Application of the gene-permutation method to investigate SNP and interaction importance in simulated data. Labels along the x-axis identify the permuted SNP. Darker shades of green represent a bigger DVI. The first column of each plot shows the changes in variable importance of all SNPs after permuting SNP1 (DVI1), the second column shows the change in importance after permuting SNP2 (DVI2), etc. Thus, the diagonal shows for k = g, while off diagonal k ≠ g. A, SNP1 and SNP2 have marginal effects but no interaction effect. B, SNPs 1 and 2 interact to influence the probability of disease.