| Literature DB >> 24723569 |
Anne-Laure Boulesteix, Silke Janitza, Alexander Hapfelmeier, Kristel Van Steen, Carolin Strobl.
Abstract
In an interesting and quite exhaustive review on Random Forests (RF) methodology in bioinformatics Touw et al. address--among other topics--the problem of the detection of interactions between variables based on RF methodology. We feel that some important statistical concepts, such as 'interaction', 'conditional dependence' or 'correlation', are sometimes employed inconsistently in the bioinformatics literature in general and in the literature on RF in particular. In this letter to the Editor, we aim to clarify some of the central statistical concepts and point out some confusing interpretations concerning RF given by Touw et al. and other authors.Entities:
Keywords: conditional inference trees; conditional variable importance; correlation; interaction; random forest; statistics
Mesh:
Year: 2014 PMID: 24723569 PMCID: PMC4364067 DOI: 10.1093/bib/bbu012
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1:Idealized tree in the presence of two predictor variables, X1 and X2 with main effects only (no interaction). The bars at the bottom of the tree denote the proportion of observations with Y = 0 and Y = 1 in the respective leaves.
Figure 2:Idealized tree in the presence of two predictor variables, X1 and X2 with interaction. (A) Different predictor variables are selected on the left and on the right. (B) Splitting stops after the first split on the right but not on the left. (C) The same predictor variable is selected on the left and on the right, but the effect is different. The bars at the bottom of the tree denote the proportion of observations with Y = 0 and Y = 1 in the respective leaves.