| Literature DB >> 35609192 |
Merle Behr1, Yu Wang1, Xiao Li1, Bin Yu1,2,3.
Abstract
Random Forests (RFs) are at the cutting edge of supervised machine learning in terms of prediction performance, especially in genomics. Iterative RFs (iRFs) use a tree ensemble from iteratively modified RFs to obtain predictive and stable nonlinear or Boolean interactions of features. They have shown great promise for Boolean biological interaction discovery that is central to advancing functional genomics and precision medicine. However, theoretical studies into how tree-based methods discover Boolean feature interactions are missing. Inspired by the thresholding behavior in many biological processes, we first introduce a discontinuous nonlinear regression model, called the “Locally Spiky Sparse” (LSS) model. Specifically, the LSS model assumes that the regression function is a linear combination of piecewise constant Boolean interaction terms. Given an RF tree ensemble, we define a quantity called “Depth-Weighted Prevalence” (DWP) for a set of signed features S±. Intuitively speaking, DWP(S±) measures how frequently features in S± appear together in an RF tree ensemble. We prove that, with high probability, DWP(S±) attains a universal upper bound that does not involve any model coefficients, if and only if S± corresponds to a union of Boolean interactions under the LSS model. Consequentially, we show that a theoretically tractable version of the iRF procedure, called LSSFind, yields consistent interaction discovery under the LSS model as the sample size goes to infinity. Finally, simulation results show that LSSFind recovers the interactions under the LSS model, even when some assumptions are violated.Entities:
Keywords: consistency; decision trees; ensemble methods; interaction selection; interpretable machine learning
Mesh:
Year: 2022 PMID: 35609192 PMCID: PMC9295780 DOI: 10.1073/pnas.2118636119
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 12.779
Fig. 1.Exemplary RF decision trees trained on data as in Eq. 9 to illustrate the results that will appear in Theorem 2. (Upper) Response surface of , as in Eq. 2, with on the x axis and on the y axis. (Lower Left) A decision tree that splits on feature X1 at the root node with the respective regions and conditional response surfaces for the left and right child of the root node. (Lower Right) A decision tree that splits on feature X2 at the root node. The red-marked decision paths contain all signed features from the basic signed interaction from an LSS model, as in Eq. 9. For both of the trees, if one starts at the root node and randomly goes left or right at every node, then the probability of the basic signed interaction to appear on the path is . In contrast, for any other set of signed features , it holds that . This provides a simple example for the more general result in Theorem 2.