| Literature DB >> 31527280 |
José Marcio Luna1, Efstathios D Gennatas2, Lyle H Ungar3, Eric Eaton3, Eric S Diffenderfer4, Shane T Jensen5, Charles B Simone6, Jerome H Friedman7, Timothy D Solberg2, Gilmer Valdes8.
Abstract
The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches.Entities:
Keywords: CART; additive tree; decision tree; gradient boosting; interpretable machine learning
Year: 2019 PMID: 31527280 PMCID: PMC6778203 DOI: 10.1073/pnas.1816748116
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.A depiction of the continuum relating CART, GBS, and our AddTree. Each algorithm has been given the same 4 training instances (blue and red symbols); the symbol’s size depicts its weight when used to train the adjacent node.
Fig. 2.Frequency that an algorithm (rows) has higher average BACC compared to each one of the remaining algorithms (columns) under study across 83 binary classification tasks. Each of the learning algorithms has been tuned to maximize BACC. Ties have been ruled out for the sake of clarity.
Performance indexes for all of the learning algorithms under comparison across 83 PMLB classification tasks
| Performance measure | CART | AddTree | GBS | RF | GBM |
| Mean BACC | |||||
| SD BACC | |||||
| Mean F1 score | |||||
| SD F1-score | |||||
| Mean no. of nodes | 55.4 | 50.7 | 4,692.0 | 412,458.4 | 316,728.8 |
| SD no. of nodes | 86.2 | 84.1 | 1,614.1 | 752,654.8 | 700,981.6 |
Fig. 3.Bar chart showing the difference . AddTree exhibits significantly better BACC than CART () in 55 tasks () out of the 83 PMLB classification tasks. The one outlier task in favor of CART consists of a synthetic parity problem, which is ill-suited to be captured by any soft regression-like method such as AddTree, and it is better solved by the hard binary logic structure of CART (more details in ).
Fig. 4.Estimate of the variance of AddTree as a function of the interpretability parameter . Notice that AddTree is able to improve the variance with respect to CART. The error bars represent 1 SE.
| 1: If |
| 2: Create a new subtree root |
| 3: Compute negative gradients: |
| 4: Fit weak classifier |
| 5: Let |
| 6: |
| 7: Define the simple regressor: |
| 8: Update the current function estimation: |
| 9: Update the left and right subtree instance weights: |
| 10: If |
| 11: If |