| Literature DB >> 32051022 |
Christoph F Kurz1, Werner Maier2, Christian Rink3.
Abstract
OBJECTIVE: Because it is impossible to know which statistical learning algorithm performs best on a prediction task, it is common to use stacking methods to ensemble individual learners into a more powerful single learner. Stacking algorithms are usually based on linear models, which may run into problems, especially when predictions are highly correlated. In this study, we develop a greedy algorithm for model stacking that overcomes this issue while still being very fast and easy to interpret. We evaluate our greedy algorithm on 7 different data sets from various biomedical disciplines and compare it to linear stacking, genetic algorithm stacking and a brute force approach in different prediction settings. We further apply this algorithm on a task to optimize the weighting of the single domains (e.g., income, education) that build the German Index of Multiple Deprivation (GIMD) to be highly correlated with mortality.Entities:
Keywords: Greedy algorithm; Machine learning; Model ensembling; Optimization
Year: 2020 PMID: 32051022 PMCID: PMC7017540 DOI: 10.1186/s13104-020-4931-7
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Fig. 1Schematic overview of both example cases for the greedy weighting algorithm. Numbers in the plots are just for illustration. a Logistic regression, Random Forest, and naive Bayes learners are combined to achieve a more accurate ensemble learner for classification. For regression tasks, Random Forest, linear regression, and support vector regression was used. b The GIMD is a weighted combination of different domains of deprivation
Fig. 2Results of the different weighting approaches for all data sets. Classification task include Random Forest (RF), naive Bayes (NB), and logistic regression (LR). Regression tasks are based on Random Forest, linear regression (LinR), and support vector regression (SVR). Stacking is based on the greedy, genetic, linear, and brute force methods
Comparison of different weighting approaches for correlation of GIMD domains with SMR
| SMR correlation | Computation time | |
|---|---|---|
| Expert | 0.578 | NA |
| Brute force | 0.616 | 23 ha |
| Greedy | 0.615 | |
| Genetic | 0.614 | |
| QP | 0.449 |
aThis computation was performed on a high-performance computer