| Literature DB >> 28394905 |
Laura Acion1,2, Diana Kelmansky1, Mark van der Laan3, Ethan Sahker2,4, DeShauna Jones2, Stephan Arndt2,5,6.
Abstract
There are several methods for building prediction models. The wealth of currently available modeling techniques usually forces the researcher to judge, a priori, what will likely be the best method. Super learning (SL) is a methodology that facilitates this decision by combining all identified prediction algorithms pertinent for a particular prediction problem. SL generates a final model that is at least as good as any of the other models considered for predicting the outcome. The overarching aim of this work is to introduce SL to analysts and practitioners. This work compares the performance of logistic regression, penalized regression, random forests, deep learning neural networks, and SL to predict successful substance use disorders (SUD) treatment. A nationwide database including 99,013 SUD treatment patients was used. All algorithms were evaluated using the area under the receiver operating characteristic curve (AUC) in a test sample that was not included in the training sample used to fit the prediction models. AUC for the models ranged between 0.793 and 0.820. SL was superior to all but one of the algorithms compared. An explanation of SL steps is provided. SL is the first step in targeted learning, an analytic framework that yields double robust effect estimation and inference with fewer assumptions than the usual parametric methods. Different aspects of SL depending on the context, its function within the targeted learning framework, and the benefits of this methodology in the addiction field are discussed.Entities:
Mesh:
Year: 2017 PMID: 28394905 PMCID: PMC5386258 DOI: 10.1371/journal.pone.0175383
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Analytic work flow.
Sample characteristics (N = 99,013).
| Total N (%) | ||
|---|---|---|
| Yes | 44,748 (45.2%) | |
| No | 54,265 (54.8%) | |
| Male | 77,123 (77.9%) | |
| Female | 21,890 (22.1%) | |
| Puerto Rican | 31,047 (31.4%) | |
| Mexican | 25,190 (25.4%) | |
| Cuban | 2,683 (2.7%) | |
| Other/Unspecified | 40,093 (40.5%) | |
| 18–20 | 11,479 (11.6%) | |
| 21–24 | 16,886 (17.1%) | |
| 25–29 | 19,625 (19.8%) | |
| 30–34 | 15,121 (15.3%) | |
| 35–39 | 11,725 (11.8%) | |
| 40–44 | 9,436 (9.5%) | |
| 45–49 | 6,988 (7.1%) | |
| 50–54 | 4,172 (4.2%) | |
| 55+ | 3,581 (3.6%) | |
| <9 | 17,170 (17.3%) | |
| 9–11 | 31,507 (31.8%) | |
| 12 | 34,329 (34.7%) | |
| 13–15 | 13,062 (13.2%) | |
| 16+ | 2,945 (3.0%) | |
| Full Time | 34,586 (34.9%) | |
| Part Time | 10,392 (10.5%) | |
| Unemployed | 29,635 (29.9%) | |
| Not in Labor Force | 24,400 (24.6%) | |
| Alcohol | 50,782 (51.3%) | |
| Marijuana | 26,269 (26.5%) | |
| Cocaine | 8,554 (8.6%) | |
| Non-Prescription Opiates | 7,791 (7.9%) | |
| Methamphetamine | 2,312 (2.3%) | |
| Prescription Opiates and Synthetics | 2,145 (2.2%) | |
| Hallucinogens | 293 (0.3%) | |
| Other Sedatives | 320 (0.3%) | |
| Other Stimulants | 234 (0.2%) | |
| Other | 313 (0.3%) | |
| Not in the past month | 41,529 (41.9%) | |
| 1–3 times past month | 20,766 (21.0%) | |
| 1–2 times past week | 11,770 (11.9%) | |
| 3–6 times past week | 8,272 (8.4%) | |
| Daily | 16,676 (16.8%) | |
| <10 | 4,825 (4.9%) | |
| 12–14 | 17,577 (17.8%) | |
| 15–17 | 31,306 (31.6%) | |
| 18–20 | 22,949 (23.2%) | |
| 21–24 | 10,895 (11.0%) | |
| 25–29 | 5,768 (5.8%) | |
| 30–34 | 2,535 (2.6%) | |
| 35–39 | 1,573 (1.6%) | |
| 40–44 | 816 (0.8%) | |
| 45–49 | 435 (0.4%) | |
| 50–54 | 211 (0.2%) | |
| 55+ | 123 (0.1%) | |
| Alcohol Only | 34,827 (35.2%) | |
| Other Drugs Only | 29,887 (30.2%) | |
| Alcohol and Drugs | 34,299 (34.6%) | |
| Self | 16,910 (17.1%) | |
| Alcohol/Drug Abuse Care Provider | 3,655 (3.7%) | |
| Other Health Care Provider | 4,013 (4.1%) | |
| School | 341 (0.3%) | |
| Employer | 1,350 (1.4%) | |
| Other Community Referral | 15,503 (15.7%) | |
| Criminal Justice Referral | 57,241 (57.8%) | |
| 1–30 | 19,942 (20.1%) | |
| 31–60 | 15,296 (15.4%) | |
| 61–90 | 12,476 (12.6%) | |
| 91–120 | 12,397 (12.5%) | |
| 121 or longer | 38,902 (39.3%) |
AUC in the test set (N = 19,802) for each algorithm and algorithm parametrization used.
| Model | AUC | |
|---|---|---|
| Super Learning | 0.820 | 0.165 |
| Random Forests All Predictors | 0.816 | 0.173 |
| Lasso All Predictors + 2-Way Interactions | 0.805 | 0.185 |
| Lasso All Predictors | 0.805 | 0.185 |
| Elastic Net All Predictors | 0.805 | 0.185 |
| Logistic Regression All Predictors | 0.805 | 0.185 |
| Ridge Regression All Predictors | 0.805 | 0.185 |
| ANN Top 10 Predictors | 0.805 | 0.185 |
| Elastic Net All Predictors + 2-Way Interactions | 0.804 | 0.186 |
| ANN All Predictors | 0.803 | 0.186 |
| Lasso Top 10 Predictors | 0.801 | 0.189 |
| Elastic Net Top 10 Predictors | 0.801 | 0.189 |
| Ridge Regression Top 10 Predictors | 0.801 | 0.189 |
| Logistic Regression Top 10 Predictors | 0.801 | 0.189 |
| Random Forests Top 10 Predictors | 0.797 | 0.191 |
| Ridge Regr. All Predictors + 2-Way Interactions | 0.793 | 0.197 |
| Logistic Regr. All Predictors + 2-Way Interactions | 0.793 | 0.197 |
Fig 295% confidence intervals for AUC of each model compared.