| Literature DB >> 32435212 |
Ronald C Kessler1, Mark S Bauer2,3, Todd M Bishop4, Olga V Demler5,6, Steven K Dobscha7,8, Sarah M Gildea1, Joseph L Goulet9,10, Elizabeth Karras4, Julie Kreyenbuhl11,12, Sara J Landes13,14, Howard Liu1,4, Alex R Luedtke15,16, Patrick Mair17, William H B McAuliffe1, Matthew Nock17, Maria Petukhova1, Wilfred R Pigeon4,18, Nancy A Sampson1, Jordan W Smoller19, Lauren M Weinstock20, Robert M Bossarte4,21.
Abstract
There is a very high suicide rate in the year after psychiatric hospital discharge. Intensive postdischarge case management programs can address this problem but are not cost-effective for all patients. This issue can be addressed by developing a risk model to predict which inpatients might need such a program. We developed such a model for the 391,018 short-term psychiatric hospital admissions of US veterans in Veterans Health Administration (VHA) hospitals 2010-2013. Records were linked with the National Death Index to determine suicide within 12 months of hospital discharge (n=771). The Super Learner ensemble machine learning method was used to predict these suicides for time horizon between 1 week and 12 months after discharge in a 70% training sample. Accuracy was validated in the remaining 30% holdout sample. Predictors included VHA administrative variables and small area geocode data linked to patient home addresses. The models had AUC=.79-.82 for time horizons between 1 week and 6 months and AUC=.74 for 12 months. An analysis of operating characteristics showed that 22.4%-32.2% of patients who died by suicide would have been reached if intensive case management was provided to the 5% of patients with highest predicted suicide risk. Positive predictive value (PPV) at this higher threshold ranged from 1.2% over 12 months to 3.8% per case manager year over 1 week. Focusing on the low end of the risk spectrum, the 40% of patients classified as having lowest risk account for 0%-9.7% of suicides across time horizons. Variable importance analysis shows that 51.1% of model performance is due to psychopathological risk factors accounted, 26.2% to social determinants of health, 14.8% to prior history of suicidal behaviors, and 6.6% to physical disorders. The paper closes with a discussion of next steps in refining the model and prospects for developing a parallel precision treatment model.Entities:
Keywords: intensive case management; machine learning; predictive analytics; suicide; super learner
Year: 2020 PMID: 32435212 PMCID: PMC7219514 DOI: 10.3389/fpsyt.2020.00390
Source DB: PubMed Journal: Front Psychiatry ISSN: 1664-0640 Impact factor: 4.157
Overview of the algorithms used in the Super Learner ensemble.
| Algorithm | R package | Description |
|---|---|---|
| Logistic regression | stats |
Traditional parametric logistic regression Prone to overfit if independent variables are highly collinear Optimal functional form of independent variables unknown (e.g., linear versus nonlinear) |
| Elastic net regularization ( | glmnet |
Penalized regression reduces overfit due to collinear independent variables Ridge regression shrinks coefficients for collinear independent variables toward zero, but does not fully-eliminate any independent variable Elastic net regression allows various penalties where coefficients for collinear independent variables are shrunk toward zero (but not eliminating contributions to the predicted probability) and/or to zero (eliminating their contributions to the predicted probability) Mixing parameter penalty (alpha) is set somewhere between .01 and .99 Least Absolute Shrinkage and Selection Operator (LASSO) regression shrinks coefficients for collinear covariate coefficients to zero, eliminating their contributions to the predicted probability |
| Random forest decision trees ( | ranger |
Decision tree methods capture interactions and non-linear associations Independent variables are partitioned (based on values) and stacked to build decision trees and ensemble an aggregate “forest” Random forest builds numerous trees in bootstrapped samples and generates an aggregate tree by averaging across trees (reducing overfit) Suitable for large data sets, but may be unstable and overfitting |
| Bayesian additive regression trees ( | bartMachine |
Bayesian trees are based on an underlying probability model (priors) for the structure and likelihood for data in terminal nodes The aggregate tree is generated by averaging across tree posteriors (reducing overfit) |
| Extreme gradient boosting ( | xgboost |
Extreme gradient boosting decision tree algorithm Final predictions are formulated by models sequentially built (using gradient descent algorithm to minimize loss) to resolve residual error made by existing models |
| Support vector machines ( | ksvm |
Support vector machines treats independent variables as dimensions in high dimensional space and attempts to identify the best hyperplane to separate the sample into classes (e.g., cases and noncases) Goal is to find the hyperplane with the maximum margin between the two closest points in space Captures linear associations, but alternate kernels can be used to capture nonlinearities (polynomial and radial basis kernels were used here) |
| Linear kernel | ||
| Polynomial kernel | ||
| Radial kernel | ||
| Neural networks ( | nnet |
Connections between predictors and the outcome are modeled as a network Predictors affect the outcome through intermediate layers |
|
Weights are assigned to connections Capture interactions and non linear associations Low interpretability |
Figure 1Monthly suicide hazard rates and cumulative incidence rates over the 12 months after psychiatric hospital discharge in (A) the training sample (January 1, 2010–October 22, 2012) and (B) holdout sample (October 23, 2012–December 31, 2013).
Area under the receiver operating characteristic curve (AUC) of the super learner model’s developer in the training sample for each time horizon to predict suicides in the holdout sample over each of the five time horizons.
| Time horizon for prediction in the holdout sample | |||||
|---|---|---|---|---|---|
| 1 week | 1 month | 3 months | 6 months | 12 months | |
| 1-week | .67 | .63 | .61 | .62 | .60 |
| 1-month | .71 | .68 | .70 | .70 | .67 |
| 3-month | .77 | .76 | .73 | .72 | .69 |
| 6-month | .75 | .77 | .73 | .74 | .71 |
| 12-month | .79 | .82 | .78 | .80 | .74 |
Figure 2Receiver operating characteristic (ROC) curve for the best Super Learner model (to predict suicides within 12 months of hospital discharge) applied in the holdout sample to predict suicides over each of the five time horizons.
Operating characteristics at a range of thresholds of the best Super Learner model (developed to predict suicides within 12 months of hospital discharge) applied in the holdout sample to predict suicides over each of the five time horizons.
| Threshold P | SN | SP | PPV | Adjusted PPV | ||||
|---|---|---|---|---|---|---|---|---|
| % | (SE) | % | (SE) | S/100k | (SE) | % | (SE) | |
| .05 | 24.1 | (0.1) | 94.9 | (0.1) | 0.12 | (0.01) | 6.1 | (0.5) |
| .10 | 44.8 | (0.2) | 89.9 | (0.1) | 0.11 | (0.01) | 5.7 | (0.5) |
| .20 | 55.2 | (0.2) | 79.9 | (0.1) | 0.07 | (0.01) | 3.5 | (0.4) |
| .60 | 100.0 | (0.0) | 39.9 | (0.1) | 0.04 | (0.01) | 2.1 | (0.3) |
| .05 | 32.2 | (0.1) | 94.9 | (0.1) | 0.3 | (0.0) | 3.8 | (0.2) |
| .10 | 52.5 | (0.2) | 89.9 | (0.1) | 0.3 | (0.0) | 3.1 | (0.2) |
| .20 | 66.1 | (0.1) | 79.9 | (0.1) | 0.2 | (0.0) | 2.0 | (0.1) |
| .60 | 98.3 | (0.0) | 39.9 | (0.1) | 0.1 | (0.0) | 1.0 | (0.1) |
| .05 | 26.9 | (0.1) | 94.9 | (0.1) | 0.5 | (0.0) | 2.1 | (0.1) |
| .10 | 46.2 | (0.2) | 89.9 | (0.1) | 0.5 | (0.0) | 1.9. | (0.1) |
| .20 | 63.0 | (0.1) | 79.9 | (0.1) | 0.3 | (0.0) | 1.3 | (0.1) |
| .60 | 95.8 | (0.1) | 39.9 | (0.1) | 0.2 | (0.0) | 0.6 | (0.0) |
| .05 | 26.4 | (0.1) | 94.9 | (0.1) | 0.9 | (0.0) | 1.8 | (0.1) |
| .10 | 40.4 | (0.1) | 89.9 | (0.1) | 0.7 | (0.0) | 1.4 | (0.0) |
| .20 | 61.5 | (0.1) | 79.9 | (0.1) | 0.5 | (0.0) | 1.1 | (0.0) |
| .60 | 93.8 | (0.1) | 39.9 | (0.1) | 0.3 | (0.0) | 0.6 | (0.0) |
| .05 | 22.4 | (0.1) | 94.9 | (0.1) | 1.2 | (0.0) | 1.2 | (0.0) |
| .10 | 35.0 | (0.1) | 90.0 | (0.1) | 1.0 | (0.0) | 1.0 | (0.0) |
| .20 | 55.3 | (0.2) | 80.0 | (0.1) | 0.8 | (0.0) | 0.8 | (0.0) |
| .60 | 90.3 | (0.1) | 40.0 | (0.1) | 0.4 | (0.0) | 0.4 | (0.0) |
Threshold P, the proportion of patients classified at above the clinical threshold based on their predicted probabilities of suicide; SN, Sensitivity; SP, specificity; PPV, positive predictive value stated in terms of suicides expected per 100 patients over the time horizon; S, suicides; Adjusted PPV, PPV adjusted for the length of the time horizon to reflect the expected ratio of suicides to number of person-years of intervention over that time horizon; SE, standard error.
Predictor variable importance1 overall by category and for the predictors in the top 10, 11–25, and 26–50 in the best Super Learner model (to predict suicides within 12 months of hospital discharge)2.
| 4 in the top 10: 1Y suicide attempt (D); intake suicide attempt (D); 30D and 2Y suicide attempt (D) |
| 3 in the top 10: LT psychiatric hospitalizations (C); LT outpx cocaine dependence (D); LT outpx drug dependence (C) |
| |
| 2 in the top 10: Non-Hispanic Black (D); Age (C) |
| |
| 1 in the top 11–25: 2Y housing problem (C) |
| |
| 1 in the top 10: BG % Non-Hispanic White x Px Non-Hispanic White (C) |
| 1 in the top 11–25: LT pain diagnosis (D) |
| 0 in the top 50 |
| 0 in the top 50 |
ICD-9-CM, International Classification of Diseases, Ninth Revision, Clinical Modification; FDA, US Food and Drug Administration.
1Variable importance was defined in terms of the Extreme Gradient Boosting (XGBoost) “gain” measure (77). We did not consider time since hospital discharge, which was category (iv) in our list of predictor categories, because the model was designed to predict suicide at any time in the 12 months after hospital discharge.
2Geocode: BG = The small-area geocode variable was defined over the Block Group of the patient’s residence; County = The small-area geocode variable was defined over the County of the patient’s residence; Variable metric: D = A yes-no dichotomous variable; C = A continuous count variable; Q = A truncated continuous variable transformed to quintiles of the count distribution; Time: 30D = The predictor was assessed over the 30 days before hospital admission; 1Y = The predictor was assessed over the 1 year (365 days) before hospital admission; 2Y = The predictor was assessed over the 2 years (730 days) before hospital admission; 3Y = The predictor was assessed over the 3 years (1,095 days) before hospital admission; LT = The predictor was assessed over the lifetime of the patient’s contact with the VHA system beginning January 1, 2000; Treatment sector: Intake = At the time of the focal hospitalization; ED = Emergency department; Inpx = Psychiatric inpatient; Outpx = Any outpatient treatment; Specialty outpx = Outpatient treatment by a mental health treatment provider; PCP outpx = Outpatient treatment by anyone other than a mental health treatment provider; No mention of treatment sector = Aggregation of the diagnosis or treatment across all sectors.
3High-risk physical disorders were defined based on Ahmedani et al. (32) as asthma, back pain, cancer (esophageal; head and neck; Hodgkin lymphoma; lung; mesothelioma; pancreatic; prostate; stomach; testicular), congestive heart failure, chronic obstructive pulmonary disease, diabetes mellitus, epilepsy, fibromyalgia, heart disease, HIV/AIDS, hypertension, migraine, multiple sclerosis, osteoporosis, Parkinson’s disease, psychogenic pain, renal disease, sleep disorders, and traumatic brain injury.