| Literature DB >> 34739164 |
Jose R Zubizarreta1,2,3, John C Umhau4, Patricia A Deuster5, Lisa A Brenner6,7, Andrew J King1, Maria V Petukhova1, Nancy A Sampson1, Boris Tizenberg8, Sanjaya K Upadhyaya8, Jill A RachBeisel9, Elizabeth A Streeten10, Ronald C Kessler1, Teodor T Postolache7,8,11.
Abstract
OBJECTIVES: To illustrate the use of machine learning methods to search for heterogeneous effects of a target modifiable risk factor on suicide in observational studies. The illustration focuses on secondary analysis of a matched case-control study of vitamin D deficiency predicting subsequent suicide.Entities:
Keywords: causal forest algorithm; heterogeneity of treatment effects (HTE); lasso penalized regression; precision medicine; prescriptive predictors; suicide; super learner
Mesh:
Year: 2021 PMID: 34739164 PMCID: PMC8886287 DOI: 10.1002/mpr.1897
Source DB: PubMed Journal: Int J Methods Psychiatr Res ISSN: 1049-8931 Impact factor: 4.035
Algorithms used in the super learner ensemble
| Algorithm | Description |
|---|---|
| I. Super learner | Super learner is an ensemble machine learning approach that uses cross‐validation (CV) to select a weighted combination of predicted outcome scores across a collection of candidate algorithms (learners) to yield an optimal combination according to a pre‐specified criterion that performs at least as well as the best component algorithm. |
| II. Linear algorithms in the super learner library | |
| A. Generalized linear models | Maximum likelihood estimation with logistic link function. |
| B. Elastic Net | Elastic net is a regularization method that minimizes the problem of overlap among predictors by explicitly penalizing over‐fitting with a composite penalty λ{MPP |
| C. Adaptive splines | Adaptive spline regression flexibly captures both linear and piecewise nonlinear associations as well as interactions among these associations by connecting linear segments (splines) of varying slopes and smooths to create piece‐wise curves (basis functions). Final fit is built using a stepwise procedure that selects the optimal combination of basic functions. |
| D. Adaptive polynomial splines | Adaptive polynomial splines are like adaptive splines but differ in the order in which basis functions (e.g., linear vs. nonlinear) are added to build the final model. |
| III. Tree‐based algorithms | |
| A. Bagging | Random forest. Independent variables are partitioned (based on contiguous values) and stacked to build short decision trees that are combined (ensemble) to create an aggregate “forest”. Random forest builds numerous trees in bootstrapped samples and generates an aggregate tree by averaging across trees, thereby reducing over‐fitting. |
| B. Gradient boosting | Gradient boosting algorithms build a sequential ensemble of shallow successive regression trees that iteratively learn the residuals from prior trees. This is a flexible method, where the number of trees, interaction depth, and shrinkage are leveraged to build flexible models. |
| C. Extreme gradient boosting | A fast and efficient implementation of gradient boosting. |
| D. DBARTS | Fits Bayesian additive regression trees. |
Abbreviation: DBARTS, Discrete Bayesian Additive Regression Trees Sampler.
Each linear algorithm was estimated separately with five different lasso screeners where dfmax = 10, 15, 20, 30 and all predictors. Each tree algorithm was estimated separately with five different ranger screeners for number of predictors equal to 10, 15, 20 30 and all predictors. Hyperparameter tuning was achieved by treating different specifications of individual algorithms as separate learners in the ensemble, as detailed in the body of the table.
Hyperparameters: Default values were used unless otherwise noted.
Nonzero super learner algorithm weights in the training sample
| Feature selection | Hyperparameter Tuning | Weight | |
|---|---|---|---|
| I. Linear algorithms | |||
| Generalized linear model | All | 0.9% | |
| Elastic net | 15 |
| 2.6% |
| 20 |
| 5.5% | |
| Adaptive splines | All | degree = 1 | 20.6% |
| All | degree = 3 | 9.1% | |
| All | degree = 5 | 5.8% | |
| Adaptive polynomial splines | 10 | 33.9% | |
| 15 | 15.0% | ||
| Total linear | ‐ | ‐ | 93.4% |
| II. Tree‐based algorithms | |||
| Extreme gradient boosting | 10 | #3 | 1.3% |
| 10 | #5 | 0.1% | |
| DBARTS | 10 | 4.8% | |
| All | 0.5% | ||
| Total tree‐based | ‐ | ‐ | 6.7% |
Abbreviations: DBARTS, Discrete Bayesian Additive Regression Trees Sampler
The 3rd and 5th specifications in Table 1.
Interactions between vitamin D deficiency and the super learner estimate of composite predicted odds (developed in the training sample) in predicting subsequent suicide based on a conditional logistic regression model estimated in the test sample (n = 331 matched pairs)
| Model 1 | Model 2 | |||||
|---|---|---|---|---|---|---|
| OR | (95% CI) | χ2 1 | OR | (95% CI) | χ2 1 | |
| Main effects | ||||||
| Vitamin D deficiency | 1.4 | (0.7–2.6) | 1.1 | 2.7 | (0.8–9.0) | 2.6 |
| SL predicted odds | ||||||
| Continuous | 3.9 | (2.8–5.3) | 72.3 | ‐ | ‐ | ‐ |
| Q1 | 1.0 | |||||
| Q2 | ‐ | ‐ | ‐ | 1.1 | (0.6–2.0) | 0.2 |
| Q3 | ‐ | ‐ | ‐ | 1.9 | (1.0–3.5) | 4.3 |
| Q4 (highest) | ‐ | ‐ | ‐ | 62.2 | (18.5–209.6) | 44.5 |
| χ2 3 | ‐ | ‐ | 46.8 | |||
| Interactions | ||||||
| SL predicted odds | ||||||
| Continuous | 0.7 | (0.3–1.4) | 1.0 | ‐ | ‐ | ‐ |
| Q1 | 1.0 | |||||
| Q2 | ‐ | ‐ | ‐ | 0.5 | (0.1–2.5) | 0.6 |
| Q3 | ‐ | ‐ | ‐ | 0.4 | (0.1–2.0) | 1.1 |
| Q4 (highest) | ‐ | ‐ | ‐ | 0.2 | (0.0–2.8) | 1.4 |
| χ2 3 | ‐ | ‐ | 1.8 | |||
Abbreviations: CI, confidence interval; OR, odds ratio; SL, super learner.
Significant at the 0.05 level, two‐sided test.
Predicted odds from the SL model was standardized in the test sample.
Lasso on the training subsample of deployed matched pairs (N = 328 observations, 164 matched pairs)
| ODDS RATIO | |
|---|---|
| Intercept | 1.1 |
| Main effects | |
| Race and ethnicity | |
| Identified as Black on race and ethnicity variable | 1.0 |
| Race: American Indian/Alaskan Native | 2.1 |
| Military | |
| Rank: Officer | 0.9 |
| History of military deployment coded in MH encounter | 0.8 |
| Total months on military deployment | 1.0 |
| Mental health | |
| Alcohol use disorder not otherwise specified | 0.8 |
| Number of DOD Inpatient mental health encounters | 1.0 |
| Number of non‐Personality disorder MH diagnoses in record | 1.1 |
| Other, mixed, or unspecified drug abuse, unspecified | 0.8 |
| Number of inpatient mental health encounters | 1.1 |
| Number of mental health encounters in the 30 days preceding suicide | 1.1 |
| Any encounters for occupational therapy | 1.1 |
| Obesity, unspecified | 1.4 |
| Any mental health visits | 1.4 |
| Any Personality disorder | 1.1 |
| Biomarkers | |
| Docosapentaenoic acid (DPA; 22:5 n–6) μg/cl in serum | 1.0 |
| Stearic acid or octadecanoic acid (18:0) as Percent of total fatty acids | 0.2 |
| Standard score of stearic acid (18:0) as Percent of total fatty acids | 1.0 |
| Concentration of this Palmitoleic acid expressed as a percentage of total fatty acid concentration expressed as a Z‐score | 1.0 |
| Palmitoleic acid (16:1 n–7) as Percent of total fatty acids | |
| Activity of delta 9 desaturase (ratio of Palmitoleic acid and palmitic) | 1.0 |
| FA cluster: Risky versus protective | 2.8 |
| Dihomo‐γ‐linolenic acid (DGLA; 20:3 n–6) as Percent of total fatty acids (Percent of DGLA) | 0.9 |
| Ratio of stearic acid to palmitic acid | 0.9 |
| Magnesium μg/ml in serum (mg) | 1.0 |
| Zinc μg/ml in serum | 0.9 |
Abbreviations: DGLA, dihomo‐γ‐linolenic acid; FA, fatty acid; OR, odds ratio.
Interactions between vitamin D deficiency and four variables selected by Least Absolute Shrinkage and Selection Operator (Lasso; in the training sample) in predicting subsequent suicide based on a conditional logistic regression model estimated in the test sample (n = 662; matched pairs = 331)
| Model 1 | Model 2 | Model 3 | Model 4 | |||||
|---|---|---|---|---|---|---|---|---|
| OR | (95% CI) | OR | (95% CI) | OR | (95% CI) | OR | (95% CI) | |
| FA cluster: Risky versus protective | 1.0 | (0.3–3.1) | 0.6 | (0.1–2.4) | 0.7 | (0.2–2.5) | 0.9 | (0.4–1.9) |
| Rank: Officer | ‐ | ‐ | ‐ | ‐ | ‐ | ‐ | ‐ | ‐ |
| Percent of DGLA | 1.0 | (0.6–1.7) | 1.0 | (0.5–1.9) | 0.8 | (0.4–1.6) | 0.8 | (0.4–1.6) |
| Ratio of stearic acid to palmitic acid | 0.7 | (0.4–1.2) | 0.5* | (0.2–1.0) | 0.5* | (0.3–1.0) | 0.5* | (0.3–1.0) |
| χ2 3 | 2.3 | 4.4 | 4.8 | 4.8 | ||||
Abbreviations: CI, confidence interval; DGLA, dihomo‐γ‐linolenic acid; FA, fatty acid; OR, odds ratio.
Model 1 included only the dummy variable for vitamin D deficiency, main effects of the 4 variables with interactions in the LASSO model, and the interactions of vitamin D deficiency with these 4 variables as predictors. Only interaction coefficients are shown here. Model 2 added controls for the 23 other variables with main effects in the LASSO model. Models 3 and 4 deleted the main effects of the four interacting variables, none of which was in the LASSO model, whereas Model 4 additionally deleted the main effect of vitamin D deficiency, which was not in the LASSO model.
Risky and protective fatty acid clusters were defined based on the clusters discovered by Ryan et al., 2021.
Only n = 8 of the n = 52 officers in the test sample had vitamin D deficiency. Seven of these eight were suicide cases. This compares to n = 19 cases and n = 25 controls among officers without vitamin D deficiency, for a gross OR of 9.2. The comparable gross OR among others in the sample (i.e., those that were not officers) was 1.1 (n = 46 cases and n = 41 controls among those with vitamin D deficiency; n = 259 cases and n = 264 controls among those without vitamin D deficiency), resulting in a gross interaction OR of 8.3. However, this coefficient became unstable in the multivariate model and could not be estimated. It is noteworthy that the comparable OR in the training sample LASSO model had the opposite sign (OR = 0.9).
Standardized variable.
No χ2 tests were significant (p = 0.31–0.68).
Significant at the 0.05 level, two‐sided test.
Within‐quartile associations between vitamin D deficiency and subsequent suicide based on quartiles of the causal forest estimate of log‐odds differences (developed in the training sample) based on a conditional logistic regression model estimated in the test sample (n = 662; matched pairs = 331)
| Model 1 (All) | Model 2 (Top 25) | Model 3 (Top 5) | ||||
|---|---|---|---|---|---|---|
| OR | (95% CI) | OR | (95% CI) | OR | (95% CI) | |
| Q1 (lowest) protective | 1.1 | (0.5–2.6) | 1.0 | (0.4–2.5) | 1.0 | (0.4–2.4) |
| Q2 | 1.5 | (0.6–3.8) | 1.6 | (0.7–3.9) | 1.3 | (0.5–3.4) |
| Q3 | 1.1 | (0.4–2.9) | 1.0 | (0.3–3.1) | 4.0* | (1.3–12.7) |
| Q4 (highest) palmitic acid | 1.8 | (0.7–4.8) | 1.5 | (0.6–3.7) | 0.8 | (0.3–2.0) |
| χ2 3 | 2.4 | 2.2 | 6.2 | |||
Abbreviations: CI, confidence interval; OR, odds ratio.
Model 1 was based on the causal forest algorithm that used all 149 predictors with nonzero variable importance values in the training sample. Model 2 was based on a separate causal forest model that used only the 25 predictors with highest SHAP values in Model 1. Model 3 was based on a separate causal forest model that used only the five predictors with highest SHAP values in Model1.
No χ2 tests were significant (p = 0.18–0.70).
Significant at the 0.05 level, two‐sided test.