| Literature DB >> 34033672 |
Gareth Harman1,2, Dakota Kliamovich3, Angelica M Morales2, Sydney Gilbert2, Deanna M Barch4, Michael A Mooney1, Sarah W Feldstein Ewing5, Damien A Fair6, Bonnie J Nagel2,3.
Abstract
The objective of the current study was to build predictive models for suicidal ideation in a sample of children aged 9-10 using features previously implicated in risk among older adolescent and adult populations. This case-control analysis utilized baseline data from the Adolescent Brain and Cognitive Development (ABCD) Study, collected from 21 research sites across the United States (N = 11,369). Several regression and ensemble learning models were compared on their ability to classify individuals with suicidal ideation and/or attempt from healthy controls, as assessed by the Kiddie Schedule for Affective Disorders and Schizophrenia-Present and Lifetime Version. When comparing control participants (mean age: 9.92±0.62 years; 4944 girls [49%]) to participants with suicidal ideation (mean age: 9.89±0.63 years; 451 girls [40%]), both logistic regression with feature selection and elastic net without feature selection predicted suicidal ideation with an AUC of 0.70 (CI 95%: 0.70-0.71). The random forest with feature selection trained to predict suicidal ideation predicted a holdout set of children with a history of suicidal ideation and attempt (mean age: 9.96±0.62 years; 79 girls [41%]) from controls with an AUC of 0.77 (CI 95%: 0.76-0.77). Important features from these models included feelings of loneliness and worthlessness, impulsivity, prodromal psychosis symptoms, and behavioral problems. This investigation provided an unprecedented opportunity to identify suicide risk in youth. The use of machine learning to examine a large number of predictors spanning a variety of domains provides novel insight into transdiagnostic factors important for risk classification.Entities:
Year: 2021 PMID: 34033672 PMCID: PMC8148349 DOI: 10.1371/journal.pone.0252114
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Nested cross-fold validation data partitioning scheme.
A. First, data were partitioned into respective class labels. B. Next, the outer loop split the suicidal ideation (SI) class into five folds, setting aside one fold and a randomly subsampled set of controls as a holdout test set for each iteration. The group of individuals endorsing concomitant suicidal ideation and attempt (SA), along with another randomly subsampled set of controls were also set aside as a holdout test set. C. The inner loop then divided the remaining control sample into 11 folds (NControl/NSI) to combine with the SI training set to create class balanced datasets for training. These balanced training sets circumvent common class imbalance issues in machine learning, while also mitigating sampling bias induced by techniques such as down-sampling. Therefore, the reported performance for each model spanned the 55 total folds for each set of inner (11-folds) and outer folds (5-folds).
Participant demographics.
| CTRL | SI | SA | CTRL/SI | CTRL/SA | SI/SA | ||||
|---|---|---|---|---|---|---|---|---|---|
| N | 10060 | 1116 | 193 | ||||||
| Age | 9.92 ± .62 | 9.89 ± .63 | 9.96 ± .62 | ||||||
| Age (SD) | 0.62 | 0.63 | 0.62 | ||||||
| Male | 5116 | 0.51 | 665 | 0.60 | 114 | 0.59 | *** | ||
| Female | 4944 | 0.49 | 451 | 0.40 | 79 | 0.41 | |||
| N | % | N | % | N | % | ||||
| Asian | 214 | 0.02 | 24 | 0.02 | 3 | 0.02 | |||
| Black | 1534 | 0.15 | 149 | 0.13 | 46 | 0.24 | * | ||
| Hispanic | 2048 | 0.20 | 215 | 0.19 | 47 | 0.24 | |||
| White | 5250 | 0.52 | 589 | 0.53 | 74 | 0.38 | ** | * | |
| Other | 968 | 0.10 | 138 | 0.12 | 23 | 0.12 | |||
| Parents Married | 6873 | 0.68 | 703 | 0.63 | 97 | 0.50 | * | *** | ** |
| < Highschool | 508 | 0.05 | 49 | 0.04 | 10 | 0.05 | |||
| Highschool / GED | 965 | 0.10 | 94 | 0.08 | 27 | 0.14 | |||
| Some College | 2542 | 0.25 | 316 | 0.28 | 72 | 0.37 | ** | ||
| Bachelors | 2562 | 0.25 | 280 | 0.25 | 48 | 0.25 | |||
| Graduate | 3474 | 0.35 | 375 | 0.34 | 36 | 0.19 | *** | ** | |
| Income < = 50k | 2677 | 0.27 | 304 | 0.27 | 89 | 0.46 | *** | *** | |
| 50k < Income < 100k | 2582 | 0.26 | 311 | 0.28 | 48 | 0.25 | |||
| Income > = 100k | 3945 | 0.39 | 396 | 0.35 | 41 | 0.21 | *** | ** | |
Several demographic elements differed significantly between control, suicidal ideation (SI), and co-occurring suicidal ideation and attempt (SA) groups. Sex, race/ethnicity, parental marital status, highest level of parental education, and total household income, were all significantly different between at least two of the groups. Level of significance after Bonferroni correction denoted by ***, **, and * (p < .001, p < .01, and p < .05).
Fig 2Model performance.
Overall model performance in predicting both suicidal ideation (SI) and concomitant suicidal ideation and attempt (SA) using each method with and without Boruta feature selection. Logistic regression with feature selection and elastic-net without feature selection were the top performing models for classifying SI from controls (AUC = 0.70; CI 95%: 0.70–0.71), but did not perform significantly better than any other model trained after feature selection. The random forest with feature selection, trained only on SI vs. controls, was able to distinguish a smaller holdout test set of individuals endorsing both SI and SA from controls (AUC = 0.77; CI 95%: 0.76–0.77).
Fig 3Feature importance.
Boxplots display the log10 p-value of the most important features from each fold of training for logistic regression after Boruta feature selection. Vertical lines represent different levels of significance (p < 0.01, p < 0.001). Bar plots show the mean permutation importance of features identified as being more important than a known noise feature at every fold of training for the random forest. In both cases, feeling unloved, loneliness, measures of impulsivity, prodromal symptoms, and behavioral problems were important. Associated colors represent the theoretical construct for each feature, and whether that feature was derived from a caregiver- or child-report measure.