| Literature DB >> 36263295 |
Jeongyoon Lee1, Tae-Young Pak2.
Abstract
Background: Suicide remains the leading cause of premature death in South Korea. This study aims to develop machine learning algorithms for screening Korean adults at risk for suicidal ideation and suicide planning or attempt.Entities:
Keywords: Machine learning; Predictive modeling; Self-harm; Suicidal ideation; Suicide planning
Year: 2022 PMID: 36263295 PMCID: PMC9573904 DOI: 10.1016/j.ssmph.2022.101231
Source DB: PubMed Journal: SSM Popul Health ISSN: 2352-8273
Descriptive statistics.
| Full sample | Suicidal ideation | Suicide planning or attempt | ||||
|---|---|---|---|---|---|---|
| ( | ( | ( | ||||
| Mean | SD | Mean | SD | Mean | SD | |
| Age (19–64) | 43.9 | 12.2 | 45.7 | 12.2 | 45.9 | 12.4 |
| Female (0,1) | 0.54 | 0.57 | 0.60 | |||
| Education background (0,1) | 0.45 | 0.34 | 0.31 | |||
| Marital status (0,1) | 0.67 | 0.58 | 0.57 | |||
| No. of household members | 3.28 | 1.21 | 3.01 | 1.29 | 2.98 | 1.26 |
| Employment status (0,1) | 0.71 | 0.62 | 0.56 | |||
| Region of residence (0,1) | 0.40 | 0.41 | 0.43 | |||
| Religion (0,1) | 0.46 | 0.45 | 0.46 | |||
| Household income (in 2019 KRW) | 5962.5 | 5900.4 | 4803.3 | 3776.4 | 4598.4 | 3603.6 |
| Household consumption (in 2019 KRW) | 423.3 | 249.5 | 360.2 | 238.0 | 344.3 | 230.2 |
| Household net worth (in 2019 KRW) | 13625.9 | 36446.7 | 10024.8 | 31557.9 | 9378.9 | 32144.1 |
| Social welfare receipt (0,1) | 0.06 | 0.15 | 0.19 | |||
| No. of outpatient visits | 10.2 | 19.4 | 16.7 | 30.0 | 18.6 | 32.6 |
| Poor self-rated health (0,1) | 0.25 | 0.40 | 0.45 | |||
| Disability (0,1) | 0.06 | 0.12 | 0.14 | |||
| Any chronic disease (0,1) | 0.37 | 0.49 | 0.55 | |||
| Smoking (0,1) | 0.22 | 0.26 | 0.27 | |||
| Drinking (0,1) | 0.59 | 0.55 | 0.51 | |||
| CESD score (0–33) | 2.68 | 4.03 | 6.68 | 7.10 | 8.31 | 8.42 |
| Self-esteem score (0–30) | 21.3 | 3.81 | 18.8 | 5.15 | 17.8 | 5.86 |
| Experience of physical abuse from spouse (0,1) | 0.67 | 0.60 | 0.58 | |||
Notes: N, number of observations; SD, standard deviation.
Fig. 1Recursive feature elimination results.
Fig. 2Data construction and model development.
Tuned hyperparameters.
| Panel A: suicidal ideation ( | |
|---|---|
| SVM | kernel = linear, cost = 1.5 |
| RF | split rule = Gini, minimal node size = 2, number of randomly selected predictors = 6 |
| XGBoost | number of boosting iterations = 120, max tree depth = 3, shrinkage = 0.04, minimum loss reduction = 2, subsample ratio of columns = 0.55, minimum sum of instance weight = 5, and subsample percentage = 1 |
| Panel B: suicide planning or attempt ( | |
| SVM | kernel = linear, cost = 0.5 |
| RF | split rule = Gini, minimal node size = 2, number of randomly selected predictors = 7 |
| XGBoost | number of boosting iterations = 60, max tree depth = 5, shrinkage = 0.04, minimum loss reduction = 3, subsample ratio of columns = 0.5, minimum sum of instance weight = 6, and subsample percentage = 1 |
Notes: SVM, support vector machine; RF, random forest; XGBoost, extreme gradient boosting.
Algorithm performance.
| Logistic | SVM | RF | XGBoost | |
|---|---|---|---|---|
| Panel A: suicidal ideation ( | ||||
| Area under the curve | 0.837 | 0.844 | 0.851 | 0.861 |
| Sensitivity | 0.808 | 0.811 | 0.850 | 0.853 |
| Specificity | 0.867 | 0.877 | 0.852 | 0.869 |
| Positive predictive value | 0.808 | 0.820 | 0.799 | 0.819 |
| Negative predictive value | 0.867 | 0.870 | 0.891 | 0.895 |
| Accuracy | 0.843 | 0.850 | 0.851 | 0.863 |
| Panel B: suicide planning or attempt ( | ||||
| Area under the curve | 0.872 | 0.872 | 0.857 | 0.880 |
| Sensitivity | 0.861 | 0.861 | 0.814 | 0.861 |
| Specificity | 0.883 | 0.883 | 0.900 | 0.900 |
| Positive predictive value | 0.841 | 0.841 | 0.854 | 0.861 |
| Negative predictive value | 0.898 | 0.898 | 0.871 | 0.900 |
| Accuracy | 0.874 | 0.874 | 0.864 | 0.884 |
Notes: SVM, support vector machine; RF, random forest; XGBoost, extreme gradient boosting.
Algorithm performance, train and test set randomly drawn from the full sample.
| Logistic | SVM | RF | XGBoost | |
|---|---|---|---|---|
| Panel A: suicidal ideation ( | ||||
| Area under the curve | 0.829 | 0.837 | 0.832 | 0.830 |
| Sensitivity | 0.797 | 0.806 | 0.819 | 0.815 |
| Specificity | 0.860 | 0.867 | 0.844 | 0.846 |
| Positive predictive value | 0.835 | 0.844 | 0.824 | 0.825 |
| Negative predictive value | 0.826 | 0.834 | 0.839 | 0.837 |
| Accuracy | 0.830 | 0.838 | 0.832 | 0.831 |
| Panel B: suicide planning or attempt ( | ||||
| Area under the curve | 0.852 | 0.843 | 0.869 | 0.869 |
| Sensitivity | 0.789 | 0.788 | 0.824 | 0.824 |
| Specificity | 0.915 | 0.898 | 0.915 | 0.915 |
| Positive predictive value | 0.931 | 0.918 | 0.933 | 0.933 |
| Negative predictive value | 0.750 | 0.747 | 0.783 | 0.783 |
| Accuracy | 0.840 | 0.833 | 0.861 | 0.861 |
Notes: SVM, support vector machine; RF, random forest; XGBoost, extreme gradient boosting.
Algorithm performance, using one observation for each participant.
| Logistic | SVM | RF | XGBoost | |
|---|---|---|---|---|
| Panel A: suicidal ideation ( | ||||
| Area under the curve | 0.827 | 0.836 | 0.847 | 0.838 |
| Sensitivity | 0.814 | 0.822 | 0.814 | 0.838 |
| Specificity | 0.861 | 0.871 | 0.881 | 0.878 |
| Positive predictive value | 0.803 | 0.817 | 0.810 | 0.804 |
| Negative predictive value | 0.861 | 0.867 | 0.883 | 0.874 |
| Accuracy | 0.839 | 0.849 | 0.855 | 0.847 |
| Panel B: suicide planning or attempt ( | ||||
| Area under the curve | 0.842 | 0.842 | 0.858 | 0.842 |
| Sensitivity | 0.800 | 0.800 | 0.800 | 0.800 |
| Specificity | 0.883 | 0.883 | 0.917 | 0.883 |
| Positive predictive value | 0.821 | 0.821 | 0.865 | 0.821 |
| Negative predictive value | 0.869 | 0.869 | 0.873 | 0.869 |
| Accuracy | 0.850 | 0.850 | 0.870 | 0.850 |
Notes: SVM, support vector machine; RF, random forest; XGBoost, extreme gradient boosting.
Top five most important predictors.
| Algorithm: | Logistic | SVM | RF | XGBoost |
|---|---|---|---|---|
| Panel A: suicidal ideation ( | ||||
| Feature 1 | CESD score | CESD score | CESD score | CESD score |
| Feature 2 | Self-esteem | Self-esteem | Self-esteem | Self-esteem |
| Feature 3 | Satisfaction with family relation | Income | Income | Income |
| Feature 4 | Smoking | Life satisfaction | Consumption | Satisfaction with family relation |
| Feature 5 | Unpaid rent | Consumption | Net worth | Life satisfaction |
| Panel B: suicide planning or attempt ( | ||||
| Feature 1 | CESD score | CESD score | CESD score | CESD score |
| Feature 2 | Mother's education | Self-esteem | Self-esteem | Self-esteem |
| Feature 3 | Smoking | Income | Income | Life satisfaction |
| Feature 4 | Religion | Life satisfaction | Consumption | Income |
| Feature 5 | Age | Consumption | Life satisfaction | Smoking |
Notes: SVM, support vector machine; RF, random forest; XGBoost, extreme gradient boosting.
Algorithm performance for the reduced model.
| Logistic | SVM | RF | XGBoost | |
|---|---|---|---|---|
| Panel A: suicidal ideation ( | ||||
| Area under the curve | 0.835 | 0.838 | 0.845 | 0.843 |
| Sensitivity | 0.801 | 0.804 | 0.839 | 0.829 |
| Specificity | 0.869 | 0.872 | 0.850 | 0.857 |
| Positive predictive value | 0.809 | 0.813 | 0.795 | 0.801 |
| Negative predictive value | 0.863 | 0.865 | 0.884 | 0.878 |
| Accuracy | 0.841 | 0.844 | 0.846 | 0.846 |
| Panel B: suicide planning or attempt ( | ||||
| Area under the curve | 0.869 | 0.877 | 0.865 | 0.869 |
| Sensitivity | 0.837 | 0.837 | 0.814 | 0.837 |
| Specificity | 0.900 | 0.917 | 0.917 | 0.900 |
| Positive predictive value | 0.857 | 0.878 | 0.875 | 0.857 |
| Negative predictive value | 0.885 | 0.887 | 0.873 | 0.885 |
| Accuracy | 0.874 | 0.884 | 0.874 | 0.874 |
Notes: SVM, support vector machine; RF, random forest; XGBoost, extreme gradient boosting.