| Literature DB >> 31170212 |
Jun Su Jung1, Sung Jin Park2, Eun Young Kim3,4, Kyoung-Sae Na5, Young Jae Kim2, Kwang Gi Kim2.
Abstract
OBJECTIVE: Suicide in adolescents is a major problem worldwide and previous history of suicide ideation and attempt represents the strongest predictors of future suicidal behavior. The aim of this study was to develop prediction model to identify Korean adolescents of high risk suicide (= who have history of suicide ideation/attempt in previous year) using machine learning techniques.Entities:
Mesh:
Year: 2019 PMID: 31170212 PMCID: PMC6553749 DOI: 10.1371/journal.pone.0217639
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Scheme prediction model development.
Optimal parameters for each machine learning model are selected through the grid search.
| Model | Optimal parameters |
|---|---|
| LR | Penalty: ‘l2,’ C: 0.1 |
| SVM | C: 0.1, gamma: 0.01, kernel: ‘rbf’ |
| RF | n_estimators: 3000, max depth: 5, min samples leaf: 4, min samples split: 10 |
| ANN | Optimizer: ‘Adam’, learning rate: 0.0001, batch size: 200, epoch: 60 |
| XGB | n_estimators: 5000, learning rate: 0.05, colsample bytree: 0.3, max depth: 4, gamma: 1, lambda: 0.5, alpha: 0.5 |
Characteristics of high-risk suicide (n = 7,443) and no high-risk suicide (n = 52,541).
| no high-risk suicide (n | high-risk suicide (n = 7,443) | ||
|---|---|---|---|
| Sex, boy | 27493 (52.3%) | 2891 (38.8%) | <0.001 |
| Age (yrs.) | 15.0±1.7 | 15.0±1.8 | 0.695 |
| School | <0.001 | ||
| Middle school | 25876 (49.2%) | 3869 (52.0%) | |
| High school | 26665 (50.8%) | 3574 (48.0%) | |
| School grade | <0.001 | ||
| G1 | 8800 (16.7%) | 1039 (14.0%) | |
| G2 | 8520 (16.2%) | 1435 (19.3%) | |
| G3 | 8556 (16.3%) | 1395 (18.7%) | |
| G4 | 8760 (16.7%) | 1057 (14.2%) | |
| G5 | 9071 (17.3%) | 1327 (17.8%) | |
| G6 | 8834 (16.8%) | 1190 (16.0%) | |
| City type | 0.654 | ||
| countryside | 4094 (7.8%) | 563 (7.6%) | |
| small/medium-sized cities | 25154 (47.9%) | 3545 (47.6%) | |
| big cities | 23293 (44.3%) | 3335 (44.8%) | |
| Academic achievement | <0.001 | ||
| high | 7221 (13.7%) | 878 (11.8%) | |
| high middle | 13830 (26.3%) | 1632 (21.9%) | |
| middle | 15286 (29.1%) | 1926 (25.9%) | |
| low middle | 11439 (21.8%) | 1922 (25.8%) | |
| low | 4765 (9.1%) | 1085 (14.6%) | |
| Family structure | <0.001 | ||
| live with both parents | 43647 (83.1%) | 5740 (77.1%) | |
| live with one parent | 4955 (9.4%) | 938 (12.6%) | |
| neither parent | 3939 (7.5%) | 765 (10.3%) | |
| Family SES | <0.001 | ||
| high | 5663 (10.8%) | 700 (9.4%) | |
| high middle | 15518 (29.5%) | 1987 (26.7%) | |
| middle | 24500 (46.6%) | 3103 (41.7%) | |
| low middle | 5790 (11.0%) | 1260 (16.9%) | |
| low | 1070 (2.0%) | 393 (5.3%) | |
| Education, father | <0.001 | ||
| unknown | 11405 (21.7%) | 1614 (21.7%) | |
| middle school graduate or less | 932 (1.8%) | 190 (2.6%) | |
| high school graduate | 13488 (25.7%) | 1902 (25.6%) | |
| college or graduate degree | 26716 (50.8%) | 3737 (50.2%) | |
| Education, mother | <0.001 | ||
| unknown | 10695 (20.4%) | 1515 (20.4%) | |
| middle school graduate or less | 779 (1.5%) | 175 (2.4%) | |
| high school graduate | 16530 (31.5%) | 2254 (30.3%) | |
| college or graduate degree | 24537 (46.7%) | 3499 (47.0%) | |
| Current smoking (yes) | 2748 (5.2%) | 753 (10.1%) | <0.001 |
| Current alcohol drinking (yes) | 7474 (14.2%) | 1659 (22.3%) | <0.001 |
| Substance use (yes) | 303 (0.6%) | 196 (2.6%) | <0.001 |
| Physical activity | <0.001 | ||
| active | 20243 (38.5%) | 2689 (36.1%) | |
| inactive | 32298 (61.5%) | 4754 (63.9%) | |
| Body mass index (kg/m2) | 21.1±3.4 | 21.2±3.4 | 0.033 |
| Obesity | 0.228 | ||
| underweight | 4088 (7.8%) | 621 (8.3%) | |
| normal | 39827 (75.8%) | 5648 (75.9%) | |
| overweight | 1326 (2.5%) | 178 (2.4%) | |
| obesity | 7300 (13.9%) | 996 (13.4%) | |
| Sexual experience (yes) | 2128 (4.1%) | 616 (8.3%) | <0.001 |
| Internet addiction (yes) | 1766 (3.4%) | 563 (7.6%) | <0.001 |
| Sadness (yes) | 9548 (18.2%) | 5389 (72.4%) | <0.001 |
| Stress | <0.001 | ||
| very high | 3648 (6.9%) | 2545 (34.2%) | |
| high | 13050 (24.8%) | 3086 (41.5%) | |
| middle | 23913 (45.5%) | 1507 (20.2%) | |
| low | 9621 (18.3%) | 227 (3.0%) | |
| very low | 2309 (4.4%) | 78 (1.0%) | |
| Self-rated health | <0.001 | ||
| very good | 15525 (29.5%) | 1204 (16.2%) | |
| good | 23892 (45.5%) | 2695 (36.2%) | |
| normal | 10561 (20.1%) | 2336 (31.4%) | |
| poor | 2438 (4.6%) | 1081 (14.5%) | |
| very poor | 125 (0.2%) | 127 (1.7%) | |
| Sleep satisfaction | <0.001 | ||
| very high | 4570 (8.7%) | 323 (4.3%) | |
| high | 9869 (18.8%) | 755 (10.1%) | |
| middle | 17375 (33.1%) | 1975 (26.5%) | |
| low | 14392 (27.4%) | 2428 (32.6%) | |
| very low | 6335 (12.1%) | 1962 (26.4%) | |
| Self-rated weight | <0.001 | ||
| very thin | 2144 (4.1%) | 351 (4.7%) | |
| thin | 11176 (21.3%) | 1409 (18.9%) | |
| normal | 19141 (36.4%) | 2281 (30.6%) | |
| fat | 16914 (32.2%) | 2667 (35.8%) | |
| very fat | 3166 (6.0%) | 735 (9.9%) | |
| Distorted weight perception (yes) | 16701 (31.8%) | 2867 (38.5%) | <0.001 |
| School injury (yes) | 12105 (23.0%) | 2382 (32.0%) | <0.001 |
| Violence (yes) | 893 (1.7%) | 529 (7.1%) | <0.001 |
| Asthma (yes) | 4343 (8.3%) | 827 (11.1%) | <0.001 |
| Allergic rhinitis (yes) | 18073 (34.4%) | 2906 (39.0%) | <0.001 |
| Atopic dermatitis (yes) | 12839 (24.4%) | 2152 (28.9%) | <0.001 |
Note. Values are means ± standard deviation, median (range), or number (percentages).
*Chi-squared test or Student's t test.
SES: socio-economic status
Multivariate logistic regression analysis to identify factors associated with high risk of suicide.
| Adjusted OR | (95% CI) | ||
|---|---|---|---|
| Sex | |||
| boy | Reference | ||
| girl | 1.250 | 1.174 to 1.330 | <0.001 |
| School grade | |||
| G1 | Reference | ||
| G2 | 0.911 | 0.829 to 1.000 | 0.051 |
| G3 | 0.767 | 0.697 to 0.844 | <0.001 |
| G4 | 0.531 | 0.479 to 0.590 | <0.001 |
| G5 | 0.532 | 0.480 to 0.589 | <0.001 |
| G6 | 0.447 | 0.403 to 0.497 | <0.001 |
| City type | |||
| countryside | Reference | ||
| small/medium-sized cities | 0.655 | 0.595 to 0.720 | <0.001 |
| big cities | 0.674 | 0.612 to 0.741 | <0.001 |
| Academic achievement | |||
| high | Reference | ||
| high middle | 0.766 | 0.695 to 0.844 | <0.001 |
| middle | 0.801 | 0.728 to 0.882 | <0.001 |
| low middle | 0.909 | 0.824 to 1.004 | 0.059 |
| low | 0.859 | 0.765 to 0.964 | 0.010 |
| Family structure | |||
| live with both parents | Reference | ||
| live with one parent | 1.116 | 1.020 to 1.222 | 0.017 |
| neither | 1.081 | 0.971 to 1.204 | 0.155 |
| Family SES | |||
| high | Reference | ||
| high middle | 0.822 | 0.743 to 0.909 | <0.001 |
| middle | 0.804 | 0.728 to 0.882 | <0.001 |
| low middle | 1.023 | 0.910 to 1.152 | 0.699 |
| low | 1.094 | 0.920 to 1.300 | 0.308 |
| Education, father | |||
| unknown | 0.844 | 0.767 to 0.929 | 0.001 |
| middle school graduate or less | 1.003 | 0.817 to 1.230 | 0.981 |
| high school graduate | 0.967 | 0.894 to 1.046 | 0.406 |
| college or graduate degree | Reference | ||
| Education, mother | |||
| unknown | 0.911 | 0.827 to 1.005 | 0.062 |
| middle school graduate or less | 1.075 | 0.868 to 1.331 | 0.508 |
| high school graduate | 0.852 | 0.790 to 0.918 | <0.001 |
| college or graduate degree | Reference | ||
| Current smoking (yes) | 1.235 | 1.097 to 1.391 | <0.001 |
| Current alcohol drinking (yes) | 1.184 | 1.093 to 1.282 | <0.001 |
| Substance use (yes) | 1.932 | 1.523 to 2.450 | <0.001 |
| Physical activity | |||
| active | Reference | ||
| inactive | 0.879 | 0.827 to 0.935 | <0.001 |
| Obesity | |||
| normal | Reference | ||
| underweight | 1.089 | 0.980 to 1.210 | 0.113 |
| overweight | 0.767 | 0.636 to 0.924 | 0.005 |
| obesity | 0.937 | 0.862 to 1.018 | 0.122 |
| Sexual experience (yes) | 1.193 | 1.054 to 1.351 | 0.005 |
| Internet addiction (yes) | 1.230 | 0.911 to 1.660 | 0.177 |
| Sadness (yes) | 6.464 | 6.083 to 6.868 | <0.001 |
| Stress | |||
| very high | 1.626 | 1.398 to 1.892 | <0.001 |
| high | 0.843 | 0.729 to 0.975 | 0.021 |
| middle | 0.360 | 0.311 to 0.416 | <0.001 |
| low | 0.182 | 0.151 to 0.218 | <0.001 |
| very low | Reference | ||
| Self-rated health | |||
| very good | Reference | ||
| good | 1.116 | 1.030 to 1.208 | 0.007 |
| normal | 1.537 | 1.409 to 1.677 | <0.001 |
| poor | 2.009 | 1.794 to 2.249 | <0.001 |
| very poor | 2.901 | 2.131 to 3.950 | <0.001 |
| Sleep satisfaction | |||
| very high | Reference | ||
| high | 0.690 | 0.604 to 0.788 | <0.001 |
| middle | 0.777 | 0.688 to 0.877 | <0.001 |
| low | 0.851 | 0.753 to 0.961 | 0.010 |
| very low | 0.883 | 0.776 to 1.005 | 0.059 |
| Self-rated weight | |||
| very thin | Reference | ||
| thin | 0.446 | 0.395 to 0.503 | <0.001 |
| normal | 0.426 | 0.380 to 0.478 | <0.001 |
| fat | 0.501 | 0.421 to 0.598 | <0.001 |
| very fat | 0.578 | 0.476 to 0.703 | <0.001 |
| Distorted weight perception (yes) | 0.967 | 0.828 to 1.129 | 0.671 |
| School injury (yes) | 1.078 | 1.012 to 1.148 | 0.020 |
| Violence (yes) | 2.317 | 2.014 to 2.666 | <0.001 |
| Asthma (yes) | 1.022 | 0.929 to 1.124 | 0.653 |
| Allergic rhinitis (yes) | 0.988 | 0.930 to 1.050 | 0.702 |
| Atopic dermatitis (yes) | 1.053 | 0.987 to 1.122 | 0.117 |
Note. SES: socio-economic status
Confusion matrix for prediction models (Test set).
| Model | Sensitivity | Specificity | PPV | NPV | Accuracy | AUC | |
|---|---|---|---|---|---|---|---|
| LR | 78.2% | 77.6% | 77.7% | 78.0% | 77.9% | 0.851 | |
| SVM | 78.4% | 78.9% | 78.8% | 78.5% | 78.7% | 0.853 | |
| RF | 77.5% | 78.0% | 77.9% | 77.6% | 77.8% | 0.857 | |
| ANN | 77.3% | 77.8% | 77.7% | 77.4% | 77.5% | 0.851 | |
| XGB | 78.5% | 79.4% | 79.2% | 78.7% | 79.0% | 0.863 |
Note. LR: logistic regression; SVM: support vector machine; RF: random forest; ANN: artificial neural network; XGB: extreme gradient boosting; PPV: positive predictive value; NPV: negative predictive value; AUC: area under ROC curve
Fig 2Receiver operating characteristic (ROC) curve.