| Literature DB >> 34886497 |
Junggu Choi1, Seoyoung Cho1, Inhwan Ko2, Sanghoon Han1,2.
Abstract
Investigating suicide risk factors is critical for socioeconomic and public health, and many researchers have tried to identify factors associated with suicide. In this study, the risk factors for suicidal ideation were compared, and the contributions of different factors to suicidal ideation and attempt were investigated. To reflect the diverse characteristics of the population, the large-scale and longitudinal dataset used in this study included both socioeconomic and clinical variables collected from the Korean public. Three machine learning algorithms (XGBoost classifier, support vector classifier, and logistic regression) were used to detect the risk factors for both suicidal ideation and attempt. The importance of the variables was determined using the model with the best classification performance. In addition, a novel risk-factor score, calculated from the rank and importance scores of each variable, was proposed. Socioeconomic and sociodemographic factors showed a high correlation with risks for both ideation and attempt. Mental health variables ranked higher than other factors in suicidal attempts, posing a relatively higher suicide risk than ideation. These trends were further validated using the conditions from the integrated and yearly dataset. This study provides novel insights into suicidal risk factors for suicidal ideations and attempts.Entities:
Keywords: longitudinal survey dataset; machine learning algorithm; suicidal ideation; suicidal risk factor; suicide attempt
Mesh:
Year: 2021 PMID: 34886497 PMCID: PMC8657265 DOI: 10.3390/ijerph182312772
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Overview of the research scheme adopted in this study.
Categories of variables in the KNHANES dataset.
| No | Categories | Type of Variables |
|---|---|---|
| 1 | Health behavior | Categorical |
| 2 | Blood pressure measurement | Continuous |
| 3 | Blood test | Continuous |
| 4 | Grip strength test | Continuous |
| 5 | Dietary life survey | Categorical |
| 6 | Food safety investigation | Categorical |
| 7 | Food intake frequency survey | Categorical |
| 8 | Food intake survey | Continuous |
| 9 | Dietary life evaluation index | Continuous |
Baseline characteristics of the KNHANES dataset.
| Characteristic | KNHANES | |
|---|---|---|
| Age (years), mean (SD) | 48.5 (18.0) | |
| No. of participants ( | 78,796 | |
| Gender, | Male | 34,230 (43.5%) |
| Female | 44,566 (56.5%) | |
| Height (cm), mean (SD) | 156.1 (19.7) | |
| Weight (kg), mean (SD) | 57.1 (18.0) | |
| BMI, mean (SD) | 22.6 (4.2) | |
Figure 2Distributions of variables used in our study: (a) distribution of “age”, (b) distribution of “sex”, (c) distribution of “ainc”, (d) distribution of “incm”, and (e) distribution of “ho_incm”.
Dimensions and number of participants for datasets from 2007 to 2019.
| Year | Dimension | No. of Participants |
|---|---|---|
| 2007 | (2839, 49) | 2839 |
| 2008 | (6585, 49) | 6585 |
| 2009 | (7399, 49) | 7399 |
| 2010 | (6175, 49) | 6175 |
| 2011 | (5977, 49) | 5977 |
| 2012 | (6125, 49) | 6125 |
| 2013 | (5941, 49) | 5941 |
| 2014 | (5655, 49) | 5655 |
| 2015 | (5899, 49) | 5899 |
| 2016 | (6542, 49) | 6542 |
| 2017 | (6608, 49) | 6608 |
| 2018 | (6403, 49) | 6403 |
| 2019 | (6648, 49) | 6648 |
Hyperparameters applied in the machine learning classifiers.
| Algorithm | Hyperparameter | Value |
|---|---|---|
| XGBoost classifier | Eta | 0.3 |
| Gamma | 0 | |
| max_depth | 6 | |
| min_child_weight | 1 | |
| Support vector classifier | Kernel | rbf |
| Gamma | auto | |
| Logistic regression | Penalty | L2 |
| Solver | newton-cg |
Classification performance results for classifiers with yearly datasets (2007–2013).
| Year | Dependent | Classifier | Precision | Recall | F1-Score | Accuracy | AUC 1 |
|---|---|---|---|---|---|---|---|
| 2007 | BP6_10 2 | XGBoost | 0.850 | 0.866 | 0.859 | 0.886 | 0.920 |
| SVC 4 | 0.846 | 0.605 | 0.600 | 0.856 | 0.843 | ||
| LR 5 | 0.707 | 0.711 | 0.727 | 0.816 | 0.859 | ||
| BP6_31 3 | XGBoost | 0.883 | 0.935 | 0.911 | 0.935 | 0.958 | |
| SVC | 0.868 | 0.530 | 0.523 | 0.658 | 0.656 | ||
| LR | 0.527 | 0.600 | 0.595 | 0.758 | 0.798 | ||
| 2008 | BP6_10 | XGBoost | 0.869 | 0.841 | 0.851 | 0.887 | 0.893 |
| SVC | 0.721 | 0.586 | 0.523 | 0.815 | 0.805 | ||
| LR | 0.704 | 0.758 | 0.706 | 0.784 | 0.829 | ||
| BP6_31 | XGBoost | 0.893 | 0.941 | 0.911 | 0.938 | 0.955 | |
| SVC | 0.471 | 0.506 | 0.485 | 0.459 | 0.649 | ||
| LR | 0.530 | 0.600 | 0.496 | 0.710 | 0.692 | ||
| 2009 | BP6_10 | XGBoost | 0.894 | 0.857 | 0.879 | 0.883 | 0.898 |
| SVC | 0.743 | 0.648 | 0.644 | 0.824 | 0.807 | ||
| LR | 0.688 | 0.737 | 0.708 | 0.783 | 0.828 | ||
| BP6_31 | XGBoost | 0.908 | 0.933 | 0.913 | 0.932 | 0.958 | |
| SVC | 0.549 | 0.537 | 0.484 | 0.419 | 0.731 | ||
| LR | 0.551 | 0.682 | 0.567 | 0.751 | 0.761 | ||
| 2010 | BP6_10 | XGBoost | 0.916 | 0.924 | 0.911 | 0.937 | 0.920 |
| SVC | 0.726 | 0.554 | 0.497 | 0.840 | 0.809 | ||
| LR | 0.676 | 0.757 | 0.698 | 0.806 | 0.833 | ||
| BP6_31 | XGBoost | 0.906 | 0.937 | 0.913 | 0.937 | 0.963 | |
| SVC | 0.667 | 0.540 | 0.527 | 0.814 | 0.725 | ||
| LR | 0.550 | 0.673 | 0.565 | 0.763 | 0.752 | ||
| 2011 | BP6_10 | XGBoost | 0.948 | 0.965 | 0.948 | 0.959 | 0.925 |
| SVC | 0.789 | 0.533 | 0.523 | 0.817 | 0.830 | ||
| LR | 0.685 | 0.782 | 0.709 | 0.818 | 0.848 | ||
| BP6_31 | XGBoost | 0.913 | 0.945 | 0.925 | 0.942 | 0.956 | |
| SVC | 0.469 | 0.503 | 0.487 | 0.728 | 0.637 | ||
| LR | 0.509 | 0.589 | 0.577 | 0.692 | 0.601 | ||
| 2012 | BP6_10 | XGBoost | 0.943 | 0.954 | 0.937 | 0.943 | 0.924 |
| SVC | 0.793 | 0.860 | 0.799 | 0.861 | 0.786 | ||
| LR | 0.857 | 0.791 | 0.801 | 0.790 | 0.819 | ||
| BP6_31 | XGBoost | 0.919 | 0.951 | 0.932 | 0.951 | 0.965 | |
| SVC | 0.476 | 0.500 | 0.488 | 0.865 | 0.710 | ||
| LR | 0.535 | 0.644 | 0.515 | 0.765 | 0.731 | ||
| 2013 | BP6_10 | XGBoost | 0.930 | 0.948 | 0.936 | 0.948 | 0.978 |
| SVC | 0.538 | 0.507 | 0.499 | 0.525 | 0.833 | ||
| LR | 0.586 | 0.778 | 0.602 | 0.833 | 0.842 | ||
| BP6_31 | XGBoost | 0.985 | 0.992 | 0.988 | 0.992 | 0.988 | |
| SVC | 0.496 | 0.500 | 0.498 | 0.604 | 0.852 | ||
| LR | 0.519 | 0.800 | 0.502 | 0.864 | 0.826 |
1 AUC: area under curve; 2 BP6_10: suicidal ideation in the previous year; 3 BP6_31: suicide attempts within the last year; 4 SVC: support vector classifier; 5 LR: logistic regression.
Classification performance results for classifiers with yearly datasets (2014–2019).
| Year | Dependent | Classifier | Precision | Recall | F1-Score | Accuracy | AUC 1 |
|---|---|---|---|---|---|---|---|
| 2014 | BP6_10 2 | XGBoost | 0.914 | 0.953 | 0.932 | 0.953 | 0.981 |
| SVC 4 | 0.475 | 0.500 | 0.487 | 0.516 | 0.865 | ||
| LR 5 | 0.587 | 0.749 | 0.599 | 0.839 | 0.828 | ||
| BP6_31 3 | XGBoost | 0.960 | 0.980 | 0.969 | 0.980 | 0.977 | |
| SVC | 0.490 | 0.500 | 0.495 | 0.785 | 0.687 | ||
| LR | 0.560 | 0.677 | 0.585 | 0.683 | 0.745 | ||
| 2015 | BP6_10 | XGBoost | 0.927 | 0.943 | 0.932 | 0.943 | 0.980 |
| SVC | 0.653 | 0.555 | 0.549 | 0.464 | 0.851 | ||
| LR | 0.616 | 0.817 | 0.646 | 0.858 | 0.870 | ||
| BP6_31 | XGBoost | 0.983 | 0.991 | 0.987 | 0.991 | 0.989 | |
| SVC | 0.496 | 0.500 | 0.498 | 0.535 | 0.742 | ||
| LR | 0.518 | 0.757 | 0.501 | 0.844 | 0.854 | ||
| 2016 | BP6_10 | XGBoost | 0.917 | 0.942 | 0.929 | 0.955 | 0.989 |
| SVC | 0.481 | 0.500 | 0.490 | 0.637 | 0.797 | ||
| LR | 0.593 | 0.743 | 0.600 | 0.746 | 0.888 | ||
| BP6_31 | XGBoost | 0.988 | 0.994 | 0.991 | 0.994 | 0.990 | |
| SVC | 0.497 | 0.500 | 0.498 | 0.884 | 0.789 | ||
| LR | 0.509 | 0.676 | 0.479 | 0.850 | 0.744 | ||
| 2017 | BP6_10 | XGBoost | 0.939 | 0.952 | 0.941 | 0.952 | 0.980 |
| SVC | 0.501 | 0.501 | 0.491 | 0.652 | 0.866 | ||
| LR | 0.605 | 0.811 | 0.634 | 0.861 | 0.881 | ||
| BP6_31 | XGBoost | 0.988 | 0.994 | 0.991 | 0.994 | 0.990 | |
| SVC | 0.497 | 0.500 | 0.498 | 0.395 | 0.862 | ||
| LR | 0.513 | 0.730 | 0.494 | 0.862 | 0.844 | ||
| 2018 | BP6_10 | XGBoost | 0.918 | 0.958 | 0.938 | 0.958 | 0.972 |
| SVC | 0.479 | 0.500 | 0.498 | 0.583 | 0.837 | ||
| LR | 0.560 | 0.725 | 0.564 | 0.838 | 0.829 | ||
| BP6_31 | XGBoost | 0.984 | 0.992 | 0.988 | 0.992 | 0.990 | |
| SVC | 0.497 | 0.500 | 0.498 | 0.836 | 0.876 | ||
| LR | 0.512 | 0.712 | 0.485 | 0.852 | 0.864 | ||
| 2019 | BP6_10 | XGBoost | 0.937 | 0.950 | 0.939 | 0.950 | 0.982 |
| SVC | 0.475 | 0.500 | 0.487 | 0.504 | 0.866 | ||
| LR | 0.599 | 0.805 | 0.624 | 0.851 | 0.879 | ||
| BP6_31 | XGBoost | 0.991 | 0.995 | 0.993 | 0.995 | 0.990 | |
| SVC | 0.498 | 0.500 | 0.499 | 0.549 | 0.806 | ||
| LR | 0.509 | 0.727 | 0.490 | 0.872 | 0.821 |
1 AUC: area under curve; 2 BP6_10: suicidal ideation in the previous year; 3 BP6_31: suicide attempts within the last year; 4 SVC: support vector classifier; 5 LR: logistic regression.
Classification performance results for classifiers with integrated datasets.
| Dependent | Classifier | Precision | Recall | F1-score | Accuracy | AUC 1 |
|---|---|---|---|---|---|---|
| BP6_10 2 | XGBoost | 0.874 | 0.893 | 0.878 | 0.893 | 0.950 |
| SVC 4 | 0.442 | 0.500 | 0.470 | 0.885 | 0.811 | |
| LR 5 | 0.653 | 0.779 | 0.677 | 0.808 | 0.853 | |
| BP6_31 3 | XGBoost | 0.977 | 0.986 | 0.981 | 0.986 | 0.990 |
| SVC | 0.493 | 0.500 | 0.497 | 0.682 | 0.766 | |
| LR | 0.524 | 0.794 | 0.493 | 0.805 | 0.850 |
1 AUC: area under curve; 2 BP6_10: suicidal ideation in the previous year; 3 BP6_31: suicide attempts within the last year; 4 SVC: support vector classifier; 5 LR: logistic regression.
Important features and risk-factor score in the integrated dataset.
| Rank | DV | Variable | Variable | Risk | DV | Variable | Variable | Risk-Factor Score |
|---|---|---|---|---|---|---|---|---|
| 1 | BP6_10 1 | ainc | Average monthly income | 0.2023 | BP6_31 2 | ainc | Average monthly income | 0.2030 |
| 2 | age | Age of participant | 0.1653 | age | Age of participant | 0.1472 | ||
| 3 | BD2 | Drinking age | 0.1088 | BD2 | Drinking age | 0.1107 | ||
| 4 | BP8 | Average sleep time per day | 0.0729 | BP8 | Average sleep time per day | 0.0969 | ||
| 5 | educ | Education level | 0.0518 | educ | Education level | 0.0392 | ||
| 6 | BO1 | Subjective body type recognition | 0.4209 | BO1 | Subjective body type recognition | 0.0389 | ||
| 7 | D_1_1 | Subjective health status | 0.0377 | D_1_1 | Subjective health status | 0.0330 | ||
| 8 | BP1 | Awareness of usual stress | 0.0301 | BO1_1 | Weight change in past 1 year | 0.0292 | ||
| 9 | BO1_1 | Weight change in past 1 year | 0.0277 | BP1 | Awareness of usual stress | 0.0274 | ||
| 10 | incm | Personal income | 0.0244 | house | Home ownership | 0.0231 | ||
| 11 | house | Home ownership | 0.0244 | DF2_lt | Prevalence of depression | 0.0216 | ||
| 12 | DF2_lt | Prevalence of depression | 0.0220 | LQ_5EQL | EuroQoL: anxiety/depression | 0.0208 | ||
| 13 | EC1_1 | Economic activity | 0.0202 | incm | Personal income | 0.0208 | ||
| 14 | LQ_4EQL | EuroQoL: pain/discomfort | 0.0188 | LQ_4EQL | EuroQoL: pain/discomfort | 0.0190 | ||
| 15 | ho_incm | Household income | 0.0178 | BP5 | Depression for 2 weeks | 0.0189 | ||
| 16 | D_2_1 | Uncomfortable experience | 0.0167 | EC1_1 | Economic activity | 0.0171 | ||
| 17 | sex | Sex of participant | 0.0166 | sex | Sex of participant | 0.0163 | ||
| 18 | LQ_5EQL | EuroQoL: anxiety/depression | 0.0149 | D_2_1 | Uncomfortable experience | 0.0140 | ||
| 19 | BP5 | Depression for 2 weeks | 0.0142 | ho_incm | Household income | 0.0140 | ||
| 20 | LQ_1EQL | EuroQoL: athletic ability | 0.0111 | LQ_3EQL | EuroQoL: daily activity | 0.0118 |
1 BP6_10: suicide ideation within the last year; 2 BP6_31: suicide attempts within the last year.
Important features and risk-factor score in yearly dataset condition (2009).
| Rank | DV | Variable | Variable | Risk | DV | Variable | Variable | Risk-Factor Score |
|---|---|---|---|---|---|---|---|---|
| 1 | BP6_10 1 | ainc | Average monthly income | 0.1700 | BP6_31 2 | age | Age of participant | 0.2000 |
| 2 | age | Age of participant | 0.1651 | ainc | Average monthly income | 0.1545 | ||
| 3 | LQ_VAS | EuroQoL: total score | 0.1168 | LQ_VAS | EuroQoL: total score | 0.1241 | ||
| 4 | BD2 | Drinking age | 0.1045 | BD2 | Drinking age | 0.1049 | ||
| 5 | BP8 | Average sleep time per day | 0.0516 | BP8 | Average sleep time per day | 0.0337 | ||
| 6 | educ | Education level | 0.0447 | educ | Education level | 0.0299 | ||
| 7 | BO1 | Subjective body type recognition | 0.0375 | BO1_1 | Weight change in past 1 year | 0.0297 | ||
| 8 | D_1_1 | Subjective health status | 0.0350 | BO1 | Subjective body type recognition | 0.0297 | ||
| 9 | BO1_1 | Weight change in past 1 year | 0.0310 | DF2_lt | Prevalence of depression | 0.0278 | ||
| 10 | BP1 | Awareness of usual stress | 0.0288 | incm | Personal income | 0.0248 | ||
| 11 | incm | Personal income | 0.0266 | ho_incm | Household income | 0.0231 | ||
| 12 | house | Home ownership | 0.0228 | LQ_5EQL | EuroQoL: anxiety/depression | 0.0210 | ||
| 13 | EC1_1 | Economic activity | 0.0203 | house | Home ownership | 0.0202 | ||
| 14 | ho_incm | Household income | 0.0184 | BP5 | Depression for 2 weeks | 0.0168 | ||
| 15 | sex | Sex of participant | 0.0147 | EC1_1 | Economic activity | 0.0161 | ||
| 16 | LQ_4EQL | EuroQoL: pain/discomfort | 0.0136 | D_1_1 | Subjective health status | 0.0159 | ||
| 17 | BP5 | Depression for 2 weeks | 0.0135 | LQ4_22 | Activity restriction: old age | 0.0145 | ||
| 18 | D_2_1 | Uncomfortable experience | 0.0127 | D_2_1 | Uncomfortable experience | 0.0134 | ||
| 19 | DF2_lt | Prevalence of depression | 0.0114 | LQ_4EQL | EuroQoL: pain/discomfort | 0.0125 | ||
| 20 | LQ_1EQL | EuroQoL: athletic ability | 0.0085 | LQ_1EQL | EuroQoL: athletic ability | 0.0097 |
1 BP6_10: suicide ideation within the last year; 2 BP6_31: suicide attempts within the last year.
Important features and risk-factor score in yearly dataset condition (2010).
| Rank | DV | Variable | Variable | Risk | DV | Variable | Variable | Risk-Factor Score |
|---|---|---|---|---|---|---|---|---|
| 1 | BP6_10 1 | ainc | Average monthly income | 0.2007 | BP6_31 2 | ainc | Average monthly income | 0.2158 |
| 2 | age | Age of participant | 0.1594 | age | Age of participant | 0.1703 | ||
| 3 | LQ_VAS | EuroQoL: total score | 0.1232 | LQ_VAS | EuroQoL: total score | 0.1505 | ||
| 4 | BD2 | Drinking age | 0.1014 | BD2 | Drinking age | 0.1420 | ||
| 5 | BP8 | Average sleep time per day | 0.0486 | BP8 | Average sleep time per day | 0.0342 | ||
| 6 | educ | Education level | 0.0442 | D_1_1 | Subjective health status | 0.0223 | ||
| 7 | BO1 | Subjective body type recognition | 0.0372 | BO1 | Subjective body type recognition | 0.0197 | ||
| 8 | D_1_1 | Subjective health status | 0.0305 | educ | Education level | 0.0196 | ||
| 9 | BP1 | Awareness of usual stress | 0.0258 | LQ4_08 | Activity restriction: high blood pressure | 0.0186 | ||
| 10 | BO1_1 | Weight change in past 1 year | 0.0248 | BO1_1 | Weight change in past 1 year | 0.0185 | ||
| 11 | house | Home ownership | 0.0222 | BP1 | Awareness of usual stress | 0.0184 | ||
| 12 | incm | Personal income | 0.0212 | incm | Personal income | 0.0163 | ||
| 13 | EC1_1 | Economic activity | 0.0206 | LQ_5EQL | EuroQoL: anxiety/depression | 0.0156 | ||
| 14 | LQ4_08 | Activity restriction: high blood pressure | 0.0175 | DF2_lt | Prevalence of depression | 0.0145 | ||
| 15 | D_2_1 | Uncomfortable experience in past 2 weeks | 0.0156 | LQ4_10 | Activity restriction: cancer | 0.0136 | ||
| 16 | LQ_4EQL | EuroQoL: pain/discomfort | 0.0156 | sex | Sex of participant | 0.0133 | ||
| 17 | ho_incm | Household income | 0.0143 | house | Home ownership | 0.0119 | ||
| 18 | sex | Sex of participant | 0.0140 | BP5 | Depression for 2 weeks or more | 0.0119 | ||
| 19 | BP5 | Depression for 2 weeks or more | 0.0140 | EC1_1 | Economic activity | 0.0102 | ||
| 20 | DF2_lt | Prevalence of depression | 0.0114 | LQ4_01 | Activity restriction: fracture/joint injury | 0.0098 |
1 BP6_10: suicide ideation within the last year; 2 BP6_31: suicide attempts within the last year.
Important features and risk-factor score in the yearly dataset condition (2011).
| Rank | DV | Variable | Variable | Risk | DV | Variable | Variable | Risk-Factor Score |
|---|---|---|---|---|---|---|---|---|
| 1 | BP6_10 1 | ainc | Average monthly income | 0.2107 | BP6_31 2 | ainc | Average monthly income | 0.2338 |
| 2 | age | Age of participant | 0.1574 | age | Age of participant | 0.2127 | ||
| 3 | LQ_VAS | EuroQoL: total score | 0.1208 | LQ_VAS | EuroQoL: total score | 0.1690 | ||
| 4 | BD2 | Drinking age | 0.1085 | BD2 | Drinking age | 0.0694 | ||
| 5 | BP8 | Average sleep time per day | 0.0482 | ho_incm | Household income | 0.0412 | ||
| 6 | educ | Education level | 0.0400 | educ | Education level | 0.0281 | ||
| 7 | BO1 | Subjective body type recognition | 0.0336 | BP8 | Average sleep time per day | 0.0261 | ||
| 8 | D_1_1 | Subjective health status | 0.0270 | BO1 | Subjective body type recognition | 0.0203 | ||
| 9 | BP1 | Awareness of usual stress | 0.0264 | BO1_1 | Weight change in past 1 year | 0.0185 | ||
| 10 | BO1_1 | Weight change in past 1 year | 0.0245 | DF2_lt | Prevalence of depression | 0.0169 | ||
| 11 | incm | Personal income | 0.0244 | EC1_1 | Economic activity | 0.0163 | ||
| 12 | EC1_1 | Economic activity | 0.0207 | D_1_1 | Subjective health status | 0.0152 | ||
| 13 | house | Home ownership | 0.0168 | BP1 | Awareness of usual stress | 0.0152 | ||
| 14 | sex | Sex of participant | 0.0168 | LQ_5EQL | EuroQoL: anxiety/depression | 0.0136 | ||
| 15 | BP5 | Depression for 2 weeks | 0.0159 | D_2_1 | Uncomfortable experience | 0.0117 | ||
| 16 | ho_incm | Household income | 0.0155 | BP5 | Depression for 2 weeks | 0.0116 | ||
| 17 | D_2_1 | Uncomfortable experience | 0.0143 | incm | Personal income | 0.0094 | ||
| 18 | LQ_4EQL | EuroQoL: pain/discomfort | 0.0143 | LQ1_sb | Lying in a sickbed in past 1 month | 0.0083 | ||
| 19 | DF2_lt | Prevalence of depression | 0.0111 | LQ_4EQL | EuroQoL: pain/discomfort | 0.0082 | ||
| 20 | LQ_5EQL | EuroQoL: anxiety/depression | 0.0094 | LQ4_05 | Activity restriction: | 0.0077 |
1 BP6_10: suicide ideation within the last year; 2 BP6_31: suicide attempts within the last year.