Literature DB >> 30301301

Use of a Machine Learning Algorithm to Predict Individuals with Suicide Ideation in the General Population.

Seunghyong Ryu¹, Hyeongrae Lee¹, Dong-Kyun Lee¹, Kyeongwoo Park¹.

Abstract

OBJECTIVE: In this study, we aimed to develop a model predicting individuals with suicide ideation within a general population using a machine learning algorithm.
METHODS: Among 35,116 individuals aged over 19 years from the Korea National Health & Nutrition Examination Survey, we selected 11,628 individuals via random down-sampling. This included 5,814 suicide ideators and the same number of non-suicide ideators. We randomly assigned the subjects to a training set (n=10,466) and a test set (n=1,162). In the training set, a random forest model was trained with 15 features selected with recursive feature elimination via 10-fold cross validation. Subsequently, the fitted model was used to predict suicide ideators in the test set and among the total of 35,116 subjects. All analyses were conducted in R.
RESULTS: The prediction model achieved a good performance [area under receiver operating characteristic curve (AUC)=0.85] in the test set and predicted suicide ideators among the total samples with an accuracy of 0.821, sensitivity of 0.836, and specificity of 0.807.
CONCLUSION: This study shows the possibility that a machine learning approach can enable screening for suicide risk in the general population. Further work is warranted to increase the accuracy of prediction.

Entities: Chemical Disease Species

Keywords: Machine learning algorithm; Prediction; Public health data; Suicide ideation

Year: 2018 PMID： 30301301 PMCID： PMC6258996 DOI： 10.30773/pi.2018.08.27

Source DB: PubMed Journal: Psychiatry Investig ISSN： 1738-3684 Impact factor: 2.505

INTRODUCTION

In Korea, suicide has become the fifth leading cause of death following cancer, stroke, cardiovascular disease, and pneumonia [1]. The suicide rate in Korea is the highest among the Organization for Economic Cooperation and Development (OECD) countries, with up to 40 people taking their own life every day [2]. This high suicide rate is significantly linked to the avoidance of psychiatric treatment due to social stigma associated with mental illness in the country [3]. Many studies have indicated that the majority of suicide completers have diagnosable psychiatric illnesses, such as depression and alcohol use disorder [4,5]. However, in Korea, only one fourth of suicide completers had seen a psychiatrist before taking their own life, but a greater number had visited a general physician or a Korean medicine doctor to address symptoms of indigestion or insomnia [6]. This underlines the importance of suicide prevention strategies in community-based settings, including gatekeeper training, screening programs, public education, and restricting access to lethal means [7]. It is estimated that 15.4% of Koreans have thought about suicide at some point in their lives, with 2.9% reporting having engaged in suicide ideation in the previous year [8]. The lifetime prevalence of suicide ideation in Korea is considerably higher than the cross-national lifetime prevalence of suicide ideation (9.2%) [9]. Suicide ideation is regarded as a major predictor of committing suicide and therefore assessing suicide ideation is an important step in suicide prevention strategies. Even though individuals who think about suicide do not all subsequently commit suicide, people experiencing persistent and severe suicide ideation are at increased risk of attempting suicide [10,11]. According to a cross-national study, 60% of transitions from suicide ideation to suicide plan and attempt occur within the first year after onset of suicide ideation [9]. In addition, suicide ideation has been found to be associated with clinically significant symptoms of mental illnesses such as depression and bipolar disorder [12,13]. There are several known socio-demographic, physical, and psychological factors influencing suicide ideation and behaviors [14]. Predicting which individuals are at high risk of suicide by screening risk factors at the population level might be an effective approach to reduce suicide rates [15]. Such an approach requires analytic techniques that can classify individuals at high risk by integrating multiple risk factors. Recently, several studies have applied machine learning to medical and healthcare big data for disease diagnosis, treatment, and prevention [16]. Machine learning is a branch of artificial intelligence in which a computer generates rules underlying or based on raw data. We expected that machine learning analysis of public health data also could be used to predict individuals at high risk of suicide in the general population. In this study, we aimed to develop a model predicting individuals with suicide ideation in the general population of Korea by using a machine learning algorithm.

METHODS

Study population

This study was performed with data from the Korea National Health and Nutrition Examination Survey (KNHANES), which was conducted between 2007 and 2012 (total n=50,405). The KNHANES is a nationwide survey of the health and nutritional status of non-institutionalized civilians in Korea, and is conducted every year by the Korea Center for Disease Control and Prevention [17]. Each year, the survey uses a stratified and multistage probability sampling design to include a new sample of about 8,000 individuals. All KNHANES participants provide written consent to participate in the survey and for their personal data to be used. Among the 38,005 individuals aged over 19 years, 35,116 subjects answered the following survey question about suicide ideation: “During the past year, have you ever felt that you were willing to die?” Among the 35,116 respondents, 5,814 (16.6%) reported experiencing suicide ideation (suicide ideators), while the remaining 29,302 respondents (83.4%) denied any suicide ideation (non-suicide ideators) (Table 1).

Table 1.

Characteristics of suicide ideators (N=5,814) and non-suicide ideators (N=29,302)

	Suicide ideator[*]	Non-suicide ideator[*]	Statistics[†]
Age, years	54.13 (17.73)	49.00 (16.26)	T=20.43, p<0.01
Sex			χ²=553.10, p<0.01
Male	1,654 (28.4)	13,225 (45.1)
Female	4,160 (71.6)	16,077 (54.9)
Education			χ²=1,345.13, p<0.01
Village school	41 (0.7)	72 (0.2)
Uneducated	834 (14.4)	1,451 (5.0)
Elementary school	1,585 (27.4)	4,896 (16.7)
Middle school	673 (11.6)	3,419 (11.7)
High school	1,342 (23.2)	8,587 (29.4)
Two- or three-year college	471 (8.1)	3,497 (12.0)
Four-year university	738 (12.7)	6,221 (21.3)
Graduate school	107 (1.8)	1,108 (3.8)
Reasons for unemployment			χ²=1,296.16, p<0.01
Do not feel the need	297 (5.1)	2,054 (7.0)
Schooling	119 (2.1)	774 (2.6)
Retired	83 (1.4)	846 (2.9)
Having health problems	1,471 (25.4)	2,674 (9.2)
Looking for a job	350 (6.1)	1514 (5.2)
Parenting or nursing	507 (8.8)	2,818 (9.6)
etc.	171 (3.0)	755 (2.6)
Employed	2,787 (48.2)	17,773 (60.8)
Average work week, hours	24.28 (26.86)	29.73 (25.88)	T=-14.15, p<0.01
Subjective health status			χ²=2,340.76, p<0.01
Very good	126 (2.2)	1,432 (4.9)
Good	1,154 (19.9)	10,041 (34.3)
Fair	1,982 (34.2)	12,571 (43.0)
Poor	1,857 (32.0)	4,552 (15.6)
Very poor	679 (11.7)	657 (2.2)
Days of feeling sick or discomfort, days	4.46 (6.08)	1.91 (4.39)	T=30.43, p<0.01
Limitation of daily life and social activities			χ²=1,585.42, p<0.01
Yes	1,858 (32.1)	3,399 (11.6)
No	3,937 (67.9)	25,854 (88.4)
EQ-5D: mobility			χ²=1,574.23, p<0.01
No problems	3,760 (64.9)	25,099 (85.8)
Some problems	1,889 (32.6)	4,046 (13.8)
Confined to bed	148 (2.6)	110 (0.4)
EQ-5D: usual activities			χ²=1,910.65, p<0.01
No problems	4,191 (72.3)	26,825 (91.7)
Some problems	1,325 (22.9)	2,225 (7.6)
Unable to perform	278 (4.8)	203 (0.7)
EQ-5D: pain/discomfort			χ²=1,812.66, p<0.01
No	3,148 (54.3)	22,789 (77.9)
Moderate	2,058 (35.5)	5,862 (20.0)
Extreme	590 (10.2)	603 (2.1)
EQ-5D: anxiety/depression			χ²=3,746.10, p<0.01
No	3,647 (62.9)	26,866 (91.8)
Moderate	1,887 (32.6)	2,280 (7.8)
Extreme	262 (4.5)	109 (0.4)
EQ-VAS	63.76 (21.81)	75.03 (16.55)	T=-37.125, p<0.01
Depressed mood over 2 weeks			χ²=6,316.11, p<0.01
Yes	2,802 (48.2)	2,321 (7.9)
No	3,011 (51.8)	26,980 (92.1)
Stress level in daily life			χ²=3,295.15, p<0.01
Extremely	837 (14.4)	844 (2.9)
Stressful	2,429 (41.8)	5,524 (18.9)
Moderately	2,085 (35.9)	17,541 (59.9)
Minimally	457 (7.9)	5,389 (18.4)

N (%) or mean±SD,

chi-square test or independent t-test.

EQ-5D: EuroQoL-5D, EQ-VAS: EuroQoL-Visual Analogue Scale

The institutional review board of the National Center for Mental Health approved the protocol of this study (IRB approval number: 116271-2018-36).

Set assignment

Inputting all the data into the classifier to build the learning model will usually lead to a learning bias towards the majority class of non-suicide ideators (known as the “class imbalance problem”) [18]. Therefore, to create two classes of the same size, we randomly selected 5,814 individuals from the 29,302 non-suicide ideators via down-sampling. Thus, 11,628 individuals (5,814 suicide ideators and the same number of non-suicide ideators) were finally included in this study. We assigned the 11,628 subjects to a training set (n=10,466) and a test set (n=1,162), preserving the ratio of 1:1 between the two classes.

Data preprocessing and feature selection

We manually selected 47 variables that were likely to be related to suicide risk. Subsequently, we imputed missing data with the Multiple Imputation by Chained Equations (MICE) method and numeric data were normalized by z-scoring. To select the smallest subset of features that most accurately classifies suicide ideators, we performed recursive feature elimination with a random forest on the training set. We observed that a model trained with 39 features achieved the highest value of Kappa. However, to reduce the dimensionality as much as possible, we determined to use a simpler model trained with the last 15 features in the backward selection for which the Kappa was not much lower than that of the 39-feature model (Figure 1). The 15 selected features, in order of importance, were as follows: “depressed mood over two weeks,” “stress level in daily life,” “EuroQoL-5D (EQ-5D): anxiety/depression,” “EuroQoL-Visual Analogue Scale (EQ-VAS),” “sex,” “education,” “subjective health status,” “age,” “EQ-5D: mobility,” “reasons for unemployment,” “EQ-5D: pain/discomfort,” “days of feeling sick or discomfort,” “EQ-5D: usual activities,” “average work week,” and “limitation of daily life and social activities.”

Figure 1.

A plot of recursive feature elimination with feature selection in the test set.

Machine learning analysis

For the machine learning algorithm, we utilized a random forest model, which is based on ensembles of classification trees. The random forest approach builds numerous trees in bootstrapped samples and generates an aggregate tree by averaging across trees. For model development, 10-fold cross validation was used to avoid overfitting and to increase the generalization of the model. In the 10-fold cross validation, data in the training set are partitioned into 10 equally sized folds and each fold is used once as a validation set while the other 9 folds are used for training (Figure 2). Together, we performed hyperparameter optimization using the grid search method. Successively, the fitted model was used to predict the classes in the test set and the predicted classes were compared with the actual class. The model’s performance in predicting the classes was evaluated by using the area under receiver operating characteristic (ROC) curve (AUC). We calculated the accuracy, sensitivity, specificity, positive predictive value, and negative predictive value from the confusion matrix. To verify the model’s performance in a real population, we applied the model to the sample of 35,116 subjects who were aged over 19 years and had answered the question about suicide ideation in the KNHANES.

Figure 2.

Scheme of prediction model development.

All analyses were conducted in R version 3.4.3 (https://www.r-project.org/) and its packages, including “mice” for imputation of missing data and “caret” for down-sampling, feature selection, and cross validation.

RESULTS

The random forest model trained with 15 features showed a good performance (AUC=0.85), comparable to that of the model trained with 39 features, in predicting suicide ideators (Figure 3). The confusion matrices are presented in Table 2. In the test set, the 15-feature model predicted 448 subjects as suicide ideators from among the 581 actual suicide ideators. Meanwhile, among the 581 non-suicide ideators, 460 were classified correctly. Therefore, the model achieved an accuracy of 0.781, sensitivity of 0.771, specificity of 0.792, positive predictive value of 0.787, and negative predictive value of 0.776. When applying the model to the total of 35,116 subjects, the 15-feature model predicted suicide ideators with an accuracy of 0.821, sensitivity of 0.836, specificity of 0.807, positive predictive value of 0.462, and negative predictive value of 0.961.

Figure 3.

Receiver operating characteristic (ROC) curves. *15-feature model, †39-feature model. AUC: Area under ROC curve.

Table 2.

Confusion matrix and prediction scores

	Test set (N=1,162)	Entire population (N=35,116)
True positive	448	4,860
True negative	460	23,641
False positive	121	5,661
False negative	133	954
Accuracy	0.781	0.821
Sensitivity	0.771	0.836
Specificity	0.792	0.807
Positive predictive value	0.787	0.462
Negative predictive value	0.776	0.961

DISCUSSION

In this study, we applied a machine learning algorithm to public health data to develop a model predicting individuals with suicide ideation in the general population. When predicting suicide ideators in the test set, the machine learning model showed a good performance (AUC=0.85) with an accuracy of 78.3%. Moreover, we identified that the model could predict suicide ideators among the total population of about 35,000 with an accuracy of 82%. The predictive ability of the machine learning model is comparable to that of suicide risk assessment tools used in the clinical setting [19,20]. Some studies have been performed to predict suicide risk in clinical settings by using machine learning approaches. Passos et al. [21] distinguished suicide attempters from non-suicide attempters among patients with mood disorders with an accuracy of 65–72%, using machine learning algorithms based on demographic and clinical data. Oh et al. [22] classified individuals with a history of suicide attempts among patients with depression or anxiety disorders by applying artificial neural networks to multiple psychiatric scales and sociodemographic data with an accuracy of 87–91%. Moreover, a recent study investigated the probability of death by suicide using general characteristics and insurance data from the National Health Insurance Service cohort in Korea, showing fair performance (AUC=0.68) of machine learning models in predicting death by suicide [23]. In the present study, we intended to develop a machine learning model predicting suicide risk in the general population. To this end, we analyzed big data from annual nationwide surveys on health and nutrition status in the general population. To ensure the prediction model could learn more information, we chose suicide ideation, rather than rarer suicide attempt, as an indicator of potential suicide risk. This study showed that machine learning based prediction models can successfully classify suicide ideators among the general population by using simple information about physical and mental health status, as well as demographic characteristics. As the number of features grows, the amount of data we need to generalize accurately grows exponentially [24]. Therefore, to avoid the so-called “curse of dimensionality” and to increase the generalization of our model, we selected as few features as possible via feature selection to train the prediction model for suicide ideators. In the training set, the model trained with 39 features showed the highest value of Kappa. However, we chose the simpler model trained with 15 features for which performance was not worse than that of the 39-feature model. We expected that the simpler model would enable easier interpretation of the results and application to other new population data. In this study, we used variables related to mental and physical health, as well as demographic characteristics, as features classifying suicide ideators. In our prediction model, depression, anxiety, and stress were the most important features predicting suicide ideators. According to a recent survey of mental disorder in Korea, about 40% of suicide ideators were found to experience mood or anxiety disorders [8]. Several studies have suggested that academic, work, and life event stresses are associated with suicide ideation [25,26]. Physical factors such as having somatic symptoms or medical illnesses also can be important features that distinguish suicide ideators [27]. Indeed, suicide ideation is often accompanied by somatic symptoms in patients with depression [28]. Moreover, the burden of physical health conditions itself is a major risk factor for suicide [29]. In relation to both mental and physical health, the score for quality of life (“EQ-VAS”) played an important role in classifying suicide ideators in our prediction model. Among sociodemographic factors, “sex,” “education,” “age,” “reasons for unemployment,” and “average work week” were included in a set of 15 features for the prediction model. It is known that there are age and gender differences in factors associated with suicide ideation and behaviors [30,31]. Furthermore, some studies have reported an association between educational level and suicide risk [32]. There is also evidence that working-related factors may be related to suicide outcomes [33,34]. This study is subject to some methodological limitations. First, data from the KNHANES included information about suicide ideation and psychological status that was examined by using very simple questions and scales, which might affect the performance of the prediction model. Second, the 1-year prevalence of suicide ideation in this study (16.6%) was much higher than that of an epidemiological survey of mental disorders in Korea in 2016 (2.9%) [8]. This is because the definition of suicide ideation in the KNHANES included mild, fleeting forms. Third, when applying the model to the total sample, the positive predictive rate remained at 46.2%. This was due to a low ratio of suicide ideators among the total subjects. Fourth, we used only one machine learning algorithm, a random forest model. Additional analyses are warranted to compare the performance of prediction models with other machine learning algorithms, such as support vector machines and artificial neural networks. In conclusion, this study showed that a machine learning model based on public health data can successfully predict individuals with suicide ideation among the general population. Further studies are needed to apply machine learning techniques to public health data, clinical data, and biomarkers to develop prediction models of more critical suicide risk such as self-harm and suicide attempt.

30 in total

1. A learning method for the class imbalance problem with medical data sets.

Authors: Der-Chiang Li; Chiao-Wen Liu; Susan C Hu
Journal: Comput Biol Med Date: 2010-03-26 Impact factor: 4.589

2. Is suicidal ideation linked to working hours and shift work in Korea?

Authors: Chang-Gyo Yoon; Kyu-Jung Bae; Mo-Yeol Kang; Jin-Ha Yoon
Journal: J Occup Health Date: 2015-03-06 Impact factor: 2.708

3. Suicidal ideation and suicide attempts in general medical illnesses.

Authors: B Druss; H Pincus
Journal: Arch Intern Med Date: 2000-05-22

4. Age and sex-related differences in risk factors for elderly suicide: Differentiating between suicide ideation and attempts.

Authors: Hyuk Lee; Ki Ho Seol; Jun Won Kim
Journal: Int J Geriatr Psychiatry Date: 2017-10-02 Impact factor: 3.485

5. Lifetime prevalence and correlates of suicidal ideation, plan, and single and multiple attempts in a Korean nationwide study.

Authors: Hong Jin Jeon; Jun-Young Lee; Young Moon Lee; Jin Pyo Hong; Seung-Hee Won; Seong-Jin Cho; Jin-Yeong Kim; Sung Man Chang; Dongsoo Lee; Hae Woo Lee; Maeng Je Cho
Journal: J Nerv Ment Dis Date: 2010-09 Impact factor: 2.254

6. Predicting the Future - Big Data, Machine Learning, and Clinical Medicine.

Authors: Ziad Obermeyer; Ezekiel J Emanuel
Journal: N Engl J Med Date: 2016-09-29 Impact factor: 91.245

7. Gender differences in factors associated with suicidal ideation and depressive symptoms among middle-aged workers in Japan.

Authors: Norio Sugawara; Norio Yasui-Furukori; Giro Sasaki; Osamu Tanaka; Takashi Umeda; Ippei Takahashi; Kazuma Danjo; Masashi Matsuzaka; Sunao Kaneko; Shigeyuki Nakaji
Journal: Ind Health Date: 2012-12-26 Impact factor: 2.179

Review 8. Suicide prevention strategies revisited: 10-year systematic review.

Authors: Gil Zalsman; Keith Hawton; Danuta Wasserman; Kees van Heeringen; Ella Arensman; Marco Sarchiapone; Vladimir Carli; Cyril Höschl; Ran Barzilay; Judit Balazs; György Purebl; Jean Pierre Kahn; Pilar Alejandra Sáiz; Cendrine Bursztein Lipsicas; Julio Bobes; Doina Cozman; Ulrich Hegerl; Joseph Zohar
Journal: Lancet Psychiatry Date: 2016-06-08 Impact factor: 27.083

9. Gender Differences in Somatic Symptoms and Current Suicidal Risk in Outpatients with Major Depressive Disorder.

Authors: Hong Jin Jeon; Jong-Min Woo; Hyo-Jin Kim; Maurizio Fava; David Mischoulon; Seong Jin Cho; Sung Man Chang; Doo-Heum Park; Jong Woo Kim; Ikki Yoo; Jung-Yoon Heo; Jin Pyo Hong
Journal: Psychiatry Investig Date: 2016-11-24 Impact factor: 2.505

10. Classification of Suicide Attempts through a Machine Learning Algorithm Based on Multiple Systemic Psychiatric Scales.

Authors: Jihoon Oh; Kyongsik Yun; Ji-Hyun Hwang; Jeong-Ho Chae
Journal: Front Psychiatry Date: 2017-09-29 Impact factor: 4.157

12 in total

Review 1. A Comprehensive Review of Computer-Aided Diagnosis of Major Mental and Neurological Disorders and Suicide: A Biostatistical Perspective on Data Mining.

Authors: Mahsa Mansourian; Sadaf Khademi; Hamid Reza Marateb
Journal: Diagnostics (Basel) Date: 2021-02-25

2. Detection of Suicide Attempters among Suicide Ideators Using Machine Learning.

Authors: Seunghyong Ryu; Hyeongrae Lee; Dong-Kyun Lee; Sung-Wan Kim; Chul-Eung Kim
Journal: Psychiatry Investig Date: 2019-08-21 Impact factor: 2.505

3. How to Develop Psychiatry Investigation into a World Class Journal.

Authors: Heon-Jeong Lee
Journal: Psychiatry Investig Date: 2019-12-25 Impact factor: 2.505

4. Screening for Depression in Mobile Devices Using Patient Health Questionnaire-9 (PHQ-9) Data: A Diagnostic Meta-Analysis via Machine Learning Methods.

Authors: Sunhae Kim; Kounseok Lee
Journal: Neuropsychiatr Dis Treat Date: 2021-11-20 Impact factor: 2.570

5. The Development of a Suicidal Ideation Predictive Model for Community-Dwelling Elderly Aged >55 Years.

Authors: Kyoung-Sae Na; Zong Woo Geem; Seo-Eun Cho
Journal: Neuropsychiatr Dis Treat Date: 2022-02-02 Impact factor: 2.570

6. A machine-learning model to predict suicide risk in Japan based on national survey data.

Authors: Po-Han Chou; Shao-Cheng Wang; Chi-Shin Wu; Masaru Horikoshi; Masaya Ito
Journal: Front Psychiatry Date: 2022-08-04 Impact factor: 5.435

7. Identification of Suicidal Ideation in the Canadian Community Health Survey-Mental Health Component Using Deep Learning.

Authors: Sneha Desai; Myriam Tanguay-Sela; David Benrimoh; Robert Fratila; Eleanor Brown; Kelly Perlman; Ann John; Marcos DelPozo-Banos; Nancy Low; Sonia Israel; Lisa Palladini; Gustavo Turecki
Journal: Front Artif Intell Date: 2021-06-24

8. Development of an early-warning system for high-risk patients for suicide attempt using deep learning and electronic health records.

Authors: Le Zheng; Oliver Wang; Shiying Hao; Chengyin Ye; Modi Liu; Minjie Xia; Alex N Sabo; Liliana Markovic; Frank Stearns; Laura Kanov; Karl G Sylvester; Eric Widen; Doff B McElhinney; Wei Zhang; Jiayu Liao; Xuefeng B Ling
Journal: Transl Psychiatry Date: 2020-02-20 Impact factor: 6.222

9. Which PHQ-9 Items Can Effectively Screen for Suicide? Machine Learning Approaches.

Authors: Sunhae Kim; Hye-Kyung Lee; Kounseok Lee
Journal: Int J Environ Res Public Health Date: 2021-03-24 Impact factor: 3.390

10. Implementing Precision Psychiatry: A Systematic Review of Individualized Prediction Models for Clinical Practice.

Authors: Gonzalo Salazar de Pablo; Erich Studerus; Julio Vaquerizo-Serrano; Jessica Irving; Ana Catalan; Dominic Oliver; Helen Baldwin; Andrea Danese; Seena Fazel; Ewout W Steyerberg; Daniel Stahl; Paolo Fusar-Poli
Journal: Schizophr Bull Date: 2021-03-16 Impact factor: 9.306