Literature DB >> 36263295

Machine learning prediction of suicidal ideation, planning, and attempt among Korean adults: A population-based study.

Abstract

Background: Suicide remains the leading cause of premature death in South Korea. This study aims to develop machine learning algorithms for screening Korean adults at risk for suicidal ideation and suicide planning or attempt.
Methods: Two sets of balanced data for Korean adults aged 19-64 years were drawn from the 2012-2019 waves of the Korea Welfare Panel Study using the random down-sampling method (N = 3292 for the prediction of suicidal ideation, N = 488 for the prediction of suicide planning or attempt). Demographic, socioeconomic, and psychosocial characteristics were used to predict suicidal ideation and suicide planning or attempt. Four machine-learning classifiers (logistic regression, random forest, support vector machine, and extreme gradient boosting) were tuned and cross-validated.
Results: All four algorithms demonstrated satisfactory classification performance in predicting suicidal ideation (sensitivity 0.808-0.853, accuracy 0.843-0.863) and suicide planning or attempt (sensitivity 0.814-0.861, accuracy 0.864-0.884). Extreme gradient boosting was the best-performing algorithm for predicting both suicidal outcomes. The most important predictors were depressive symptoms, self-esteem, income, consumption, and life satisfaction. The algorithms trained with the top two predictors, depressive symptoms and self-esteem, showed comparable classification performance in predicting suicidal ideation (sensitivity 0.801-0.839, accuracy 0.841-0.846) and suicide planning or attempt (sensitivity 0.814-0.837, accuracy 0.874-0.884). Limitations: Suicidal ideation and behaviors may be under-reported due to social desirability bias. Causality is not established. Discussion: More than 80% of individuals at risk for suicidal ideation and suicide planning or attempt could be predicted by a number of mental and socioeconomic characteristics of respondents. This finding suggests the potential of developing a quick screening tool based on the known risk factors and applying it to primary care or community settings for early intervention.

Entities: Chemical

Keywords: Machine learning; Predictive modeling; Self-harm; Suicidal ideation; Suicide planning

Year: 2022 PMID： 36263295 PMCID： PMC9573904 DOI： 10.1016/j.ssmph.2022.101231

Source DB: PubMed Journal: SSM Popul Health ISSN： 2352-8273

Introduction

Suicide is a major public health concern in South Korea (Korea hereafter). In 2019, there were 13195 cases of suicide, or roughly one death by suicide every 38 min (Korean Statistical Information Service, 2020). A survey of more than 18000 Korean adults indicated that about 3.8% seriously considered suicide at some point in any given year and 0.5% planned or attempted suicide (Pak & Choung, 2020). The high suicide rate in Korea is often termed a “suicide epidemic,” highlighting a large fraction of the population at risk for suicide (Raschke et al., 2022). Previous studies have indicated that suicide completers rarely seek psychiatric treatment before they commit suicide, although they often visit primary care providers to address physical symptoms associated with poor mental health (Zalsman et al., 2016). This suggests the potential of a prospective screening tool for use in primary care or community settings to identify potential recipients of preventive interventions (Raue et al., 2014). Suicide is a complex, multifaceted phenomenon in which a constellation of risk markers mediates its pathogenic mechanisms (De Berardis et al., 2018). Shneidman's (1993) theory of suicide identifies psychache (i.e., psychological and emotional pain at intolerable intensity) as the primary motivator of suicidal desire, highlighting the role of mental illness. Baumeister's (1990) escape theory posits that self-blame for difficult life events and negative self-perceptions interact to confer the desire for suicide. Empirical studies found elevated levels of mental health symptoms, including anhedonia, anxiety disorders, depression, and self-esteem deficits, among those who reported suicidal ideation or suicide attempts (Bhar et al., 2008; Ducasse et al., 2018; Fredriksen et al., 2017). Emerging evidence also pointed to subjective assessment of life conditions as possible mechanisms through which other risk markers may operate (Heisel & Flett, 2004; Suh et al., 2021). In the Korean context, various dimensions of economic insecurity were associated with suicidal ideation and behaviors (Kim & You, 2019; Raschke et al., 2022; Yoon et al., 2017). Researchers have found that no single factor is sufficient to explain suicide, but several interacting factors jointly contribute to suicide risk (Kuroki & Tilley, 2012). Recent studies have called for a multifocal approach to suicide prediction, which considers a full spectrum of risk markers validated by theories and empirical research (Choi et al., 2018; Kuroki & Tilley, 2012; Passos et al., 2016). Machine learning (ML) has emerged as a promising analytic tool for integrating complex risk markers into clinical signatures of suicide. The idea behind ML is to learn the typical characteristics of a class (e.g., those with suicidal symptoms) from past data and apply this knowledge to unseen data to identify a sample with similar characteristics. Typically, estimating a reliable ML algorithm requires a large number of inputs that contain useful information for distinguishing one group of subjects from others, and iteratively tuning the algorithms to achieve higher predictive accuracy. Early attempts to predict suicide have used unstructured text data, such as social media posts (Cash et al., 2013; Jashinsky et al., 2014) and counseling transcripts (Pestian et al., 2010), to detect distinctive patterns in natural language related to suicide. While these approaches demonstrated methodological potential of ML, they could not be used in the primary care or community setting because they rely on unconventional big data that is not available at the time of screening. Subsequent research showed that ML algorithms based on survey or clinical data could achieve a satisfactory level of accuracy in classifying individuals who are likely to think about suicide (Hill et al., 2017; Ryu et al., 2018), attempt suicide (Bae et al., 2015; Oh et al., 2017; Passos et al., 2016), and complete suicide (Choi et al., 2018; Jiang et al., 2021). This study extends the previous literature by providing an empirical foundation for the prediction model via ML modeling of population-based longitudinal data. Specifically, this study explores whether the predictive signature of suicide derived from easily accessible demographic, socioeconomic, and psychosocial variables in the population data can help distinguish suicide-prone individuals from their nonsuicidal peers. Using a representative community sample of Korean adults, we estimated a series of ML algorithms that link individual- and household-level predictors to suicidal ideation and suicide planning or attempt. Predictors were selected based on a priori knowledge and related theories. Our ML models integrate the information from multiple predictors to subsequently estimate an individual's probability or risk of being a suicide ideator or a suicide planner/attempter over the next 12 months. The prediction models developed here may help identify priority targets for prevention and intervention efforts. Suicidal ideation is an important precursor of suicide planning or attempt, with 15.6% of ideators going on to make an attempt within 12 months (Borges et al., 2006) and 31.8% making an attempt at some point in their lifetime (Nock et al., 2008). Among those who attempt suicide, a significant portion re-attempt suicide and eventually die (Suominen et al., 2004). Clinical studies commonly find that a substantial percentage of suicide ideators experience psychiatric illness or poor psychosocial conditions as they progress to attempt (see O'Connor et al., 2013). This background represents an important opportunity to prospectively screen people at risk for suicide and intervene with appropriate prevention measures. The application of ML to large-scale population data may help develop an efficient screening system for the general population.

Methods

Data description

This study used the 2012–2019 waves of the Korea Welfare Panel Study (KoWePS), conducted by the Korea Institute for Health and Social Affairs and Seoul National University. The KoWePS is a longitudinal cohort study that annually follows a nationally representative sample of South Korean households. The first study was conducted in 2006, with 18856 participants from 7072 households. The initial sample was selected from 16 provincial districts in proportion to the population size of each district using stratified multistage cluster sampling. Interviews were conducted by trained interviewers at the participants’ homes via computer assisted personal interview. The interviewers were individuals aged 18 or higher with experiences or interests in social surveys. After each survey, the KoWePS randomly selected 10% of the recorded responses and conducted a post-interview quality check. If the survey was not conducted according to the guideline, an additional interview was conducted over the phone. The topics included in KoWePS were demographic background, economic characteristics, social service needs, health status, healthcare utilization patterns, and psychosocial well-being. All participants provided informed consent before participating in the survey. The details of the survey protocol and sampling design are available elsewhere (https://www.koweps.re.kr/). The study sample was restricted to individuals aged 19–64 years. The age of 19 is the age at which a Korean citizen is legally recognized as an adult. Those older than 64 years were not included in this study as their health and socioeconomic characteristics might be different from those of younger cohorts in unobserved ways.1 The baseline data included 60568 observations from 11114 individuals with no missing data. Each observation comprised 57 features (including two measures of suicidality) that were considered for predictive modeling.

Suicidal ideation and suicide planning or attempt

A binary indicator of suicidal ideation was based on an affirmative response to the interview question, “Have you seriously considered committing suicide in the past year?”. A binary indicator of suicide planning or attempt was based on an affirmative response to the questions, “Have you made a concrete plan to commit suicide in the past year?” or “Have you made an attempt to commit suicide in the past year?”. This coding scheme leads to 1646 person-level observations of suicidal ideation and 244 person-level observations of suicide planning or attempt. The two binary measures were used to label each observation in the classification problem below.

Class imbalance problem

Learning from data with a severely imbalanced target variable poses empirical challenges for ML research (Libbrecht & Noble, 2015). It is a particularly salient issue in suicide prediction because the size of the no-suicide-risk group far exceeds the size of the at-risk group (Passos et al., 2016). Under this setting, comparing predicted probabilities for suicidal outcome to a default cutoff of 0.5 leads to high specificity but low sensitivity, making it difficult to assess the algorithm's classification performance. One way to circumvent this issue is to under-sample the majority class (i.e., those with no risk of suicide) so that the sample is balanced across target labels (Passos et al., 2016; Ryu et al., 2018).2 In this study, we created two sets of balanced data: one for predicting suicidal ideation and one for predicting suicide planning or attempt. Specifically, a total of 1646 observations were randomly drawn from the pooled sample of no suicidal ideation, yielding a balanced dataset of 3292 observations (data A). We also randomly selected 244 observations from the pooled sample of no suicide planning or attempt to create a balanced dataset of 488 observations (data B). The average sample characteristics are presented in Table 1 with a number of representative features.

Table 1

Descriptive statistics.

	Full sample		Suicidal ideation		Suicide planning or attempt
	(N = 60568)		(N = 3292)		(N = 488)
	Mean	SD	Mean	SD	Mean	SD
Age (19–64)	43.9	12.2	45.7	12.2	45.9	12.4
Female (0,1)	0.54		0.57		0.60
Education background (0,1)	0.45		0.34		0.31
Marital status (0,1)	0.67		0.58		0.57
No. of household members	3.28	1.21	3.01	1.29	2.98	1.26
Employment status (0,1)	0.71		0.62		0.56
Region of residence (0,1)	0.40		0.41		0.43
Religion (0,1)	0.46		0.45		0.46
Household income (in 2019 KRW)	5962.5	5900.4	4803.3	3776.4	4598.4	3603.6
Household consumption (in 2019 KRW)	423.3	249.5	360.2	238.0	344.3	230.2
Household net worth (in 2019 KRW)	13625.9	36446.7	10024.8	31557.9	9378.9	32144.1
Social welfare receipt (0,1)	0.06		0.15		0.19
No. of outpatient visits	10.2	19.4	16.7	30.0	18.6	32.6
Poor self-rated health (0,1)	0.25		0.40		0.45
Disability (0,1)	0.06		0.12		0.14
Any chronic disease (0,1)	0.37		0.49		0.55
Smoking (0,1)	0.22		0.26		0.27
Drinking (0,1)	0.59		0.55		0.51
CESD score (0–33)	2.68	4.03	6.68	7.10	8.31	8.42
Self-esteem score (0–30)	21.3	3.81	18.8	5.15	17.8	5.86
Experience of physical abuse from spouse (0,1)	0.67		0.60		0.58

Notes: N, number of observations; SD, standard deviation.

Descriptive statistics. Notes: N, number of observations; SD, standard deviation.

Predictor variables

A common practice in ML research is to select predictors based on expert knowledge, preferably published studies. Following Passos et al. (2016), we conducted a structured search of the PubMed database to identify published articles that reported the determinants of increased suicidal risk. In all, 55 predictors were selected according to the literature review and their availability in KoWePS (Supplementary Table S1). These include demographic, socioeconomic, health and well-being, and early life characteristics of participants and their households that are likely to be correlated with suicidal risk as predicted by underlying theories (Baumeister, 1990; Shneidman, 1993). A recursive feature elimination (RFE) algorithm was used to identify a subset of predictors that ensured the highest classification performance. RFE is a feature selection method that recursively eliminates weak predictors to reduce dependencies and collinearity that may exist in the model. This study used logistic regression and 10-fold cross-validation with three repeats to evaluate the model's classification performance during the elimination process. We found that when the target label was suicidal ideation, the model trained with 39 predictors achieved the highest kappa value (Fig. 1). For the model that predicted suicide planning or attempt, 26 predictors led to the highest kappa. The selected predictors are shaded gray in Supplementary Table S2.

Fig. 1

Recursive feature elimination results.

Machine learning algorithms

We employed four sets of predictive algorithms for comparison: logistic regression, support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost). Logistic regression assumes a linear relationship between the logit of the outcome and predictor variables, and has been widely used in ML research on suicide predictions (Jung et al., 2019; Kessler et al., 2017; van Mens et al., 2020; Walsh et al., 2017). The SVM finds a linear hyperplane that maximizes the separation or margin between two classes in a higher-dimensional feature space (Choi et al., 2018). It makes a nonlinear mapping of the original data into a higher dimension to form a decision surface suitable for classification. RF is a tree-based algorithm that ensembles a number of decision trees based on bootstrapped samples and aggregates votes (predicted class) from each tree (Breiman, 2001). It improves upon the classification tree by considering a random subspace of predictors when building a tree and by creating a diverse set of trees that contribute to classification performance. XGBoost is a scalable tree-boosting algorithm proposed by Chen and Guestrin (2016). The algorithm derives from the idea of boosting, which combines the prediction results of the “weak” learners with those of the “strong” learners through cumulative training instances. It has been shown to have desirable properties, including regularization, handling of missing values, flexible evaluation criteria, optimized computation processes, and high classification performance. Logistic regression has the advantage that it is fully interpretable and efficient for training. Other algorithms considered here have been shown to provide a more accurate classification than logistic regression when the data are linearly inseparable. We trained and tuned these four ML algorithms independently and compared their classification performances. All analyses were performed using R/RStudio version 4.1.0 (Integrated Development Environment for R, Boston, MA) and caret package (Kuhn, 2008).

Algorithm development and validation

A unique challenge of ML is that the algorithm's generalizability to unseen data cannot be assessed until future data arrives. Techniques such as cross-validation help evaluate the algorithm's out-of-sample performance by setting aside a portion of data as “unseen” and using the remaining data for algorithm training. In this study, we followed the strategy of Xue et al. (2018) of using more recent survey waves for algorithm testing and earlier survey waves for algorithm training. The implicit assumption here is that recent data are more reflective of future data and therefore more suitable for evaluating the algorithm's predictive performance. Specifically, we set aside the last two waves (2018 and 2019 surveys) as the test set and used the older waves as the training set. For SVM, RF, and XGBoost, the hyperparameters were optimized using a grid search on 10 randomly selected training and validation sets. Grid search is a tuning technique that pursues the optimum values of hyperparameters through an exhaustive search. Testing each hyperparameter setup requires our data to be partitioned into training and validation sets. Here, we created 10 equally sized random folds of data, where each fold was used once as a validation set and the other nine folds were used for training. This evaluation process was repeated three times, and the classification performance of the algorithm was averaged over the repeats (10-fold cross-validation with three repeats). Each set of hyperparameters undergoes this evaluation process until we find the set with the highest classification accuracy (Fig. 2). SVM, RF, and XGBoost at the optimal setting, along with logistic regression, were used to predict suicidal outcomes in the test set. The classification performance of the final algorithm was assessed using the area under the curve (AUC), sensitivity, specificity, positive predictive value, negative predictive value, and accuracy. Our interpretation places greater emphasis on sensitivity because suicide prevention aims to minimize false negatives.

Fig. 2

Data construction and model development.

Results

Our data comprised of 3292 observations for predicting suicidal ideation and 488 observations for predicting suicide planning or attempt (Table 1). In a sample of 3292 observations, there were 57% female, 34% college graduates, and 58% married individuals. The mean age was 45.7, with a standard deviation of 12.2, and the mean number of family members was 3.01. The sample was predominantly employed (62%), non-smokers (74%), and non-disabled (88%). The sample for suicide planning or attempt exhibits similar characteristics, except that it includes greater share of welfare beneficiaries, disabled respondents, and those with poor self-rated health. This difference is consistent with our knowledge that suicide planning or attempt are more prevalent in disadvantaged populations. Optimal hyperparameters were obtained for each ML algorithm using the grid search method (Table 2). We trained the SVM with three different kernels (linear, radial, and polynomial) and found that the linear kernel at the optimal cost achieved the highest accuracy with the validation set. The optimal setup for RF was determined through exhaustive evaluation of the algorithm at two possible split rules (Gini and extratrees) and varying degrees of minimal node size and the number of randomly selected predictors considered for a split. XGBoost was tuned over boosting iterations, maximum tree depth, shrinkage, minimum loss reduction, subsample ratio of columns, minimum sum of instance weight, and subsample percentage. The classification results below were generated by each algorithm using the optimal hyperparameters.

Table 2

Tuned hyperparameters.

Panel A: suicidal ideation (N = 3292)
SVM	kernel = linear, cost = 1.5
RF	split rule = Gini, minimal node size = 2, number of randomly selected predictors = 6
XGBoost	number of boosting iterations = 120, max tree depth = 3, shrinkage = 0.04, minimum loss reduction = 2, subsample ratio of columns = 0.55, minimum sum of instance weight = 5, and subsample percentage = 1
Panel B: suicide planning or attempt (N = 488)
SVM	kernel = linear, cost = 0.5
RF	split rule = Gini, minimal node size = 2, number of randomly selected predictors = 7
XGBoost	number of boosting iterations = 60, max tree depth = 5, shrinkage = 0.04, minimum loss reduction = 3, subsample ratio of columns = 0.5, minimum sum of instance weight = 6, and subsample percentage = 1