Literature DB >> 33156863

Predicting suicide attempt or suicide death following a visit to psychiatric specialty care: A machine learning study using Swedish national registry data.

Qi Chen1, Yanli Zhang-James2, Eric J Barnett3,4, Paul Lichtenstein1, Jussi Jokinen5,6, Brian M D'Onofrio1,7, Stephen V Faraone2,3, Henrik Larsson1,8, Seena Fazel9.   

Abstract

BACKGROUND: Suicide is a major public health concern globally. Accurately predicting suicidal behavior remains challenging. This study aimed to use machine learning approaches to examine the potential of the Swedish national registry data for prediction of suicidal behavior. METHODS AND
FINDINGS: The study sample consisted of 541,300 inpatient and outpatient visits by 126,205 Sweden-born patients (54% female and 46% male) aged 18 to 39 (mean age at the visit: 27.3) years to psychiatric specialty care in Sweden between January 1, 2011 and December 31, 2012. The most common psychiatric diagnoses at the visit were anxiety disorders (20.0%), major depressive disorder (16.9%), and substance use disorders (13.6%). A total of 425 candidate predictors covering demographic characteristics, socioeconomic status (SES), electronic medical records, criminality, as well as family history of disease and crime were extracted from the Swedish registry data. The sample was randomly split into an 80% training set containing 433,024 visits and a 20% test set containing 108,276 visits. Models were trained separately for suicide attempt/death within 90 and 30 days following a visit using multiple machine learning algorithms. Model discrimination and calibration were both evaluated. Among all eligible visits, 3.5% (18,682) were followed by a suicide attempt/death within 90 days and 1.7% (9,099) within 30 days. The final models were based on ensemble learning that combined predictions from elastic net penalized logistic regression, random forest, gradient boosting, and a neural network. The area under the receiver operating characteristic (ROC) curves (AUCs) on the test set were 0.88 (95% confidence interval [CI] = 0.87-0.89) and 0.89 (95% CI = 0.88-0.90) for the outcome within 90 days and 30 days, respectively, both being significantly better than chance (i.e., AUC = 0.50) (p < 0.01). Sensitivity, specificity, and predictive values were reported at different risk thresholds. A limitation of our study is that our models have not yet been externally validated, and thus, the generalizability of the models to other populations remains unknown.
CONCLUSIONS: By combining the ensemble method of multiple machine learning algorithms and high-quality data solely from the Swedish registers, we developed prognostic models to predict short-term suicide attempt/death with good discrimination and calibration. Whether novel predictors can improve predictive performance requires further investigation.

Entities:  

Year:  2020        PMID: 33156863      PMCID: PMC7647056          DOI: 10.1371/journal.pmed.1003416

Source DB:  PubMed          Journal:  PLoS Med        ISSN: 1549-1277            Impact factor:   11.069


Introduction

Suicide is a major public health concern globally. Predicting suicidal behavior is challenging both at the population level and among high-risk groups. The accuracy of predicting suicidal behavior based on clinical judgment varies considerably across clinicians. Risk factors known to be strongly associated with suicidal behaviors are weak predictors on their own [1]. One meta-analysis assessing the sensitivity and specificity of 15 different instruments for suicide and suicide attempt concluded that none of these instruments provided sufficient diagnostic accuracy defined by the authors (i.e., 80% for sensitivity and 50% for specificity) [2]. However, using a lower threshold for discrimination measures, it is possible for a prediction model to achieve the specified diagnostic accuracy, though most likely at the cost of a reduced positive predictive value (PPV). Another meta-analysis assessing the performance of previously reported psychological scales, biological tests, and third-generation scales derived from statistical modeling (mostly using conventional multivariable regression) for prediction of suicidal behavior [3]. The authors, who did not synthesize performance metrics other than PPV, reported a pooled PPV of 39% for the third-generation scales for predicting suicide attempts/deaths. One potential explanation for the modest predictive performance is that the data used for previous model development did not contain enough information for making accurate prediction. It is also possible that prediction of suicidal behavior is too complex to be based on a few simplified theoretical hypotheses [4]. Although currently not feasible to implement, difficult to understand by clinicians, and lacking transparency [5], machine learning algorithms have been applied to large-scale data such as electronic medical records for predicting suicidal behavior. In machine learning analytics, selecting candidate predictors may benefit from established theories and clinical expertise. When given access to large amounts of new data, machine learning may serve as an efficient and flexible approach to exploring the predictive potential of the new data. Meanwhile, machine learning algorithms usually identify far more complex data patterns than conventional methods, though at the cost of decreased interpretability [4]. Belsher and colleagues systematically reviewed 64 machine learning–based prediction models for suicide and suicide attempts in 17 studies and found that, despite good overall discrimination accuracy, the PPVs remained low, with inadequate information on negative predictive value (NPV) and calibration [6]. In their subsequent simulation analyses, they demonstrated that the achievable PPV was limited by the rarity of suicide even when sensitivity and specificity are hypothetically set to be nearly perfect. They thus recommended future research focus on predicting more common outcomes such as suicide attempts [6]. Most prior studies on predicting suicide or suicide attempts have been limited by small sample sizes. Very few of them provided a comprehensive report of model discrimination, including sensitivity, specificity, PPV, and NPV, as well as calibration. Only a single type of model (e.g., random forest model or regression model) was selected in most studies. Data from multiple Swedish national registers have been used to develop a multivariable regression model for predicting suicide among patients with schizophrenia spectrum disorders and bipolar disorder [7]. To date, the data have never been combined with machine learning to predict suicidal behavior in the setting of psychiatric specialty care. In this study, we aimed to examine the achievable performance of models trained by several machine learning algorithms using the Swedish registry data. We developed prognostic prediction models for suicide attempt/death within 90 and 30 days following an in-/outpatient visit to psychiatric specialty care, using predictors generated via linkage between multiple Swedish national registers.

Methods

Ethics approval

The study was approved by the Regional Ethical Review Board in Stockholm, Sweden (reference number: 2013/862-31/5). The requirement for informed consent was waived because of the retrospective nature of the registry-based study.

Data sources

Each individual registered as resident in Sweden is assigned a unique personal identity number, enabling linkage between the Swedish national registers [8]. The registers used in the current study are listed as follows: The Medical Birth Register covers nearly all deliveries in Sweden since 1973 [9]; The Total Population Register was established in 1968 and contains data on sex, birth, death, migration, and family relationship for Swedish residents who were born since 1932 [10]; The Multi-Generation Register, as part of the Total Population Register, links individuals born since 1932 and registered as living in Sweden since 1961 to their biological parents [11]; The longitudinal integration database for health insurance and labor market studies (LISA) launched in 1990 and contains annually updated data on socioeconomic status (SES) such as education, civil status, unemployment, social benefits, family income, and many other variables for all Swedish residents aged 16 years or older [12]; The National Patient Register covers inpatient care since 1964 (psychiatric care since 1973) and outpatient visits to specialty care since 2001, with a PPV of 85% to 95% for most disorders [13]; The Prescribed Drug Register contains data on all prescribed medications dispensed at pharmacies in Sweden since July 2005 [14]; Active ingredients of medications are coded according to the anatomical therapeutic chemical (ATC) classification system [14]; The National Crime Register provides data on violent and nonviolent crime convictions since 1973 [15].

Study sample

The study sample consisted of any in-/outpatient visits by a patient aged 18 to 39 to psychiatric specialty care in Sweden between January 1, 2011 and December 31, 2012, with a primary diagnosis of any mental and behavioral disorders according to the International Classification of Diagnosis, 10th edition (ICD-10: F00–F99). To ensure reasonable data quality and minimize missingness of the predictors, only Sweden-born patients were included in the study. Patients who emigrated before the visit died on the same day of the visit, or lacked information on identity of either parent were excluded. A flowchart for identification of the study sample can be found in the online supplement (S1 Fig). The final study sample included 541,300 eligible visits by 126,205 patients during the study period.

Outcome

In the current study, the outcome of interest was suicidal event, either suicide attempt or death by suicide. Consistent with previous research [16], suicide attempt was defined as intentional self-harm (ICD-10: X60–X84) or self-harm of undetermined intent (ICD-10: Y10-Y34) in the National Patient Register. Only unplanned inpatient or outpatient visits with a recorded self-harm were labeled as incident suicide attempts. Planned visits were likely to be follow-up healthcare appointments following an incident suicide attempt and thus were not classified as suicide attempts for our analysis. Any hospitalization, including stay at emergency department, stretching over more than 1 day is only registered once (as it is based on a discharge date) and thus was coded as 1 visit. Suicide was defined as any recorded death from intentional self-harm (ICD-10: X60–X84) or self-harm of undetermined intent (ICD-10: Y10-Y34) in the Cause of Death Register. We chose to predict suicide attempt/death within 2 time windows, namely 90 and 30 days following a visit to psychiatric specialty care, given the time windows are likely to be meaningful for short- to medium-term interventions. These 2 outcomes were selected also to ensure a certain proportion of positive cases (more than 1%) and to facilitate comparison with prior studies [16,17].

Predictors

When selecting potential predictors, we took into consideration previous studies on suicidal behavior [7,16,18,19] and the availability and quality of the information in the Swedish national registers. Predictors prior to or at each visit covered demographic characteristics (sex and age at the visit), SES (family income, educational attainment, civil status, unemployment, and social benefits), electronic medical records (planned/unplanned visit, in-/outpatient visit, clinical diagnoses of psychiatric and somatic disorders, intentional self-harm and self-harm of undetermined intent, methods used for self-harm, and dispensed medications), criminality (violent and nonviolent criminal offenses), and family history of disease and crime. To better utilize information on timing of the predictors, we generated predictors related to clinical diagnoses using several arbitrary time windows (i.e., at the index visit, within 1 month, 1 to 3 months, 3 to 6 months, 6 to 12 months, 1 to 3 years, and 3 to 5 years before the index visit), assuming better predictive power by events occurring more recently. Prior intentional self-harm and self-harm of undetermined intent were treated as separate predictors. Methods for prior self-harm were first categorized according to the first 3 digits of the ICD-codes (ICD-10: X60–X84, Y10-Y34). Intentional self-poisoning (X60–X69) and self-poisoning of undetermined intent (Y10–Y19) were then combined into 2 separate predictors (S3 Table). Predictors related to dispensed medications within different time windows (i.e., within 1 month, 1 to 3 months, 3 to 6 months, 6 to 12 months, 1 to 3 years, and 3 to 5 years before the visit) were generated in the same way as for clinical diagnoses. The complete ICD and ATC codes used for ascertainment of clinical diagnoses and dispensed medications are listed in S2 and S4 Tables. Age at the visit and family income were treated as continuous variables and rescaled to the range of 0 to 1. The other predictors were treated as binary or categorical. Missing predictors appeared to co-occur in the same person. Therefore, no imputation was done as missing at random could not be assumed. Missingness on the predictors ranged from 0.6% to 12.5% (S1 Table) and was coded as a separate category for each predictor. All categorical predictors were converted to dummy indicators. In total, 425 predictors were included for the subsequent model derivation (S1 Table).

Model derivation and evaluation

We treated visits by each patient as a separate cluster and randomly split the entire study sample by patient cluster into an 80% training set containing 433,024 visits by 100,964 patients and a 20% test set containing 108,276 visits by 25,241 patients. The purpose of splitting by patient cluster was to prevent a model from performing artificially well on the test set due to redundancy between the training and test sets. We first trained 4 models using elastic net penalized logistic regression [20], random forest [21], gradient boosting [22], and a neural network [23]. The 4 model algorithms were selected, first, because they have been repeatedly used in previous research but have never been applied to the same data in the same study; second, because the models diverse in analytic approach, which makes it possible to be aggregated using an ensemble method to achieve a predictive performance better than each individual model. For each model, we grid-searched for the optimal set of hyperparameters via 10-fold cross-validation and used the area under the receiver operating characteristic (ROC) curve (AUC) as the evaluation metric. We then compared the performance of the best models trained by the 4 algorithms, together with the ensemble models that used the average predicted risk of 2 or more of the best models for making prediction [24]. Based on the results of cross-validation, among the models giving the highest average validation AUC (values unchanged to the fourth decimal place after rounding were considered the same), the one showing the smallest difference between the training and validation AUCs was selected and applied to the entire training set to obtain the final model parameters. The test set was reserved only for the final model validation. The AUC for the test set was used to evaluate model discrimination (i.e., the extent to which a prediction model can separate those who will experience suicidal events from those who will not). The confidence interval (CI) of the test AUC was estimated using Delong’s method [25]. Additional metrics, including sensitivity, specificity, PPV, and NPV, were reported over a series of risk thresholds. Sensitivity measures the proportion of predicted positives among all true positives and specificity measures the proportion of predicted negatives among all true negatives [26]. These 2 metrics represent the characteristics of a prediction model, which are not affected by the prevalence of the predicted outcome. PPV measures the proportion of true positives among all predicted positives, and NPV measures the proportion of true negatives among all predicted negatives [26]. These 2 metrics are directly relevant to making clinical decisions about whether specific interventions could be given to patients predicted to be highly suicidal. For an outcome of low prevalence, PPV tends to be low, whereas NPV tends to be high [27]. We then employed a nonparametric approach based on isotonic regression to calibrate the model [28]. The Brier score (equal to 0 under perfect calibration), along with calibration plots, was used to assess model calibration in the test set (i.e., the agreement between observed proportion of positives and mean predicted risk of the outcome in different risk strata) [29]. The top 30 predictors were reported separately for the elastic net penalized logistic regression, random forest, and gradient boosting models. For the neural network, there is no standard solution for ranking predictors. The selection of the top predictors was based on absolute magnitude of predictor coefficient for the elastic net penalized logistic regression and predictor importance score for random forest and gradient boosting models. Predictor importance score measures the contribution of each predictor to the overall prediction and the sum of all scores equals to 100%. Learning curve analysis was performed to evaluate the bias and variance trade-off and to assess if future work would benefit from larger sample size, greater model capacity, or both [30]. Finally, we fitted additional models using predictors restricted to sex, age at the visit, diagnoses and dispensed medications only, and tested for statistical significance (p < 0.05, two sided) of decrease in AUC using the method proposed by Hanley and McNeil [31]. These analyses were conducted to explore the predictive potential of the electronic medical records system alone, as it is more feasible to integrate computer algorithms to a single system than create complex linkage between registries for making prediction in real life. This study is reported as per the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) guideline (S1 Checklist). The analyses were planned prior to data retrieval, but we did not register or prepublish the analysis plan. Results outlined in S8–S10 Tables were reported in response to peer review comments. SAS software 9.4 (https://www.sas.com/) and R software 3.6.1 (https://www.r-project.org/) were used for constructing the datasets and descriptive analyses. Scikit-learn (https://scikit-learn.org) and XGBoost (https://xgboost.readthedocs.io/en/latest/python/) packages for Python programming language 3.6.7 (https://www.python.org/) were used for the machine learning analyses during model derivation and evaluation. The code has been placed in a repository on Github (https://github.com/qichense/suicide_ml/).

Results

Among 541,300 eligible visits to psychiatric specialty care, 18,682 (3.45%) were followed by suicidal outcomes within 90 days and 9,099 (1.68%) within 30 days. Descriptive characteristics of the entire study sample are shown in Table 1.
Table 1

Baseline characteristics of 541,300 eligible visits by 126,205 patients to psychiatric specialty care during 2011 and 2012.

CharacteristicTraining setn (%)Test setn (%)
Visits433,024108,276
    Inpatient49,077 (11.3)12,293 (11.4)
    Outpatient383,947 (88.7)95,983 (88.6)
    Unplanned94,988 (21.9)23,343 (21.6)
    Planned335,494 (77.5)84,398 (77.9)
    Unknown if planned or not2,544 (0.6)535 (0.5)
Unique patients100,96425,241
Female242,944 (56.1)62,355 (57.6)
Mean (standard deviation) age at the visit, years27.3 (6.1)27.2 (6.1)
Primary diagnosisa
    Substance use disorders59,178 (13.7)14,427 (13.3)
    Schizophrenia spectrum disorders27,467 (6.3)6,073 (5.6)
    Bipolar disorder33,005 (7.6)8,412 (7.8)
    Major depressive disorder72,876 (16.8)18,676 (17.2)
    Anxiety disorders86,246 (19.9)21,933 (20.3)
    Borderline personality disorder17,248 (4.0)4,626 (4.3)
    Attention-deficit/hyperactivity disorder53,048 (12.3)12,991 (12.0)
    Autism20,831 (4.8)5,170 (4.8)
    Others63,125 (14.6)15,968 (14.7)
Visits followed by
    Suicide attempt/death within 90 daysb14,675 (3.4)4,007 (3.7)
        Intentional self-harm13,308 (3.1)3,696 (3.4)
        Self-harm with undetermined intent1,277 (0.3)353 (0.3)
        Death from intentional self-harm379 (0.1)56 (0.1)
        Death from self-harm of undetermined intent164 (0.04)39 (0.04)
    Suicide attempt/death within 30 daysa7,188 (1.7)1,911 (1.8)
        Intentional self-harm6,596 (1.5)1,775 (1.6)
        Self-harm with undetermined intent505 (0.1)142 (0.1)
        Death from intentional self-harm144 (0.03)19 (0.02)
        Death from self-harm of undetermined intent55 (0.01)14 (0.01)

aPrimary diagnosis with an ICD-10 codes ranging from F00 to F99.

bDifferent types of events may occur during the same outcome window.

ICD-10, International Classification of Diagnosis, 10th edition.

aPrimary diagnosis with an ICD-10 codes ranging from F00 to F99. bDifferent types of events may occur during the same outcome window. ICD-10, International Classification of Diagnosis, 10th edition.

Model selection

S5 Table shows the mean (standard deviation) training and validation AUCs of the 4 best models trained via 4 machine learning algorithms, namely elastic net penalized logistic regression, random forest, gradient boosting, and a neural network, as well as the ensemble of different combinations of these best models. S6 Table lists the optimized hyperparameters for each model. Results from cross-validation showed that the validation AUCs of the best models of each type were very similar. The ensemble of the 4 best models gave a higher validation AUC and a smaller difference between the training and the validation AUCs relative to the other models and thus was selected and applied to the entire training set to obtain the final model parameters. The subsequent results, therefore, were solely based on ensemble models for both outcomes.

Model discrimination

The models for predicting suicide attempt/death within 90 and 30 days following a visit to psychiatric specialty care demonstrated good discrimination accuracy. The test AUCs were 0.88 (95% CI: 0.87 to 0.89) and 0.89 (95% CI = 0.88 to 0.90), respectively (Fig 1). At the 95th percentile risk threshold, the sensitivities were 47.2% and 52.9%, the specificities were 96.6% and 95.9%, the PPVs were 34.9% and 18.7%, and the NPVs were 97.9% and 99.1% (Table 2 and S7 Table). Table 2 also shows sensitivity, specificity, and predictive values over a series of risk thresholds.
Fig 1

ROC curves illustrating model discrimination accuracy in the test set for predicting suicide attempt/death within 90 (A) and 30 days (B) following a visit to psychiatric specialty care during 2011 and 2012. The figure was based on the discrimination accuracy of the ensemble models. The solid line in brown represents the ROC curves achieved by the models. The dotted line in black represents the ROC curves when AUC equals 50%. AUC, area under the receiver operating characteristic curves; ROC, receiver operating characteristic.

Table 2

Model performance metrics at various risk thresholds for predicting suicide attempt/death within 90 and 30 days following a visit to psychiatric specialty care during 2011 and 2012.

Risk thresholdSensitivity (%)Specificity (%)PPV (%)NPV (%)
Suicide attempt/death within 90 days following a visit
99.5th10.099.974.296.7
99th17.599.664.696.9
98th27.799.051.297.3
97th36.598.345.097.6
96th42.097.538.997.8
95th47.296.634.997.9
90th62.192.023.098.4
85th71.587.217.698.8
80th77.482.214.399.0
70th85.572.110.699.2
60th90.561.98.499.4
50th94.051.77.099.6
Suicide attempt/death within 30 days following a visit
99.5th12.899.744.998.5
99th21.199.437.398.6
98th33.698.629.798.8
97th42.697.725.199.0
96th48.196.821.299.0
95th52.995.918.799.1
90th67.791.012.099.4
85th75.486.18.999.5
80th80.781.17.199.6
70th87.971.05.299.7
60th92.660.94.199.8
50th95.250.83.499.8

NPV, negative predictive value; PPV, positive predictive value.

Model performance metrics were based on ensemble models.

ROC curves illustrating model discrimination accuracy in the test set for predicting suicide attempt/death within 90 (A) and 30 days (B) following a visit to psychiatric specialty care during 2011 and 2012. The figure was based on the discrimination accuracy of the ensemble models. The solid line in brown represents the ROC curves achieved by the models. The dotted line in black represents the ROC curves when AUC equals 50%. AUC, area under the receiver operating characteristic curves; ROC, receiver operating characteristic. NPV, negative predictive value; PPV, positive predictive value. Model performance metrics were based on ensemble models.

Model calibration

Brier scores were estimated to be 0.028 and 0.015 (both close to 0) for the models predicting outcome events within 90 days and 30 days, respectively, indicating good model calibration. The calibration plots (Fig 2) further illustrate high agreement between the observed proportion of positives and mean predicted risk of the outcomes. More details can be found in S9 Table.
Fig 2

Calibration plots comparing observed proportion of positives and mean predicted risk suicide attempt/death within 90 (A) and 30 days (B) following a visit to psychiatric specialty care during 2011 and 2012. The figure was based on the calibration of the ensemble models. Each solid dot in blue represents the observed proportion of index visits followed by a suicidal event in a bin of sample (observed proportion of positives [suicidal events]) against the mean predicted risk in the same bin. More details can be found in S9 Table. The rug plot in pink represents the distribution of the study sample across different predicted risk levels.

Calibration plots comparing observed proportion of positives and mean predicted risk suicide attempt/death within 90 (A) and 30 days (B) following a visit to psychiatric specialty care during 2011 and 2012. The figure was based on the calibration of the ensemble models. Each solid dot in blue represents the observed proportion of index visits followed by a suicidal event in a bin of sample (observed proportion of positives [suicidal events]) against the mean predicted risk in the same bin. More details can be found in S9 Table. The rug plot in pink represents the distribution of the study sample across different predicted risk levels.

Learning curve analyses: bias versus variance

Fig 3 illustrates the learning curves for assessing the bias and variance trade-off. As the training sample size gradually increases, the 2 curves representing training and validation AUCs end up very close to each other and converge at AUCs of 0.88 and 0.89 for the models separately predicting outcome events within 90 and 30 days, suggesting relatively low bias (relatively high AUCs) and low variance (eventual convergence). Because the validation curve is no longer increasing with increased training sample size, future improvements to the model may require more informative predictors and higher model capacity rather than a larger sample size. The convergence of the training and validation AUCs indicates no model overfitting.
Fig 3

Learning curves illustrating bias and variance trade-off in the training set for predicting suicide attempt/death within 90 (A) and 30 days (B) following a visit to psychiatric specialty care during 2011 and 2012. The figure was based on the calibration of the ensemble models. AUC, area under the receiver operating characteristic curve.

Learning curves illustrating bias and variance trade-off in the training set for predicting suicide attempt/death within 90 (A) and 30 days (B) following a visit to psychiatric specialty care during 2011 and 2012. The figure was based on the calibration of the ensemble models. AUC, area under the receiver operating characteristic curve.

Importance of predictors

Table 3 shows the top 30 predictors with the highest importance based on the best elastic net penalized logistic regression, random forest model, and gradient boosting models, as well as a substantial overlap in top predictors between these models. In general, temporally close predictors tended to be ranked higher than temporally remote predictors. Intentional self-harm during the past 1 year (i.e., within 1 month, 1 to 3 months, 3 to 6 months, or 6 to 12 months), unplanned visit to psychiatric specialty care service during the past 1 to 3 months, diagnosis of borderline personality disorder during the past 3 months (i.e., within 1 month or 1 to 3 months), and diagnosis of depressive disorder during the past month, recent dispensation of antidepressants (i.e., 3 to 6 months), anxiolytics (i.e., 6 to 12 months), benzodiazepines (i.e., within 1 month or 6 to 12 months), and antipsychotics (i.e., 1 to 3 years) were ranked as top predictors. In addition, prior intentional self-harm by poisoning or sharp object, family history of suicide attempt, family history of substance use disorder, and family history borderline personality disorder were also ranked as top predictors by more than 1 model. The intercepts and coefficients of the elastic net penalized logistic regression models can be found in S10 Table.
Table 3

Predictors ranked top 30 by the best models of elastic net penalized logistic regression, random forest, and gradient boosting.

PredictorEN90RF90GB90EN30RF30GB30
Intentional self-harm 3–6 months before the visit1st2nd1st1st1st1st
Intentional self-harm within 1 month before the visit2nd7th6th2nd5th2nd
Intentional self-harm 1–3 months before the visit3rd5th3rd3rd2nd3rd
Unplanned visit4th17th14th8th14th12th
Family history of intentional self-harm5th1st2nd6th3rd5th
Intentional self-harm 6–12 months before the visit6th3rd5th7th4th7th
Prior intentional self-harm by sharp object7th8th7th4th9th6th
Prior intentional self-harm by poisoning8th4th4th5th6th4th
Unplanned visit 1–3 months before the visit9th9th10th12th12th11th
Unplanned visit within 1 month before the visit11th6th12th9th7th9th
Hospitalization within 1 month before the visit12th12th11th11th8th10th
Family history of substance use disorder13th23rd17th17th23rd25th
Family history of borderline personality disorder14th16th13th14th15th8th
Intentional self-harm 1–3 years before the visit15th10th8th16th10th16th
Family history of self-harm of undetermined intent16th24th21st15th24th23rd
Hospitalization 1–3 months before the visit19th11th9th19th13th14th
In-/outpatient visit27th15th16th27th16th20th
Family history of anxiety disorders26th30th20th22nd30th
Planned visit18th18th10th17th13th
Age at the visit10th26th13th
Diagnosis of major depressive disorder within 1 month before the visit17th24th24th
Sex21st30th23rd
Family history of major depressive disorder24th27th25th
Diagnosis of anxiety within 1 month before the visit18th20th
Intentional self-harm at the visit20th18th
Diagnosis of ADHD at the visit22nd26th
Presence of study income23rd29th
Dispensed benzodiazepines 6–12 months before the visit25th
Diagnosis of other personality disorders than ASPD and BLPD 6–12 months before the visit28th
Diagnosis of substance use disorder within 1 month before the visit29th
Dispensed benzodiazepines within 1 month before the visit30th
Diagnosis of borderline personality disorder within 1 month before the visit13th19th11th15th
Diagnosis of borderline personality disorder 6–12 months before the visit14th23rd18th
Diagnosis of borderline personality disorder 1–3 months before the visit20th15th22nd26th
Diagnosis of borderline personality disorder 3–6 months before the visit21st22nd19th19th
Diagnosis of borderline personality disorder at the visit22nd29th21st
Diagnosis of borderline personality disorder 1–3 years before the visit25th25th27th
Intentional self-harm 3–5 years before the visit19th20th
Prior self-harm by poisoning of undetermined intent26th26th27th
Diagnosis of other borderline personality disorders than ASPD and BLPD at the visit27th
Diagnosis of substance use disorder 3–6 months before the visit28th30th
Diagnosis of borderline personality disorder 3–5 years before the visit29th29th
Dispensed antidepressants 3–6 months before the visit28th
Dispensed anxiolytics 6–12 months before the visit21st
Dispensed antipsychotics 1–3 years before the visit28th
Diagnosis of epilepsy 3–6 months before the visit29th
Father’s education £ 9 years30th
Family history of other borderline personality disorders than ASPD and BLPD25th
Intentional self-harm by unspecified means28th
Diagnosis of substance use disorder 6–12 months before the visit17th
Diagnosis of other personality disorders than ASPD and BLPD 1–3 months before the visit18th
Diagnosis of type 2 diabetes mellitus 1–3 months before the visit21st
Diagnosis of asthma 3–5 years before the visit22nd
Diagnosis of substance use disorder at the visit24th
Diagnosis of epilepsy within 1 month before the visit28th

EN: The best elastic net penalized logistic regression model.

RF: The best random forest model.

GB: The best gradient boosting model.

Subscript 90: Model for predicting suicide attempt/death within 90 days following a visit to psychiatric specialty care.

Subscript 30: Model for predicting suicide attempt/death within 30 days following a visit to psychiatric specialty care.

ADHD, attention-deficit/hyperactivity disorder; ASPD, antisocial personality disorder; BLPD, borderline personality disorder.

EN: The best elastic net penalized logistic regression model. RF: The best random forest model. GB: The best gradient boosting model. Subscript 90: Model for predicting suicide attempt/death within 90 days following a visit to psychiatric specialty care. Subscript 30: Model for predicting suicide attempt/death within 30 days following a visit to psychiatric specialty care. ADHD, attention-deficit/hyperactivity disorder; ASPD, antisocial personality disorder; BLPD, borderline personality disorder.

Predictive potential of electronic medical records system alone

When the candidate predictors were restricted to sex, age at the visit, and those identified from the National Patient Register as well as the Prescribed Drug Register (S1 Table), the AUCs for predicting the outcome events within 90 and 30 days were 0.86 (95% CI: 0.86 to 0.87) and 0.88 (95% CI: 0.87 to 0.88), respectively. Compared with the main models, these decreases in AUCs were statistically significant (p < 0.001). S8 Table shows the sensitivity, specificity, and predictive values at different risk thresholds.

Discussion

To the best of our knowledge, this is the first study using machine learning to determine the potential of the Swedish national registry data for relatively short-term prediction of suicidal behavior in the general psychiatric specialty care. Based on ensemble learning of elastic net penalized logistic regression, random forest, gradient boosting, and a neural network, the final models achieved both good discrimination (AUC was 0.88 [0.87 to 0.89] for the 90-day outcome and 0.89 [0.88–0.90] for the 30-day outcome) and calibration (Brier score was 0.028 for the 90-day outcome and 0.015 for the 30-day outcome). The AUCs achieved by our models were higher than those of prior studies that predicted suicidal behavior within the 90-/30-day windows in the review of Belsher and colleagues [6]. One model in a prior study among the United States army soldiers with suicidal ideations demonstrated higher AUC (0.93) [32]. The authors of the study, however, did not specify the time window of the predicted outcome. They used cross-validation rather than a separate dataset for the final model evaluation, which tended to overestimate the AUC as indicated by the previous review and simulations [33,34]. To date, there is a lack of agreement on what risk threshold would signal sufficient clinical utility of a prediction model for deployment in clinics. We therefore reported 4 additional model metrics at varying risk thresholds instead of focusing on 1 single preselected threshold. In the current study, at the 95th percentile risk threshold, the model would correctly identify approximately half of all suicide attempts/deaths within 90 days (sensitivity 47.2%); among those psychiatric visits that were predicted to be at high risk, around one-third were actually followed by suicidal events within 90 days (i.e., PPV 34.7%). Although a higher sensitivity could be achieved at a lower threshold, this would be at the cost of a reduced PPV. At the 80th percentile risk threshold, nearly 80% of all suicide attempts/deaths within 90 days would be correctly identified (sensitivity 77.4%); among those visits that were predicted to be at high risk, 1 in 7 was followed by suicidal events within 90 days (PPV 14.3%). Meanwhile, most visits without suicidal events within 90 days would be correctly identified given the high specificity estimates (96.6% and 82.2% at the 95th and 80th percentile risk thresholds, respectively), while most visits predicted to be low risk were not followed by suicidal events given the high NPV (97.9% and 99.0% at the 95th and 80th percentile risk thresholds, respectively). Selecting a reasonable risk threshold for predictive values depends on the resources available for subsequent intervention strategies and their implications for individuals and services. When effective intervention strategies are taken into consideration, model performance metrics at other risk thresholds may also be informative. Future research on cost-effective interventions with negligible or no adverse effects are warranted (e.g., increased frequency of follow-up), as such interventions could be allocated to false-positive patients using models with low PPVs. In addition, prior studies tended to put more emphasis on the role of PPV in assisting clinical decision-making, although NPV might be useful in confirming clinical assessment of low-risk group and helpful in optimizing the allocation of healthcare resource. One prior study, using the Swedish registry data collected during 2001 to 2008 and multivariable regression, developed and validated a prediction model for suicide within 1 year after discharge among in- and outpatients diagnosed with schizophrenia spectrum or bipolar disorder [7]. The final model achieved an AUC of 0.71 on the final validation set. The study prespecified a risk threshold of 1% (i.e., 99th percentile risk threshold), which was close to the prevalence of the predicted outcome (approximately 0.6%), for evaluating the model metrics, although concluded that this risk threshold should not be used clinically, and probability scores were more informative. At the 1% threshold, the sensitivity, specificity, PPV, and NPV were estimated to be 55%, 75%, 2%, and 99%, respectively. Despite a very low PPV, the model achieved a high NPV. Similarly, in the current study, at the risk thresholds close to the prevalence estimates of the predicted outcomes, the NPVs were also high for suicide attempt/death with 90 days (97.8%) and 30 days (98.8%). However, it is difficult to directly compare the models from the 2 studies, given the differences in definition of the predicted outcome (suicide death versus suicide attempt or death) and time window of interest between the studies. Currently, it is unclear under which circumstances a seemingly very high NPV may have clinical utility for an outcome with a low prevalence. Simulation studies are therefore required to determine how NPV varies with the values of PPV, outcome prevalence, and potential cost of intervention. Moreover, screening out on the basis of high NPVs will be limited to the outcome of interest and will not be appropriate if those individuals are at increased risk of other adverse outcomes (e.g., accidents). This suggests that if tools are used in this way, clinicians need to consider risks for other outcomes before these individuals are not further assessed or considered for treatment. In the current study, the difference in predictive performance between the initial 4 types of models, namely elastic net penalized logistic regression, random forest, gradient boosting, and a neural network, was very small. This is consistent with the findings in the review by Belsher and colleagues suggesting that no 1 model seems clearly better than another [6]. Future research using the same predictors as in our study may employ elastic net penalized logistic regression only to obtain a relatively better model interpretability at a limited loss of predictive performance. The learning curves suggest that more informative predictors and higher model capacity are likely required to further improve the predictive power. Informative predictors could come from creating transformations of existing predictors (e.g., the frequency of self-harm over a certain time period) or incorporating completely novel predictors such as data from primary medical care services, clinical documentation text, audio and video data on clinical interview, vital physiological parameters continuously monitored via wearable devices. Novel and previously untested data may open opportunities for deep learning analytics to improve prediction of suicidal behavior through identifying highly complex data patterns [35,36]. Deep learning analytics may also improve model capacity by creating a better representation of the predictors used in the current study [37]. There was a substantial overlap in top-ranked predictors between different models. While borderline personality disorder, substance use disorder, depression, dispensed benzodiazepines and/or antidepressants have already been identified as risk factors for suicide [38,39], the timing of these factors may play a vital role in predicting subsequent suicidal behavior [40]. Our study showed that temporally close predictors tended to be ranked higher than distal ones. These results call for more research in the timing of diagnoses of psychiatric disorders and the use of psychotropic medications in relation to suicidal behavior. Using the Danish registry data, a recent study predicting sex-specific suicide risk reported that diagnoses occurring long (e.g., 48 months) before suicide were more important for suicide prediction than diagnoses occurring shortly (e.g., 6 months) before suicide [41]. Since the authors defined the predictors using time of suicide, which could not be known beforehand, the models would not be implementable in clinical systems. One Swedish registry-based cohort study of 48,649 individuals admitted to hospital after suicide attempt found that prior hanging, drowning, firearms or explosives, jumping from a height, or gassing at suicide attempt better predicted subsequent suicide than poisoning and cutting [42]. Our study, however, showed that poisoning and cutting were ranked higher than other methods when predicting a broadly defined behavior including both suicide attempt and suicide. As expected [43], family history of several psychiatric disorders (i.e., anxiety disorders, borderline personality disorder, major depressive disorder, substance use disorder) were also among the top predictors, whereas family history of somatic disorders were not. It should be noted, however, that in predictive modeling, the importance of a predictor reflects the extent to which permuting the value of the predictor will increase prediction error [44]. Unlike causal risk factors, predictors with higher importance may or may not have a direct impact on the outcome. Highly correlated predictors, differing in model-based importance scores, may have very similar univariate predictive ability but the one that is less predictive would be given low importance since its importance value is conditional on the highly correlated predictor being in the model. The coefficients in the elastic net penalized logistic regression, unlike in ordinary logistic regression, were shrunk in magnitude. Although the size of coefficients could be used for ranking predictor importance, it does not represent the effect size of a specific predictor on suicidal behavior. Moreover, the sign (positive/negative) of coefficients should not be interpreted as an increase or decrease in risk of suicidal behavior. This is because, in predictive modeling, the predictors are not necessarily causal risk factors of the predicted outcome and the direction of effect of a specific predictor is influenced by other model selected predictors that are not necessarily confounders. Of more relevance in predictive modeling is the overall predictive performance rather than individual model coefficients. Accurate estimation of the magnitude and direction of risk factors requires different research designs, such as cohort or cross-control investigations. When candidate predictors were restricted to those related to patient’s history of diseases and dispensed medications only, the AUCs were only slightly worse than those achieved by the models using all predictors obtained via complex linkages among registers, suggesting limited incremental value of the predictors from other registers than the National Patient and the Prescribed Drug Registers in predicting suicidal behavior. The results have pragmatic implications, because it is more feasible to integrate data from within the electronic medical records system than it is to create complex linkages among registries. Our study is subject to several limitations. To ensure reasonable quality of data on both individual health and social economic information, as well as family history of disease and crime, the sample was limited to patients who were adults at the visit and covered by the Swedish Medical Birth Register. By the end of year 2012, the oldest in the study population were 39 years old. The derived models therefore are less likely to generalize well to adult patients older than 40 years or to children and adolescents. The candidate predictors were limited by the data sources. The derived models are not clinically based as hospital services are not in a position to link disparate registers. Our outcomes have limited validation. Nevertheless, according to an external review on the validity of diagnoses in the Swedish national inpatient register [13], the PPVs of the register-based diagnostic codes varied between 85% and 95%. Notably, the PPV of injury, including both accidental injury and self-injury, was 95%. On the basis of this, we think that the PPV of our outcome is likely to be high, despite a lack of specific validation study. Our definition of suicidal behavior was based on the ICD-10 criteria, which does not make distinction between suicidal and nonsuicidal self-harm. Although this definition has been widely used in prior research, whether nonsuicidal self-harm represents actual suicide risk remains controversial. A certain amount of nonsuicidal self-harm events were labeled as positive outcome events, which may have led to an overestimation of the PPVs. On the other hand, underestimation of the PPVs is also possible as planned visits with a recorded self-harm were not labeled as positive outcome events. When defining the outcome, we assumed that unplanned inpatient or outpatient visits after the index hospitalization with a recorded self-harm were incident suicidal events. This was based on the selection of the study sample, which was restricted to patients who, at the index visit, received a primary diagnosis with an ICD-10 code ranging from F00 to F99. This means that the index visits were not primarily due to suicide attempt/self-harm (which has a different ICD code, i.e., ICD-10: X60–X84 and Y10-Y34), and thus, the identified suicidal events after the index visit were very unlikely to be the same visit as the index visit. Future validation studies could examine how accurately identified events in the Patient Register represent true incident suicidal events. The preliminary selection of 425 candidate predictors and their time windows was somewhat arbitrary. In theory, many more candidate predictors can be generated from the registry data or via transformation of existing ones. As a result, it is unclear whether the achieved model performance in the current study is the best achievable performance. Our models have not yet been externally validated, and thus, the generalizability of the models to other populations remains unknown. In future work, guidelines will be critical in improving transparency and reproducibility of research in predictive modeling and should be followed once they are finalized. In our study, the models were derived to predict suicidal behaviors rather than the rare outcome of suicide deaths. In relation to deaths, the model performance would be quite different with higher specificities and lower sensitivities, and the calibration would be poor. As some metrics of discrimination will be low, 1 promising approach would be to solely use probability scores rather than risk thresholds to investigate suicide deaths [7]. Future research could compare ordinary logistic regression models with machine learning models, and in particular the former can focus on the top-ranked predictors identified in this study.

Conclusions

By combining the ensemble method of multiple machine learning algorithms and high-quality data solely from the Swedish registers, we developed prognostic models to predict short-term suicide attempt/death with good discrimination and calibration. Whether novel predictors can improve predictive performance requires further investigation.

Prediction model development.

(PDF) Click here for additional data file.

Flowchart for study sample identification.

(TIF) Click here for additional data file.

A total of 425 candidate predictors generated from the Swedish national registers.

(DOCX) Click here for additional data file.

International Classification of Diseases (ICD) codes for identifying clinical diagnoses from the National Patient Register.

(DOCX) Click here for additional data file.

International Classification of Diseases (ICD)-10 codes for identifying methods used for prior self-harm from the National Patient Register.

(DOCX) Click here for additional data file.

Prior use of medications identified from the Prescribed Drug Register according to the Anatomical Therapeutic Chemical (ATC) Classification System.

(DOCX) Click here for additional data file.

Training and validation AUCs obtained from 10-fold cross-validation for the best models trained by each selected machine learning algorithm and the ensemble of different combinations of these best models.

(DOCX) Click here for additional data file.

Optimization of hyperparameters using grid search.

(DOCX) Click here for additional data file.

Number of true vs.

predicted outcome at the 95th percentile risk threshold. (DOCX) Click here for additional data file.

Model performance metrics at various risk thresholds for predicting suicide attempt/death within 90 and 30 days following a visit to psychiatric specialty care during 2011 and 2012, predictors being restricted to sex, age at the visit, and those identified from the National Patient Register as well as the Prescribed Drug Register.

(DOCX) Click here for additional data file.

Statistics underlying model calibration curves for predicting suicide attempt/death within 90 and 30 days following a visit to psychiatric specialty care during 2011 and 2012.

(DOCX) Click here for additional data file.

Elastic net logistic regression model selected predictors and coefficients.

(DOCX) Click here for additional data file. 15 Jan 2020 Dear Dr Chen, Thank you for submitting your manuscript entitled "Predicting suicide attempt/death following a visit to psychiatric specialty care by applying machine learning to the Swedish national registry data" for consideration by PLOS Medicine. Your manuscript has now been evaluated by the PLOS Medicine editorial staff and I am writing to let you know that we would like to send your submission out for external peer review. However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire. Please re-submit your manuscript within two working days, i.e. by . Login to Editorial Manager here: https://www.editorialmanager.com/pmedicine Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. Once your manuscript has passed all checks it will be sent out for review. **Please be aware that, due to the voluntary nature of our reviewers and academic editors, manuscript assessment may be subject to delays during the holiday season. Thank you for your patience.** Feel free to email us at plosmedicine@plos.org if you have any queries relating to your submission. Kind regards, Helen Howard, for Clare Stone PhD Acting Editor-in-Chief PLOS Medicine plosmedicine.org 11 Feb 2020 Dear Dr. Chen, Thank you very much for submitting your manuscript "Predicting suicide attempt/death following a visit to psychiatric specialty care by applying machine learning to the Swedish national registry data" (PMEDICINE-D-20-00084R1) for consideration at PLOS Medicine. Your paper was evaluated by a senior editor and discussed among all the editors here. It was also discussed with an academic editor with relevant expertise, and sent to independent reviewers, including a statistical reviewer. The reviews are appended at the bottom of this email and any accompanying reviewer attachments can be seen via the link below: [LINK] In light of these reviews, I am afraid that we will not be able to accept the manuscript for publication in the journal in its current form, but we would like to consider a revised version that addresses the reviewers' and editors' comments. Obviously we cannot make any decision about publication until we have seen the revised manuscript and your response, and we plan to seek re-review by one or more of the reviewers. In revising the manuscript for further consideration, your revisions should address the specific points made by each reviewer and the editors. Please also check the guidelines for revised papers at http://journals.plos.org/plosmedicine/s/revising-your-manuscript for any that apply to your paper. In your rebuttal letter you should indicate your response to the reviewers' and editors' comments, the changes you have made in the manuscript, and include either an excerpt of the revised text or the location (eg: page and line number) where each change can be found. Please submit a clean version of the paper as the main article file; a version with changes marked should be uploaded as a marked up manuscript. In addition, we request that you upload any figures associated with your paper as individual TIF or EPS files with 300dpi resolution at resubmission; please read our figure guidelines for more information on our requirements: http://journals.plos.org/plosmedicine/s/figures. While revising your submission, please upload your figure files to the PACE digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at PLOSMedicine@plos.org. We expect to receive your revised manuscript by Mar 03 2020 11:59PM. Please email us (plosmedicine@plos.org) if you have any questions or concerns. ***Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.*** We ask every co-author listed on the manuscript to fill in a contributing author statement, making sure to declare all competing interests. If any of the co-authors have not filled in the statement, we will remind them to do so when the paper is revised. If all statements are not completed in a timely fashion this could hold up the re-review process. If new competing interests are declared later in the revision process, this may also hold up the submission. Should there be a problem getting one of your co-authors to fill in a statement we will be in contact. YOU MUST NOT ADD OR REMOVE AUTHORS UNLESS YOU HAVE ALERTED THE EDITOR HANDLING THE MANUSCRIPT TO THE CHANGE AND THEY SPECIFICALLY HAVE AGREED TO IT. You can see our competing interests policy here: http://journals.plos.org/plosmedicine/s/competing-interests. Please use the following link to submit the revised manuscript: https://www.editorialmanager.com/pmedicine/ Your article can be found in the "Submissions Needing Revision" folder. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/plosmedicine/s/submission-guidelines#loc-methods. Please ensure that the paper adheres to the PLOS Data Availability Policy (see http://journals.plos.org/plosmedicine/s/data-availability), which requires that all data underlying the study's findings be provided in a repository or as Supporting Information. For data residing with a third party, authors are required to provide instructions with contact information for obtaining the data. PLOS journals do not allow statements supported by "data not shown" or "unpublished results." For such statements, authors must provide supporting data or cite public sources that include it. We look forward to receiving your revised manuscript. Sincerely, Caitlin Moyer, Ph.D. Associate Editor PLOS Medicine plosmedicine.org ----------------------------------------------------------- Requests from the editors: Specific point – You will note that is repeatedly raised among referees that the study is not reproducible and that without the code and parameters for others to replicate this study is severely limited. We are unable to consider a resubmission unless the code is placed in a repository such as Github and all parameters and associated data made freely available. This is regardless of other revisions undertaken. General points: Abstract – please add some demographic information to the abstract as well as mean ages of male and female; please add p values where 95% Cis are given; please provide a sentence on the limitations of the study as the final sentence of the ‘Methods and Findings’ section; ‘models outperforming existing models in predicting relatively short-term suicide attempt/death.’ (also page 10) – unless you perform side-by-side comparisons and show superiority you need to remove such statements. Any comparisons added in revision would be assessed by referees. Data – I would remove this ‘they are subject to secrecy I’ and on resubmission state where code can be accessed and this needs to be live for reviewers to access. At this stage, we ask that you include a short, non-technical Author Summary of your research to make findings accessible to a wide audience that includes both scientists and non-scientists. The Author Summary should immediately follow the Abstract in your revised manuscript. This text is subject to editorial change and should be distinct from the scientific abstract. Please see our author guidelines for more information: https://journals.plos.org/plosmedicine/s/revising-your-manuscript#loc-author-summary Page 7 – please remove the URL l (http://vassarstats.net/roc_comp.html) and simply say the name of the test and provide the reference. Comments from the reviewers: Reviewer #1: This is a well-written manuscript, analyzing big data on suicide attempt/death to develop prediction models based on several machine learning algorithms and describing its methodology in detail. Particularly, models developed in this study showed relatively higher PPV despite very low prevalence of suicide outcome, compared with other machine learning studies. My comments are as follow. 1. In this study, suicide attempt was defined as intentional self-harm (ICD-10: X60-X84) or self-harm of undetermined intent (ICD-10: Y10-Y34). Considering that the study population were young adults, a considerable portion of self-harm events might correspond to non-suicidal self-injury. It is controversial whether non-suicidal self-injury represents actual suicide risk. Although the authors already mentioned self-harm behavior in the discussion section as limitation, non-suicidal self-injury should be addressed to clarify suicide outcome more clearly. 2. Prediction models based on machine learning algorithms may be useful in screening at risk population. But, models developed in this study had archived relatively low sensitivity despite high specificity and good AUC performance. Reviewer #2: This paper reports the development of algorithms to predict suicide-related outcomes using data from a large registry. The approaches taken appear sound to me, and the contribution to knowledge is important, given the relatively high predictive performance achieved compared to other initiatives. The authors are appropriately aware of the limitations here, in particular the need for cross-site algorithm replication and the distance between the data resources used here and real-world health record capabilities. My comments are minor. 1. The section in the Results on predictive potential of electronic medical records alone felt a little under-described (e.g. it would have been helpful to have the standard list of performance metrics rather than just the AUCs), and the final sentence on p9 looks unfinished. This is an important section because it is likely to represent the element that might most readily be replicated elsewhere, so it would be helpful to have more detail. 2. I think it might be helpful if the text in the 3rd paragraph of the discussion was re-ordered so that the sensitivity and PPVs (and possibly NPVs) of the 95th and 80th percentile alternatives were more explicitly linked and discussed together, as these are most reflective of potential clinical applicability. 3. On p11, the statement about NPV being more useful for directing resources is a little problematic, as the consideration here is just focused on a single, albeit important, outcome. In order for NPV to be used in this way, it would have to be established that patients falling into the negative group were at lower risk of all other adverse outcomes; otherwise they might receive suboptimal care just on the grounds of lower suicidal risk. 4. Although I can understand the reason for restrictions on data access, it would be helpful to have a more explicit statement from the authors about what they intend to do with their algorithms. Clearly there is a need for replication here, but I don't believe that this is going to be possible from the information provided. Reviewer #4: "Predicting suicide attempt/death following a visit to psychiatric specialty care by applying machine learning to the Swedish national registry data" trained and validated various machine learning models, on a comprehensive Swedish dataset of 541,300 visits from 126,205 patients. The final ensemble model achieved AUCs of about 0.88-0.89, on predicting 30-day and 90-day suicide attempts. Various ancillary analyses were made to establish the degree of calibration, impact of additional training data and of the electronic medical records predictor subset. In particular, the claimed PPVs from the validated model are very promising (0.349/0.187, with corresponding sensitivity of 0.472/0.529, at the 95th percentile risk threshold for 90-day/30-day outcomes respectively); a recent review (Belsher et al, JAMA Psychiatry, citation 6 in the manuscript) reported that most previous prediction models had extremely low PPVs of <0.01. That said, it should be noted that expected PPV appears to be significantly affected by the population the model(s) are applied to, with general populations with very low attempts/mortality being particularly challenging. As such, the performance achieved in this manuscript might be considered in the context of the target population already seeking psychiatric specialty care. Nonetheless, it remains an interesting study from the scale involved, and the comprehensiveness of the data features available, made possible by the quality of the Swedish national registers. Separation between training and test data also seems well-respected. There however remain a number of issues that might be addressed: 1. While the study focuses on various machine learning models for data analysis (in particular: elastic net penalized logistic regression, random forest, gradient boosting [XGBoost], neural network), there are next to no pertinent details provided about these methods. For example, model hyperparameters are acknowledged as being grid-searched over, but there is no mention about what these hyperparameters were, nor their ranges (e.g. for random forests, how many trees, what were the tree max depths, etc? for neural networks, what was the architecture, activation functions, learning rate used, etc?) Although the datasets cannot be made public, there appears no reason why a reproducible description of the machine learning models cannot be described in the supplementary material. 2. The detailed training and validation AUC results presented in eTable 5 suggests that the relatively high PPV/AUC performance is largely due to the specific data examined, and less the specific state-of-the-art machine learning models used. In particular, logistic regression alone (albeit elastic net penalized, EN) achieved validation AUCs of 0.8721/0.8883 for 90-day/30-day suicide attempt prediction. The improvement with an ensemble of all four models (to 0.8751/0.8909 respectively) is extremely small. As such, it appears reasonable for the authors to focus on EN here, given it maintains the interpretability of standard logistic regression (e.g. via odds ratios; refer for example "Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers", Deist et al., Med Phys 2018), as opposed to the other methods. 3. It appears that the self-harm predictors were binary ("Age at the visit and family income were treated as continuous variables and rescaled to the range of 0-1. The other predictors were treated as binary or categorical variables"); was information about the frequency of self-harm episodes within each time period available, and if so, might this frequency be reasonably considered? 4. Given the greater importance of PPV due to the large number of negatives, the authors might consider exploring precision-recall curves and reporting the AUPRC metric too (e.g. alongside the ROC curves in Figure 1). 5. For Table 1, the figures in brackets might be explained. 6. For eFigure 1, the sum of first-stage exclusion (1028907+210250) and qualifying visits after that exclusion (570945) does not appear to tally with the original total (1810602). 7. For eTable 1, the number of predictors accounted for at finer levels of granularity might also be listed, for convenience (e.g. "Diagnosis at the visit [37 predictors?]") 8. eTable 6 further suggests that the bulk of predictive performance might be attained from a relatively small subset of predictors. In particular, "Intentional self-harm 3-6 months before the visit" was the top predictor for nearly all models, with "Family history of intentional self-harm" and "Intentional self-harm 1-3 months before the visit" probably the next most consistently important predictors. Given this, the authors might examine whether predictors relating to individual/family history of self-harm alone would yield comparable performance to the full 425-predictor models, or indeed the electric medical record-only predictor subsets (which again are able to achieve similar AUCs of 0.86/0.88). 9. The authors might consider providing another table with the data from eTable 6, but sorted by (mean?) rank to show predictors in general order of importance. 10. In the Model derivation and evaluation section, it is stated that "Based on the results of cross-validation, the model giving the highest average validation AUC and showing the smallest difference between the training and validation AUCs was selected and applied to the entire training set to obtain the final model parameters". However, the model having the highest average validation AUC might not necessarily also show the small difference between training and validation AUCs. How were these objectives balanced? 11. A "risk threshold" at various percentiles is referenced frequently, and appears to be achievable by varying the operating point of each machine learning model. There might be a brief explanation of what the various percentiles of risk threshold signify. 12. Certain assertions appear too definite, e.g. "Because the validation curve is no longer increasing with increased training sample size, we can conclude that future improvements to the model will require more informative predictors and higher model capacity rather than a larger sample size". 13. Some analysis/comment on prediction of attempts vs. prediction of deaths would be interesting. 14. Finally, there are some minor textual issues: "The Medical Birth Register covers nearly all deliverys in Sweden..." -> "nearly all deliveries" "Despite statistically significant decrease in the value (p<0.05)." -> incomplete sentence fragment? In summary, while the authors have systematically investigated various ensemble combinations of machine learning models on a large set of 425 predictors in their main analyses, the presented evidence suggests that virtually the same performance might have been achieved with just logistic regression and a much smaller set of the most informative predictors relating to individual/family history of self-harm. Therefore, while the contribution to suicide prediction remains welcome, the applicability of less-interpretable machine learning models (random forests, XGBoost, neural networks) seems not yet well established, for this particular study. Reviewer #5: The authors applied an ensemble learning method including four types of models to predict suicide attempt/death within a 90-days/30-days window for patients aged 18-39 after a visit to psychiatric specialty care using Swedish national registry data and electronic health records (EHR). 425 candidate predictors were carefully selected covering all relevant information in the registry. The models were trained using a large sample size of patient visits and model parameters were selected using cross-validation. Results showed that the ensemble model outperformed existing models on discrimination and calibration, with high AUC, specificity and negative predictive values. Main findings include a comparison of top 30 predictors from each model, and prediction result using only EHR data also had good performance. Strength: The study investigated an urgent and important topic in mental health. The authors applied a suite of machine learning methods to high quality national registry data. Analyses were well done with clear communication of their findings. The top predictors could provide insightful information to practitioners. Weakness: The developed ensemble model has not been externally validated. So it is unclear how well it will perform prospectively or in a different patient population. Sensitivity and positive predictive values are relatively low, which decrease the potential impact of this model as a prevention/monitoring tool. I have the following comments to the authors: 1. It is unclear why these four particular types of machine learning models were selected in this study. Could you provide a brief rationale and explanation of the advantage/disadvantage of each type of model? Since they performed similarly, why include all four models? 2. It is unclear how missing data on the predictors were handled in the study. Could you provide the procedure for handling missing data? Were there any imputation done? 3. Why did you select 80th-99th risk thresholds in Table 2? 4. The predictors listed in eTable 6 are very important results and should not be in the appendix. How about swapping with Table 3 or Figure 3 which is not very interesting? 5. The paper is written for an audience with background in machine learning and know the concepts of model discrimination and calibration. It would be helpful to include the definition of these concepts when they first appear for a broader readership. For example, the explanations on sensitivity and specificity in 3rd paragraph on page 10 should appear earlier in the paper. 6. The last sentence on page 9, starting with "Despite statistically…", is unclear what it is referring to. Any attachments provided with reviews can be seen via the following link: [LINK] Submitted filename: Review_plos one medicine.docx Click here for additional data file. Submitted filename: Suicide risk prediction review.docx Click here for additional data file. 29 Mar 2020 Submitted filename: Response_to_reviewers.docx Click here for additional data file. 21 May 2020 Dear Dr. Chen, Thank you very much for submitting your revised manuscript "Predicting suicide attempt/death following a visit to psychiatric specialty care by applying machine learning to the Swedish national registry data" (PMEDICINE-D-20-00084R2) for consideration at PLOS Medicine. Your paper was re-evaluated by a senior editor and discussed among all the editors here. It was also sent to three of the original reviewers, including a statistical reviewer. The reviews are appended at the bottom of this email and any accompanying reviewer attachments can be seen via the link below: [LINK] In light of the comments of Reviewer 3, I am afraid that we will not be able to accept the manuscript for publication in the journal in its current form, but we would like to consider a further revised version that addresses the reviewers' and editors' comments. Obviously we cannot make any decision about publication until we have seen the revised manuscript and your response, and we plan to seek re-review by one or more reviewers. In revising the manuscript for further consideration, your revisions should address the specific points made by each reviewer and the editors. Please also check the guidelines for revised papers at http://journals.plos.org/plosmedicine/s/revising-your-manuscript for any that apply to your paper. In your rebuttal letter you should indicate your response to the reviewers' and editors' comments, the changes you have made in the manuscript, and include either an excerpt of the revised text or the location (eg: page and line number) where each change can be found. Please submit a clean version of the paper as the main article file; a version with changes marked should be uploaded as a marked up manuscript. In addition, we request that you upload any figures associated with your paper as individual TIF or EPS files with 300dpi resolution at resubmission; please read our figure guidelines for more information on our requirements: http://journals.plos.org/plosmedicine/s/figures. While revising your submission, please upload your figure files to the PACE digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at PLOSMedicine@plos.org. We expect to receive your revised manuscript by May 28 2020 11:59PM. Please email us (plosmedicine@plos.org) if you have any questions or concerns. ***Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.*** We ask every co-author listed on the manuscript to fill in a contributing author statement, making sure to declare all competing interests. If any of the co-authors have not filled in the statement, we will remind them to do so when the paper is revised. If all statements are not completed in a timely fashion this could hold up the re-review process. If new competing interests are declared later in the revision process, this may also hold up the submission. Should there be a problem getting one of your co-authors to fill in a statement we will be in contact. YOU MUST NOT ADD OR REMOVE AUTHORS UNLESS YOU HAVE ALERTED THE EDITOR HANDLING THE MANUSCRIPT TO THE CHANGE AND THEY SPECIFICALLY HAVE AGREED TO IT. You can see our competing interests policy here: http://journals.plos.org/plosmedicine/s/competing-interests. Please use the following link to submit the revised manuscript: https://www.editorialmanager.com/pmedicine/ Your article can be found in the "Submissions Needing Revision" folder. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/plosmedicine/s/submission-guidelines#loc-methods. Please ensure that the paper adheres to the PLOS Data Availability Policy (see http://journals.plos.org/plosmedicine/s/data-availability), which requires that all data underlying the study's findings be provided in a repository or as Supporting Information. For data residing with a third party, authors are required to provide instructions with contact information for obtaining the data. PLOS journals do not allow statements supported by "data not shown" or "unpublished results." For such statements, authors must provide supporting data or cite public sources that include it. We look forward to receiving your revised manuscript. Sincerely, Caitlin Moyer, Ph.D. Associate Editor PLOS Medicine plosmedicine.org ----------------------------------------------------------- Requests from the editors: 1.Response to reviewers: Please respond to the remaining concerns of Reviewer 3. Please describe in the text the details for the model parameters and equations (these can be supplemental files) and if possible/relevant please include a diagram illustrating the models. 2.Data Analysis Plan: Did your study have a prospective protocol or analysis plan? Please state this (either way) early in the Methods section. a) If a prospective analysis plan (from your funding proposal, IRB or other ethics committee submission, study protocol, or other planning document written before analyzing the data) was used in designing the study, please include the relevant prospectively written document with your revised manuscript as a Supporting Information file to be published alongside your study, and cite it in the Methods section. A legend for this file should be included at the end of your manuscript. b) If no such document exists, please make sure that the Methods section transparently describes when analyses were planned, and when/why any data-driven changes to analyses took place. c) In either case, changes in the analysis-- including those made in response to peer review comments-- should be identified as such in the Methods section of the paper, with rationale. 3.Abstract: Please make the last sentence of the Methods and Findings section (on limitations) more transparent; based on what you wrote in the Discussion we suggest: “A limitation of our study is that our models have not yet been externally validated and thus the generalizability of the models to other populations remains unknown.” or similar. 4. In the first paragraph of the Discussion, AUC values with subscripts “30” and “90” are used, presumably to represent the AUC corresponding to 30 days and 90 days post-visit predictions, however these abbreviations are not defined earlier in the text. 5. Table 1: Please define the abbreviation for “SD” in the legend. 6. Table 3: Please make it clear in the legend that the “30” and “90” subscripts represent the models at 30 and 90 days post-visit, respectively. 7. Figure 1 and Figure 2: In the figure legends, please describe what each line in the plot represents. 8. Checklist: Please ensure that the study is reported according to the TRIPOD guideline, and include the completed TRIPOD checklist as Supporting Information. When completing the checklist, please use section and paragraph numbers, rather than page numbers. Please add the following statement, or similar, to the Methods: "This study is reported as per the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) guideline (S1 Checklist).” Comments from the reviewers: Reviewer #3: Thank you to the authors for your careful response to reviewer comments and clarification to many questions. I still have some outstanding concerns which I detail below. My biggest concern is that of items 1 and 2. While the code is helpful for other who want to do the same analysis, it is not helpful for assessing the performance of these models in other settings, which should be the goal of this work. Additionally, without more described quality test on the outcome, I have concerns that some self-harm events are actually events from the index visit that due to administrative reasons in the data base appear as another visits, thus falsely giving the impression of high performance, especially for unplanned index visits. 1. While the authors have made the code available on github, which is great, the models themselves are still not available. At a bare minimum the authors should make the logistic regression model available. All variables selected and the coefficients that goes with those variables. The code itself has limited use without the data it was run on, which for good reasons is unavailable to readers. 2. I am still concerned about measurement error following an unplanned visit. The authors should report the proportion of unplanned visits in Table 1. Have authors done data quality checks on the self-harm outcome after unplanned visits? For example, it is common for ED visits that lead to a hospitization to be counted as two separate visits, these would both be unplanned. Would the code associated with a hospitalization in this case be counted as an outcome for the ED visit? Additionally, ED visits can cover more than one day, if this is the case does the data you have on hand ensure that ED visits that stretch over more than one day are coded as one visit? a. Small comment for clarification: I would recommend changing this sentence: "Planned visits were likely to be follow-up healthcare appointments following an incident suicide attempt and thus were not included." To Planned visits were likely to be follow-up healthcare appointments following a suicide attempt and thus were not classified as suicide attempts for our analysis. If I understand correctly unplanned visits were included in the analysis as non-self-harm events, by saying they were not included it sounds like they were removed from the analysis altogether. 3. I remain concerned about differences in performance of the model in predicting suicide attempt after unplanned and planned visits. In my previous comments I referred to unplanned visits as inpatient/ED visits, I much prefer the terminology "unplanned" - thank you! I remain concerned that many outcomes after an unplanned visit are actually the same visit or care for the same self-harm event. Without some information about the QA performed to ensure that self-harm events after an unplanned visit are true self harm events (rather than and ED visit lasting two days and billed as such or and ED visit followed by a hospital visit etc.) it is difficult to assess if this performance is true (I hope it is this would be great to get into clinical practice if it proves replicable!) or due to misclassification of the outcome. In the list of predictors, a top predictor across all models is "unplanned" visit - which is another reason I am concerned about this. 4. The calibration plots presented in the paper need to be described more carefully, the authors respond to my question about the plots in the response to reviewer comments #12 on page 10, but this is not described in the paper and the general audience for this paper is not computer scientists who are the people the authors say are most used to looking at these plots. The plots need to be described much more carefully as to be as a statistician this is not describing calibration. To me, and in the statistical literation, calibration is how well your predicted probability aligns with observed probability. I think this is more informative for assessing performance as well. It is straight forward to look at the average predicted probability in a bin compared to the average observed probability. This is what I would expect in a calibration plot. The current calibration plot I do not find helpful, especially since the definition of a "positive" is not well-described. 5. Table 1 is really helpful to look at. While self-harm like represents a small proportion of diagnosis for the index visit, it is very relevant in this context. I think it would be good to add the proportion of visits with a self-harm primary diagnosis to Table 1. 6. I appreciate the authors efforts to add understanding of the models, by including a table with top predictors of the models. The authors should cite which variable important measure was used for ranking predictors in the random forest and gradient boosting models. I am assuming the magnitude of coefficient size was use for logistic regression, but it would be useful to provide this information. Are predictors listed in Table 3 only predictors of increase risk or are some predictors of decreased risk as well? It might be worth elaborating on that in the table or text. Reviewer #4: The authors have adequately responded to our previous comments. The availability of the code on GitHub is appreciated. The major reservation remains that standard (elastic net penalized) logistic regression produces performance comparable to the other, more-complex and less readily interpretable machine learning techniques. We do however agree with the authors that the above observation on the efficacy of logistic regression could not have been known without empirical evaluation. Reviewer #5: The authors did a good job addressing my comments and concerns. I am satisfied with the revision and have no additional comments. Any attachments provided with reviews can be seen via the following link: [LINK] 28 Jul 2020 Dear Dr. Chen, Thank you very much for submitting your revised manuscript "Predicting suicide attempt/death following a visit to psychiatric specialty care by applying machine learning to the Swedish national registry data" (PMEDICINE-D-20-00084R3) for consideration at PLOS Medicine. Your paper was evaluated by a senior editor and discussed among all the editors here. It was also seen again by one of the reviewers, and the comments are appended at the bottom of this email. In the report, Reviewer 3 notes that you have addressed most of the issues raised; however, additional clarification is requested pertaining to the construction of the calibration plots. Although we will not be able to accept the manuscript for publication in the journal in its current form, we would like to consider a revised version that addresses the remaining points raised by Reviewer 3 and the editors' comments. Obviously we cannot make any decision about publication until we have seen the revised manuscript and your response. In revising the manuscript for further consideration, your revisions should address the specific points made by each reviewer and the editors. Please also check the guidelines for revised papers at http://journals.plos.org/plosmedicine/s/revising-your-manuscript for any that apply to your paper. In your rebuttal letter you should indicate your response to the reviewers' and editors' comments, the changes you have made in the manuscript, and include either an excerpt of the revised text or the location (eg: page and line number) where each change can be found. Please submit a clean version of the paper as the main article file; a version with changes marked should be uploaded as a marked up manuscript. In addition, we request that you upload any figures associated with your paper as individual TIF or EPS files with 300dpi resolution at resubmission; please read our figure guidelines for more information on our requirements: http://journals.plos.org/plosmedicine/s/figures. While revising your submission, please upload your figure files to the PACE digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at PLOSMedicine@plos.org. We expect to receive your revised manuscript by Aug 04 2020 11:59PM. Please email us (plosmedicine@plos.org) if you have any questions or concerns. ***Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.*** We ask every co-author listed on the manuscript to fill in a contributing author statement, making sure to declare all competing interests. If any of the co-authors have not filled in the statement, we will remind them to do so when the paper is revised. If all statements are not completed in a timely fashion this could hold up the re-review process. If new competing interests are declared later in the revision process, this may also hold up the submission. Should there be a problem getting one of your co-authors to fill in a statement we will be in contact. YOU MUST NOT ADD OR REMOVE AUTHORS UNLESS YOU HAVE ALERTED THE EDITOR HANDLING THE MANUSCRIPT TO THE CHANGE AND THEY SPECIFICALLY HAVE AGREED TO IT. You can see our competing interests policy here: http://journals.plos.org/plosmedicine/s/competing-interests. Please use the following link to submit the revised manuscript: https://www.editorialmanager.com/pmedicine/ Your article can be found in the "Submissions Needing Revision" folder. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/plosmedicine/s/submission-guidelines#loc-methods. Please ensure that the paper adheres to the PLOS Data Availability Policy (see http://journals.plos.org/plosmedicine/s/data-availability), which requires that all data underlying the study's findings be provided in a repository or as Supporting Information. For data residing with a third party, authors are required to provide instructions with contact information for obtaining the data. PLOS journals do not allow statements supported by "data not shown" or "unpublished results." For such statements, authors must provide supporting data or cite public sources that include it. We look forward to receiving your revised manuscript. Sincerely, Caitlin Moyer, Ph.D. Associate Editor PLOS Medicine plosmedicine.org ----------------------------------------------------------- Requests from the editors: 1.Response to reviewers: Please fully respond to the comments of reviewer 3, including whether the calibration plot was derived from the training data set or the validation set and in the methods or the legend of Figure 2 please describe how the plots were created/what is represented in the plots. 2.Please revise your title according to PLOS Medicine's style. Your title must be nondeclarative and not a question. It should begin with main concept if possible. Please place the study design ("A randomized controlled trial," "A retrospective study," "A modelling study," etc.) in the subtitle (ie, after a colon). We suggest: “Predicting suicide attempt or death following a visit to psychiatric specialty care: A machine learning study of Swedish national registry data” or similar. 3.Abstract: Methods and Findings: If possible please present the p values for both the 30 day and 90 day models 4.Author Summary: First bullet point under “What do these findings mean?”: Please revise to: Our findings suggest that combining machine learning with registry data has potential to accurately predict short -term suicidal behavior. 5.Discussion: Middle of paragraph on page 13: Please rephrase the term “completed suicide” in the following sentence; we suggest: “However, it is difficult to directly compare the models from the two studies, given the differences in definition of the predicted outcome (suicide death vs suicidal attempt or death) and time window of interest between the studies.” or similar. 6.Figure 1: Please change the colors/patterns of the solid and dotted lines to make them easier to differentiate. 7.Supporting information file: eTable 7: The word “days” is missing from the legend following “90” and “30” 8.TRIPOD Guideline: S1 Checklist is not present in the file inventory, please provide the TRIPOD checklist. When completing the checklist, please use section names and paragraph numbers to refer to locations within the text, rather than page numbers. Comments from the reviewers: Reviewer #3: The authors have done a good job in making clarifications and addressing reviewer concerns. The additional limitations added to the discussion are important. I am still confused by the calibration plot. I cannot find in the paper if the calibration plot is created using the training data or the validation data set. It should be created in the validation data set using percentile bins from the training data. It appears that deciles were used for the calibration plot (but why are there are only 9 dots instead of 10 ?). I am still very surprised that the observed probability of a suicide attempt in the highest risk group is 100%. The math doesn't make sense here, because if this was created in the validation data set, then there would be about 10,000 visits in the highest risk decile; given the graph it says that nearly 100% of those visits were observed to have a suicide attempt following the visits. that would be about 10,000 suicide attempts, but there should only be about 3,726 suicide attempts in the entire validation data set. The math continues to be a problem with if the calibration plot was created with the training data set. Please provide more details (not just the function that was used) on how these calibration plots were created. A common approach is to divide your visits into deciles, these deciles are on the x axis with the mean predicted risk in that percentile. Then on the y-axis is the observed proportion of visits followed by a suicide attempt. At the end of the day a calibration plot needs to indicate in specific bins of people defined by risk, how similar is their predicted risk (from the model) and their observed risk (proportion of those visits with an event following the visit). Any attachments provided with reviews can be seen via the following link: [LINK] 31 Jul 2020 Submitted filename: Response to Reviewer (3).pdf Click here for additional data file. 17 Sep 2020 Dear Dr. Chen, Thank you very much for re-submitting your manuscript "Predicting suicide attempt or death following a visit to psychiatric specialty care: A machine learning study of the Swedish national registry data" (PMEDICINE-D-20-00084R4) for review by PLOS Medicine. I have discussed the paper with my colleagues and the academic editor and it was also seen again by one of the reviewers. I am pleased to say that provided the remaining editorial and production issues are dealt with we are planning to accept the paper for publication in the journal. The remaining issues that need to be addressed are listed at the end of this email. Any accompanying reviewer attachments can be seen via the link below. Please take these into account before resubmitting your manuscript: [LINK] Our publications team (plosmedicine@plos.org) will be in touch shortly about the production requirements for your paper, and the link and deadline for resubmission. DO NOT RESUBMIT BEFORE YOU'VE RECEIVED THE PRODUCTION REQUIREMENTS. ***Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.*** In revising the manuscript for further consideration here, please ensure you address the specific points made by each reviewer and the editors. In your rebuttal letter you should indicate your response to the reviewers' and editors' comments and the changes you have made in the manuscript. Please submit a clean version of the paper as the main article file. A version with changes marked must also be uploaded as a marked up manuscript file. Please also check the guidelines for revised papers at http://journals.plos.org/plosmedicine/s/revising-your-manuscript for any that apply to your paper. If you haven't already, we ask that you provide a short, non-technical Author Summary of your research to make findings accessible to a wide audience that includes both scientists and non-scientists. The Author Summary should immediately follow the Abstract in your revised manuscript. This text is subject to editorial change and should be distinct from the scientific abstract. We expect to receive your revised manuscript within 1 week. Please email us (plosmedicine@plos.org) if you have any questions or concerns. We ask every co-author listed on the manuscript to fill in a contributing author statement. If any of the co-authors have not filled in the statement, we will remind them to do so when the paper is revised. If all statements are not completed in a timely fashion this could hold up the re-review process. Should there be a problem getting one of your co-authors to fill in a statement we will be in contact. YOU MUST NOT ADD OR REMOVE AUTHORS UNLESS YOU HAVE ALERTED THE EDITOR HANDLING THE MANUSCRIPT TO THE CHANGE AND THEY SPECIFICALLY HAVE AGREED TO IT. Please ensure that the paper adheres to the PLOS Data Availability Policy (see http://journals.plos.org/plosmedicine/s/data-availability), which requires that all data underlying the study's findings be provided in a repository or as Supporting Information. For data residing with a third party, authors are required to provide instructions with contact information for obtaining the data. PLOS journals do not allow statements supported by "data not shown" or "unpublished results." For such statements, authors must provide supporting data or cite public sources that include it. If you have any questions in the meantime, please contact me or the journal staff on plosmedicine@plos.org. We look forward to receiving the revised manuscript by Sep 24 2020 11:59PM. Sincerely, Caitlin Moyer, Ph.D. Associate Editor PLOS Medicine plosmedicine.org ------------------------------------------------------------ Requests from Editors: 1.Reviewer comments: Supporting information file eTable 9: As suggested by reviewer 3, please do update the table with the number of unique individuals at each row. 2. Title: Please revise the title: “Predicting suicide attempt or death following a visit to psychiatric specialty care: A machine learning study using Swedish national registry data” 3.Abstract: Methods and Findings: Given the similarity of mean ages of males and females, it would be more informative if you could please include the % of males and females, and the overall mean age. It would also be helpful to have a breakdown of mental/behavioral disorders represented, as well as percentages for participant ethnicity. 4.Methods: Page 9, line 15: Please specify the significance level used (eg, P<0.05, two-sided) where you describe the Hanley and McNeil test used to derive a p value. 5.Methods: Page 9, line 19: Please change “algorisms” to “algorithms” 6.Methods: Page 9, lines 22-24: Thank you for noting your analyses were preplanned, and that there is no analysis plan documented. “The analyses were planned in April 2019 by the project team, and then revised after further discussions between QC, HL, and SF in September 2019, but we did not publish a study protocol.” Please make sure that the Methods section transparently describes which analyses were planned, and when/why any data-driven changes to analyses took place. Any changes in the analysis-- including those made in response to peer review comments-- should be identified as such in the Methods section of the paper, with rationale. 7.Results, Page 11, Line 24: Please present the exact p value or p <0.001 8.Discussion, Page 14, Lines 1-4: Please clarify the following sentence: “Novel and previously untested data may open opportunity opportunities for deep learning analytics to improve prediction of suicidal behavior through identifying highly complex data pattern [35, 36]. especially if the complexity is beyond human comprehension for the time being.” 9.TRIPOD Checklist: Please refer to sections rather than page numbers in the Checklist. For example, for “Title” please refer to the “Title Page” and for “Abstract” please refer to Abstract as the section. 10. Table 3: Please note in the legend the meaning of “...” entries in the table in the legend. Comments from Reviewers: Reviewer #3: Thank you for your explanation on the calibration plots. eTable 9 that was added to the supplementary information was very helpful in understanding and interpreting the plots. It would strengthen your understanding, as well as readers understanding, of the models, if you added information about the number of unique people in each row in eTable 9. It is difficult in this setting, but those 31 visits in the bottom row on eTable 9 could be to the same person and all be associated with the *same* suicide attempt. Alternatively they could be visits to the same person (or a small group of patients) with multiple suicide attempts per person or they could represent 31 different visits by 31 visit people. My guess is that this is for a small group of people (maybe only one person) that is in this highest threshold. This makes me really want to see what the performance of this model would be in a different data set, set this tight performance as the highest risk could be overfitting or it could be a model that is able to identify highest risk individuals well. Any attachments provided with reviews can be seen via the following link: [LINK] 8 Oct 2020 Dear Dr. Chen, On behalf of my colleagues and the academic editor, Dr. Alexander C. Tsai, I am delighted to inform you that your manuscript entitled "Predicting suicide attempt or suicide death following a visit to psychiatric specialty care: A machine learning study using Swedish national registry data" (PMEDICINE-D-20-00084R5) has been accepted for publication in PLOS Medicine. PRODUCTION PROCESS Before publication you will see the copyedited word document (within 5 busines days) and a PDF proof shortly after that. The copyeditor will be in touch shortly before sending you the copyedited Word document. We will make some revisions at copyediting stage to conform to our general style, and for clarification. When you receive this version you should check and revise it very carefully, including figures, tables, references, and supporting information, because corrections at the next stage (proofs) will be strictly limited to (1) errors in author names or affiliations, (2) errors of scientific fact that would cause misunderstandings to readers, and (3) printer's (introduced) errors. Please return the copyedited file within 2 business days in order to ensure timely delivery of the PDF proof. If you are likely to be away when either this document or the proof is sent, please ensure we have contact information of a second person, as we will need you to respond quickly at each point. Given the disruptions resulting from the ongoing COVID-19 pandemic, there may be delays in the production process. We apologise in advance for any inconvenience caused and will do our best to minimize impact as far as possible. PRESS A selection of our articles each week are press released by the journal. You will be contacted nearer the time if we are press releasing your article in order to approve the content and check the contact information for journalists is correct. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. PROFILE INFORMATION Now that your manuscript has been accepted, please log into EM and update your profile. Go to https://www.editorialmanager.com/pmedicine, log in, and click on the "Update My Information" link at the top of the page. Please update your user information to ensure an efficient production and billing process. Thank you again for submitting the manuscript to PLOS Medicine. We look forward to publishing it. Best wishes, Caitlin Moyer, Ph.D. Associate Editor PLOS Medicine plosmedicine.org
  30 in total

1.  The Swedish Multi-generation Register.

Authors:  Anders Ekbom
Journal:  Methods Mol Biol       Date:  2011

2.  Machine Learning for Suicide Research-Can It Improve Risk Factor Identification?

Authors:  Seena Fazel; Lauren O'Reilly
Journal:  JAMA Psychiatry       Date:  2020-01-01       Impact factor: 21.596

3.  The meaning and use of the area under a receiver operating characteristic (ROC) curve.

Authors:  J A Hanley; B J McNeil
Journal:  Radiology       Date:  1982-04       Impact factor: 11.105

4.  Risk factors for the transition from suicide ideation to suicide attempt: Results from the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS).

Authors:  Matthew K Nock; Alexander J Millner; Thomas E Joiner; Peter M Gutierrez; Georges Han; Irving Hwang; Andrew King; James A Naifeh; Nancy A Sampson; Alan M Zaslavsky; Murray B Stein; Robert J Ursano; Ronald C Kessler
Journal:  J Abnorm Psychol       Date:  2018-02

5.  Method of attempted suicide as predictor of subsequent successful suicide: national long term cohort study.

Authors:  Bo Runeson; Dag Tidemalm; Marie Dahlin; Paul Lichtenstein; Niklas Långström
Journal:  BMJ       Date:  2010-07-13

Review 6.  Suicide risk assessment and intervention in people with mental illness.

Authors:  James M Bolton; David Gunnell; Gustavo Turecki
Journal:  BMJ       Date:  2015-11-09

7.  External review and validation of the Swedish national inpatient register.

Authors:  Jonas F Ludvigsson; Eva Andersson; Anders Ekbom; Maria Feychting; Jeong-Lim Kim; Christina Reuterwall; Mona Heurgren; Petra Otterblad Olausson
Journal:  BMC Public Health       Date:  2011-06-09       Impact factor: 3.295

Review 8.  The longitudinal integrated database for health insurance and labour market studies (LISA) and its use in medical research.

Authors:  Jonas F Ludvigsson; Pia Svedberg; Ola Olén; Gustaf Bruze; Martin Neovius
Journal:  Eur J Epidemiol       Date:  2019-03-30       Impact factor: 8.082

Review 9.  Instruments for the assessment of suicide risk: A systematic review evaluating the certainty of the evidence.

Authors:  Bo Runeson; Jenny Odeberg; Agneta Pettersson; Tobias Edbom; Ingalill Jildevik Adamsson; Margda Waern
Journal:  PLoS One       Date:  2017-07-19       Impact factor: 3.240

10.  Suicide Risk and Mental Disorders.

Authors:  Louise Brådvik
Journal:  Int J Environ Res Public Health       Date:  2018-09-17       Impact factor: 3.390

View more
  11 in total

Review 1.  A Comprehensive Review of Computer-Aided Diagnosis of Major Mental and Neurological Disorders and Suicide: A Biostatistical Perspective on Data Mining.

Authors:  Mahsa Mansourian; Sadaf Khademi; Hamid Reza Marateb
Journal:  Diagnostics (Basel)       Date:  2021-02-25

2.  Suicide prediction among men and women with depression: A population-based study.

Authors:  Tammy Jiang; Dávid Nagy; Anthony J Rosellini; Erzsébet Horváth-Puhó; Katherine M Keyes; Timothy L Lash; Sandro Galea; Henrik T Sørensen; Jaimie L Gradus
Journal:  J Psychiatr Res       Date:  2021-08-11       Impact factor: 5.250

3.  Suicidal Behaviours During Covid-19 Pandemic: A Review.

Authors:  Nadia Barberis; Marco Cannavò; Francesca Cuzzocrea; Valeria Verrastro
Journal:  Clin Neuropsychiatry       Date:  2022-04

4.  Ensemble machine learning classification of daily living abilities among older people with HIV.

Authors:  Robert Paul; Torie Tsuei; Kyu Cho; Andrew Belden; Benedetta Milanini; Jacob Bolzenius; Shireen Javandel; Joseph McBride; Lucette Cysique; Samantha Lesinski; Victor Valcour
Journal:  EClinicalMedicine       Date:  2021-05-07

5.  Prospective Validation of an Electronic Health Record-Based, Real-Time Suicide Risk Model.

Authors:  Colin G Walsh; Kevin B Johnson; Michael Ripperger; Sarah Sperry; Joyce Harris; Nathaniel Clark; Elliot Fielstein; Laurie Novak; Katelyn Robinson; William W Stead
Journal:  JAMA Netw Open       Date:  2021-03-01

6.  Suicidal ideation in patients with mental illness and concurrent substance use: analyses of national census data in Norway.

Authors:  Helle Wessel Andersson; Solfrid E Lilleeng; Torleif Ruud; Solveig Osborg Ose
Journal:  BMC Psychiatry       Date:  2022-01-04       Impact factor: 3.630

Review 7.  Artificial intelligence and suicide prevention: a systematic review.

Authors:  Alban Lejeune; Aziliz Le Glaz; Pierre-Antoine Perron; Johan Sebti; Enrique Baca-Garcia; Michel Walter; Christophe Lemey; Sofian Berrouiguet
Journal:  Eur Psychiatry       Date:  2022-02-15       Impact factor: 5.361

8.  Structured data vs. unstructured data in machine learning prediction models for suicidal behaviors: A systematic review and meta-analysis.

Authors:  Danielle Hopkins; Debra J Rickwood; David J Hallford; Clare Watsford
Journal:  Front Digit Health       Date:  2022-08-02

9.  Incidence Trends and Risk Prediction Nomogram for Suicidal Attempts in Patients With Major Depressive Disorder.

Authors:  Sixiang Liang; Jinhe Zhang; Qian Zhao; Amanda Wilson; Juan Huang; Yuan Liu; Xiaoning Shi; Sha Sha; Yuanyuan Wang; Ling Zhang
Journal:  Front Psychiatry       Date:  2021-06-23       Impact factor: 4.157

Review 10.  The Potential Impact of Adjunct Digital Tools and Technology to Help Distressed and Suicidal Men: An Integrative Review.

Authors:  Luke Balcombe; Diego De Leo
Journal:  Front Psychol       Date:  2022-01-04
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.