| Literature DB >> 35360130 |
Tao Yu1, Xulai Zhang1, Xiuyan Liu1, Chunyuan Xu1, Chenchen Deng2.
Abstract
Background: Early to identify male schizophrenia patients with violence is important for the performance of targeted measures and closer monitoring, but it is difficult to use conventional risk factors. This study is aimed to employ machine learning (ML) algorithms combined with routine data to predict violent behavior among male schizophrenia patients. Moreover, the identified best model might be utilized to calculate the probability of an individual committing violence. Method: We enrolled a total of 397 male schizophrenia patients and randomly stratified them into the training set and the testing set, in a 7:3 ratio. We used eight ML algorithms to develop the predictive models. The main variables as input features selected by the least absolute shrinkage and selection operator (LASSO) and logistic regression (LR) were integrated into prediction models for violence among male schizophrenia patients. In the training set, 10 × 10-fold cross-validation was conducted to adjust the parameters. In the testing set, we evaluated and compared the predictive performance of eight ML algorithms in terms of area under the curve (AUC) for the receiver operating characteristic curve. Result: Our results showed the prevalence of violence among male schizophrenia patients was 36.8%. The LASSO and LR identified main risk factors for violent behavior in patients with schizophrenia integrated into the predictive models, including lower education level [0.556 (0.378-0.816)], having cigarette smoking [2.121 (1.191-3.779)], higher positive syndrome [1.016 (1.002-1.031)] and higher social disability screening schedule (SDSS) [1.081 (1.026-1.139)]. The Neural Net (nnet) with an AUC of 0.6673 (0.5599-0.7748) had better prediction ability than that of other algorithms.Entities:
Keywords: factor; machine learning; male; schizophrenia; violence
Year: 2022 PMID: 35360130 PMCID: PMC8962616 DOI: 10.3389/fpsyt.2022.799899
Source DB: PubMed Journal: Front Psychiatry ISSN: 1664-0640 Impact factor: 4.157
Main demographic and clinical characteristics of patients.
|
|
|
|
|
|
|---|---|---|---|---|
|
|
|
| ||
|
|
| |||
| Age | 37.63 ± 12.56 | 41.16 ± 14.62 | −2.543 | 0.011 |
| Education level | ||||
| Primary school | 29 (19.86) | 35 (13.94) | 8.979 | 0.011 |
| Junior or senior | 101 (69.18) | 160 (63.75) | ||
| high school | ||||
| College | 16 (10.96) | 56 (22.31) | ||
| Married statue | ||||
| No single | 24 (16.44) | 49 (19.52) | 0.585 | 0.444 |
| Single | 122 (83.56) | 202 (80.48) | ||
| Duration of disease | 13.20 ± 9.47 | 15.54 ± 10.61 | −2.271 | 0.024 |
| Positive syndrome | 27.14 ± 15.53 | 22.25 ± 15.26 | 3.058 | 0.002 |
| Negative syndrome | 44.58 ± 23.19 | 46.22 ± 22.72 | −0.685 | 0.494 |
| BPRS | 31.63 ± 10.97 | 31.22 ± 28.68 | 0.532 | 0.867 |
Figure 1Prediction variables identified by LASSO. The x axis represents the log value of lambda, and y axis represents the mean squared error. The first dotted line represents the minimum mean squared error, corresponding to the optimum number of variables. The number at the top of the picture represents the number of variables.
The variables selected by LASSO.
|
|
|
|---|---|
| Age | −0.0031 |
| Education level | −0.1198 |
| Situation at birth | 0.6040 |
| Suicidal ideation | 0.1864 |
| Cigarette smoking | −0.1583 |
| Duration of disease | −0.0035 |
| Positive syndrome | 0.0034 |
| SDSS | 0.0152 |
| Uric acid | 0.0003 |
Independent factors associated with violence by logistic regression.
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|
| Education level | −0.587 | 0.196 | 8.961 | 0.003 | 0.015 | 0.556 (0.378–0.816) | |
| Suicide ideation | yes | −1.063 | 0.539 | 3.892 | 0.049 | 0.049 | 0.345 (0.120–0.993) |
| no | 1 | ||||||
| Cigarette smoking | yes | 0.752 | 0.295 | 6.517 | 0.011 | 0.018 | 2.121 (1.191–3.779) |
| no | 1 | ||||||
| Positive syndrome | 0.016 | 0.007 | 4.886 | 0.027 | 0.034 | 1.016 (1.002–1.031) | |
| SDSS | 0.078 | 0.027 | 8.657 | 0.003 | 0.008 | 1.081 (1.026–1.139) |
Prediction ability of ML algorithms in testing set.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| glm | 0.6454 (0.5327–0.7581) | 0.5027 | 0.0061 | 0.1667 | 0.8387 |
| rpart | 0.6351 (0.5351–0.7350) | 0.5757 | 0.1608 | 0.3611 | 0.7903 |
| nnet | 0.6673 (0.5599–0.7748) | 0.6416 | 0.3007 | 0.4444 | 0.8387 |
| knn | 0.5661 (0.4436–0.6886) | 0.7352 | 0.4934 | 0.5833 | 0.8871 |
| rf | 0.6353 (0.5218–0.7488) | 0.7155 | 0.4605 | 0.5278 | 0.9032 |
| glmnet | 0.6449 (0.5323–0.7576) | 0.5188 | 0.0432 | 0.1667 | 0.8710 |
| svm | 0.6400 (0.5223–0.7578) | 0.5336 | 0.0826 | 0.0833 | 0.9839 |
| nb | 0.6288 (0.5143–0.7433) | 0.5963 | 0.2152 | 0.3056 | 0.8871 |
Figure 2Comparison of performance of eight ML algorithms in testing set.