| Literature DB >> 36231935 |
Theyazn H H Aldhyani1, Saleh Nagi Alsubari2, Ali Saleh Alshebami1, Hasan Alkahtani3, Zeyad A T Ahmed2.
Abstract
Individuals who suffer from suicidal ideation frequently express their views and ideas on social media. Thus, several studies found that people who are contemplating suicide can be identified by analyzing social media posts. However, finding and comprehending patterns of suicidal ideation represent a challenging task. Therefore, it is essential to develop a machine learning system for automated early detection of suicidal ideation or any abrupt changes in a user's behavior by analyzing his or her posts on social media. In this paper, we propose a methodology based on experimental research for building a suicidal ideation detection system using publicly available Reddit datasets, word-embedding approaches, such as TF-IDF and Word2Vec, for text representation, and hybrid deep learning and machine learning algorithms for classification. A convolutional neural network and Bidirectional long short-term memory (CNN-BiLSTM) model and the machine learning XGBoost model were used to classify social posts as suicidal or non-suicidal using textual and LIWC-22-based features by conducting two experiments. To assess the models' performance, we used the standard metrics of accuracy, precision, recall, and F1-scores. A comparison of the test results showed that when using textual features, the CNN-BiLSTM model outperformed the XGBoost model, achieving 95% suicidal ideation detection accuracy, compared with the latter's 91.5% accuracy. Conversely, when using LIWC features, XGBoost showed better performance than CNN-BiLSTM.Entities:
Keywords: LIWC-22; artificial intelligence; machine learning; suicidal ideation
Mesh:
Year: 2022 PMID: 36231935 PMCID: PMC9565132 DOI: 10.3390/ijerph191912635
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 4.614
Figure 1Framework of the proposed suicide ideation detection system.
Figure 2Structure of the CNN–BiLSTM model.
Parameters and their values used in the CNN–BiLSTM model.
| Parameter | Value |
|---|---|
| Input sequence length | 430 |
| Embedding dimension | 32 |
| Vocabulary size | 30,000 |
| Number of filters | 100 |
| LSTM units | 100 |
| Dropout | 0.3 |
| Batch size | 64 |
| Number of epochs | 5 |
| Activation function | ReLU |
| Optimizers | RMSprop (textual features) + Adam (LIWC features) |
Dataset splitting.
| Dataset | Total Samples | Training (70%) | Validation (10%) | Testing (20%) |
|---|---|---|---|---|
| Reddit (SuicideWatch) | 232,074 | 162,452 | 23,207 | 46,415 |
Figure 3Confusion matrices of (a) CNN–BiLSTM and (b) XGBoost using textual features.
Figure 4Confusion matrices of (a) CNN–BiLSTM and (b) XGBoost using LIWC features.
Test results using textual features.
| Algorithm | Precision (%) | Recall (%) | Specificity (%) | F-score (%) | Accuracy (%) |
|---|---|---|---|---|---|
| CNN–BiLSTM | 94.3 | 94.9 | 94.3 | 95 | 95 |
| XGBoost | 93.5 | 89.1 | 93.8 | 91.3 | 91.5 |
Test results using LIWC-based features.
| Algorithm | Precision (%) | Recall (%) | Specificity (%) | F-score (%) | Accuracy (%) |
|---|---|---|---|---|---|
| CNN–BiLSTM | 85.2 | 83.5 | 85.6 | 84.3 | 84.5 |
| XGBoost | 88.6 | 84.7 | 89.1 | 86.6 | 86.9 |
Figure 5Graphical representation of the statistical analysis of (a) non-suicidal and (b) suicidal posts determined based on LIWC features.
Figure 6Training and validation (a) accuracy and (b) loss using textual features.
Figure 7Training and validation (a) accuracy and (b) loss using LIWC features.
Figure 8Word cloud based on the dataset.
Comparative analysis of the performance of proposed model with other existing methods.
| Paper Id | Dataset Distribution | Word Representation Approach | Model | Results |
|---|---|---|---|---|
| 3549 suicide indicative posts and 3652 non-suicidal | Word2Vec | LSTM-CNN | 93 % accuracy | |
| 3549 suicide posts and 3652 non-suicidal | Word2Vec | LSTM | 92% accuracy | |
| 785 suicide posts and 785 non-suicidal | TF-IDF | SVM | 92% accuracy | |
|
| 116,037 suicide and 116,037 non-suicide posts | Word2Vec | CNN–BiLSTM | 95% accuracy |