| Literature DB >> 35455105 |
Zepeng Li1, Jiawei Zhou1, Zhengyi An1, Wenchuan Cheng1, Bin Hu1,2,3.
Abstract
As a serious worldwide problem, suicide often causes huge and irreversible losses to families and society. Therefore, it is necessary to detect and help individuals with suicidal ideation in time. In recent years, the prosperous development of social media has provided new perspectives on suicide detection, but related research still faces some difficulties, such as data imbalance and expression implicitness. In this paper, we propose a Deep Hierarchical Ensemble model for Suicide Detection (DHE-SD) based on a hierarchical ensemble strategy, and construct a dataset based on Sina Weibo, which contains more than 550 thousand posts from 4521 users. To verify the effectiveness of the model, we also conduct experiments on a public Weibo dataset containing 7329 users' posts. The proposed model achieves the best performance on both the constructed dataset and the public dataset. In addition, in order to make the model applicable to a wider population, we use the proposed sentence-level mask mechanism to delete user posts with strong suicidal ideation. Experiments show that the proposed model can still effectively identify social media users with suicidal ideation even when the performance of the baseline models decrease significantly.Entities:
Keywords: China; Sina Weibo; deep neural network; imbalanced data; social media; suicide ideation detection
Year: 2022 PMID: 35455105 PMCID: PMC9029105 DOI: 10.3390/e24040442
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.738
Figure 1The architecture of DHE-SD model.
Figure 2Hierarchical Ensemble. (a) An example of a classical ensemble method, and (b) an example of a hierarchical ensemble method, where red rectangle represents the base classifier with wrong classification result, and green rectangle represents the base classifier with correct classification result. Taking nine base classifiers as an example, the classical ensemble method cannot give correct prediction results when the number of correct base classifiers is less than half of the total number of classifiers. Through the continuous combination of base classifiers, the hierarchical ensemble method can still give correct prediction results even if the number of correct classifiers is less than half.
Figure 3An example of sentence-level mask mechanisim. (a) User posts before mask, and (b) User posts after mask.
Details of normal group data and suicidal group data before and after using sentence-level mask mechanism.
| Dataset | #Users | #Posts | Avg_Post | Avg_Length |
|---|---|---|---|---|
| SWOM | 1606 | 132,654 | 83.65 | 58.71 |
| SWM | 1606 | 98,680 | 62.48 | 55.18 |
| Normal | 2915 | 426,161 | 147.20 | 62.77 |
Note: #Users represents the number of users. #Posts represents the number of posts. Avg_Post represents the average number of posts posted by each user. Avg_Length represents the average number of characters per user’s posts. SWOM and SWM are suicide group datasets before and after using sentence-level mask mechanism. Normal represents the normal group dataset.
Examples of user posts in different datasets.
| Dataset | Example |
|---|---|
| SWOM |
Loop this song infinitely, crying while listening. I want to see my blood pouring out from the cut arteries, and I feel my body Mom, happy holidays! I miss you so much. |
| SWM |
Loop this song infinitely, crying while listening. He laughs really well. |
| Normal |
Rather than want to prove that I can do anything, it’s better to admit that I am a waste. I hope my daughter is healthy and happy… |
Examples of high-frequency words in different datasets.
| SWOM | SWM | Normal |
|---|---|---|
| want | uncomfortable | laugh |
The results of three deep learning models on the SWOM and normal group dataset.
| Model | Accuracy | F1-Score |
|---|---|---|
| DPCNN | 90.58% | 86.45% |
| FastText | 92.10% | 88.33% |
| TextCNN | 93.94% | 91.18% |
| Average | 92.21% | 88.65% |
The results of different methods of dealing with data imbalance on the SWOM and normal group dataset.
| Method | Accuracy | F1-Score |
|---|---|---|
| Baseline | 93.94% | 91.18% |
| Oversampling | 94.40% | 92.09% |
| Undersampling | 93.67% | 90.98% |
| DHE-SD |
The results of the three deep learning models on the SWM and normal group dataset.
| Model | Accuracy | F1-Score |
|---|---|---|
| DPCNN | 81.53% (−9.05%) | 74.75% (−11.70%) |
| FastText | 84.87% (−7.23%) | 78.28% (−10.05%) |
| TextCNN | ||
| Average | 84.51% (−7.69%) | 78.36% (−10.29%) |
Note: The content in parentheses represents the difference between the results of the same model when using the SWM and SWOM datasets
Performance comparison of the DHE-SD model on the SWM and normal group dataset. ↑ represents the performance improvement of DHE-SD compared with baseline.
| Method | Accuracy | F1-Score |
|---|---|---|
| Baseline | 87.15% | 82.04% |
| DHE-SD |
|
|
| ↑ 2.52% | ↑ 4.11% |
Examples of case studies of different models.
| User | Text | Label | Baseline | DHE-SD |
|---|---|---|---|---|
| 1 |
I am tired with crying. Maybe I can sleep after taking more pills. Taking medicine, jumping from a building, cutting my wrists, drinking too much… I am too tired. I do not want to die anymore. I want to live well. Death is actually a relief | 1 | 1 | 1 |
| 2 |
They are too rational and weigh the pros and cons every second. You are aphasia alone and hold on to the advanced stage of cancer. Last night, I dreamed of working overtime to collapse, and then said that I quit, fxxk, and then quit to relax… It is nice that so many people love you. Happy birthday! | 0 | 1 | 0 |
| 3 |
The boy I love, happy 520, always healthy and happy! Fortunately, I was vaccinated. If I am unhappy, I will have a cup of milk tea. If I am not in a good mood after drinking, I will have another cup! I hope the firemen will return safely! | 1 | 0 | 0 |
The results of different methods of dealing with data imbalance on the public Weibo data.
| Method | Accuracy | F1-Score |
|---|---|---|
| Baseline | 91.07% | 85.87% |
| Oversampling | 91.16% | 86.48% |
| Undersampling | 90.93% | 86.56% |
| DHE-SD |
|
|