| Literature DB >> 29929945 |
Ahmet Emre Aladağ1,2, Serra Muderrisoglu3, Naz Berfu Akbas4, Oguzhan Zahmacioglu5, Haluk O Bingol1.
Abstract
BACKGROUND: In 2016, 44,965 people in the United States died by suicide. It is common to see people with suicidal ideation seek help or leave suicide notes on social media before attempting suicide. Many prefer to express their feelings with longer passages on forums such as Reddit and blogs. Because these expressive posts follow regular language patterns, potential suicide attempts can be prevented by detecting suicidal posts as they are written.Entities:
Keywords: artificial intelligence; classification model; detection; machine learning; prevention; suicidal ideation; suicidal surveillance; suicidality; suicide; text mining
Mesh:
Year: 2018 PMID: 29929945 PMCID: PMC6035349 DOI: 10.2196/jmir.9840
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Suicidality label distribution of posts in subreddits.
| Subreddit | Nonsuicidal, n | Suicidal, n | Total, n |
| SuicideWatch | 25 | 150 | 175 |
| Depression | 152 | 48 | 200 |
| Anxiety | 193 | 7 | 200 |
| ShowerThoughts | 210 | 0 | 210 |
| Total | 580 | 205 | 785 |
Hypothetical dataset (Di) matrix and corresponding label vector (Li) for an experiment with two sample posts. A table in this form was generated for each experiment with different posts.
| Di | Li | |||
| Post ID | Subreddit | Title | Body | Label |
| 1 | SuicideWatch | I don’t wanna live anymore | Since the day I was born,... | 1 |
| 2 | ShowerThoughts | Why are the oceans blue? | I have always wondered... | 0 |
Sample table representing concatenated Ci | Li matrix containing 590 corpus feature columns (Ci) plus one label column (Li) that were provided to machine learning algorithms for classification. A matrix in this form was generated for each experiment with different posts.
| Post IDa | Wit1…Wit93b | Wib1…Wib93c | Sitpd | Sitj | Sibpe | Sibj | Tit1…Tit200f | Tib1…Tib200g | Li |
| 1 | 0.3…0.00 | 0.15…0.22 | -0.75 | 0.70 | 0.25 | 0.35 | 0.15…0.54 | 0.14…0.32 | 1 |
| 2 | 0.11…0.08 | 0.00…0.00 | 0.20 | 0.90 | -0.45 | 0.78 | 0.07…0.93 | 0.01…0.63 | 0 |
| Column # | 1…93 | 94…186 | 187 | 188 | 189 | 190 | 191 | 391…590 | 1 |
aPost IDs are hypothetical.
bWit: Linguistic inquiry and word count (LIWC) matrice for title.
cWib: LIWC matrix for body.
dSit: sentiment score matrix for title.
eSib: sentiment score matrix for body.
fTit: document term matrix for title.
gTib: document term matrix for body.
Summary of post distribution used in experiments (E).
| Subreddit | Whole data (10-fold) posts, n | Train data posts, n | Test data posts, n | ||||
| E1 | E2 | E3 | E4 | E3 | E4 | ||
| SuicideWatch | 175a | 175a | 5000 | 5000 | 175a | 175a | |
| ShowerThoughts | 210a | 210a | 5000 | 5000 | 210a | 210a | |
| Depression | 200a | 200a | |||||
| Anxiety | 200a | 200a | |||||
aAnnotated post.
Figure 1Prediction performance evaluation for the four experiments with different combinations of posts from SuicideWatch (SW), Depression (D), Anxiety (A), ShowerThoughts (ST) subreddits.