| Literature DB >> 32673256 |
Diana Ramírez-Cifuentes1, Ana Freire1, Ricardo Baeza-Yates1, Joaquim Puntí2,3, Pilar Medina-Bravo4, Diego Alejandro Velazquez5, Josep Maria Gonfaus6, Jordi Gonzàlez5.
Abstract
BACKGROUND: Suicide risk assessment usually involves an interaction between doctors and patients. However, a significant number of people with mental disorders receive no treatment for their condition due to the limited access to mental health care facilities; the reduced availability of clinicians; the lack of awareness; and stigma, neglect, and discrimination surrounding mental disorders. In contrast, internet access and social media usage have increased significantly, providing experts and patients with a means of communication that may contribute to the development of methods to detect mental health issues among social media users.Entities:
Keywords: machine learning; mental health; risk assessment; social media; suicidal ideation
Mesh:
Year: 2020 PMID: 32673256 PMCID: PMC7381053 DOI: 10.2196/17758
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Description of behavioral features.
| Feature | Description | Source |
| Working week tweets count ratio | Total number of tweets on weekdays (Monday to Friday) normalized by the total amount of tweets | SPVa tweets |
| Weekend tweets count ratio | Total number of tweets on weekend days (Saturday and Sunday) normalized by the total amount of tweets | SPV tweets |
| Median time between tweets | Median of the time (in seconds) that passes between the publication of each tweet | SPV tweets |
| Sleep time tweets ratio | Ratio of tweets posted during the inferred sleep period of the user | Full profile tweets |
| Daytime tweets ratio | Ratio of tweets posted during the period the user is usually awake | Full profile tweets |
| Normalized tweet count per quarter (4 features) | Number of tweets posted by the user within each quarter of the year, normalized by the total amount of tweets generated by the user during the year | SPV tweets |
aSPV: short profile version.
Description of features based on tweet statistics.
| Feature | Description | Source |
| Suicide-related tweets ratio | Ratio of tweets retained by the SPVCa over all the tweets of the full profile | SPVb and full profile tweets |
| Median SPVC score | Median of the scores obtained by the tweets that are part of the SPV after applying the SPVC | SPV tweets |
| Median tweet length | Median length of all the user tweets (word level) | SPV tweets |
| Number of SPV tweets | Number of tweets | SPV tweets |
| Number of user tweets | Number of tweets posted by the user since the creation of the account | Tweet metadata |
aSPVC: short profile version classifier.
bSPV: short profile version.
Description of relational features.
| Feature | Description | Source |
| Followers number | Number of followers | Tweet metadata |
| Friends number | Number of accounts followed by the user | Tweet metadata |
| Favorites given | Total number of favorites given by the user | Tweet metadata |
| Median favorites count | Median of the favorites received by the user | SPVa tweets |
| Median retweets count | Median of the retweets received by the user | SPV tweets |
aSPV: short profile version.
Models and features.
| Model | Features | Number of features | |
|
|
| Task 1 | Task 2 |
| BoWa model | BoW features generated with the Tf.Idf vectorizer with 1- to 5-gram features | 24,645 | 24,336 |
| Embeddings model | Word embeddings representations as input for a text-based convolutional neural network model | 200 | 200 |
| SNPSYb model | SNPSY features=behavioral+relational+tweets statistics+lexicons+suicide risk factors vocabulary+sentiment analysis features | 112 | 112 |
| BoW+SNPSY model | BoW outputted feature+SNPSY features | 24,757 | 24,448 |
| Images+BoW model | Images user score+BoW features | 24,646 | 24,337 |
| Images+SNPSY model | Images user score+SNPSY features | 113 | 113 |
| Images+BoW+SNPSY model 1 | Ensemble model=images user score+BoW outputted feature+SNPSY outputted feature | 24,758 | 24,449 |
| Images+BoW+SNPSY model 2 | SNPSY features+images user score+BoW outputted feature | 114 | 114 |
| Selected features model 1 | Selected features from all the feature types with | 5807 | 14,882 |
| Selected features model 2 | Selected features from all the feature types with | 522 | 3250 |
aBoW: bag of words.
bSNPSY: social networks and psychological features.
Full dataset labeled group statistics.
| Description | Suicidal ideation risk group | Focused control group | Generic control group |
| Number of users | 84 | 84 | 84 |
| Number of tweets | 313,791 | 766,437 | 134,246 |
| Median number of tweets per user | 2797.5 | 2984 | 716 |
| Median tweet length | 11 | 19 | 14 |
| Number of images | 37,801 | 251,830 | 16,006 |
Medians and Distribution Overlapping Index for some of the attributes with the most significant differences between the Suicidal ideation and Focused control groups.
| Attribute | Suicidal ideation median | Focused control median | Overlapping index |
| Anxiety | 10.94 | 0 | 0.25 |
| Coursing terms | 21.52 | 7.68 | 0.43 |
| To die (self-reference) | 5.45 | 0 | 0.25 |
| I feel | 46.25 | 6.71 | 0.32 |
| Self-loathing | 0.03 | 0 | 0.35 |
| Verb I | 22.66 | 12.11 | 0.41 |
Medians and Overlapping Index for some of the attributes with the most significant differences between the Suicidal ideation and Generic control groups.
| Attribute | Suicidal ideation median | Generic control median | Overlapping index |
| Median classifier score | 0.72 | 0.65 | 0.46 |
| To die | 19.5 | 0 | 0.25 |
| Number of user tweets | 2076.5 | 453 | 0.38 |
| Health | 17.19 | 8.18 | 0.44 |
| Work | 35.46 | 49.59 | 0.44 |
| I | 41.32 | 9.60 | 0.23 |
Medians and Overlapping Index for the images score between the suicidal ideation, focused control and generic control classes.
| Attribute | Group | Median value | Overlapping index |
| Images score | Suicidal ideation | 0.24 | 0.64 |
| Focused control | 0.23 | ||
| Suicidal ideation | 0.24 | 0.52 | |
| Generic control | 0.23 |
Predictive task results.
| Model | Suicidal ideation versus focused control group | Suicidal ideation versus generic control group | |||||||||||
|
| Pra | Rb | F1c | Acd | AUCe | Classifier | Pr | R | F1 | Ac | AUC | Classifier | |
| BoWf model—full profile (baseline 1) | 0.78 | 0.81 | 0.79 | 0.78 | 0.81 | MLPg | 0.79 | 0.85 | 0.81 | 0.80 | 0.91 | MLP | |
| Embeddings model—full profile (baseline 2) | 0.76 | 0.81 | 0.79 | 0.77 | 0.82 | CNNh | 0.78 | 0.87 | 0.82 | 0.80 | 0.84 | CNN | |
| BoW model—SPVi (baseline 3) | 0.81 | 0.85 | 0.83 | 0.82 | 0.85 | LRj | 0.80 | 0.92k | 0.86 | 0.84 | 0.89 | MLP | |
| Embeddings model—SPV (baseline 4) | 0.79 | 0.85 | 0.82 | 0.80 | 0.83 | CNN | 0.77 | 0.87 | 0.82 | 0.80 | 0.82 | CNN | |
| SNPSYl model | 0.85 | 0.85 | 0.85 | 0.84 | 0.86 | SVMm | 0.85 | 0.88 | 0.87 | 0.86 | 0.94 | LR | |
| BoW+SNPSY model | 0.82 | 0.88k | 0.85 | 0.84 | 0.89 | RFn | 0.85 | 0.88 | 0.87 | 0.86 | 0.94 | LR | |
| Images+BoW model | 0.79 | 0.88k | 0.84 | 0.82 | 0.86 | MLP | 0.82 | 0.88 | 0.85 | 0.84 | 0.90 | LR | |
| Images+SNPSY model | 0.88k | 0.85 | 0.86k | 0.86k | 0.91 | SVM | 0.88 | 0.88 | 0.88k | 0.88k | 0.94 | LR | |
| Images+BoW+SNPSY model 1 | 0.85 | 0.85 | 0.85 | 0.83 | 0.87 | LR | 0.85 | 0.92k | 0.88k | 0.88k | 0.92 | MLP | |
| Images+BoW+SNPSY model 2 | 0.88k | 0.81 | 0.84 | 0.84 | 0.92k | SVM | 0.85 | 0.88 | 0.87 | 0.86 | 0.94 | LR | |
| Selected features model 1 ( | 0.85 | 0.85 | 0.85 | 0.84 | 0.90 | MLP | 0.91k | 0.77 | 0.83 | 0.84 | 0.94 | SVM | |
| Selected features model 2 ( | 0.83 | 0.77 | 0.80 | 0.80 | 0.92k | SVM | 0.91k | 0.81 | 0.86 | 0.86 | 0.95k | SVM | |
aPr: precision.
bR: recall.
cF1: F1 score.
dAc: accuracy.
eAUC: area under the curve.
fBoW: bag of words.
gMLP: multilayer perceptron.
hCNN: convolutional neural network.
iSPV: short profile version.
jLR: logistic regression.
kThe best results for each of the evaluation measures.
lSNPSY: Social networks and psychological features.
mSVM: support vector machine.
nRF: random forest.
Figure 1Features more correlated with the class to predict for both tasks: Suicidal ideation risk vs Focused control (left), and Suicidal ideation risk vs Generic control (right).
Figure 2Most predictive features for both tasks: Suicidal ideation risk vs Focused control (left), and Suicidal ideation risk vs Generic control (right).