| Literature DB >> 35976193 |
Hannah Metzler1,2,3,4,5, Hubert Baginski3,6, Thomas Niederkrotenthaler2, David Garcia1,3,4.
Abstract
BACKGROUND: Research has repeatedly shown that exposure to suicide-related news media content is associated with suicide rates, with some content characteristics likely having harmful and others potentially protective effects. Although good evidence exists for a few selected characteristics, systematic and large-scale investigations are lacking. Moreover, the growing importance of social media, particularly among young adults, calls for studies on the effects of the content posted on these platforms.Entities:
Keywords: Twitter; deep learning; machine learning; social media; suicide prevention
Mesh:
Year: 2022 PMID: 35976193 PMCID: PMC9434391 DOI: 10.2196/34705
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 7.076
Figure 1Creating the labeled data set and annotation scheme. Each box describes how tweets were selected from the large pool of available tweets, how many tweets were added to the training data set in each step (after removing duplicates), and how many coders labeled each tweet. When we used preliminary model predictions to identify potential candidates for each category, we deleted the model labels before manual coding. After rounds with 2 coders, we checked interrater reliability, adapted the annotation scheme until all disagreements were clarified, and relabeled the respective sample.
Annotation scheme of content categories organized along two dimensions: message type and underlying perspective about suicide.
| Message type | Underlying perspective | ||
|
| Problem and suffering | Solution and coping | |
| Personal experiences first or third person | Suicidal ideation and attemptsa | Coping (Papageno)a | |
| News about experiences and behavior | News suicidal ideation and attempts | News coping | |
| Experience of bereaved | Bereaved negative | Bereaved coping | |
| Case reports | Suicide cases (Werther)a | Lives saved | |
| Calls for action | Awarenessa | Preventiona | |
|
| |||
|
| Suicide other | Murder-suicides, history, fiction, not being suicidal, and opinions | |
|
| Off-topicb | Bombings, euthanasia, jokes, metaphors, and band or song names | |
aThe 6 main categories classified in machine learning task 1.
bTask 2 distinguished the off-topic category from all other categories (see Classification Tasks).
Figure 2Overview of characteristics of data sets. Each box describes the purpose of the data set, further details on how it was used or created, and the sample size. Only the predictions data set includes retweets, as it aims to capture the full volume of tweets posted on a given day. BERT: Bidirectional Encoder Representations from Transformers.
Distribution of tweets across categories for manual labels and model predictions.
| Category label | Total labeled sample (n=3202) | Subset of labeled tweets, randomly selected (n=1000) | Estimated frequency in predictions data set (including retweets; n=7.15 million), n (%)a | ||||||
|
|
|
| Task 1 | Task 2 | |||||
| Suicidal ideation and attempts, n (%) | 284 (8.87) | 63 (6.33) | 367,135.56 (5.13) | 5,471,499 (76.52) | |||||
| Coping, n (%) | 205 (6.4) | 26 (2.71) | 90,328.99 (1.26) | 5,471,499 (76.52) | |||||
| Awareness, n (%) | 314 (9.81) | 126 (12.54) | 1,577,650 (22.06) | 5,471,499 (76.52) | |||||
| Prevention, n (%) | 457 (14.27) | 71 (7.13) | 1,109,223.6 (15.51) | 5,471,499 (76.52) | |||||
| Suicide cases, n (%) | 514 (16.05) | 129 (12.95) | 1,155,277.92 (16.16) | 5,471,499, (76.52) | |||||
| Irrelevant, n (%) | 1428 (44.5) | 581 (58.33) | 2,850,994 (39.88) | 5,471,499 (76.52) | |||||
|
| |||||||||
|
| News suicidal | 68 (2.12) | 20 (2.01) | 2,850,994 (39.88) | 5,471,499 (76.52) | ||||
|
| News coping | 27 (0.84) | 5 (0.5) | 2,850,994 (39.88) | 5,471,499 (76.52) | ||||
|
| Bereaved negative | 34 (1.06) | 7 (0.7) | 2,850,994 (39.88) | 5,471,499 (76.52) | ||||
|
| Bereaved coping | 34 (1.06) | 5 (0.5) | 2,850,994 (39.88) | 5471499 (76.52) | ||||
|
| Live saved | 13 (0.41) | 2 (0.2) | 2,850,994 (39.88) | 5,471,499 (76.52) | ||||
|
| Suicide other | 440 (13.74) | 206 (20.68) | 2,850,994 (39.88) | 5,471,499 (76.52) | ||||
|
| Off-topic | 812 (25.36) | 336 (33.73) | 2,850,994 (39.88) | 1,679,111 (23.48) | ||||
aFor the predictions data set: Absolute values and percentages were weighted (ie, divided) by the model’s recall (proportion of all true cases the model detects). Sample values (n) and percentage for the irrelevant category were calculated by subtracting the sum of all other categories from the total sample size and 100, respectively. If several cells contain the same values, this is because they were subsumed to one higher-level category (irrelevant in task 1, about suicide in task 2) in the respective classification task.
Macroaveraged performance metrics and accuracy cross all 6 categories on the validation and test set.
| Model | Validation set (n=513) | Test set (n=641) | |||||||
|
| Precision | Recall |
| Accuracy | Precision | Recall |
| Accuracy | |
| Majority classifier | 0.07 | 0.17 | 0.10 | 0.45 | 0.07 | 0.17 | 0.10 | 0.44 | |
| TF-IDFa and SVMb | 0.61 | 0.63 | 0.62 | 0.66 | 0.61 | 0.65 | 0.62 | 0.66 | |
| BERTc,d | 0.73 | 0.71 | 0.71 | 0.76 | 0.72 | 0.69 | 0.70 | 0.73 | |
| XLNetd | 0.74 | 0.73 | 0.73 | 0.77 | 0.71 | 0.71 | 0.71 | 0.74 | |
aTF-IDF: term frequency-inverse document frequency.
bSVM: support vector machine.
cBERT: Bidirectional Encoder Representations from Transformers.
dGiven that the performance of both deep learning models with fixed seeds and parameters varied slightly from run to run owing to internal segmentation, we ran these models 5 times. We report the average of all 5 runs in this section and include the metrics for each individual run in Table S2, in Multimedia Appendix 1.
Intraclass performance metrics on the test set.
| Category | TF-IDFa and SVMb | BERTc,d | XLNetd | ||||||||
|
| Precision (95% CI) | Recall (95% CI) |
| Precision (95% CI) | Recall (95% CI) |
| Precision (95% CI) | Recall (95% CI) |
| ||
| Suicidal ideation (n=57) | 0.32 (21.93-43.58) | 0.44 (30.74-57.64) | 0.37 | 0.58 (43.25-73.66) | 0.45 (32.36-59.34) | 0.51 | 0.60 (46.11-74.16) | 0.54 (40.66-67.64) | 0.55 | ||
| Coping (n=42) | 0.44 (31.55-57.55) | 0.64 (48.03-78.45) | 0.52 | 0.76 (59.76-88.56) | 0.69 (52.91-82.38) | 0.72 | 0.71 (54.80-83.24) | 0.74 (57.96-86.14) | 0.73 | ||
| Awareness (n=63) | 0.65 (51.60-76.87) | 0.62 (48.80-73.85) | 0.63 | 0.71 (58.05-81.80) | 0.70 (56.98-80.77) | 0.70 | 0.69 (56.74-79.76) | 0.74 (62.06-84.73) | 0.72 | ||
| Prevention (n=91) | 0.83 (74.00-90.36) | 0.82 (73.02-89.60) | 0.83 | 0.81 (71.93-88.16) | 0.89 (80.72-94.60) | 0.85 | 0.82 (72.27-88.62) | 0.87 (78.10-93.00) | 0.84 | ||
| Suicide cases (n=103) | 0.70 (60.82-78.77) | 0.74 (64.20-81.96) | 0.72 | 0.75 (65.14-82.49) | 0.77 (67.34-84.46) | 0.76 | 0.78 (68.31-85.52) | 0.75 (65.24-82.80) | 0.76 | ||
| Irrelevant (n=285) | 0.74 (67.78-79.18) | 0.63 (57.27-68.77) | 0.68 | 0.64 (57.76-69.11) | 0.65 (59.06-70.45) | 0.64 | 0.68 (61.96-73.46) | 0.64 (57.99-69.44) | 0.66 | ||
aTF-IDF: term frequency-inverse document frequency.
bSVM: support vector machine.
cBERT: Bidirectional Encoder Representations from Transformers.
dScores are averages across 5 model runs for BERT and XLNet. Table S3 in Multimedia Appendix 1 shows separate runs.
Figure 3Performance scores per category for Bidirectional Encoder Representations from Transformers (BERT) for the 6 main categories (A) and for tweets about actual suicide versus off-topic tweets (B).
Macroaveraged performance metrics and accuracy for task 2 (about suicide vs off-topic) on the validation and test sets.
| Model | Validation set (n=513) | Test set (n=641) | |||||||
|
| Precision | Recall |
| Accuracy | Precision | Recall |
| Accuracy | |
| Majority classifier | 0.37 | 0.50 | 0.43 | 0.75 | 0.37 | 0.50 | 0.43 | 0.75 | |
| TF-IDFa and SVMb | 0.74 | 0.77 | 0.75 | 0.80 | 0.75 | 0.77 | 0.76 | 0.81 | |
| BERTc | 0.85 | 0.81 | 0.83 | 0.88 | 0.85 | 0.81 | 0.83 | 0.88 | |
| XLNet | 0.84 | 0.78 | 0.81 | 0.87 | 0.83 | 0.80 | 0.81 | 0.87 | |
aTF-IDF: term frequency-inverse document frequency.
bSVM: support vector machine.
cBERT: Bidirectional Encoder Representations from Transformers.
Intraclass performance metrics for deep learning models in task 2 (about suicide vs off-topic) on the test set.
| Test set and model | About suicide (n=478) | Off-topic (n=163) | |||||
|
| Precision (95% CI) | Recall (95% CI) |
| Precision (95% CI) | Recall (95% CI) |
| |
| TF-IDFa and SVMb | 0.89 (85.74-91.71) | 0.85 (80.96-87.64) | 0.87 | 0.60 (53.03-67.49) | 0.69 (61.63-76.30) | 0.65 | |
| BERTc,d | 0.90 (87.42-92.81) | 0.94 (91.64-96.07) | 0.92 | 0.80 (71.62-85.67) | 0.68 (60.35-75.17) | 0.73 | |
| XLNetd | 0.90 (87.12-92.59) | 0.93 (90.68-95.38) | 0.92 | 0.76 (68.60-83.06) | 0.67 (59.72-74.60) | 0.71 | |
aTF-IDF: term frequency-inverse document frequency.
bSVM: support vector machine.
cBERT: Bidirectional Encoder Representations from Transformers.
dScores are averages across 5 model runs for BERT and XLNet. Table S5 in Multimedia Appendix 1 shows separate runs.
Figure 4Confusion matrix of true and predicted labels in the reliability data set. (A) percentages and (B) count of tweets per true and predicted category. The diagonal from bottom left to top right represents correct predictions. True labels are labels by coder 1, and predicted labels are by Bidirectional Encoder Representations from Transformers (BERT).
Figure 5Daily percent of tweets per predicted category in the predictions data set (n=7.15 million). The daily value subsumes original and retweets per category. Key words for event peaks are explained in the main text.