| Literature DB >> 32401222 |
Derek Howard1,2, Marta M Maslej1,2, Justin Lee3, Jacob Ritchie1,4, Geoffrey Woollard5,6, Leon French1,2,7,8.
Abstract
BACKGROUND: Mental illness affects a significant portion of the worldwide population. Online mental health forums can provide a supportive environment for those afflicted and also generate a large amount of data that can be mined to predict mental health states using machine learning methods.Entities:
Keywords: classification; data interpretation, statistical; machine learning; mental health; natural language processing; social support; transfer learning; triage
Year: 2020 PMID: 32401222 PMCID: PMC7254287 DOI: 10.2196/15371
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Benchmarking by features, automated machine learning methods, and datasets with the macro-F1 metric.
| Feature set | Feature count | Tree-based Optimization Tool | Auto-Sklearn | ||||
|
|
| Train 10-fold, 5 times | Test | Reddit validation | Train 10-fold, 5 times | Test | Reddit validation |
| Empath (post) | 195 | 0.280 | 0.253 | 0.385a | 0.292 | 0.344 | 0.321 |
| Linguistic Inquiry and Word Count | 70 | 0.434 | 0.354 | 0.346a | 0.433 | 0.380 | 0.315 |
| Valence Aware Dictionary and sEntiment Reasoner (sentence) | 12 | 0.363 | 0.263 | 0.356a | 0.340 | 0.263 | 0.353a |
| Emoji 64 | 192 | 0.425 | 0.369 | 0.280 | 0.424 | 0.461 | 0.308 |
| DeepMoji | 6912 | 0.442 | 0.452 | 0.345a | 0.391 | 0.437 | 0.351a |
| Universal Sentence Encoder | 1536 | 0.457 | 0.446 | 0.300 | 0.484 | 0.479 | 0.236 |
| GPTb default | 2304 | 0.373 | 0.334 | 0.344a | 0.396 | 0.383 | 0.402a |
| GPT fine-tuned | 2304 | 0.510 | 0.559 | 0.320 | 0.492 | 0.572 | 0.324 |
aReddit validation performance better than chance.
bGPT: Generative Pretrained Transformer.
Figure 1Confusion matrices for 2 models trained with Auto-Sklearn. Each cell in the matrix provides the counts of posts that were labeled in the corresponding row and column axis that represent the predicted and true labels, respectively. Counts are colored from the highest cell (blue) to the lowest (white). The top-left to bottom-right diagonal cells count correctly predicted posts. Panel A trained with Valence Aware Dictionary and sEntiment Reasoner (VADER) features. Panel B trained with features from a fine-tuned Generative Pretrained Transformer (GPT) language model.
Figure 2A graph of macro-F1 test scores versus the number of posts used for Generative Pretrained Transformer-1 fine-tuning. Auto-Sklearn methods are marked with continuous red (Auto-Sklearn) and dashed blue (Tree-based Optimization Tool, TPOT) lines.
Mantel correlations between the extracted feature sets.
| Feature Set | VADERa | Empath | LIWCb | Universal Sentence | Emoji 64 | DeepMoji | GPTc default | GPT fine-tuned |
| VADER | 1.000 | 0.003 | 0.098 | 0.453 | 0.211 | 0.422 | 0.430 | 0.429 |
| Empath | 0.003 | 1.000 | 0.009 | 0.006 | −0.005 | −0.008 | 0.004 | 0.001 |
| LIWC | 0.098 | 0.009 | 1.000 | 0.148 | 0.403 | 0.507 | 0.267 | 0.253 |
| Universal Sentence | 0.453 | 0.006 | 0.148 | 1.000 | 0.193 | 0.509 | 0.823 | 0.823 |
| Emoji 64 | 0.211 | −0.005 | 0.403 | 0.193 | 1.000 | 0.523 | 0.302 | 0.335 |
| DeepMoji | 0.422 | −0.008 | 0.507 | 0.509 | 0.523 | 1.000 | 0.632 | 0.631 |
| GPT default | 0.430 | 0.004 | 0.267 | 0.823 | 0.302 | 0.632 | 1.000 | 0.799 |
| GPT fine-tuned | 0.429 | 0.001 | 0.253 | 0.823 | 0.335 | 0.631 | 0.799 | 1.000 |
aVADER: Valence Aware Dictionary and sEntiment Reasoner.
bLIWC: Linguistic Inquiry and Word Count.
cGPT: Generative Pretrained Transformer.
Figure 3Violin plot showing the distributions of the 10 most discriminative emoji features across labeled classes. The classes are according to label with crisis in gray. The y-axis is the predicted scores for each emoji that have been scaled to the 0-1 interval. The emojis across the y-axis are marked with their images and their official Unicode text labels. The emojis are ranked from the most to least important feature (left to right).
Figure 4Predictions and highlights of suicide-related composite quotes from Furqan and colleagues. Words that changed predictions are color coded. Replacing a yellow or red word with an unknown word shifts the prediction to a less severe class by 1 or 2 levels, respectively, (ie, replacing a yellow word in text that is classified as crisis would change the prediction to red while a red word would change it to amber). In contrast, replacement of green words will result in more severe predictions.