| Literature DB >> 28807891 |
Michael L Birnbaum1,2,3, Sindhu Kiranmai Ernala4, Asra F Rizvi1,2, Munmun De Choudhury4, John M Kane1,2,3.
Abstract
BACKGROUND: Linguistic analysis of publicly available Twitter feeds have achieved success in differentiating individuals who self-disclose online as having schizophrenia from healthy controls. To date, limited efforts have included expert input to evaluate the authenticity of diagnostic self-disclosures.Entities:
Keywords: Twitter; linguistic analysis; machine learning; online social networks; psychotic disorders; schizophrenia
Mesh:
Year: 2017 PMID: 28807891 PMCID: PMC5575421 DOI: 10.2196/jmir.7956
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Descriptive statistics of acquired Twitter data.
| Results | Schizophrenia group (n=146) | Control group (n=146) |
| Total tweets by unique users, n | 1,940,921 | 791,092 |
| Mean tweets per user, mean (SD) | 13,293.93 (18,134.83) | 5418.43 (11,403.54) |
| Median tweets per user, median (IQR) | 5542.5 (14,651.8) | 1660.0 (4402.3) |
| Range of tweets per user (min-max) | 8-88,169 | 1-82,985 |
Mann-Whitney U test results comparing the linguistic differences between users with schizophrenia and the control datasets.
| LIWC category | Difference in mean LIWC scores between groups | |||
| Positive affect | 0.262 | 8517.5 | .002 | |
| Negative affect | 0.283 | 7873.5 | <.001 | |
| Sadness | 0.241 | 5301.5 | <.001 | |
| Swear | 0.164 | 8557.5 | .002 | |
| Auxiliary verbs | 0.319 | 5712.5 | <.001 | |
| Preposition | 0.186 | 7162.0 | <.001 | |
| Article | 0.426 | 5812.0 | <.001 | |
| Inclusive | 0.410 | 8262.5 | <.001 | |
| Exclusive | 0.347 | 4753.0 | <.001 | |
| Quantifier | 0.079 | 991.0 | <.001 | |
| Past tense | 0.194 | 7809.5 | <.001 | |
| Present tense | 0.304 | 7501.0 | <.001 | |
| Future tense | 0.185 | 4130.5 | <.001 | |
| First-person singular | 0.024 | 3387.0 | <.001 | |
| First-person plural | 0.006 | 8401.5 | <.001 | |
| Third person | 0.243 | 7329.5 | <.001 | |
| Indefinite pronoun | 0.265 | 2691.5 | <.001 | |
| Cognitive mechanisms | 0.307 | 9418.0 | .04 | |
| Discrepancies | 0.220 | 8975.5 | .01 | |
| Inhibition | 0.257 | 7738.5 | <.001 | |
| Negation | 0.187 | 9318.5 | .03 | |
| Causation | 0.353 | 8023.5 | <.001 | |
| Certainty | 0.110 | 6101.5 | <.001 | |
| Tentativeness | 0.266 | 1841.5 | <.001 | |
| Hear | 0.163 | 1796.5 | <.001 | |
| Feel | 0.270 | 7555.5 | <.001 | |
| Perception | 0.257 | 3340.5 | <.001 | |
| Insight | 0.396 | 7918.5 | <.001 | |
| Friends | –0.068 | 3269.0 | <.001 | |
| Work | 0.036 | 5917.5 | <.001 | |
| Health | 1.143 | 6775.0 | <.001 | |
| Humans | 0.039 | 2963.5 | <.001 | |
| Biological Processes | 0.427 | 7587.5 | <.001 | |
| Body | 0.150 | 8021.5 | <.001 | |
| Achievement | 0.087 | 6057.5 | <.001 | |
| Home | 0.134 | 6261.5 | <.001 | |
| Sexual | 0.494 | 8898.5 | .007 | |
aBased on Bonferroni correction.
Classification results to distinguish between schizophrenia users and control users.
| Results | Accuracy | Precision | Recall | F1 score | ROC AUC |
| Best performance | 0.90 | 0.92 | 0.87 | 0.90 | 0.95 |
| Average over 10 folds, mean (SD) | 0.81 (0.07) | 0.80 (0.09) | 0.82 (0.05) | 0.80 (0.07) | 0.88 (0.04) |
Figure 1Receiver operating characteristic (ROC) curves for the classification task.
Confusion matrix showing agreement and disagreement between the machine learning classifier and the experts.
| Machine label | Expert annotation | |
| Yes | No | |
| Yes | 14 | 37 |
| No | 4 | 45 |