| Literature DB >> 26306253 |
Kimberly McManus1, Emily K Mallory1, Rachel L Goldfeder1, Winston A Haynes1, Jonathan D Tatum1.
Abstract
Individuals who suffer from schizophrenia comprise I percent of the United States population and are four times more likely to die of suicide than the general US population. Identification of at-risk individuals with schizophrenia is challenging when they do not seek treatment. Microblogging platforms allow users to share their thoughts and emotions with the world in short snippets of text. In this work, we leveraged the large corpus of Twitter posts and machine-learning methodologies to detect individuals with schizophrenia. Using features from tweets such as emoticon use, posting time of day, and dictionary terms, we trained, built, and validated several machine learning models. Our support vector machine model achieved the best performance with 92% precision and 71% recall on the held-out test set. Additionally, we built a web application that dynamically displays summary statistics between cohorts. This enables outreach to undiagnosed individuals, improved physician diagnoses, and destigmatization of schizophrenia.Entities:
Year: 2015 PMID: 26306253 PMCID: PMC4525233
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Approaches for classifying depression using social media.
| Ref | Media | Cohort Acquisition | Features | Approach | Results |
|---|---|---|---|---|---|
| Clinical depression surveys | Interactions, emoticons, vocabulary, drugs, linguistic style, behaviors | Support vector machine | 0.74 precision 0.63 recall | ||
| Bulletin boards | Prozac post, doctor curation | Vocabulary | 2-step support vector machine | 0.82 accuracy AUC 0.88 | |
| Sina | Psychologist curation | Pronoun use, emoticons. interactions, behaviors | Weka, BayesNet | 0.91 accuracy AUC 0.90 |
Figure 1.Analysis workflow for feature extraction, model building, and evaluation.
Figure 2.5-Fold cross validation results.
Testing data results.
| Model | Precision | Recall | Accuracy | F1 |
|---|---|---|---|---|
| SVM + PCA | 0.706 | |||
| ANN + PCA | 0.813 | 0.875 | 0.788 | |
| NB + Log | 0.688 | 0.647 | 0.803 | 0.667 |