| Literature DB >> 30522991 |
Benjamin J Ricard1,2, Lisa A Marsch2,3, Benjamin Crosier1,2, Saeed Hassanpour1,2,4,5.
Abstract
BACKGROUND: The content produced by individuals on various social media platforms has been successfully used to identify mental illness, including depression. However, most of the previous work in this area has focused on user-generated content, that is, content created by the individual, such as an individual's posts and pictures. In this study, we explored the predictive capability of community-generated content, that is, the data generated by a community of friends or followers, rather than by a sole individual, to identify depression among social media users.Entities:
Keywords: depression; machine learning; mental health; social media
Mesh:
Year: 2018 PMID: 30522991 PMCID: PMC6302231 DOI: 10.2196/11817
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Our cohort characteristics and their associated features. The last column specifies which models, if any, contain the variable.
| Characteristic | Statistics | Model inclusion (user/community) | |||
| Subjects (n) | 749 | Both | |||
| Emoji sentiment, captions | 0.39 (0.25) | User-based | |||
| Emoji sentiment, comments | 0.47 (0.17) | Community-based | |||
| ANEWa valence, captions | 6.55 (0.4) | User-based | |||
| SD ANEW valence, captions | 1.05 (0.36) | User-based | |||
| ANEW domination, captions | 5.66 (0.25) | User-based | |||
| SD ANEW domination, captions | 0.66 (0.23) | User-based | |||
| ANEW arousal, captions | 5.36 (0.25) | User-based | |||
| SD ANEW arousal, captions | 0.65 (0.2) | User-based | |||
| LabMTb score, captions | 5.81 (0.23) | User-based | |||
| SD LabMT score, captions | 0.57 (0.21) | User-based | |||
| ANEW valence, comments | 6.83 (0.55) | Community-based | |||
| SD ANEW valence, comments | 0.99 (0.5) | Community-based | |||
| ANEW domination, comments | 5.77 (0.32) | Community-based | |||
| SD ANEW domination, comments | 0.63 (0.3) | Community-based | |||
| ANEW arousal, comments | 5.51 (0.3) | Community-based | |||
| SD ANEW arousal, comments | 0.59 (0.23) | Community-based | |||
| LabMT score, comments | 0.62 (0.29) | Community-based | |||
| SD LabMT score, comments | 5.91 (0.34) | Community-based | |||
| Number of posts | 333.55 (476.59) | Both | |||
| Number of likes | 27.25 (55.46) | Both | |||
| Number of comments per post | 1.63 (1.8) | Both | |||
| Number of comments, total | 245.25 (616.41) | Both | |||
| Fraction of posts with no captions | 0.03 (0.07) | User-based | |||
| Fraction of posts with no comments | 0.48 (0.24) | Community-based | |||
| Caption length by word | 12.39 (10.07) | User-based | |||
| Comment length by word | 10.09 (13.21) | Community-based | |||
| Age (years), mean (SD) | 26.7 (7.29) | Neither | |||
| Female, n (%) | 515 (68.8) | Both | |||
| Male, n (%) | 234 (31.2) | Both | |||
| Asian, n (%) | 51 (6.8) | Neither | |||
| Black, n (%) | 143 (19.1) | Neither | |||
| Hispanic/Latino, n (%) | 91 (12.1) | Neither | |||
| Native American/Alaskan Native, n (%) | 10 (1.3) | Neither | |||
| Native Hawaiian/Pacific Islander, n (%) | 2 (0.2) | Neither | |||
| Other, n (%) | 27 (3.6) | Neither | |||
| White, n (%) | 425 (56.7) | Neither | |||
| PHQ-8c score, mean (SD) | 6.62 (5.22) | Neither | |||
| PHQ-8 ≥15, n (%) | 69 (9.2) | Neither | |||
aANEW: Affective Norms for English Words.
bLabMT: Language assessment by Mechanical Turk.
cPHQ-8: Patient Health Questionnaire-8.
Figure 1Overview of our machine learning methodology. From the original 749 participating individuals, 78 (ie, 10% of the dataset) were randomly selected and held out for testing. The remaining 671 cases were used for training and parameter-tuning through cross-validation. AUC: area under curve.
Figure 2Classification receiver operating characteristic curves for the predictive capability of user-generated data, community-generated data, and the combination of both to predict major depressive disorder in 78 social media users. The models that included community-generated data were significantly better than random classification, as measured with a Mann-Whitney U test (P=.03 and P=.02 for community-generated and combined, respectively), whereas the model trained on only user-generated data was not (P=.11). AUC: area under curve.
Figure 3Minimum-maximum normalized linear regression coefficients for the model based on (A) user-generated data, (B) community-generated data, and (C) both. “Gender” variable indicates if the individual is male. These weights indicate the relative importance of each feature in the corresponding model. ANEW: Affective Norms for English Words; LabMT: Language assessment by Mechanical Turk.
Optimal cutoffs using the highest observed F score for user-generated, community-generated, combined, and bag-of-words models and comparison with physician rates.
| Method | Sensitivity | Specificity | |
| Physician (meta-analysis [ | 0.62 | 0.50 | 0.81 |
| Baseline feature set (BOWa) | 0.58 | 0.50 | 0.69 |
| User-generated | 0.66 | 0.57 | 0.77 |
| Community-generated | 0.69 | 0.57 | 0.87 |
| Community- and user-generated | 0.70 | 0.57 | 0.92 |
aBOW: bag-of-words.