| Literature DB >> 32275658 |
Amir Hossein Yazdavar1,2, Mohammad Saeid Mahdavinejad3,4, Goonmeet Bajaj5, William Romine6, Amit Sheth7, Amir Hassan Monadjemi3, Krishnaprasad Thirunarayan4, John M Meddar2, Annie Myers2, Jyotishman Pathak2, Pascal Hitzler1.
Abstract
Depression is a major public health concern in the U.S. and globally. While successful early identification and treatment can lead to many positive health and behavioral outcomes, depression, remains undiagnosed, untreated or undertreated due to several reasons, including denial of the illness as well as cultural and social stigma. With the ubiquity of social media platforms, millions of people are now sharing their online persona by expressing their thoughts, moods, emotions, and even their daily struggles with mental health on social media. Unlike traditional observational cohort studies conducted through questionnaires and self-reported surveys, we explore the reliable detection of depressive symptoms from tweets obtained, unobtrusively. Particularly, we examine and exploit multimodal big (social) data to discern depressive behaviors using a wide variety of features including individual-level demographics. By developing a multimodal framework and employing statistical techniques to fuse heterogeneous sets of features obtained through the processing of visual, textual, and user interaction data, we significantly enhance the current state-of-the-art approaches for identifying depressed individuals on Twitter (improving the average F1-Score by 5 percent) as well as facilitate demographic inferences from social media. Besides providing insights into the relationship between demographics and mental health, our research assists in the design of a new breed of demographic-aware health interventions.Entities:
Year: 2020 PMID: 32275658 PMCID: PMC7147779 DOI: 10.1371/journal.pone.0226248
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Sample of depressive-indicative phrases collected from tweets.
| Clinical Depression Symptoms | Depressive-indicative phrases in tweets |
|---|---|
| “People hate me,” “I am Ugly,” “I am depressed” | |
| “we will never sleep,” “we’re fuxx dead” | |
| “I’m that tired,” “why can’t I sleep” | |
| “0 energy to do anything” | |
| “cba with work,” “I just want to snuggle up all day in bed” | |
| “Must not.eat,” “must.be.thin” | |
| “94lbs, urgh I disgust myself” | |
| “Obssessed with my weight,” “I just want be skinny” | |
| “I feel like a failure” | |
| “Im a piece of shix,” | |
| “I just don’t want to wake up tomorrow morning” | |
| “all my blades are so fuxx blunt” | |
| “Thinking hanging myself,” “I’ve never been so sure about suicide” | |
| “how much blood can bleed from a cut into a vain” |
Fig 1The age distribution for depressed and control users in ground-truth dataset.
Fig 2Gender and depressive behavior association (Chi-square test: Color-code: (blue:Association), (red: Repulsion), size: Amount of each cell’s contribution).
Facial presence comparison in profile/posted images for depressed and control users—*** alpha = 0.05.
| Face_Found_in | % Of Users | ||
|---|---|---|---|
| Depressed | Control | ||
| 72% | 81% | 163.52*** | |
| 4% | 12% | 167.2*** | |
| 8% | 7% | 2.55 | |
Statistics of processed shared/profile images.
| # of Processed Prof. Images | # of Processed Shared Images | ||
|---|---|---|---|
| Depressed | Control | Depressed | Control |
| 3466 | 4127 | 265785 | 401435 |
Fig 3The Pearson correlation between the average emotions derived from facial expressions through the shared images and emotions from textual content for depressed-(a) and control users-(b).
Pairs without statistically significant correlation are crossed (p-value <0.05).
Statistical significance (t-statistic) of the mean of salient features for both depressed and control classes—** alpha = 0.05, *** alpha = 0.05/223.
| Feature | Depressed ( | Control ( | 95 percent Conf. interval | T-stat | |
|---|---|---|---|---|---|
| Profile_colorfulness | 108.05 | 118.85 | (-15.38, -6.22) | -4.62*** | |
| Profile_averageRGB | 134.39 | 139.00 | (2.3 6.92) | -3.92*** | |
| Profile_naturalness | 0.37 | 0.61 | (-0.304, -0.192) | -12.72*** | |
| Profile_hueVAR | 0.0517 | 0.072 | (-0.027, -0.008) | -4.56*** | |
| Profile_saturationVAR | 0.032 | 0.040 | (-0.015, -0.003) | -3.92*** | |
| Profile_saturationMean | 0.21 | 0.31 | (-0.122, -0.078) | -8.95*** | |
| Shared_imageBlueChan.Mean | 119.53 | 134.09 | (-9.82, -19.28) | -6.04*** | |
| Shared_imageGrayScaleMean | 0.54 | 0.49 | (0.03, 0.068) | 5.47*** | |
| Shared_imageColorfulness | 106.12 | 122.37 | (-14.98, -10.753) | -11.94*** | |
| Shared_imageSaturationVAR | 0.033 | 0.047 | (-0.01, -0.010) | -9.26*** | |
| Shared_imageSaturationMean | 0.198 | 0.289 | (-0.106, -0.074) | -10.95*** | |
| Shared_imageNaturalness | 0.486 | 0.651 | (-0.193, -0.136) | -16.28*** | |
| Friends_count | 610.196 | 1380.25 | (-1023, -516) | -5.98*** | |
| Followers_count | 589.47 | 1340.83 | (-1148.08, -354) | -3.727** | |
| Statuses_count | 3722 | 7766 | (-6281, -1806) | -3.55** | |
| Avg_tweet_favorite_count | 0.22 | 0.67 | (-0.781, -0.103) | -2.57** | |
| Avg_tweet_retweet_count | 876.75 | 2720 | (-2673, -1013) | -4.36*** | |
| Favourites_count | 2021 | 5199.67 | (-5038, -1317) | -3.35** |
Statistical significance test of linguistic patterns/visual attributes for different age groups with one-way ANOVA, *** alpha = 0.001, ** alpha = 0.01.
| Feature | Mean (SD) | F-value | |||||
|---|---|---|---|---|---|---|---|
| [11,19) | [19,23) | [23,34) | [34,46) | [46,60) | |||
| 27.62 | 38.61 | 47.28 | 67.88 | 72.05 | 84*** | ||
| 58.54 | 55.04 | 49.21 | 33.99 | 28.39 | 22*** | ||
| 51.6 | 53.43 | 56.27 | 70.28 | 71.21 | 9*** | ||
| 85.04 | 82.63 | 80.48 | 75.87 | 74.09 | 37*** | ||
| 3.52 | 3.92 | 4.00 | 4.52 | 5.13 | 35*** | ||
| 15.48 | 16.58 | 18.65 | 20.88 | 21.33 | 52*** | ||
| 12.17 | 11.24 | 10.99 | 8.36 | 8.75 | 28*** | ||
| 14.13 | 12.45 | 10.96 | 9.05 | 7.55 | 85*** | ||
| 0.96 | 0.89 | 0.57 | 0.36 | 0.33 | 18*** | ||
| 0.27 | 0.38 | 0.45 | 0.52 | 0.78 | 15*** | ||
| 0.80 | 1.09 | 1.31 | 1.67 | 2.02 | 69*** | ||
| 37.80 | 48.05 | 52.33 | 64.33 | 68.07 | 10*** | ||
| 20.31 | 23.27 | 29.78 | 38.76 | 33.13 | 9*** | ||
| 106.47 | 107.95 | 111.01 | 113.97 | 123.60 | 0.89 | ||
| 139.20 | 140.45 | 131.55 | 133.74 | 139.02 | 3** | ||
| 0.471 | 0.474 | 0.456 | 0.470 | 0.450 | 0.12 | ||
Fig 4Characterizing linguistic patterns in two aspects: Depressive-behavior and age distribution.
Age Prediction performance from visual and textual content for different age group(years old).
| Group | Measure | Text-based | Image-based (Profile) | Image-based (Media) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| (11,19] | (19,23] | (23,34] | (34,46] | (11,19] | (19,23] | (23,34] | (34,46] | (11,19] | (19,23] | (23,34] | (34,46] | ||
| 0.23 | 0.38 | 0.65 | 0.33 | 0.29 | 0.29 | 0.22 | 1.0 | 0.11 | 0.1 | 0.19 | 0.22 | ||
| 0.95 | 0.53 | 0.69 | 0.96 | 0.92 | 0.92 | 0.57 | 0.80 | 0.96 | 0.94 | 0.72 | 0.58 | ||
| 0.59 | 0.46 | 0.67 | 0.65 | 0.47 | 0.46 | 0.40 | 0.900 | 0.50 | 0.49 | 0.46 | 0.40 | ||
| 0.14 | 0.31 | 0.62 | 0.69 | 0.12 | 0.1 | 0.40 | 0.25 | 0.15 | 0.30 | 0.63 | 0.64 | ||
| 0.98 | 0.63 | 0.61 | 0.90 | 0.90 | 0.95 | 0.53 | 0.75 | 0.98 | 0.62 | 0.60 | 0.91 | ||
| 0.56 | 0.47 | 0.62 | 0.80 | 0.49 | 0.48 | 0.47 | 0.51 | 0.56 | 0.46 | 0.62 | 0.77 | ||
Gender prediction performance through visual and textual content.
| Face found in | Image-based Predictor | Content-based Predictor | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Depressed | Control | Depressed | Control | |||||||||
| Sens. | Spec. | ACC (95% CI) | Sens. | Spec. | ACC (95% CI) | Sens. | Spec. | ACC (95% CI) | Sens. | Spec. | ACC (95% CI) | |
| 0.90 | 1.0 | 0.92 | 0.91 | 0.87 | 0.90 | 0.87 | 0.50 | 0.82 | 0.86 | 0.76 | 0.82 | |
| 0.57 | 0.70 | 0.58 | 0.46 | 0.65 | 0.51 | |||||||
Facial presentation distribution for different age group(in years old) in profile and media.
| % Users Faces_Found_ in_Profile | % Users Faces_Found_ in_Media | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| [11,19) | [19,23) | [23,34) | [34,46) | [46,60) | [11,19) | [19,23) | [23,34) | [34,46) | [46,60) | |
| 4.55 | 9.58 | 13.84 | 17.85 | 21.42 | 89.70 | 88.35 | 78.46 | 67.85 | 78.57 | |
| 2.71 | 5.88 | 10.52 | 8.33 | 14.28 | 90.21 | 90.58 | 76.31 | 83.33 | 85.71 | |
Fig 5Ranking features obtained from different modalities with an ensemble algorithm.
Model’s performance for depressed user identification in Twitter using different data modalities.
| Model# | Data Source | Ref. | Year | Features | Model | Spec. | Sens. | F-1 | Acc. | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| N-grams | LIWC | Sentiment | Topics | Metadata | |||||||||
| I | Content | [ | 2016 | X | NB | 0.69 | 0.70 | 0.69 | 0.70 | ||||
| II | [ | 2016 | X | X | User Acti. | N/A (LR) | 0.73 | 0.74 | 0.73 | 0.74 | |||
| III | [ | 2015 | X | X | X | User Acti. | Log-linear | 0.83 | 0.80 | 0.81 | 0.82 | ||
| IV | [ | 2015 | X | X | X | X | LR | 0.84 | 0.83 | 0.84 | 0.84 | ||
| V | [ | 2015 | X | X | X | X | User Acti. | SVM | 0.86 | 0.84 | 0.85 | 0.85 | |
| VI | N/A | N/A | X | SVM(Pre. embed.) | 0.72 | 0.72 | 0.72 | 0.72 | |||||
| VII | N/A | N/A | X | SVM(Train w2vec) | 0.70 | 0.70 | 0.70 | 0.70 | |||||
| VIII | Cont., Net. | [ | 2013 | X | X | X | SVM, PCA | 0.84 | 0.80 | 0.83 | 0.85 | ||
| IX | Image | N/A | N/A | N/A | LR | 0.68 | 0.67 | 0.67 | 0.68 | ||||
| X | N/A | N/A | SVM | 0.69 | 0.67 | 0.67 | 0.69 | ||||||
| XI | N/A | N/A | RF | 0.72 | 0.70 | 0.69 | 0.71 | ||||||
| Cont.,Image,Net. | N/A | X | X | X | X | X | X | N/A | |||||
Fig 6The explanation of the log-odds prediction of outcome (0.31) for a sample user (y-axis shows the outcome probability (depressed or control), the bar labels indicate the log-odds impact of each feature).
Fig 7Word usage difference of likely vulnerable individuals versus random profiles.