| Literature DB >> 33273468 |
Michael L Birnbaum1,2,3, Raquel Norel4, Anna Van Meter5,6,7, Asra F Ali5,6, Elizabeth Arenare5,6, Elif Eyigoz4, Carla Agurto4, Nicole Germano5,6, John M Kane5,6,7, Guillermo A Cecchi4.
Abstract
Prior research has identified associations between social media activity and psychiatric diagnoses; however, diagnoses are rarely clinically confirmed. Toward the goal of applying novel approaches to improve outcomes, research using real patient data is necessary. We collected 3,404,959 Facebook messages and 142,390 images across 223 participants (mean age = 23.7; 41.7% male) with schizophrenia spectrum disorders (SSD), mood disorders (MD), and healthy volunteers (HV). We analyzed features uploaded up to 18 months before the first hospitalization using machine learning and built classifiers that distinguished SSD and MD from HV, and SSD from MD. Classification achieved AUC of 0.77 (HV vs. MD), 0.76 (HV vs. SSD), and 0.72 (SSD vs. MD). SSD used more (P < 0.01) perception words (hear, see, feel) than MD or HV. SSD and MD used more (P < 0.01) swear words compared to HV. SSD were more likely to express negative emotions compared to HV (P < 0.01). MD used more words related to biological processes (blood/pain) compared to HV (P < 0.01). The height and width of photos posted by SSD and MD were smaller (P < 0.01) than HV. MD photos contained more blues and less yellows (P < 0.01). Closer to hospitalization, use of punctuation increased (SSD vs HV), use of negative emotion words increased (MD vs. HV), and use of swear words increased (P < 0.01) for SSD and MD compared to HV. Machine-learning algorithms are capable of differentiating SSD and MD using Facebook activity alone over a year in advance of hospitalization. Integrating Facebook data with clinical information could one day serve to inform clinical decision-making.Entities:
Year: 2020 PMID: 33273468 PMCID: PMC7713057 DOI: 10.1038/s41537-020-00125-0
Source DB: PubMed Journal: NPJ Schizophr ISSN: 2334-265X
Participant demographics.
| SSD | MD | HV | Full sample | |
|---|---|---|---|---|
| 79 | 74 | 70 | 223 | |
| Mean (SD) | ||||
| Age | 23.9 (4.17) | 22.01 (3.72) | 25.34 (5.53) | 23.72 (4.69) |
| Male | 53 (67.09) | 20 (27.03) | 20 (28.57) | 93 (41.70) |
| African American/Black | 39 (49.36) | 16 (21.62) | 10 (14.28) | 65 (29.14) |
| Asian | 12 (15.19) | 14 (18.92) | 13 (18.57) | 39 (17.48) |
| Caucasian | 19 (24.05) | 31 (41.89) | 44 (62.86) | 94 (42.15) |
| Mixed race/other | 9 (11.39) | 13 (17.56) | 3 (4.28) | 25 (11.21) |
| Hispanic | 12 (15.19) | 12 (16.22) | 2 (2.86) | 26 (11.65) |
| Schizophrenia | 39 (49.37) | – | – | 39 (49.37) |
| Schizophreniform | 15 (18.99) | – | – | 15 (18.99) |
| Schizoaffective | 14 (17.72) | – | – | 14 (17.72) |
| Unspecified SSD | 11 (13.92) | – | – | 11 (13.92) |
| Bipolar disorder (manic episode) | – | 8 (10.81) | – | 8 (10.81) |
| Bipolar disorder (depressed episode) | – | 2 (2.70) | – | 2 (2.70) |
| Bipolar disorder (mixed episode) | – | 2 (2.70) | – | 2 (2.70) |
| Major depressive disorder | – | 62 (83.78) | – | 62 (83.78) |
Number of participants with Facebook messenger and image data per trimester per group.
| Group | Trimester | Number of subjects with messages | Number of subjects with images | Number of average messages (s.d.) | Number of average word count (s.d.) | Number of average images (s.d.) |
|---|---|---|---|---|---|---|
| HV | 1T | 66 | 32 | 489 (1042) | 3546 (6379) | 22 (56) |
| 2T | 69 | 24 | 566 (1231) | 4068 (7967) | 9 (11) | |
| 3T | 67 | 25 | 517 (1238) | 3384 (7293) | 80 (282) | |
| 4T | 64 | 26 | 609 (1379) | 4151 (8932) | 98 (368) | |
| 5T | 62 | 27 | 601 (1222) | 3993 (7679) | 99 (377) | |
| 6T | 58 | 26 | 672 (1183) | 4504 (6979) | 43 (73) | |
| MD | 1T | 56 | 39 | 680 (953) | 4895 (7609) | 33 (81) |
| 2T | 63 | 37 | 653 (1139) | 4201 (6125) | 25 (83) | |
| 3T | 62 | 40 | 752 (1260) | 4497 (7050) | 16 (33) | |
| 4T | 57 | 37 | 737 (1220) | 4509 (6109) | 37 (89) | |
| 5T | 56 | 38 | 663 (973) | 4559 (6036) | 27 (77) | |
| 6T | 52 | 34 | 725 (1082) | 4090 (5734) | 72 (226) | |
| SSD | 1T | 59 | 35 | 865 (2117) | 4796 (9182) | 3 (2) |
| 2T | 57 | 30 | 690 (1747) | 3808 (7559) | 17 (60) | |
| 3T | 55 | 31 | 766 (2245) | 4009 (8599) | 63 (314) | |
| 4T | 56 | 26 | 601 (1134) | 3926 (6965) | 11 (39) | |
| 5T | 57 | 32 | 569 (934) | 4055 (5895) | 10 (21) | |
| 6T | 53 | 33 | 552 (836) | 3874 (6081) | 12 (30) |
Fig. 1Classification performance for binary classification case, using fivefold (by the participant) cross-validation.
a shows data separation in train and test set. b Bars show mean of 50 runs, vertical lines denote standard deviation. Light cyan bars show results using all features, cyan bars show results when using only linguistic features.
Fig. 2Classification performance for binary classification for each trimester.
a shows data separation in train and test set. b Bars show mean of 50 runs, vertical lines denote standard deviation. Top of panel b show results for HV vs MD classification; the middle bars show results for HV vs. SDD classification; bottom bars show results for schizophrenia spectrum disorders (SSD) vs. MD.
Performance metrics for the three-way classification.
| Group | Accuracy | F1 | Chance |
|---|---|---|---|
| SSD | 0.52 | 0.53 | 0.33 |
| MD | 0.57 | 0.57 | 0.37 |
| HV | 0.56 | 0.54 | 0.29 |
Fig. 3Boxplots showing significantly different feature distributions from images posted to Facebook.
The box represents data from 25th quartile to 75th quartile; quartile 50th (median) is indicated with a horizontal line; the mean (average) is represented with a square in the box.
Fig. 4Statistically significant differences in linguistic features among groups depicted in a conceptual grouping.
Boxplots show distributions for the three classes for a representative case of each “cloud”.
Fig. 5Boxplots showing statistically significant different feature distributions.
The box represents data from 25th quartile to 75th quartile; quartile 50th (median) is indicated with a horizontal line; the mean (average) is represented with a square in the box.
Kolmogorov–Smirnov (KS) test results, significance, and effect sizes for features that are statistically significant in at least one of the three comparisons.
| HV vs. MD | HV vs. SSD | MD vs. SSD | |||||||
|---|---|---|---|---|---|---|---|---|---|
| LIWC feature | KS | Cohen’s d | KS | Cohen’s d | KS | Cohen’s d | |||
| Total pronouns | 0.10 | 0.03 | 0.08 | 1.77E-01 | 0.07 | ||||
| Personal pronouns | 0.23 | 4.47E-09 | 0.16 | 0.15 | 0.05 | 6.39E-01 | 0.00 | ||
| First person singular | 0.26 | 0.17 | 1.32E-05 | 0.17 | 0.08 | 1.61E-01 | 0.09 | ||
| Second person | 0.09 | 9.55E-02 | 0.04 | 0.28 | 0.20 | 3.39E-07 | 0.31 | ||
| Negation | 0.53 | 0.23 | 0.14 | 7.20E-04 | 0.27 | ||||
| Adjectives | 0.09 | 1.03E-01 | 0.04 | 0.22 | 0.16 | 1.41E-04 | 0.17 | ||
| Interrogatives | 0.20 | 7.25E-07 | 0.16 | 0.32 | 0.17 | 3.74E-05 | 0.17 | ||
| Numerals | 0.53 | 0.15 | 2.30E-04 | 0.08 | 0.11 | 1.68E-02 | 0.27 | ||
| Negative emotions | 0.41 | 0.14 | 8.70E-04 | 0.09 | 0.15 | 2.03E-04 | 0.27 | ||
| Anger | 0.62 | 0.28 | 0.13 | 4.21E-03 | 0.36 | ||||
| Family | 0.17 | 2.70E-05 | 0.05 | 0.15 | 0.12 | 7.53E-03 | 0.25 | ||
| Perception | 0.07 | 2.27E-01 | 0.00 | 0.08 | 0.05 | ||||
| Biological process | 0.29 | 0.10 | 3.89E-02 | 0.11 | 0.18 | 4.71E-06 | 0.42 | ||
| Sexual | 0.54 | 0.32 | 0.13 | 3.75E-03 | 0.14 | ||||
| Informal | 0.23 | 2.53E-09 | 0.28 | 0.42 | 0.14 | 8.75E-04 | 0.15 | ||
| Swearing | 0.58 | 0.33 | 0.12 | 6.95E-03 | 0.25 | ||||
Cases that pass FDR correction (alpha = 0.01) are shown in bold.
Fig. 6Statistically significant differences among groups as a function of time.
Kolmogorov–Smirnov (KS) score demonstrating the trend in differences among groups as a function of time (distance to hospitalization date). Each panel represents a single feature.