| Literature DB >> 30258732 |
Nicholas J L Brown1, James C Coyne1.
Abstract
We comment on Eichstaedt et al.'s (2015a) claim to have shown that language patterns among Twitter users, aggregated at the level of US counties, predicted county-level mortality rates from atherosclerotic heart disease (AHD), with "negative" language being associated with higher rates of death from AHD and "positive" language associated with lower rates. First, we examine some of Eichstaedt et al.'s apparent assumptions about the nature of AHD, as well as some issues related to the secondary analysis of online data and to considering counties as communities. Next, using the data files supplied by Eichstaedt et al., we reproduce their regression- and correlation-based models, substituting mortality from an alternative cause of death-namely, suicide-as the outcome variable, and observe that the purported associations between "negative" and "positive" language and mortality are reversed when suicide is used as the outcome variable. We identify numerous other conceptual and methodological limitations that call into question the robustness and generalizability of Eichstaedt et al.'s claims, even when these are based on the results of their ridge regression/machine learning model. We conclude that there is no good evidence that analyzing Twitter data in bulk in this way can add anything useful to our ability to understand geographical variation in AHD mortality rates.Entities:
Keywords: Artifacts; Big data; Emotions; False positives; Heart disease; Language; Risk factors; Social media; Well-being
Year: 2018 PMID: 30258732 PMCID: PMC6152451 DOI: 10.7717/peerj.5656
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Correlations between self-harm and Twitter language measured by dictionaries.
| Language variable | 95% CI | ||
|---|---|---|---|
| Risk factors | |||
| Anger | −0.169 | <.001 | [−0.238, −0.099] |
| Negative relationships | −0.095 | .010 | [−0.166, −0.023] |
| Negative emotions | −0.102 | .005 | [−0.173, −0.030] |
| Disengagement | 0.008 | .831 | [−0.064, 0.080] |
| Anxiety | −0.045 | .219 | [−0.117, 0.027] |
| Protective factors | |||
| Positive relationships | −0.001 | .976 | [−0.073, 0.071] |
| Positive emotions | 0.059 | .110 | [−0.013, 0.131] |
| Positive engagement | −0.031 | .393 | [−0.103, 0.041] |
Notes.
Following Eichstaedt et al. (2015a), the word love was removed from the dictionary for this variable. See discussion in the text.
Figure 1Twitter topics highly correlated with age-adjusted mortality from self-harm, cf. Eichstaedt et al.’s (2015a).
Figure 1 (A–C) Topics positively correlated with county-level self-harm mortality: (A) Friends and family, r = .175. (B) Romantic love, r = .176. (C) Time spent with nature, r = .214. (D–F) Topics negatively correlated with county-level self-harm mortality: (D) Watching reality TV, r = − .200. (E) Binge drinking, r = − .249. (F) Baseball, r = − .317.
Partial correlations between atherosclerotic heart disease (AHD) mortality and Twitter language measured by dictionaries, across the northern and southern halves of the United States.
| North | South | |||||
|---|---|---|---|---|---|---|
| Language variable | Partial | 95% CI | Partial | 95% CI | ||
| Risk factors | ||||||
| Anger | 0.240 | <.001 | [0.168, 0.310] | −0.020 | .604 | [−0.095, 0.056] |
| Negative relationships | 0.156 | <.001 | [0.081, 0.229] | 0.060 | .121 | [−0.016, 0.135] |
| Negative emotions | 0.108 | .005 | [0.032, 0.182] | 0.028 | .462 | [−0.047, 0.104] |
| Disengagement | 0.166 | <.001 | [0.092, 0.239] | −0.012 | .750 | [−0.088, 0.063] |
| Anxiety | 0.017 | .654 | [−0.058, 0.093] | 0.104 | .007 | [0.028, 0.178] |
| Protective factors | ||||||
| Positive relationships | −0.032 | .411 | [−0.107, 0.044] | 0.111 | .004 | [0.035, 0.185] |
| Positive emotions | −0.166 | <.001 | [−0.238, −0.091] | 0.082 | .034 | [0.006, 0.156] |
| Positive engagement | −0.192 | <.001 | [−0.264, −0.119] | 0.041 | .288 | [−0.035, 0.116] |
Notes.
Partial r: partial correlation coefficients obtained from a regression predicting AHD from the Twitter theme represented by the language variable, with county-level education and income as control variables.
Following Eichstaedt et al. (2015a) the word love was removed from the dictionary for this variable. See discussion in the text.
Figure 2Difference between the two maps of AHD mortality rates (CDC-reported and Twitter-predicted) from Eichstaedt et al.’s (2015a) Figure 3.
Note: Green indicates a difference of 0–2 color-scale points (see discussion in the text) between the two maps; yellow, a difference of 3–5 points; red, a difference of 6 or more points. Of the 608 colored areas, 123 (20.2%) are red and 200 (32.9%) are yellow.