| Literature DB >> 31295294 |
Joseph Gibbons1, Robert Malouf2, Brian Spitzberg3, Lourdes Martinez3, Bruce Appleyard4, Caroline Thompson5, Atsushi Nara6, Ming-Hsiang Tsou6.
Abstract
Several studies have recently applied sentiment-based lexicons to Twitter to gauge local sentiment to understand health behaviors and outcomes for local areas. While this research has demonstrated the vast potential of this approach, lingering questions remain regarding the validity of Twitter mining and surveillance in local health research. First, how well does this approach predict health outcomes at very local scales, such as neighborhoods? Second, how robust are the findings garnered from sentiment signals when accounting for spatial effects? To evaluate these questions, we link 2,076,025 tweets from 66,219 distinct users in the city of San Diego over the period of 2014-12-06 to 2017-05-24 to the 500 Cities Project data and 2010-2014 American Community Survey data. We determine how well sentiment predicts self-rated mental health, sleep quality, and heart disease at a census tract level, controlling for neighborhood characteristics and spatial autocorrelation. We find that sentiment is related to some outcomes on its own, but these relationships are not present when controlling for other neighborhood factors. Evaluating our encoding strategy more closely, we discuss the limitations of existing measures of neighborhood sentiment, calling for more attention to how race/ethnicity and socio-economic status play into inferences drawn from such measures.Entities:
Mesh:
Year: 2019 PMID: 31295294 PMCID: PMC6622529 DOI: 10.1371/journal.pone.0219550
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Number of tweets collected by census tract.
Descriptive statistics.
| Variable | N | Mean | St. Dev. | Min | Max |
|---|---|---|---|---|---|
| Self-Rated Mental Health | 281 | 10.694 | 3.750 | 0.000 | 20.600 |
| Poor Sleep | 281 | 33.139 | 7.940 | 0.000 | 44.200 |
| Chronic Heart Disease | 281 | 4.540 | 1.829 | 0.000 | 13.500 |
| 281 | 5.985 | 0.100 | 5.566 | 6.262 | |
| VADER | 281 | 0.000 | 1.000 | -2.140 | 3.640 |
| Insurance | 281 | 0.848 | 0.123 | 0.000 | 1.000 |
| Proportion Over 40 | 281 | 0.294 | 0.125 | 0.000 | 0.952 |
| Socio-economic Status | 281 | 0.605 | 2.287 | -4.359 | 5.667 |
| Proportion Nonwhite | 281 | 53.585 | 26.557 | 1.942 | 95.472 |
| Automobile Access | 281 | 566,819.200 | 232,542.900 | 21,912.440 | 1,269,017.000 |
| Rail Access | 281 | 22,828.960 | 24,012.750 | 0.000 | 170,026.200 |
| Intersection Density | 281 | 273.018 | 169.934 | 2.391 | 1,398.743 |
Fig 2havg by census tract.
Fig 3Exploratory spatial data analysis.
Multiple regression results for health outcomes—Hedonometer.
| Self-Rated Mental Health | Poor Sleep | Chronic Heart Disease | |||||||
|---|---|---|---|---|---|---|---|---|---|
| (1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | (9) | |
| -1.294 | -0.421 | 0.093 | -2.118 | -0.931 | -0.060 | 0.094 | 0.110 | -0.025 | |
| (0.211) | (0.155) | (0.165) | (0.458) | (0.395) | (0.446) | (0.109) | (0.100) | (0.091) | |
| Insurance | -0.203 | 0.333 | -0.495 | ||||||
| (0.201) | (0.544) | (0.111) | |||||||
| Percent 50 and Over | -0.514 | -1.384 | 1.321 | ||||||
| (0.164) | (0.446) | (0.098) | |||||||
| SES | -1.215 | -0.053 | -0.387 | ||||||
| (0.294) | (0.759) | (0.158) | |||||||
| Percent Nonwhite | -0.244 | -2.023 | -0.019 | ||||||
| (0.230) | (0.635) | (0.124) | |||||||
| Auto Transit Access | 0.068 | 0.392 | 0.121 | ||||||
| (0.216) | (0.586) | (0.120) | |||||||
| Public Transit Access | 0.110 | 0.562 | -0.0004 | ||||||
| (0.219) | (0.589) | (0.120) | |||||||
| Intersection Density | -0.025 | 0.022 | -0.053 | ||||||
| (0.202) | (0.548) | (0.112) | |||||||
| Constant | 10.694 | 2.576 | 6.080 | 33.139 | 12.278 | 19.090 | 4.540 | 2.366 | 3.440 |
| (0.210) | (0.486) | (0.678) | (0.457) | (1.941) | (2.365) | (0.109) | (0.338) | (0.320) | |
| Observations | 281 | 281 | 281 | 281 | 281 | 281 | 281 | 281 | 281 |
| Log Likelihood | -682.847 | -640.769 | -939.967 | -920.524 | -550.137 | -470.290 | |||
Note:
*p<0.1;
**p<0.05;
***p<0.01; Predictors are Standardized
Multiple regression results for health outcomes—VADER.
| Self-Rated Mental Health | Poor Sleep | Chronic Heart Disease | |||||||
|---|---|---|---|---|---|---|---|---|---|
| (1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | (9) | |
| VADER | -0.892 | -0.285 | 0.102 | -1.411 | -0.558 | 0.158 | 0.137 | 0.147 | 0.029 |
| (0.224) | (0.158) | (0.157) | (0.479) | (0.404) | (0.425) | (0.112) | (0.103) | (0.087) | |
| Insurance | -0.205 | 0.341 | -0.492 | ||||||
| (0.201) | (0.543) | (0.111) | |||||||
| Percent 50 and Over | -0.513 | -1.414 | 1.313 | ||||||
| (0.164) | (0.444) | (0.097) | |||||||
| SES | -1.206 | -0.111 | -0.403 | ||||||
| (0.290) | (0.746) | (0.155) | |||||||
| Percent Nonwhite | -0.248 | -2.047 | -0.026 | ||||||
| (0.230) | (0.635) | (0.124) | |||||||
| Auto Transit Access | 0.068 | 0.388 | 0.120 | ||||||
| (0.216) | (0.585) | (0.119) | |||||||
| Public Transit Access | 0.109 | 0.528 | -0.009 | ||||||
| (0.218) | (0.589) | (0.120) | |||||||
| Intersection Density | -0.023 | 0.013 | -0.055 | ||||||
| (0.202) | (0.547) | (0.112) | |||||||
| Constant | 10.694 | 2.350 | 6.109 | 33.139 | 11.428 | 19.013 | 4.540 | 2.367 | 3.438 |
| (0.218) | (0.467) | (0.677) | (0.467) | (1.877) | (2.358) | (0.109) | (0.338) | (0.319) | |
| Observations | 281 | 281 | 281 | 281 | 281 | 281 | 281 | 281 | 281 |
| Log Likelihood | -684.661 | -640.718 | -941.605 | -920.464 | -549.719 | -470.272 | |||
Note:
*p<0.1;
**p<0.05;
***p<0.01; Predictors are Standardized
Fig 4Word shift graph.