| Literature DB >> 29180792 |
Quynh C Nguyen1, Kimberly D Brunisholz2, Weijun Yu3, Matt McCullough4, Heidi A Hanson5,6, Michelle L Litchman7, Feifei Li8, Yuan Wan6, James A VanDerslice9, Ming Wen10, Ken R Smith11.
Abstract
Neighborhood characteristics are increasingly connected with health outcomes. Social processes affect health through the maintenance of social norms, stimulation of new interests, and dispersal of knowledge. We created zip code level indicators of happiness, food, and physical activity culture from geolocated Twitter data to examine the relationship between these neighborhood characteristics and obesity and diabetes diagnoses (Type 1 and Type 2). We collected 422,094 tweets sent from Utah between April 2015 and March 2016. We leveraged administrative and clinical records on 1.86 million individuals aged 20 years and older in Utah in 2015. Individuals living in zip codes with the greatest percentage of happy and physically-active tweets had lower obesity prevalence-accounting for individual age, sex, nonwhite race, Hispanic ethnicity, education, and marital status, as well as zip code population characteristics. More happy tweets and lower caloric density of food tweets in a zip code were associated with lower individual prevalence of diabetes. Results were robust in sibling random effects models that account for family background characteristics shared between siblings. Findings suggest the possible influence of sociocultural factors on individual health. The study demonstrates the utility and cost-effectiveness of utilizing existing big data sources to conduct population health studies.Entities:
Mesh:
Year: 2017 PMID: 29180792 PMCID: PMC5703998 DOI: 10.1038/s41598-017-16573-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic diagram of data collection and processing of Twitter data. Twitter data was collected using Twitter’s Streaming API. Each tweet was processed to extract its sentiment as well as its food and physical activity mentions. The location information of where the tweet was sent was used to assign the tweet to its corresponding zip code. We created neighborhood indicators by averaging for each zip code—the sentiment of tweets, the frequency and type of food tweets, and the frequency of physical activity tweets. Blue arrow point to steps in processing individual tweets while red arrows point to steps in processing aggregated tweets at regional levels.
Descriptive characteristics for neighborhood and individual characteristics.
| N | Mean (SD) | |
|---|---|---|
|
| ||
| % of tweets that are happy | 216 | 22.06 (11.13) |
| % of tweets about physical activity | 183 | 3.06 (3.18) |
| Caloric density of food tweets | 216 | 231.89 (94.94) |
|
| ||
| Age (years) | 1,968,451 | 46.23 (17.37) |
| % Female | 1,964,485 | 49.55 (50.00) |
| % Married | 1,786,137 | 61.82 (48.58) |
| % Nonwhite | 1,832,286 | 4.73 (21.23) |
| % White | 1,832,286 | 95.27 (21.23) |
| % Hispanic ethnicity | 1,703,412 | 11.05 (31.35) |
| % Less than high school | 1,208,101 | 9.11 (28.77) |
| % High school | 1,208,101 | 30.00 (45.83) |
| % Some college | 1,208,101 | 31.90 (46.61) |
| % College or greater | 1,208,101 | 28.99 (45.37) |
| % Obese | 1,952,993 | 28.75 (45.26) |
| % Diabetes | 1,968,451 | 5.22 (22.24) |
| Body mass index (kg/m2) | 1,952,993 | 27.67 (6.52) |
| Fasting glucose (mg/dL) | 137,589 | 94.91 (32.15) |
| Hemoglobin A1c (%) | 382,100 | 6.06 (2.17) |
Data sources: 422,904 geolocated tweets from Utah aggregated to the zip code level; Utah Population Database, University of Utah Health Science Center Data Warehouse; Intermountain Healthcare Data Warehouse.
Figure 2Geographic distribution of Twitter-derived zip code characteristics, Utah. Zip code summaries of caloric density of tweets, percent of tweets about physical activity, and percent of tweets that are happy. Choropleth maps were created using ArcGIS Desktop Version 10.5 (ESRI, Redlands CA, http://www.esri.com/arcgis/about-arcgis) and the 2016 U.S. Census TIGER/Line Shapefiles (https://www.census.gov/geo/maps-data/data/tiger-line.html).
Twitter-derived predictors of adult obesity and diabetes (Type 1 and Type 2)a.
| Log Poisson Regression for dichotomous outcomes | Linear regression for continuous outcomes | ||||
|---|---|---|---|---|---|
| Obese | Diabetes | Body Mass Index (kg/m2) | Fasting glucose (mg/dL) | HbA1c (%) | |
|
| Prevalence Ratio (95% CI)b | Prevalence Ratio (95% CI)b | Beta (95% CI)b | Beta (95% CI)b | Beta (95% CI)b |
| Happy tweets | |||||
| 3rd tertile (highest) | 0.88 (0.81, 0.96) | 0.91 (0.83, 0.99) | −0.57 (−0.94, −0.21) | −1.44 (−2.43, −0.44) | −0.01 (−0.08, 0.07) |
| 2nd tertile | 0.94 (0.90, 0.98) | 0.97 (0.95, 0.99) | −0.34 (−0.56, −0.12) | −0.86 (−1.44, −0.28) | 0.01 (−0.09, 0.11) |
| Physical activity tweets | |||||
| 3rd tertile (highest) | 0.91 (0.85, 0.97) | 0.96 (0.88, 1.05) | −0.39 (−0.71, −0.07) | −0.87 (−2.30, 0.56) | −0.08 (−0.19, 0.03) |
| 2nd tertile | 0.96 (0.93, 1.00) | 1.00 (0.95, 1.05) | −0.16 (−0.33, 0.01) | −0.70 (−1.18, −0.21) | 0.02 (−0.06, 0.10) |
| Caloric density of food tweets | |||||
| 3rd tertile (highest) | 1.03 (0.98, 1.08) | 1.08 (1.01, 1.16) | 0.13 (−0.07, 0.34) | 0.80 (0.10, 1.49) | 0.02 (−0.04, 0.08) |
| 2nd tertile | 1.04 (1.00, 1.07) | 1.13 (1.06, 1.20) | 0.17 (0.03, 0.31) | 0.76 (0.11, 1.41) | 0.02 (−0.02, 0.06) |
| N | 1,855,768 | 1,866,509 | 1,855,768 | 131,015 | 362,035 |
aData source for health outcome: Utah Population Database and Intermountain Healthcare Enterprise Data Warehouse on Utah adults 20 years and older. bAdjusted regression models were run for each outcome separately. For dichotomous outcomes such as obesity and diabetes (0 = no; 1 = yes), log Poisson models were utilized. For continuous variables like body mass index, linear regression was used. Models controlled for age, sex, nonwhite race, Hispanic ethnicity, education, marital status as well as the following zip code area characteristics: population density, percent of the population 65 years and older, percent Hispanic, percent black, and median household income. Indicator variables were created for missing data on covariates. Twitter-derived characteristics were categorized into tertiles, with the lowest tertile serving as the referent group. Standard errors adjusted for clustering of values at the county level.
Sibling random effects model results: Twitter predictors of individual health outcomes.
| Log Poisson Regression | Linear Regression | ||
|---|---|---|---|
| Obese | Diabetes | Body Mass Index (kg/m2) | |
|
| Prevalence Ratio (95% CI)b | Prevalence Ratio (95% CI)b | Beta (95% CI)b |
| Happy tweets | |||
| 3rd tertile (highest) | 0.92 (0.91, 0.93) | 0.93 (0.90, 0.95) | −0.40 (−0.44, −0.36) |
| 2nd tertile | 0.96 (0.95, 0.96) | 0.98 (0.96, 1.00) | −0.24 (−0.27, −0.20) |
| Physical activity tweets | |||
| 3rd tertile (highest) | 0.93 (0.92, 0.94) | 0.99 (0.96, 1.02) | −0.29 (−0.34, −0.25) |
| 2nd tertile | 0.96 (0.95, 0.97) | 1.00 (0.98, 1.03) | −0.15 (−0.19, −0.12) |
| Caloric density of food tweets | |||
| 3rd tertile (highest) | 1.02 (1.01, 1.03) | 1.06 (1.03, 1.08) | 0.07 (0.04, 0.11) |
| 2nd tertile | 1.03 (1.02, 1.04) | 1.10 (1.07, 1.13) | 0.16 (0.13, 0.19) |
| N | 944,309 | 946,324 | 944,309 |
aData source for health outcome: Utah Population Database and Intermountain Healthcare Enterprise Data Warehouse on Utah adults 20 years and older. bAdjusted regression models were run for each outcome separately. For dichotomous outcomes such as obesity and diabetes (0 = no; 1 = yes), log Poisson models were utilized. For continuous variables like body mass index, linear regression was used. Models controlled for age, sex, nonwhite race, Hispanic ethnicity, education, and marital status as well as the following zip code area characteristics: population density, percent of the population 65 years and older, percent Hispanic, percent black, and median household income. Twitter-derived characteristics were categorized into tertiles, with the lowest tertile serving as the referent group.