| Literature DB >> 25331122 |
Ruchit Nagar1, Qingyu Yuan, Clark C Freifeld, Mauricio Santillana, Aaron Nojima, Rumi Chunara, John S Brownstein.
Abstract
BACKGROUND: Twitter has shown some usefulness in predicting influenza cases on a weekly basis in multiple countries and on different geographic scales. Recently, Broniatowski and colleagues suggested Twitter's relevance at the city-level for New York City. Here, we look to dive deeper into the case of New York City by analyzing daily Twitter data from temporal and spatiotemporal perspectives. Also, through manual coding of all tweets, we look to gain qualitative insights that can help direct future automated searches.Entities:
Keywords: Google Flu Trends; New York City; Twitter; influenza; infodemiology; mHealth; medical informatics; social media, natural language processing; spatiotemporal
Mesh:
Year: 2014 PMID: 25331122 PMCID: PMC4259880 DOI: 10.2196/jmir.3416
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Quality of the classified tweets and search query data.
| Tweet groupa | Percentage of tweets | Time series | Pearson correlation | |
| Relevant |
| 0.907 | Infection | .763 |
| Self |
| 0.689 | RISH | .689 |
| Infection |
| 0.628 | Relevant | .687 |
| High |
| 0.497 | GSQ | .683 |
| Awareness |
| 0.279 | Self | .677 |
| Medium |
| 0.223 | Medium | .668 |
| Other |
| 0.219 | Other | .666 |
| Low |
| 0.188 | High | .665 |
| Irrelevant |
| 0.082 | RISM | .655 |
|
|
|
| RIOH | .616 |
|
|
|
| RAOM | .587 |
|
| RISH | 0.399 | Awareness | .549 |
|
| RASL | 0.107 | RASM | .545 |
|
| RISM | 0.100 | RISL | .542 |
|
| RAOM | 0.058 | RIOM | .511 |
|
| RIOH | 0.054 | Low | .451 |
|
| RISL | 0.041 | RAOH | .411 |
|
| RAOH | 0.040 | RASL | .351 |
|
| RASM | 0.037 | RASH | .322 |
|
| RAOL | 0.032 | RAOL | .277 |
|
| RIOM | 0.027 | RIOL | .254 |
|
| RIOL | 0.007 | Irrelevant | .213 |
|
| RASH | 0.005 |
|
|
aRelevant (R), Awareness (A), Infection (I), Self (S), Other (O), High (H), Medium (M), Low (L).
Figure 1Time series comparisons between Tweet categories and ILI-ED visits.
Figure 2Comparison of Infection tweets and Awareness-based data.
Augmented Dickey-Fuller (ADF) test of ILIa, Twitter, and Google search query data.
|
| ILI | Twitter infection | Google search query | ||||
|
| probability |
| probability |
| probability | ||
| ADF test |
| −1.902 | 0.331 | −2.569 | 0.101 | −2.844 | 0.054 |
|
| |||||||
|
| 1% level | −3.462 | −3.463 |
| −3.463 | ||
|
| 5% level | −2.876 | −2.876 |
| −2.876 | ||
|
| 10% level | −2.574 | −2.574 |
| −2.574 | ||
|
|
| Non-stationary | Non-stationary | Non-stationary | |||
aILI: influenza-like illness
bDegrees of freedom=203
Augmented Dickey-Fuller (ADF) test of ILIa, Twitter, and Google search query data with first order lag.
|
|
| ΔILIb | ΔTwitter infectionb | ΔGoogle search queryb | |||
|
|
|
| probability |
| probability |
| probability |
| ADF test |
| −12.544 | 0.000 | −19.358 | 0.000 | −6.920 | 0.000 |
|
| |||||||
|
| 1% level | −3.463 | −3.463 | −3.463 | |||
|
| 5% level | −2.876 | −2.876 | −2.876 | |||
|
| 10% level | −2.574 | −2.574 | −2.574 | |||
|
|
| Stationary | Stationary | Stationary | |||
aILI: influenza-like illness
bΔ=first order lag
cDegrees of freedom=202
Results of model (1).
| Variable | Coefficient | Standard error |
| Probability |
| Infection(−1) | −2.174 | 1.016 | −2.140 | 0.036 |
| ILIa(−14) | 0.224 | 0.142 | 1.576 | 0.120 |
| ARb(1) | 1.007 | 0.016 | 61.676 | 0.000 |
aILI: influenza-like illness
bAR: auto-regressive
cDegrees of freedom=188
Results of model (3).
| Variable | Coefficient | Standard error |
| Probability |
| GSQa(−3) | 0.069 | 0.031 | 2.218 | 0.030 |
| ILIb(−14) | 0.212 | 0.147 | 1.444 | 0.154 |
| ARc(1) | 0.690 | 0.125 | 5.515 | 0.000 |
| AR(2) | 0.315 | 0.127 | 2.476 | 0.016 |
aGSQ: Google Trends search query
bILI: influenza-like illness
cAR: auto-regressive
dDegrees of freedom=188
MAPEa scores for Infection tweet and GSQb models for ILIc predictions.
| | Twitter model | GSQ models | ||
| Durbin-Watson statistic | MAPE (static) | Durbin-Watson statistic | MAPE (static) | |
| 1/06-1/12 | 2.00 | 4.7 | 2.04 | 5.5 |
| 1/13-1/19 | 2.11 | 6.9 | 2.13 | 15.8 |
| 1/20-1/26 | 2.16 | 11.8 | 2.16 | 12.4 |
| 1/27-2/02 | 2.07 | 10.4 | 2.04 | 11.3 |
| 2/03-2/09 | 2.09 | 8.2 | 2.06 | 7.9 |
| 2/10-2/16 | 2.08 | 14.8 | 2.05 | 15.2 |
| 2/17-2/23 | 2.08 | 15.3 | 2.05 | 14.5 |
aMAPE: mean absolute percent error
bGSQ: Google Trends search query
cILI: influenza-like illness
Figure 3Predicted ILI-ED visits in red using the Infection tweets model (Model 2).
Figure 4Right: Retrospective primary space-time cluster (p < .001) for high risk of tweeting flu infection-based Content, determined by a Poisson Model with cases as High Probability Flu Tweets and controls as Medium Probability Flu Tweets, aggregated by week, and with content-specific covariate weight in NYC during 10/15/2012-5/10/2013. Top Left: Epicenter located at (40.685, -79.983) with 0.48 mile radius including places of mass gathering such as Barclays Center and Atlantic Avenue Terminal. Bottom Left: Prospective approach to modeling weekly changes in local Infection-tweet spread.