| Literature DB >> 27455108 |
Chris Allen1, Ming-Hsiang Tsou1, Anoshe Aslam2, Anna Nagel2, Jean-Mark Gawron3.
Abstract
Traditional methods for monitoring influenza are haphazard and lack fine-grained details regarding the spatial and temporal dynamics of outbreaks. Twitter gives researchers and public health officials an opportunity to examine the spread of influenza in real-time and at multiple geographical scales. In this paper, we introduce an improved framework for monitoring influenza outbreaks using the social media platform Twitter. Relying upon techniques from geographic information science (GIS) and data mining, Twitter messages were collected, filtered, and analyzed for the thirty most populated cities in the United States during the 2013-2014 flu season. The results of this procedure are compared with national, regional, and local flu outbreak reports, revealing a statistically significant correlation between the two data sources. The main contribution of this paper is to introduce a comprehensive data mining process that enhances previous attempts to accurately identify tweets related to influenza. Additionally, geographical information systems allow us to target, filter, and normalize Twitter messages.Entities:
Mesh:
Year: 2016 PMID: 27455108 PMCID: PMC4959719 DOI: 10.1371/journal.pone.0157734
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of correlations between tweet rates and ILI rates (local, regional, and national).
| CITY | CORR. WITH LOCAL EMERGENCY ILI | CORR. WITH LOCAL SENTINEL ILI | CORR. WITH REGIONAL ILI | CORR. WITH NATIONAL ILI |
|---|---|---|---|---|
| Atlanta | NA | NA | 0.657 | 0.679 |
| Austin | NA | NA | 0.919 | 0.830 |
| Baltimore | NA | NA | 0.031 | -0.116 |
| Boston | 0.804 | 0.105 | 0.395 | 0.433 |
| Chicago | 0.804 | 0.636 | 0.771 | 0.782 |
| Cleveland | 0.784 | 0.605 | 0.819 | 0.822 |
| Columbus | 0.877 | -0.235 | 0.771 | 0.776 |
| Dallas | NA | NA | 0.702 | 0.797 |
| Denver | NA | 0.690 | 0.599 | 0.589 |
| Detroit | NA | 0.757 | 0.846 | 0.878 |
| El Paso | NA | NA | 0.422 | 0.563 |
| Fort Worth | NA | 0.855 | 0.659 | 0.734 |
| Houston | NA | NA | 0.845 | 0.663 |
| Indianapolis | NA | NA | 0.750 | 0.777 |
| Jacksonville | NA | NA | 0.787 | 0.778 |
| Los Angeles | NA | NA | 0.793 | 0.690 |
| Memphis | NA | NA | 0.850 | 0.854 |
| Milwaukee | NA | NA | 0.761 | 0.779 |
| Nashville | NA | 0.827 | 0.869 | 0.875 |
| New Orleans | NA | NA | 0.858 | 0.886 |
| New York | NA | 0.555 | 0.630 | 0.639 |
| Oklahoma City | NA | NA | 0.463 | 0.658 |
| Philadelphia | NA | NA | 0.718 | 0.624 |
| Phoenix | NA | NA | 0.820 | 0.685 |
| Portland | NA | NA | 0.837 | 0.725 |
| San Antonio | NA | NA | 0.824 | 0.809 |
| San Diego | 0.916 | 0.693 | 0.750 | 0.626 |
| San Francisco | NA | NA | 0.707 | 0.616 |
| San Jose | NA | NA | 0.715 | 0.653 |
| Seattle | NA | 0.830 | 0.807 | 0.665 |
| Washington DC | NA | NA | 0.756 | 0.578 |
* Emergency ILI reports were incomplete and thus the correlation only compares tweet rates with available ILI data. Boston is missing weeks 36–46, 48, 50, and 6–10. Chicago is missing weeks 36–40 and 6–10. Cleveland is missing 36–39 and 4–10. Columbus is missing weeks 36–39 and 6–10. San Diego is missing week 6–10.
Fig 1National ILI compared to the aggregated tweet rates for all study cities.
Fig 2Local sentinel-provided ILI compared to the tweet rate for Fort Worth (a), Nashville (b), Cleveland (c), and Boston (d).
Correlation coefficients aggregated by region.
Regional ILI data provided by CDC.
| REGION | CORRELATION WITH REGIONAL ILI |
|---|---|
| Region 1 (Boston) | 0.445283886 |
| Region 2 (New York) | 0.643321552 |
| Region 3 (Baltimore, Philadelphia, Washington DC) | 0.503859481 |
| Region 4 (Atlanta, Jacksonville, Memphis, Nashville) | 0.899332773 |
| Region 5 (Chicago, Columbus, Cleveland, Detroit, Indianapolis, Milwaukee) | 0.903099689 |
| Region 6 (Austin, Dallas, El Paso, Fort Worth, Houston, Oklahoma City, New Orleans, San Antonio) | 0.891701735 |
| Region 7 (No data) | NA |
| Region 8 (Denver) | 0.541016527 |
| Region 9 (Los Angeles, Phoenix, San Diego, San Francisco, San Jose) | 0.887259347 |
| Region 10 (Portland, Seattle) | 0.927950078 |
Fig 3Map showing the correlation rank for each region.