| Literature DB >> 25406040 |
Anoshé A Aslam1, Ming-Hsiang Tsou, Brian H Spitzberg, Li An, J Mark Gawron, Dipak K Gupta, K Michael Peddecord, Anna C Nagel, Christopher Allen, Jiue-An Yang, Suzanne Lindsay.
Abstract
BACKGROUND: Existing influenza surveillance in the United States is focused on the collection of data from sentinel physicians and hospitals; however, the compilation and distribution of reports are usually delayed by up to 2 weeks. With the popularity of social media growing, the Internet is a source for syndromic surveillance due to the availability of large amounts of data. In this study, tweets, or posts of 140 characters or less, from the website Twitter were collected and analyzed for their potential as surveillance for seasonal influenza.Entities:
Keywords: Internet; Twitter; influenza; infodemiology; infoveillance; syndromic surveillance; tweets
Mesh:
Year: 2014 PMID: 25406040 PMCID: PMC4260066 DOI: 10.2196/jmir.3532
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Examples of valid and invalid tweets from the machine-learning classifier.
| Tweet text | Valid or Invalid |
| “I hate being sick with the flu” | Valid |
| “Not a good time to be hit by a flu” | Valid |
| “Been home sick with the flu the last 2 days” | Valid |
| “Getting my flu shot” | Invalid |
| “Now it’s my turn to have the stomach flu. Ugh” | Invalid |
| “Recipes for Foods That Fight The Flu [URL] | Invalid |
Correlations between tweets and sentinel-provided ILIa rates.b
|
| 1. | 2. | 3. | 4. | 5. | 6. | 7. | 8. |
|
|
|
|
|
|
|
|
|
|
| Boston | −.05 | −.19 | .08 | <.001 | .04 | −.13 | <.001 | 17,370 |
| Chicago | .33 | .50 | .04 | <.001 | .49e | .25 | <.001 | 21,655 |
| Cleveland | .63e | .74e | .42 | <.001 | .56e | .55e | .703 | 6632 |
| Columbus | .01 | .05 | −.06 | .019 | −.04 | .08 | .001 | 3206 |
| Denver | .76e | .64e | .74e | <.001 | .81e | .63e | <.001 | 5706 |
| Detroit | .81e | .84 | .44 | <.001 | .62e | .78e | <.001 | 8417 |
| Fort Worth | .69e | .73e | .45e | <.001 | .81e | .62e | <.001 | 4755 |
| Nashville-Davidson | .77e | .74e | .54e | <.001 | .70e | .66e | <.001 | 5805 |
| New York | .44e | .42 | .39 | <.001 | .32 | .44 | <.001 | 64,340 |
| San Diego | .78e | .73e | .41e | <.001 | .69e | .73e | <.001 | 8002 |
aILI: influenza-like illness
bCorrelation coefficients of all tweets and tweet categories with sentinel-provided ILI rates for each city. Comparisons between tweets and ILI began in Weeks 36-49 (weeks starting September 1, 2013 to starting November 24, 2013) as ILI data became available by city and ended in Week 9 (ending March 1, 2014).
cThis column displays the P values from Fisher’s z transformation comparing the correlation coefficients of non-retweets to retweets.
dThis column displays the P values from Fisher’s z transformation comparing the correlation coefficients of tweets without a URL to tweets with a URL.
eSignificant correlation coefficient (P<.05).
Correlations between valid tweets and sentinel-provided ILIa rates.b
|
| 1. | 2. | 3. | 4. | 5. | 6. | 7. |
| Boston | −.05 | 17,370 | .834 | .10 | 3813 | .67 | <.001 |
| Chicago | .33 | 21,655 | .139 | .64 | 5116 | .002 | <.001 |
| Cleveland | .63 | 7152 | .002 | .60 | 1497 | .003 | .064 |
| Columbus | .01 | 3288 | .978 | −.24 | 1034 | .274 | <.001 |
| Denver | .76 | 5706 | .003 | .69 | 1942 | .009 | <.001 |
| Detroit | .81 | 8417 | .001 | .76 | 2195 | <.001 | <.001 |
| Fort Worth | .69 | 4755 | .001 | .85 | 1236 | <.001 | <.001 |
| Nashville-Davidson | .77 | 5805 | .001 | .83 | 1630 | <.001 | <.001 |
| New York | .44 | 64,340 | .047 | .55 | 12632 | .01 | <.001 |
| San Diego | .78 | 8002 | .001 | .88 | 1808 | <.001 | <.001 |
aILI: influenza-like illness
bCorrelation coefficients between all tweets and valid tweets, as identified by the machine-learning classifier, with sentinel-provided ILI rates for each city. Comparisons between tweets and ILI began in Weeks 36-49 (weeks starting September 1, 2013 to starting November 24, 2013) as ILI data became available by city and ended in Week 9 (ending March 1, 2014).
Figure 1“Valid” Tweet rates per 100,000 versus sentinel-provided influenza-like illness rates by city, 2013-14 influenza season.
Correlations between tweet rates and emergency department ILIa rates by city.b
|
| 1. | 2. | 3. | 4. | 5. | 6. | 7. | 8. |
|
|
|
|
|
|
|
|
|
|
| Boston | .23 | .47 | −.004 | <.001 | .03 | .41 | <.001 | 17,370 |
| Chicago | .51e | .54e | .23 | <.001 | .59e | .45e | <.001 | 21,655 |
| Cleveland | .68e | .87e | .39 | <.001 | .62e | .58e | .005 | 7152 |
| Columbus | .62e | .54 | .61 | .018 | .62e | .47e | <.001 | 3288 |
| San Diego | .80e | .92e | .40e | <.001 | .88e | .79e | <.001 | 8002 |
| Seattle | .72e | .71e | .67e | .001 | .62e | .71e | <.001 | 9735 |
aILI: influenza-like illness
bCorrelation coefficients of all tweets and tweet categories with emergency department ILI rates for each city. Comparisons between tweets and ILI began in Weeks 40-41 (weeks starting September 29, 2013 to starting October 6, 2013) as ILI data became available by city and ended in Week 9 (ending March 1, 2014).
cThis column displays the P values from Fisher’s z transformation comparing the correlation coefficients of non-retweets to retweets.
dThis column displays the P values from Fisher’s z transformation comparing the correlation coefficients of tweets without a URL to tweets with a URL.
eSignificant correlation coefficient (P<.05).
Correlations between valid tweets and emergency department ILIa rates by city.b
|
| 1. | 2. | 3. | 4. | 5. | 6. | 7. |
|
|
|
|
|
|
|
|
|
| Boston | .23 | 17,370 | .411 | .61 | 3813 | .016 | <.001 |
| Chicago | .51 | 21,655 | .017 | .80 | 5116 | <.001 | <.001 |
| Cleveland | .68 | 7152 | <.001 | .75 | 1497 | <.001 | <.001 |
| Columbus | .62 | 3288 | .002 | .87 | 1034 | <.001 | <.001 |
| San Diego | .80 | 8002 | <.001 | .88 | 1808 | <.001 | <.001 |
| Seattle | .72 | 9735 | <.001 | .82 | 2941 | <.001 | <.001 |
aILI: influenza-like illness
bCorrelation coefficients between all tweets and valid tweets, as identified by the machine-learning classifier, with emergency department ILI rates for each city. Comparisons between tweets and ILI began in Weeks 40-41 (weeks starting September 29, 2013 to starting October 6, 2013) as ILI data became available by city and ended in Week 9 (ending March 1, 2014).
Figure 2“Valid” Tweet rates per 100,000 versus emergency department influenza-like illness rates by city, 2013-14 influenza season.
Correlations between tweets and number of laboratory-confirmed influenza cases in San Diego.a
| All tweets | All tweets | Non-retweet | Non-retweets | Retweets | Retweets | Tweets without a URL | Tweets without a URL | Tweets with a UR | Tweets with a URL | Valid tweets | Valid tweets |
|
|
|
|
|
|
|
|
|
|
|
|
|
| .88 | <.001 | .92 | <.001 | .40 | <.001 | .88 | <.001 | .79 | <.001 | .93 | <.001 |
aCorrelation coefficients for all tweets and all categories of tweets, including valid tweets with the number of laboratory-confirmed influenza cases in San Diego starting Week 40 (beginning October 6, 2013) through Week 9 (ending March 1, 2014).