| Literature DB >> 22166368 |
Nigel Collier1, Nguyen Truong Son, Ngoc Mai Nguyen.
Abstract
BACKGROUND: Micro-blogging services such as Twitter offer the potential to crowdsource epidemics in real-time. However, Twitter posts ('tweets') are often ambiguous and reactive to media trends. In order to ground user messages in epidemic response we focused on tracking reports of self-protective behaviour such as avoiding public gatherings or increased sanitation as the basis for further risk analysis.Entities:
Year: 2011 PMID: 22166368 PMCID: PMC3239309 DOI: 10.1186/2041-1480-2-S5-S9
Source DB: PubMed Journal: J Biomed Semantics
Example messages
| n | Message | A | I | P | W | S |
|---|---|---|---|---|---|---|
| el | home this weekend? i’ve been off work all week with the flu | + | - | - | - | + |
| e2 | there is alot more to preparing for Swine Flu than just washing your hands | - | - | - | - | - |
| e3 | everyone wash your hands.. no one wants swine flu | - | + | - | - | - |
| e4 | awl u need to go get to the doc so u dnt past da swine flu | - | - | - | - | - |
| e5 | it’s 2:10pm, I have flu and I’m still wearing my pajama | - | - | - | - | + |
| e6 | I have the flu. I had a normal flu shot | - | - | + | - | + |
| e7 | This guy has a nasty cough! Thank god he’s sitting far away from me - the swine flu travels | + | - | - | - | - |
| e8 | I’m sick too… cold or flu, I don’t know… I couldn’t go to work today… | + | - | - | - | + |
| e9 | Trivia for tonight has been cancelled due to flu bug | + | - | - | - | - |
| e10 | Feel like I’ve washed my hands a 1000 times Gotta loveworkin during cold & flu season | - | + | - | - | - |
| ell | overhyped public scare. I want to remove this mask | - | - | - | + | - |
| e12 | i don’t know. she just keeps getting sick, but it’s not the flu. i hate keeping her off school | - | - | - | - | - |
| e13 | i feel terrible, don’t want to be at work, wish id never had the h1n1 jab | - | - | + | - | - |
| e14 | Some cleaning products were especially made to kill the H1N1 … | - | - | - | - | - |
| e15 | She has a surgical mask on in the movies I’m nervous hope it’s not h1n1 | - | - | - | - | - |
| e16 | regretting not getting a flu shot this year | - | - | - | - | - |
Positive (+) and negative (-) examples of classified messages.
Message frequency by class
| A | I | P | W | S | |
|---|---|---|---|---|---|
| Positive | 251 | 37 | 499 | 32 | 741 |
| Negative | 632 | 43 | 974 | 230 | 1873 |
| Total | 883 | 80 | 1443 | 262 | 2614 |
| Mean length | 109.2 | 118.8 | 107.0 | 117.3 | 100.9 |
| Sd. length | 28.9 | 21.9 | 30.6 | 27.7 | 33.4 |
| N/P ratio: | 0.40 | 0.86 | 0.51 | 0.14 | 0.40 |
Message frequency in the training/testing corpus for self-protection classes. Mean message length, standard deviation for message length and the ratio of negative to positive messages are also shown.
Results for Naive Bayes classification
| P | R | F1 | |
|---|---|---|---|
| A | |||
| UNI | 0.55 | 0.77 | 0.64 |
| UNI+SRL | 0.56 | 0.80 | 0.66 |
| UNI+BI | 0.54 | 0.80 | 0.65 |
| UNI+BI+SRL | 0.56 | 0.80 | 0.66 |
| I | |||
| UNI | 0.54 | 0.57 | 0.55 |
| UNI+BI | 0.48 | 0.43 | 0.46 |
| P | |||
| UNI | 0.60 | 0.80 | 0.68 |
| UNI+SRL | 0.61 | 0.81 | 0.70 |
| UNI+BI | 0.61 | 0.83 | 0.70 |
| UNI+BI+SRL | 0.62 | 0.84 | 0.71 |
| W | |||
| UNI | 0.24 | 0.63 | 0.35 |
| UNI+SRL | 0.29 | 0.78 | 0.42 |
| UNI+BI | 0.25 | 0.72 | 0.37 |
| UNI+BI+SRL | 0.26 | 0.72 | 0.38 |
| S | |||
| UNI | 0.53 | 0.70 | 0.61 |
| UNI+SRL | 0.59 | 0.74 | 0.65 |
| UNI+BI | 0.54 | 0.78 | 0.64 |
| UNI+BI+SRL | 0.57 | 0.78 | 0.66 |
F1 results for tweet classification using Naive Bayes. UNI = unigram, BI = bigram, SRL = Simple Rule Language regular expression.
Results for SVM classification
| P | R | F1 | |
|---|---|---|---|
| A | |||
| UNI | 0.70 | 0.66 | 0.68 |
| UNI+SRL | 0.72 | 0.69 | 0.70 |
| UNI+BI | 0.71 | 0.70 | 0.70 |
| UNI+BI+SRL | 0.71 | 0.71 | 0.71 |
| I | |||
| UNI | 0.62 | 0.70 | 0.66 |
| UNI+BI | 0.61 | 0.59 | 0.60 |
| P | |||
| UNI | 0.65 | 0.84 | 0.73 |
| UNI+SRL | 0.65 | 0.85 | 0.74 |
| UNI+BI | 0.67 | 0.77 | 0.72 |
| UNI+BI+SRL | 0.67 | 0.78 | 0.72 |
| W | |||
| UNI | 0.15 | 0.06 | 0.09 |
| UNI+SRL | 0.25 | 0.16 | 0.19 |
| UNI+BI | 0.15 | 0.06 | 0.09 |
| UNI+BI+SRL | 0.31 | 0.16 | 0.21 |
| S | |||
| UNI | 0.64 | 0.59 | 0.61 |
| UNI+SRL | 0.68 | 0.72 | 0.70 |
| UNI+BI | 0.74 | 0.54 | 0.63 |
| UNI+BI+SRL | 0.78 | 0.60 | 0.67 |
F1 results for tweet classification using SVM. UNI = unigram, BI = bigram, SRL = Simple Rule Language regular expression
Twitter positives versus CDC cases
| Wk | A | S | I | P | CDC |
|---|---|---|---|---|---|
| 46 | 49 | 48 | 22 | 222 | 2715 |
| 47 | 32 | 72 | 30 | 258 | 1408 |
| 48 | 24 | 49 | 9 | 181 | 997 |
| 49 | 35 | 41 | 10 | 199 | 610 |
| 50 | 35 | 39 | 10 | 154 | 480 |
| 51 | 21 | 35 | 12 | 150 | 251 |
| 52 | 19 | 26 | 4 | 37 | 285 |
| 1 | 25 | 32 | 6 | 63 | 266 |
| 2 | 25 | 32 | 5 | 81 | 261 |
| 3 | 29 | 31 | 7 | 73 | 317 |
| 4 | 29 | 20 | 7 | 62 | 268 |
| 5 | 29 | 23 | 6 | 46 | 290 |
Positively identified Tweets in the Edinburgh corpus shown against Influenza Positive tests reported to CDC by U.S. WHO/NREVSS collaborating laboratories, National Summary, 2009-2010. Counts for W were zero throughout and are therefore not shown. For week 46 we only have partial Twitter data available in the Edinburgh corpus.
Correlation between Twitter positives and CDC cases
| Category | Spearman’s Rho | p-value |
|---|---|---|
| A | 0.66 | 0.020 |
| S | 0.66 | 0.021 |
| I | 0.58 | 0.048 |
| P | 0.67 | 0.017 |
| A+I+P | 0.68 | 0.008 |
| A+I+P+S | 0.67 | 0.017 |
Correlation between CDC AH1N1 laboratory data frequency for Influenza 2009-2010 and aggregated self protection behaviour counts and self reported diagnosis from Tweets. Spearman’s rank-order correlation coefficient. p values are reported for a two-tailed test. Calculations were done using VassarStats (http://faculty.vassar.edu/lowry/corr_rank.html)