| Literature DB >> 31094347 |
Ashlynn R Daughton1,2, Michael J Paul2.
Abstract
BACKGROUND: An estimated 3.9 billion individuals live in a location endemic for common mosquito-borne diseases. The emergence of Zika virus in South America in 2015 marked the largest known Zika outbreak and caused hundreds of thousands of infections. Internet data have shown promise in identifying human behaviors relevant for tracking and understanding other diseases.Entities:
Keywords: behavior; communicable diseases; epidemiology; information science; public health; social media; travel; travel-related illness; zika virus
Mesh:
Year: 2019 PMID: 31094347 PMCID: PMC6535980 DOI: 10.2196/13090
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Data processing and experimental overview. Dotted boxes show datasets and corresponding sizes where applicable. Solid boxes show methods used and reference relevant text figures or tables. Black arrows show the flow of data through the pipeline. The gray arrows denote that the final classifiers were used to identify first person, travel consideration, and travel change tweets from the keyword filtered tweets.
Label frequency (%), annotator agreement (Cohen’s κ), and example tweets for each classification category.
| Category | Example (paraphrased) | % (n/N) | κ |
| First person | When Zika explodes after the Olympics, I’m going to say I told you so! | 41.15% (823/2000) | .52 |
| Travel consideration | Thinking about going to Rio for honeymoon. Will I be safe with Zika? | 17.5% (350/2000) | .76 |
| Travel change | So mad I had to cancel my island babymoon because of Zika | 10.8% (216/2000) | .66 |
Final precision, recall, and F1 of the 3 classifiers.
| Classifier | Precision | Recall | F1 | F1 (no pipeline) |
| First person | 0.89 | 0.94 | 0.92 | 0.92 |
| Travel consideration | 0.61 | 0.74 | 0.67 | 0.63 |
| Travel change | 0.66 | 0.81 | 0.73 | 0.65 |
Figure 2Temporal trends in classifications by week.
Figure 3Temporal trends in decisions to change international (outside of the United States) and domestic (within the United States) travel.
Figure 4Weighted volume of classified tweets by modified US Department of Health and Human Services Region. Bars show median weighted volume. Error bars represent 95% confidence intervals obtained using weighted bootstrapped sampling.
Figure 5Relative percent of women in a sample of Twitter (red), English Zika dataset (orange), travel consideration dataset (yellow), and the travel change dataset (blue). Bars show 95% weighted bootstrapped confidence intervals.
Average percent of Linguistic Inquiry Word Count category prevalence per group.
| Type | Category | All Twitter | Consideration | Change |
| Linguistic processes | Personal pronouns | 0.6080 | 0.7501 | |
| Linguistic processes | 1st singular | 0.2788 | 0.3214 | |
| Linguistic processes | 1st plural | 0.0458 | 0.0699 | |
| Linguistic processes | 3rd singular | 0.0692 | 0.0699 | |
| Linguistic processes | 3rd plural | 0.0474 | 0.0561 | 0.0571 |
| Linguistic processes | Past tense | 0.1794 | ||
| Linguistic processes | Future tense | 0.0648 | 0.0842 | 0.0871 |
| Linguistic processes | Present tense | 0.6053 | ||
| Psychological processes | Social processes | 0.7181 | ||
| Psychological processes | Affective processes | 0.6648 | 0.7362 | 0.7587 |
| Psychological processes | Positive emotion | 0.4323 | 0.5106 | 0.5105 |
| Psychological processes | Negative emotion | 0.2290 | 0.2225 | 0.2440 |
| Psychological processes | Anxiety | 0.0246 | 0.0331 | |
| Psychological processes | Tentativeness | 0.1556 | 0.2019 | 0.2075 |
| Psychological processes | Certainty | 0.1203 | 0.1437 | 0.1375 |
| Psychological processes | Inhibition | 0.0470 | 0.0633 | |
| Psychological processes | Biological processes | 0.2230 | 0.2712 | 0.2401 |
| Psychological processes | Body | 0.0705 | 0.0787 | 0.0674 |
| Psychological processes | Health | 0.0495 | 0.0744 | 0.0734 |
| Psychological processes | Sexual | 0.0857 | 0.0648 | 0.0526 |
| Other (non- Linguistic Inquiry Word Count) | Pregnancy | 0.0004 | 0.0106 |
aInstances where there are significant differences from the random sample. Significance is estimated using an unpaired 2-sided t test with a significance level of P<.05 after Bonferroni correction.
The number of followees an individual user has who are also in the dataset, and the number of tweets that followees tweeted that are also in the dataset. We normalized to the number of total followees for each individual. Values in italics are significant (P≤.05).
| Metric | All Twitter, median (95% CI) | Consideration, median (95% CI) | Change, median (95% CI) |
| Number of followees (raw) | 92.8 (58.3-135.4) | 111.6 (71.1-170.9) | 122.2 (82.3-177.4) |
| Number of followees (normalized) | 0.08 (0.06-0.11) | ||
| Number of tweets (raw) | 93.6 (56.2-141.2) | 111.3 (67.7-169.8) | 122.7 (79.6-179.0) |
| Number of tweets (normalized) | 1.71 (1.02-2.62) |