| Literature DB >> 31120935 |
Dasha Pruss1, Yoshinari Fujinuma2, Ashlynn R Daughton3,4, Michael J Paul2,3, Brad Arnot2, Danielle Albers Szafir2,3, Jordan Boyd-Graber5.
Abstract
This work examines Twitter discussion surrounding the 2015 outbreak of Zika, a virus that is most often mild but has been associated with serious birth defects and neurological syndromes. We introduce and analyze a collection of 3.9 million tweets mentioning Zika geolocated to North and South America, where the virus is most prevalent. Using a multilingual topic model, we automatically identify and extract the key topics of discussion across the dataset in English, Spanish, and Portuguese. We examine the variation in Twitter activity across time and location, finding that rises in activity tend to follow to major events, and geographic rates of Zika-related discussion are moderately correlated with Zika incidence (ρ = .398).Entities:
Mesh:
Year: 2019 PMID: 31120935 PMCID: PMC6532961 DOI: 10.1371/journal.pone.0216922
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The number of tweets from each country or territory in our Americas dataset, along with the percentage in each language.
| United States | 2,275,072 | 90.69 | 4.23 | 0.91 |
| Venezuela | 402,489 | 3.45 | 90.93 | 0.58 |
| Brazil | 392,600 | 9.92 | 3.24 | 68.67 |
| Canada | 121,412 | 91.26 | 1.72 | 0.59 |
| Mexico | 98,121 | 11.10 | 80.77 | 0.66 |
| Argentina | 94,508 | 14.07 | 78.20 | 1.08 |
| Colombia | 93,338 | 10.12 | 83.37 | 0.62 |
| Chile | 55,376 | 8.54 | 85.92 | 0.71 |
| Dominican Republic | 48,199 | 8.82 | 85.63 | 0.46 |
| Ecuador | 47,415 | 8.24 | 87.96 | 0.42 |
| Puerto Rico | 41,032 | 25.33 | 67.67 | 1.96 |
| Honduras | 34,747 | 4.73 | 92.80 | 0.20 |
| Cuba | 27,898 | 5.74 | 89.63 | 1.23 |
| El Salvador | 26,692 | 8.32 | 87.63 | 0.37 |
| Jamaica | 22,201 | 90.60 | 1.24 | 2.22 |
| Peru | 18,354 | 12.49 | 76.44 | 5.36 |
| Paraguay | 16,230 | 10.02 | 81.84 | 2.21 |
| Guatemala | 15,887 | 9.42 | 84.21 | 0.57 |
| Uruguay | 14,892 | 6.13 | 83.11 | 3.24 |
| Nicaragua | 14,494 | 6.80 | 86.44 | 0.85 |
| Costa Rica | 14,358 | 14.88 | 79.89 | 0.73 |
| Panama | 10,483 | 19.27 | 76.22 | 0.34 |
| Bolivia | 6,174 | 11.71 | 82.22 | 1.00 |
| Trinidad & Tobago | 5,272 | 93.27 | 2.37 | 0.13 |
| Haiti | 3,835 | 50.38 | 1.75 | 0.23 |
| Martinique | 3,238 | 24.52 | 0.80 | 0.19 |
| Guadeloupe | 2,508 | 27.31 | 0.68 | 0.16 |
| Bahamas | 2,215 | 71.38 | 22.17 | 0.59 |
| Barbados | 2,147 | 87.61 | 4.15 | 2.10 |
| Suriname | 1,404 | 72.29 | 2.35 | 0.21 |
| Grenada | 1,111 | 93.61 | 2.70 | 0.00 |
| US Virgin Islands | 1,038 | 95.28 | 0.19 | 0.00 |
| Aruba | 987 | 31.81 | 25.63 | 1.11 |
| Guyana | 824 | 94.30 | 1.09 | 0.12 |
| St. Lucia | 776 | 86.47 | 0.64 | 0.90 |
| Cayman Islands | 749 | 91.99 | 0.93 | 0.27 |
| Belize | 652 | 82.36 | 12.58 | 0.31 |
| Antigua & Barbuda | 484 | 85.33 | 1.65 | 4.75 |
| Dominica | 457 | 94.31 | 1.31 | 0.22 |
| Turks & Caicos Islands | 183 | 93.99 | 1.09 | 0.00 |
| British Virgin Islands | 181 | 84.53 | 2.76 | 0.00 |
| Saint Vincent & Grenadines | 82 | 93.90 | 1.22 | 0.00 |
Topic model evaluation (NPMI and MTA) for different training conditions and different numbers of topics (K).
The two “MT” settings use a full machine translation system, while the “word replacement” approach approximates machine translation by simply replacing the words with entries in a bilingual dictionary.
| Mean | SD | Max | Mean | SD | Max | ||
|---|---|---|---|---|---|---|---|
| MT (all tweets) | 10 | .185 | .063 | .291 | 3.15 | 1.85 | 6.00 |
| MT (translations only) | 10 | .101 | .058 | .202 | 7.45 | 1.72 | 9.00 |
| Word replacement | 10 | .130 | .075 | .289 | 3.30 | 2.07 | 6.50 |
| MT (all tweets) | 25 | .112 | .082 | .295 | 3.30 | 2.44 | 7.50 |
| MT (translations only) | 25 | .111 | .062 | .247 | 7.44 | 1.41 | 9.50 |
| Word replacement | 25 | .126 | .086 | .327 | 3.28 | 2.13 | 7.50 |
| MT (all tweets) | 50 | .096 | .097 | .342 | 3.22 | 2.05 | 8.50 |
| MT (translations only) | 50 | .098 | .070 | .306 | 7.37 | 1.42 | 10.00 |
| Word replacement | 50 | .126 | .085 | .361 | 3.39 | 2.12 | 8.50 |
| MT (all tweets) | 75 | .150 | .086 | .424 | 2.77 | 1.89 | 9.00 |
| MT (translations only) | 75 | .089 | .076 | .346 | 7.12 | 1.51 | 10.00 |
| Word replacement | 75 | .124 | .085 | .363 | 3.61 | 2.08 | 9.00 |
| MT (all tweets) | 100 | .127 | .082 | .404 | 2.26 | 1.58 | 7.00 |
| MT (translations only) | 100 | .078 | .068 | .327 | 6.58 | 1.47 | 10.00 |
| Word replacement | 100 | .113 | .086 | .384 | 3.49 | 1.94 | 9.00 |
Fig 1Crosslingual consistency versus number of translation pairs.
Learning curves showing how crosslingual consistency of topics, measured by MTA, varies with the number of pairs of translated documents in the corpus.
The top words representing 14 topics (labeled manually) aligned across three languages.
The numbers in parentheses after each language indicate the overall topic proportion (i.e., the average value of θ across documents in that language, where a higher value means it appears more in the corpus). The rank correlation (ρ) with per-country incidence rates is shown after each topic number.
| bill conspiracy gates foundation rockefeller fear media theories people vaccine mosquitoes hoax | ||
| mosquitoes spraying genetically fight millions modified bees south florida mosquitos spray | ||
| microcephaly cases linked link syndrome study guillain disorder barre evidence colombia | ||
| blood test fda testing urine donations emergency saliva florida supply screening donated areas | ||
| mosquito repellent bug spray insect protect prevent bites mosquitoes repellents news deet | ||
| rio olympics fears olympic games concerns due mcilroy rory athletes day world golfer janeiro | ||
| brain study infection cells fetal damage microcephaly babies scientists researchers brains adult | ||
| vaccine trials human vaccines scientists develop race testing news development biotech research | ||
| bill funding senate gop house congress democrats planned parenthood pass republicans dems | ||
| abortion women pope contraception latin america birth crisis countries access rights control | ||
| birth defects born baby microcephaly linked babies defect related link brazil brain cdc severe | ||
| miami florida beach cases scott gov local county dade rick travel officials zone area governor | ||
| pregnant women travel areas cdc avoid affected countries sex pregnancy airlines latin hit | ||
| health emergency world public global organization declares international declared spread |
Fig 2Volume of Zika tweets across time and place.
Top: The adjusted volume of Zika-related tweets in five geographic regions per week. The country flags below the line plots indicate the time window in which each country reported its first Zika case. Bottom: Tweet volume in each country during six time windows spanning our data collection. Darker shading indicates higher volume; the color scale is on a log2 scale.
Fig 3Topic prevalence by location.
Geospatial distribution of the topics from Table 3. Darker shading indicates a higher average probability of the topic in that location.
The top three topics with the highest average topic probabilities in each location.
| Country | Top Three Topics |
|---|---|
| Antigua and Barbuda | Microcephaly, Advisories, Environment |
| Argentina | Conspiracies, Environment, Olympics |
| Aruba | Emergency Declaration, Negative Effects, Microcephaly |
| Barbados | Conspiracies, Advisories, Vaccination |
| Belize | Florida Outbreak, Microcephaly, Advisories |
| Bolivia | Reproductive Health, Viral Testing, Emergency Declaration |
| Brazil | Viral Testing, Olympics, Research |
| British Virgin Islands | Florida Outbreak, Conspiracies, Environment |
| Canada | Olympics, Environment, Viral Testing |
| Cayman Islands | Environment, Florida Outbreak, Viral Testing |
| Chile | Reproductive Health, Mosquitos, Conspiracies |
| Colombia | Negative Effects, Reproductive Health, Olympics |
| Costa Rica | Reproductive Health, Olympics, Research |
| Cuba | Emergency Declaration, Vaccination, Olympics |
| Dominica | Emergency Declaration, Research, Negative Effects |
| Dominican Republic | Olympics, US Politics, Florida Outbreak |
| Ecuador | Olympics, Vaccination, Research |
| El Salvador | Olympics, Reproductive Health, Negative Effects |
| Grenada | Reproductive Health, Negative Effects, Advisories |
| Guadeloupe | Research, Advisories, Negative Effects |
| Guatemala | Negative Effects, Reproductive Health, Research |
| Guyana | Emergency Declaration, Advisories, Research |
| Haiti | Florida Outbreak, Microcephaly, Reproductive Health |
| Honduras | Mosquitos, US Politics, Environment |
| Jamaica | Negative Effects, Mosquitos, Microcephaly |
| Martinique | Advisories, Emergency Declaration, Florida Outbreak |
| Mexico | Viral Testing, Vaccination, Conspiracies |
| Nicaragua | Microcephaly, Vaccination, Viral Testing |
| Panama | Microcephaly, Vaccination, Negative Effects |
| Paraguay | Conspiracies, Olympics, Emergency Declaration |
| Peru | Reproductive Health, Vaccination, Research |
| Puerto Rico | US Politics, Environment, Florida Outbreak |
| Saint Vincent and the Grenadines | Mosquitos, Florida Outbreak, Reproductive Health |
| St. Lucia | Conspiracies, Mosquitos, Viral Testing |
| Suriname | Advisories, Research, Emergency Declaration |
| The Bahamas | Florida Outbreak, Environment, Mosquitos |
| Trinidad and Tobago | Emergency Declaration, Microcephaly, Mosquitos |
| Turks and Caicos Islands | Mosquitos, Advisories, Florida Outbreak |
| US Virgin Islands | Viral Testing, Advisories, Florida Outbreak |
| United States | US Politics, Florida Outbreak, Environment |
| Uruguay | Conspiracies, Vaccination, Olympics |
| Venezuela | Negative Effects, Vaccination, Research |
Fig 4Topic prevalence by time.
The volume and distribution of the topics from Table 3 per week. The adjusted counts of each topic are shown on top; dashed lines are added for readability and do not have a special meaning. The adjusted counts are normalized to sum to 1 on the bottom; the shaded gray area represents the proportion of the 36 other topics outside of the 14 topics in Table 3.