| Literature DB >> 34764561 |
Abstract
As of July 17, 2020, more than thirteen million people have been diagnosed with the Novel Coronavirus (COVID-19), and half a million people have already lost their lives due to this infectious disease. The World Health Organization declared the COVID-19 outbreak as a pandemic on March 11, 2020. Since then, social media platforms have experienced an exponential rise in the content related to the pandemic. In the past, Twitter data have been observed to be indispensable in the extraction of situational awareness information relating to any crisis. This paper presents COV19Tweets Dataset (Lamsal 2020a), a large-scale Twitter dataset with more than 310 million COVID-19 specific English language tweets and their sentiment scores. The dataset's geo version, the GeoCOV19Tweets Dataset (Lamsal 2020b), is also presented. The paper discusses the datasets' design in detail, and the tweets in both the datasets are analyzed. The datasets are released publicly, anticipating that they would contribute to a better understanding of spatial and temporal dimensions of the public discourse related to the ongoing pandemic. As per the stats, the datasets (Lamsal 2020a, 2020b) have been accessed over 74.5k times, collectively. © Springer Science+Business Media, LLC, part of Springer Nature 2020.Entities:
Keywords: Crisis computing; Network analysis; Sentiment analysis; Social computing; Twitter data
Year: 2020 PMID: 34764561 PMCID: PMC7646503 DOI: 10.1007/s10489-020-02029-z
Source DB: PubMed Journal: Appl Intell (Dordr) ISSN: 0924-669X Impact factor: 5.086
Fig. 1Daily distribution of tweets in the COV19Tweets Dataset
Overview of the filtering keywords as of July 17, 2020
| In use since | Keywords |
|---|---|
| March 20, 2020 | corona, #corona, coronavirus, #coronavirus |
| April 18, 2020 | covid, #covid, covid19, #covid19, covid-19, #covid-19, sarscov2, #sarscov2, sars cov2, sars cov 2, covid_19, #covid_19, #ncov, ncov, #ncov2019, ncov2019, 2019-ncov, #2019-ncov, #2019ncov, 2019ncov |
| May 16, 2020 | pandemic, #pandemic, quarantine, #quarantine, flatten the curve, flattening the curve, #flatteningthecurve, #flattenthecurve, hand sanitizer, #handsanitizer, #lockdown, lockdown, social distancing, #socialdistancing, work from home, #workfromhome, working from home, #workingfromhome, ppe, n95, #ppe, #n95 |
keyword preceded by a hash sign (#) is a hashtag
Overview of the VM
| Resource | Description |
|---|---|
| CPU | Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Width: 64 bits, 2 vCPUs |
| Memory | Size: 4GiB |
| Disk type | Solid State Drive |
| Bandwidth (based on Speedtest CLI) | Download avg.: 2658.182 Mb/s Upload avg.: 2149.294 Mb/s |
Fig. 2Resource utilization graphs for the VM (24 hours)
Fig. 3Daily distribution of tweets in the GeoCOV19Tweets Dataset
Fig. 4COVID-19 sentiment trend, since April 24, 2020 to July 17, 2020
Trending unigrams and bigrams
| Date | score | Unigrams | Bigrams |
|---|---|---|---|
| May 28, 2020 | − 0.03 | deaths, people, trump, pandemic, cases, world, US, virus, health, UK, death, government, china, police | nursing_homes, covid_deaths, bad_gift, tested_positive, gift_china, death_rate, supreme_court, new_york, real_virus, covid_racism |
| June 01, 2020 | − 0.05 | people, US, health, protests, care, cases, pandemic, home, testing, trump, black, virus, please, masks, curfew, tests | covid_testing, stay_home, testing_centers, impose_curfew, eight_pm, curfue_impose, fighting_covid, peaceful_protests, health_care, enough_masks, masks_ppe |
| June 14, 2020 | − 0.11 | pandemic, people, children, cases, virus, staff, US, deaths, killed, worst, disease, beat, unbelievable | covid_blacks, latinx_children, unbelievable_asians, systematically_killed, exposed_corona, going_missing, staff_sitting, recovered_covid, worst_disease |
| June 21, 2020 | − 0.02 | trump, people, pandemic, masks, rally, tulsa, cases, social, distancing, lockdown, died, hospital, mask, call, | wearing_masks, social_distancing, wake_call, mother_died, still_arguing, tested_positive, trump_campaign, tulsa_rally, trump_rally |
| June 24, 2020 | − 0.01 | pandemic, people, trump, cases, US, testing, lockdown, positive, lindsay, world, social, masks, president | covid_cases, social_distancing, last_year, drunk_driving, lindsay_richardson, tested_positive, wear_mask, america_recovering |
| July 06, 2020 | − 0.02 | pandemic, people, trump, cases, lockdown, positive, US, virus, wear, social, distancing, mask | social_distancing, got_covid, severe_respiratory, respiratory_cardiovascular, wear_mask, kimberly_guilfoyle, donald_trump |
| July 10, 2020 | − 0.01 | andemic, coronavirus, people, cases, trump, control, lockdown, US, schools, students, deaths, masks, virus, home, government | control_covid, covid_cases, covid_schools, social_distancing, shake_hands, kneel_bow, hands_hug, vs_right, left_vs |
the lowest average sentiment reached on the particular date, excluding the significantly dominating unigrams: COVID, corona, coronavirus and other terms, such as SARS, nCoV, SARS-CoV-2, etc
Fig. 5Network Analysis: Overview of the GeoCOV19Tweets Dataset
Fig. 6Country specific outlier hashtags detected using Network Analysis
Fig. 7Network diagram in Fig. 5 expanded by a scale factor
Communities in the GeoCOV19Tweets dataset
| S No. | C | Color | N | Countries (ISO) |
|---|---|---|---|---|
| 1 | 0 | Medium Red | 55.56% | US, AU, NG, ZA, AE, ES, ID, IE, MX, PK, SG, FR, BE, GH, KE, TH, SE, AT, SA, PT, LB, UG, EG, CO, MA, LK, EC, HK, KW, RO, PE, FI, HR, NO, ZW, PA, TZ, VN, BS, PG, HU, BH, CR, BB, OM, SX, RS, TW, BG, DO, ZM, AW, KH, GU, BT, BW, CM, CG, CD, FJ, AQ, SV, AL, ET, JO, UY |
| 2 | 4 | Cyan | 17.12% | GB, MV, MK, MU, SK, |
| 3 | 3 | Yellow Green | 11.55% | IN, TT, |
| 4 | 2 | Blush Pink | 4.79% | CA |
| 5 | 1 | Cameo | 4.52% | PH, MY, BR, TR, AR, IL, DK, RU, DX, GT, CY, IQ, |
| 6 | 6 | Buddha Gold | 2.25% | IT, SI, VE, MC |
| 7 | 5 | Caribbean Green | 2.21% | DE, NL, CZ, UA, AO, |
| 8 | 9 | Pine Green | 0.74% | JP, PL |
| 9 | 8 | Fern Frond | 0.55% | NZ, NP, MT, IR |
| 10 | 7 | Eggplant | 0.42% | QA |
| 11 | 10 | Paarl | 0.19% | BM |
| 12 | 11 | Melrose | 0.1% | KR |
community, percentage of total nodes, italicized ISO codes suggest that those countries have associations with less than 25 different hashtags
Top 40 hashtags and their communities
| Hashtag | Weighted in-degree | C | Hashtag | Weighted in-degree | C |
|---|---|---|---|---|---|
| covid19 | 31,414 | 0 | isolation | 799 | 4 |
| coronavirus | 15,709 | 0 | india | 716 | 3 |
| corona | 11,338 | 0 | savetheworld | 708 | 0 |
| lockdown | 5,300 | 4 | facemask | 704 | 0 |
| quarantine | 5,242 | 0 | workfromhome | 655 | 3 |
| socialdistancing | 4,438 | 0 | stayhealthy | 634 | 0 |
| stayhome | 4,198 | 0 | savetheworldthanksdoc | 617 | 3 |
| covid | 4,074 | 0 | london | 568 | 4 |
| staysafe | 3,393 | 0 | health | 533 | 2 |
| pandemic | 2,206 | 0 | italy | 471 | 6 |
| billionshieldschallenge | 2,129 | 0 | wearamask | 459 | 0 |
| billionshields | 1,957 | 0 | fitness | 450 | 4 |
| stayathome | 1,675 | 1 | exoworldnow | 437 | 0 |
| faceshield | 1,524 | 0 | besafe | 435 | 0 |
| love | 1,442 | 0 | newnormal | 410 | 1 |
| quarantinelife | 1,323 | 0 | stayhomestaysafe | 393 | 3 |
| mask | 1,212 | 0 | selfisolation | 391 | 4 |
| 2020 | 1,192 | 0 | washyourhands | 390 | 0 |
| virus | 1,148 | 0 | coronamemes | 383 | 3 |
| lockdown2020 | 853 | 4 | workingfromhome | 364 | 4 |
community
Distribution of tweets in the GeoCOV18Tweets Dataset (top 7)
| S No. | Country | # of tweets |
|---|---|---|
| 1 | United States | 60,016 (43.71%) |
| 2 | United Kingdom | 20,847 (15.18%) |
| 3 | Canada | 10,688 (7.78%) |
| 4 | India | 10,082 (7.34%) |
| 5 | Nigeria | 4,246 (3.09%) |
| 6 | Australia | 2,893 (2.11%) |
| 7 | South Africa | 2,824 (2.06%) |
as of July 17, 2020, 1010hrs GMT+ 5:45
Fig. 8World view of COVID-19 Sentiment
Fig. 9Region-specific view of COVID-19 Sentiment (color scale for this figure is same as of Fig. 8)