| Literature DB >> 36092470 |
Rabindra Lamsal1, Aaron Harwood1, Maria Rodriguez Read1.
Abstract
As of writing this paper, COVID-19 (Coronavirus disease 2019) has spread to more than 220 countries and territories. Following the outbreak, the pandemic's seriousness has made people more active on social media, especially on the microblogging platforms such as Twitter and Weibo. The pandemic-specific discourse has remained on-trend on these platforms for months now. Previous studies have confirmed the contributions of such socially generated conversations towards situational awareness of crisis events. The early forecasts of cases are essential to authorities to estimate the requirements of resources needed to cope with the outgrowths of the virus. Therefore, this study attempts to incorporate the public discourse in the design of forecasting models particularly targeted for the steep-hill region of an ongoing wave. We propose a sentiment-involved topic-based latent variables search methodology for designing forecasting models from publicly available Twitter conversations. As a use case, we implement the proposed methodology on Australian COVID-19 daily cases and Twitter conversations generated within the country. Experimental results: (i) show the presence of latent social media variables that Granger-cause the daily COVID-19 confirmed cases, and (ii) confirm that those variables offer additional prediction capability to forecasting models. Further, the results show that the inclusion of social media variables introduces 48.83%-51.38% improvements on RMSE over the baseline models. We also release the large-scale COVID-19 specific geotagged global tweets dataset, MegaGeoCOV, to the public anticipating that the geotagged data of this scale would aid in understanding the conversational dynamics of the pandemic through other spatial and temporal contexts.Entities:
Keywords: ARIMAX models; Granger causality; Pandemic forecast; Social media analytics; Time series analysis; Twitter analytics; VAR models
Year: 2022 PMID: 36092470 PMCID: PMC9444159 DOI: 10.1016/j.asoc.2022.109603
Source DB: PubMed Journal: Appl Soft Comput ISSN: 1568-4946 Impact factor: 8.263
Fig. 1Daily (new) and total (cumulative) COVID-19 cases reported in Australia between January 25, 2020 (first COVID-19 case reported), and September 9, 2020.
Fig. 2The overall view of the Twitter-based COVID-19 cases forecast methodology.
Descriptive statistics of the daily COVID-19 public discourse on Twitter.
| All Tweets | Geotagged Tweets (global) | Tweets from | % of tweets geotagged (global) | |
|---|---|---|---|---|
| mean | 4.62 million | 33.2 k | 493 | 0.694538 |
| std | 3.74 million | 30.8 k | 337 | 0.129555 |
| minimum | 59.6 k | 615 | 7 | 0.449497 |
| 25% | 3.06 million | 18.9 k | 272 | 0.595030 |
| median | 3.77 million | 24.4 k | 408 | 0.682665 |
| 75% | 4.64 million | 33.7 k | 660 | 0.781970 |
| maximum | 25.8 million | 183 k | 2297 | 1.439504 |
Overview of MegaGeoCOV.
| Total tweets (unique ids) | 21,305,691 |
| Duplicate tweets (exact copy) | 137,836 |
| Countries and territories | 245 |
| Cities and states | 260,732 |
| Languages | 64 (and undefined) |
Top 15 global locations in MegaGeoCOV.
| (a) Top countries/territories (N | |
| Country/territory | Tweets |
| United States | 7,375,997 |
| United Kingdom | 2,279,064 |
| India | 1,563,017 |
| Brazil | 1,379,733 |
| Canada | 756,466 |
| Spain | 625,599 |
| Indonesia | 509,498 |
| Argentina | 434,454 |
| Mexico | 430,478 |
| Philippines | 383,215 |
| Australia | 366,033 |
| South Africa | 357,674 |
| France | 339,001 |
| Italy | 324,028 |
| Nigeria | 293,242 |
| (b) Top cities and states (N | |
| City/state | Tweets |
| Los Angeles | 240,374 |
| Rio de Janerio | 192,986 |
| Manhattan | 185,021 |
| New Delhi | 173,854 |
| Mumbai | 155,855 |
| Sao Paulo | 148,202 |
| Toronto | 141,963 |
| Florida | 122,370 |
| Chicago | 120,930 |
| Brooklyn | 112,231 |
| Houston | 111,836 |
| Melbourne | 111,038 |
| Washington | 98,907 |
| Madrid | 96,592 |
| Buenos Aires | 95,759 |
Most frequent languages (N 64) in MegaGeoCOV.
| Language | ISO | Tweets |
|---|---|---|
| English | en | 13,854,642 |
| Spanish | es | 2,545,726 |
| Portuguese | pt | 1,389,951 |
| Indonesian | in | 708,023 |
| Undefined | – | 689,301 |
| French | fr | 415,434 |
| Italian | it | 280,087 |
| Tagalog | tl | 274,845 |
| Hindi | hi | 221,280 |
| Turkish | tr | 157,962 |
| German | de | 143,874 |
| Other languages | ||
| nl, ca, ja, th, ar, pl, et, ru, sv, ht, lt, mr, ro, cs, fi, da, el, ur, ta, | ||
| zh, sl, ne, gu, bn, lv, no, vi, cy, te, kn, uk, hu, ko, or, fa, is, eu, | ||
| si, ml, iw, bg, sr, pa, dv, km, my, am, sd, ckb, ps, lo, hy, ka, bo , | ||
ISO 639-1 Language Code
Fig. 3Daily distribution of COVID-19 specific tweets between January 1, 2020, and September 9, 2020.
Top Australian locations in MegaGeoCOV.
| (a) Top small regions (N | |
| Small regions | Tweets |
| Melbourne | 94,330 |
| Sydney | 70,118 |
| Brisbane | 21,298 |
| Perth | 18,143 |
| Adelaide | 13,372 |
| Canberra | 8,366 |
| Gold Coast | 6,483 |
| New Castle | 3,574 |
| Sunshine Coast | 2,843 |
| Central Coast | 2,190 |
| (b) Top larger regions (N | |
| Larger regions | Tweets |
| Victoria | 107,560 |
| New South Wales | 89,142 |
| Queensland | 37,107 |
| Western Australia | 20,259 |
| South Australia | 15,301 |
| Australia | 14,703 |
| Australian Capital Territory | 8,373 |
| Not Available | 4,678 |
| Tasmania | 3,280 |
| Northern Territory | 2,141 |
20 most frequent unigrams and bigrams used by Australian Twitter users in the COVID-19 discourse.
| (a) Unigrams | |
| Unigram | Frequency |
| covid | 46,942 |
| lockdown | 34,016 |
| people | 30,936 |
| virus | 21,844 |
| vaccine | 19,380 |
| covid-19 | 18,132 |
| #covid-19 | 17,231 |
| quarantine | 16,618 |
| pandemic | 15,842 |
| australia | 13,456 |
| mask | 12,936 |
| time | 12,891 |
| coronavirus | 12,602 |
| health | 12,111 |
| cases | 11,444 |
| #coronavirus | 8,859 |
| government | 8,679 |
| nsw | 8,481 |
| home | 8,202 |
| work | 8,104 |
| (b) Bigrams | |
| Bigram | Frequency |
| (‘hotel’, ‘quarantine’) | 3,161 |
| (‘wear’, ‘mask’) | 2,037 |
| (‘2’, ‘weeks’)/(’14’, ’days’) | 1,974 |
| (‘aged’, ‘care’) | 1,970 |
| (‘wearing’, ‘mask’) | 1,517 |
| (‘new’, ‘cases’) | 1,463 |
| (‘social’, ‘distancing’) | 1,424 |
| (‘public’, ‘health’) | 1,303 |
| (‘new’, ‘daily’) | 1,302 |
| (‘many’, ‘people’) | 1,239 |
| (‘mental’, ‘health’) | 1,221 |
| (‘federal’, ‘government’) | 1,154 |
| (‘covid’, ‘cases’) | 1,098 |
| (‘last’, ‘year’) | 1,077 |
| (‘vaccine’, ‘rollout’) | 1,074 |
| (‘stay’, ‘home’) | 1,037 |
| (‘face’, ‘mask’) | 1,002 |
| (‘tested’, ‘positive’) | 907 |
| (‘covid’, ‘vaccine’) | 858 |
| (‘covid’, ‘test’) | 805 |
Variables in that Granger-cause at most lags (only 10 listed).
| Variable | Variable | sig. | Variable | Variable | sig. |
|---|---|---|---|---|---|
| 14 | 12 | ||||
| 14 | 12 | ||||
| 14 | 12 | ||||
| 14 | 11 | ||||
| 14 | 11 | ||||
| 13 | 11 | ||||
| 13 | 10 |
Fig. 4Plots of the variables (listed in Table 7) in that Granger-cause at most lags (10). For each subplot, the vertical axis represents the count of tweets, and the horizontal axis represents the date.
Best forecasting model for .
| Approach | Avg. RMSE |
|---|---|
| Traditional model | 135.387 |
| Additive model (FB Prophet) | 236.427 |
| Machine learning model (XGBoost) | 341.8 |
also involves the participation of the models such as AR, MA, ARIMA, SARIMA.
the traditional models and their mathematical structures are discussed later in Section 4.2.1.
Results from training. Models are ranked based on their AIC scores.
| (a) Top 5 baseline models. | ||
| (p,d,q) | AIC | RMSE |
| (6, 2, 7) | 6118.50 | 37.78 |
| (5, 2, 8) | 6118.80 | 37.81 |
| (7, 2, 5) | 6119.06 | 37.89 |
| (7, 2, 8) | 6120.12 | 37.70 |
| (7, 2, 6) | 6120.21 | 37.87 |
| (b) Top 5 social media models. | ||
| (p,d,q) | AIC | RMSE |
| (2, 2, 3) | 5941.08 | 32.97 |
| (1, 2, 4) | 5942.43 | 33.05 |
| (2, 2, 2) | 5945.95 | 33.26 |
| (4, 2, 3) | 5957.88 | 33.12 |
| (4, 2, 2) | 5960.05 | 33.41 |
Results (upper values) from test data. Baseline model versus Social media model at 1% and 5% significance.
| Date | Cases | Baseline | Social media | ||
|---|---|---|---|---|---|
| at 5% | at 1% | at 5% | at 1% | ||
| 2021-08-27 | 1119 | 1068 | 1092 | 1116 | 1138 |
| 2021-08-28 | 1321 | 1090 | 1114 | 1143 | 1166 |
| 2021-08-29 | 1355 | 1074 | 1099 | 1171 | 1195 |
| 2021-08-30 | 1257 | 1114 | 1144 | 1219 | 1244 |
| 2021-08-31 | 1225 | 1161 | 1195 | 1242 | 1272 |
| 2021-09-01 | 1467 | 1120 | 1159 | 1289 | 1325 |
| 2021-09-02 | 1648 | 1194 | 1240 | 1358 | 1399 |
| 2021-09-03 | 1741 | 1230 | 1280 | 1413 | 1459 |
| 2021-09-04 | 1670 | 1221 | 1276 | 1447 | 1496 |
| 2021-09-05 | 1536 | 1261 | 1320 | 1472 | 1525 |
| 2021-09-06 | 1466 | 1279 | 1342 | 1529 | 1586 |
| 2021-09-07 | 1696 | 1326 | 1393 | 1572 | 1634 |
| 2021-09-08 | 1725 | 1323 | 1394 | 1568 | 1635 |
| 2021-09-09 | 1870 | 1334 | 1410 | 1661 | 1731 |
| RMSE | 342.58 | 295.68 | 175.31 | 143.76 | |
| MAPE | 19.36% | 16.29% | 9.24% | 7.61% | |
| R2 | 0.67 | 0.68 | 0.75 | 0.75 | |
Fig. 5COVID-19 confirmed cases versus the cases predicted by the baseline and social media models at 1% and 5% significance levels.
VAR order selection—fitting VAR models on . Lowest AIC score is highlighted.
| parameter | AIC |
|---|---|
| 0 | 28.60 |
| 1 | 23.33 |
| 2 | 22.90 |
| 3 | 22.69 |
| ... | ... |
| 15 | |
| 16 | 22.52 |
Fig. 6Forecast of COVID-19 cases for the next 7 days with VAR model. (overall); (excluding the 9/10/2021’s sudden rise).
Fig. 7Search interests data retrieved from Google Trends for the period January 1, 2020, and September 9, 2021.
Comparison of our latent variables search methodology with existing studies that use social media-based volumetric features.
| at 5% | at 1% | |||||
|---|---|---|---|---|---|---|
| RMSE | MAPE | R2 | RMSE | MAPE | R2 | |
| Baseline | 342.58 | 19.36% | 0.67 | 295.68 | 16.29% | 0.68 |
| Search index (dry cough) | 326.93 | 17.76% | 0.68 | 277.22 | 14.74% | 0.7 |
| Search index (coronavirus) | 307.48 | 16.98% | 0.7 | 258.716 | 13.75% | 0.7 |
| Search index (fever) | 344.15 | 19.49% | 0.67 | 297.28 | 16.4% | 0.68 |
| Search index (pneumonia) | 266.13 | 14.51% | 0.67 | 223.15 | 11.79% | 0.68 |
| Search indexes Combined | 241.23 | 13.1% | 0.66 | 200.40 | 10.62% | 0.67 |
| Sick posts | 283.44 | 15.71% | 0.68 | 239.68 | 12.72% | 0.69 |
| Sick posts + Search indexes combined | 198.52 | 10.29% | 0.7 | 160.62 | 8.52% | 0.70 |
| Overall posts | 289.16 | 16.07% | 0.73 | 241.44 | 12.84% | 0.73 |
| Latent variables search | 175.31 | 9.24% | 0.75 | 143.76 | 7.61% | 0.75 |
fitted solely on . Exogenous variables: [36], [37], [38], [40], [6]. this study.
Results from fitting the exogenous variables listed in Table 12 and their respective 14 days’ lags against 84 weeks of data (January 15, 2020, to August 26, 2021).
| Best fitted model | Exo. Variables count | AIC | RMSE | |
|---|---|---|---|---|
| Baseline | ARIMA(6,2,7) | – | 6118.50 | 37.78 |
| Search index (dry cough) | ARIMAX(9,2,9) | 1 and its 14 lags | 6019.93 | 37.46 |
| Search index (coronavirus) | ARIMAX(7,2,5) | 1 and its 14 lags | 6013.5 | 37.51 |
| Search index (fever) | ARIMAX(5,2,8) | 1 and its 14 lags | 5993.47 | 37.55 |
| Search index (pneumonia) | ARIMAX(6,2,9) | 1 and its 14 lags | 6001.28 | 37.52 |
| Search indexes Combined | ARIMAX(7,2,8) | 4 and respective 14 lags | 6085.15 | 36.53 |
| Sick posts | ARIMAX(8,2,7) | 1 and its 14 lags | 5989.78 | 37.12 |
| Sick posts + Search indexes combined | ARIMAX(3,2,9) | 5 and respective 14 lags | 6069.28 | 35.77 |
| Overall posts | ARIMAX(4,2,5) | 1 and its 14 lags | 5991.94 | 37.34 |
| Latent variables search | ARIMAX(2,2,3) | 14 and respective 14 lags | 5941.08 | 32.97 |
| Topic | Salient words |
|---|---|
| 0 | lockdown, pm, rule, idea, message, panic, detail, move, meeting, announcement, restriction, location, gathering, situation, detention, notice, prime_minister, mess, looking_forward, regulation |
| 1 | food, order, book, shop, market, price, water, store, delivery, supermarket, restaurant, paper, supply, stock, demand, sale, stuff, trade, cafe, list, product, customer, shortage, grocery |
| 2 | family, friend, hope, love, mate, house, shit, thought, heart, member, visit, guy, daughter, wife, mother, girl, party, partner, son, movie, dad, anxiety, brother, memory, sister, colleague, loved_one, neighbor, kind, hug, spirit, song, prayer, soul, sunshine |
| 3 | time, thing, moment, ship, air, lung, fire, cruise, trip, passenger, winter, crew, plane, hell, summer, pain, island, tip, weather, get_back, spring, ruby_princess, hope, quality, doubt, trouble, board, tour, track, smoke, breathe, omg, port, storm, boat |
| 4 | year, team, game, event, show, season, video, player, sport, club, tv, challenge, watch, fan, race, art, music, crowd, training, play, session, ground, tennis, football, ticket, court, venue, ball, goal, episode, win, series, cricket, artist, film, star, host, horse, content, performance, league, competition, song, entertainment, gig |
| 5 | death, people, number, rate, infection, risk, population, datum, transmission, protest, freedom, idiot, conspiracy, spread, exposure, control, theory, toll, site, suicide, factor, stat, mortality, prevent, evidence, confirmed_case, statistic, count, analysis, victim, every_day, protester, survivor, fatality, cases_death, surge |
| 6 | day, today, test, case, person, hour, testing, isolation, yesterday, symptom, contact, tomorrow, area, period, week, clinic, morning, site, line, act, last_week, lab, drive, delay, queue, household, isolate, swab, fever, throat, trace, temperature, quarantine, positive, tracer, carrier, diagnosis, pathology, caution, vitamin |
| 7 | people, world, country, life, rest, war, leader, million, threat, earth, around_world, citizen, pressure, stop, die, moron, stupidity, worry, shit, covidiot, problem, spanish_flu, kill, narrative, planet, prison, mentalhealth, years_ago, faith, enemy, weapon, danger, livelihood, estate, liberty, bullet, fighting, destruction, frustration |
| 8 | health, issue, system, advice, problem, expert, science, effect, research, safety, emergency, condition, treatment, mental_health, concern, evidence, disease, management, trial, effort, scientist, solution, trust, officer, report, process, authority, drug, brain, damage |
| 9 | news, media, story, fact, article, election, app, information, fear, comment, answer, view, truth, tweet, info, twitter, vote, source, journalist, ad, opinion, page, claim, president, statement |
| 10 | mask, hospital, hand, patient, doctor, ace, staff, care, nurse, centre, phone, distance, shopping, icu, eye, ppe, pace, practice, bed, work, line, nose, capacity, folk, guideline, mouth, limit, nursing, lady |
| 11 | work, job, business, worker, support, service, money, company, industry, cost, office, pay, healthcare, access, economy, leave, loss, bill, payment, driver, bus, sector, university, frontline, income, tax, |
| 12 | state, case, border, vic, nsw, travel, restriction, outbreak, premier, flight, new_case, control, record, sa, victorian, wave, gladys, hotel_quarantine, cluster, traveler, arrival, closure, bubble, community_transmission, update, ban, region, hotspot, territory, exemption |
| 13 | community, response, measure, part, change, decision, level, impact, economy, nation, point, situation, law, recovery, strategy, opportunity, crisis, sense, society, step, term, history, experience, reality, role, behavior, contract, thread, lock, model |
| 14 | vaccine, flu, vaccination, pfizer, group, risk, age, jab, dose, study, delta, choice, blood, chance, strain, shot, astrazeneca, variant, type, appointment, get_vaccine, protection, immunity, pfizer_vaccine, reaction, virus, clot, target, gp, supply |
| 15 | lockdown, week, month, melbourne, sydney, end, city, weekend, beach, stage, road, street, exercise, adelaide, town, restriction, suburb, stayhome, start, curfew, first_time, half, last_year, staysafe, apartment, rock, melb, sight, stage_lockdown |
| 16 | quarantine, home, hotel, police, student, security, parent, facility, place, hotel_quarantine, room, care, program, staff, purpose, force, airport, guard, adult, member, breach, station, protocol, inquiry, two_week, standard, fine, requirement |
| 17 | government, auspol, morrison, plan, australian, govt, failure, policy, leadership, power, responsibility, labor, disaster, leader, action, blame, federal_government, politician, deal, attack, excuse, crisis, insider, lack, lie, climate, minister, vaccine_rollout, credit, recession |