| Literature DB >> 36186843 |
Lateef Babatunde Amusa1, Hossana Twinomurinzi1, Chinedu Wilfred Okonkwo1.
Abstract
Infodemiologic methods could be used to enhance modeling infectious diseases. It is of interest to verify the utility of these methods using a Nigerian case study. We used Google Trends data to track COVID-19 incidences and assessed whether they could complement traditional data based solely on reported case numbers. Data on the Nigerian weekly COVID-19 cases spanning through March 1, 2020, to May 31, 2021, were matched with internet search data from Google Trends. The reported weekly incidence numbers and the GT data were split into training and testing sets. ARIMA models were fitted to describe reported weekly COVID cases using the training set. Several COVID-related search terms were theoretically and empirically assessed for initial screening. The utilized Google Trends (GT) variable was added to the ARIMA model as a regressor. Model forecasts, both with and without GTD, were compared with weekly cases in the test set over 13 weeks. Forecast accuracies were compared visually and using RMSE (root mean square error) and MAE (mean average error). Statistical significance of the difference in predictions was determined with the two-sided Diebold-Mariano test. Preliminary results of contemporaneous correlations between COVID-related search terms and weekly COVID cases reveal "loss of smell," "loss of taste," "fever" (in order of magnitude) as significantly associated with the official cases. Predictions of the ARIMA model using solely reported case numbers resulted in an RMSE (root mean squared error) of 411.4 and mean absolute error (MAE) of 354.9. The GT expanded model achieved better forecasting accuracy (RMSE: 388.7 and MAE = 340.1). Corrected Akaike Information Criteria also favored the GT expanded model (869.4 vs. 872.2). The difference in predictive performances was significant when using a two-sided Diebold-Mariano test (DM = 6.75, p < 0.001) for the 13 weeks. Google trends data enhanced the predictive ability of a traditionally based model and should be considered a suitable method to enhance infectious disease modeling.Entities:
Keywords: ARIMA; Big Data; COVID-19; Google Trends; infectious disease modeling
Year: 2022 PMID: 36186843 PMCID: PMC9520600 DOI: 10.3389/frma.2022.1003972
Source DB: PubMed Journal: Front Res Metr Anal ISSN: 2504-0537
Grouping of COVID-19 related GT search terms.
|
|
|
|---|---|
| Disease-related | “corona”, “COVID”, “COVID-19”, “coronavirus” |
| Symptoms | “cough”, “fever”, “loss of smell”, “loss of taste”, “sore throat” |
| Government instructions | “lockdown”, “quarantine” |
| Non-pharmaceutical interventions (NPI) | “hand wash”, “mask”, “sanitizer”, “social distancing” |
Figure 1Time plot of weekly GT RSVs for some COVID-19-related search terms in Nigeria. GT RSVs, Google Trends Relative Search Volumes; NPI, Non-pharmaceutical interventions.
Figure 2Spearman correlations among GT data and the weekly COVID-19 cases in Nigeria. The blank spaces indicate insignificant correlations (p < 0.05).
Figure 3Time plot of weekly GT RSVs of “loss of smell” search term (the most strongly correlated) and the COVID-19 weekly cases in Nigeria. GT RSVs, Google Trends Relative Search Volumes.
Figure 4Plot of the autocorrelation and partial autocorrelation functions of the weekly COVID-19 cases.
Comparative performance assessment of the model without GT and the GT-enhanced model.
|
|
| |||
|---|---|---|---|---|
| AICc | 872.2 | 869.4 | ||
| Training set RMSE | 253.7 | 231.8 | ||
| Test set RMSE | 411.4 | 388.7 | ||
| Training set MAE | 190.8 | 176.6 | ||
| Test set MAE | 354.9 | 340.1 | ||
|
|
|
|
|
|
| AR1 | 0.255 (0.139) | 0.066 | 0.249 (0.139) | 0.072 |
| AR2 | 0.361 (0.143) | 0.012 | 0.378 (0.144) | 0.009 |
| GT | NA | 9.065 (12.034) | 0.451 | |
AR, Autoregressive; SE, Standard error; MAE, Mean absolute error; RMSE, Root mean squared error; AICc, corrected Akaike information criterion.
Figure 5l Forecasting of the optimal ARIMA Model (red curve) compared to the Google Trends enhanced Model (blue curve) and to the actual weekly COVID-19 cases (black curve).