| Literature DB >> 34762064 |
Atina Husnayain1, Eunha Shim2, Anis Fuad3, Emily Chia-Yu Su1,4.
Abstract
BACKGROUND: Given the ongoing COVID-19 pandemic situation, accurate predictions could greatly help in the health resource management for future waves. However, as a new entity, COVID-19's disease dynamics seemed difficult to predict. External factors, such as internet search data, need to be included in the models to increase their accuracy. However, it remains unclear whether incorporating online search volumes into models leads to better predictive performances for long-term prediction.Entities:
Keywords: COVID-19; South Korea; infodemiology; internet search; prediction
Mesh:
Year: 2021 PMID: 34762064 PMCID: PMC8698803 DOI: 10.2196/34178
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Data set description.
| Data seta | Data description | Use |
| Case-based data | Daily cumulative cases and deaths; used to calculate new daily cases and deaths | Time series graph, correlation, and prediction analysis |
| Google Community Mobility data | Daily changes in time spent in six categorized places—retail and recreation, grocery and pharmacy, parks, transit stations, workplaces, and residential areas—compared to baseline days; median value from January 3 to February 6, 2020 | Correlation and prediction analysis |
| Apple Mobility Trends data | Daily relative volume of direction requests, in driving and walking situations, in Apple Maps compared to a baseline volume on January 13, 2020 | Correlation and prediction analysis |
| NAVER search volumes | Daily online searches made through NAVER search engines; data ranged from 0 to 100; queries were made based on 12 terms used in our previous study [ | Correlation and prediction analysis |
aAll data sets include country-level data.
Figure 1Time series of new daily COVID-19 cases and deaths in South Korea from January 20, 2020, to July 31, 2021. The information at the bottom of the figure describes the percentage of terms related to COVID-19 per month, from April 2020 to July 2021, out of the monthly top 10 terms for the life and health category (N=10). The list of terms is provided in Multimedia Appendix 1.
Figure 2Time series of new daily COVID-19 cases, mobility data (top plots), and NAVER searches (bottom plots) in South Korea from January 20, 2020, to July 31, 2021.
Figure 3Time series of new daily COVID-19 deaths, mobility data (top plots), and NAVER searches (bottom plots) in South Korea from January 20, 2020, to July 31, 2021.
Assessment of the performance of the models.
| Model | Subset 1a, RMSEb | Subset 2a, RMSE | Subset 3a, RMSE | Subset 4a, RMSE | ||||||||||||||||
|
| Training set | Test set | Training set | Test set | Training set | Test set | Training set | Test set | ||||||||||||
|
| ||||||||||||||||||||
|
| GLM1c | 62.22 | 66.92 | 53.04 | 32.70d | 48.01 | 378.94 | 85.75 | 219.22 | |||||||||||
|
| GLM2e | 43.71 | 29.29d | 36.80 | 569,037.92 | 48.19 | 495.88 | 120.76 | 429.51 | |||||||||||
|
| GLM3f | 982.42 | 587.65 | 329.49 | 8,247,155.77 | 184.59 | 543.20 | 330.15 | 4161.61 | |||||||||||
|
| LR1g | 58.57 | 60.17 | 50.90 | 44.92 | 48.20 | 373.58 | 85.09 | 216.22d | |||||||||||
|
| LR2h | 56.88 | 79.57 | 49.41 | 78.32 | 48.00 | 366.19d | 84.52 | 216.70 | |||||||||||
|
| LR3i | 56.51 | 69.13 | 50.90 | 44.92 | 48.20 | 373.58 | 84.42 | 217.81 | |||||||||||
|
| ||||||||||||||||||||
|
| GLM1 | 3.10 | 4.89 | 2.52 | 1.04 | 2.08 | 6.79 | 2.80 | 4.89 | |||||||||||
|
| GLM2 | 3.24 | 5.52 | 2.71 | 0.47 | 2.23 | 7.65 | 2.82 | 5.26 | |||||||||||
|
| GLM3 | 3.25 | 3.79d | 2.72 | 0.19d | 2.24 | 17.02 | 3.81 | 4.64d | |||||||||||
|
| LR1 | 3.05 | 4.95 | 2.62 | 1.71 | 2.16 | 5.21 | 2.75 | 5.23 | |||||||||||
|
| LR2 | 3.04 | 4.50 | 2.61 | 0.70 | 2.19 | 4.82d | 2.75 | 5.38 | |||||||||||
|
| LR3 | 3.05 | 4.95 | 2.62 | 1.71 | 2.16 | 5.23 | 2.75 | 5.23 | |||||||||||
aSubsets 1 to 4: 3, 6, 12, and 18 months after the first case was reported in South Korea, respectively.
bRMSE: root mean square error.
cGLM1: generalized linear model with a normal distribution.
dThe lowest RMSE value in the test subset.
eGLM2: generalized linear model with a Poisson distribution.
fGLM3: generalized linear model with a negative binomial distribution.
gLR1: linear regression model with lasso regularization.
hLR2: linear regression model with adaptive lasso regularization.
iLR3: linear regression model with elastic net regularization.
Figure 4Time series of new daily COVID-19 cases in South Korea from January 20, 2020, to July 31, 2021, and predicted values in the generalized linear models (GLMs) and linear regression (LR) models. GLM1: GLM with a normal distribution; GLM2: GLM with a Poisson distribution; GLM3: GLM with a negative binomial distribution; LR1: LR model with lasso regularization; LR2: LR model with adaptive lasso regularization; LR3: LR model with elastic net regularization; RMSE: root mean square error.
Figure 5Time series of new daily COVID-19 deaths in South Korea from January 20, 2020, to July 31, 2021, and predicted values in the generalized linear models (GLMs) and linear regression (LR) models. GLM1: GLM with a normal distribution; GLM2: GLM with a Poisson distribution; GLM3: GLM with a negative binomial distribution; LR1: LR model with lasso regularization; LR2: LR model with adaptive lasso regularization; LR3: LR model with elastic net regularization; RMSE: root mean square error.