| Literature DB >> 32244425 |
Lei Qin1, Qiang Sun1, Yidan Wang1, Ke-Fei Wu2, Mingchih Chen2, Ben-Chang Shia3,4,5, Szu-Yuan Wu6,7,8,9,10.
Abstract
Predicting the number of new suspected or confirmed cases of novel coronavirus disease 2019 (COVID-19) is crucial in the prevention and control of the COVID-19 outbreak. Social media search indexes (SMSI) for dry cough, fever, chest distress, coronavirus, and pneumonia were collected from 31 December 2019 to 9 February 2020. The new suspected cases of COVID-19 data were collected from 20 January 2020 to 9 February 2020. We used the lagged series of SMSI to predict new suspected COVID-19 case numbers during this period. To avoid overfitting, five methods, namely subset selection, forward selection, lasso regression, ridge regression, and elastic net, were used to estimate coefficients. We selected the optimal method to predict new suspected COVID-19 case numbers from 20 January 2020 to 9 February 2020. We further validated the optimal method for new confirmed cases of COVID-19 from 31 December 2019 to 17 February 2020. The new suspected COVID-19 case numbers correlated significantly with the lagged series of SMSI. SMSI could be detected 6-9 days earlier than new suspected cases of COVID-19. The optimal method was the subset selection method, which had the lowest estimation error and a moderate number of predictors. The subset selection method also significantly correlated with the new confirmed COVID-19 cases after validation. SMSI findings on lag day 10 were significantly correlated with new confirmed COVID-19 cases. SMSI could be a significant predictor of the number of COVID-19 infections. SMSI could be an effective early predictor, which would enable governments' health departments to locate potential and high-risk outbreak areas.Entities:
Keywords: COVID-19; new case; outbreak; predictor; social media
Mesh:
Year: 2020 PMID: 32244425 PMCID: PMC7177617 DOI: 10.3390/ijerph17072365
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Correlation between new suspected Coronavirus disease 2019 (COVID-19) case number and lag value of five keywords in Baidu search index (BSI).
| Variables | Dry Cough | Fever | Chest Distress | Coronavirus | Pneumonia |
|---|---|---|---|---|---|
| Lag 1 Day | −0.1070 | 0.3586 | 0.6493 | −0.2094 | 0.1922 |
| (0.6445) | (0.1105) | (0.0014) | (0.3623) | (0.4039) | |
| Lag 2 Day | 0.1488 | 0.5650 | 0.7468 | 0.0626 | 0.4111 |
| (0.5198) | (0.0076) | (0.0001) | (0.7876) | (0.0641) | |
| Lag 3 Day | 0.4183 | 0.7856 | 0.8590 | 0.3828 | 0.6517 |
| (0.0591) | (<0.0001) | (<0.0001) | (0.0868) | (0.0014) | |
| Lag 4 Day | 0.5868 | 0.8596 | 0.9007 | 0.5847 | 0.7824 |
| (0.0052) | (<0.0001) | (<0.0001) | (0.0054) | (<0.0001) | |
| Lag 5 Day | 0.6920 | 0.9147 | 0.9175 | 0.7352 | 0.8813 |
| (0.0005) | (<0.0001) | (<0.0001) | (0.0001) | (<0.0001) | |
| Lag 6 Day | 0.7779 | 0.9124 | 0.8920 | 0.7831 | 0.9030 |
| (<0.0001) | (<0.0001) | (<0.0001) | (<0.0001) | (<0.0001) | |
| Lag 7 Day | 0.8288 | 0.8896 | 0.8396 | 0.8301 | 0.8886 |
| (<0.0001) | (<0.0001) | (<0.0001) | (<0.0001) | (<0.0001) | |
| Lag 8 Day | 0.8418 | 0.8361 | 0.7766 | 0.8795 | 0.8832 |
| (<0.0001) | (<0.0001) | (<0.0001) | (<0.0001) | (<0.0001 | |
| Lag 9 Day | 0.7758 | 0.7381 | 0.6935 | 0.8325 | 0.8130 |
| (<0.0001) | (0.0001) | (0.0005) | (<0.0001) | (<0.0001) | |
| Lag 10 Day | 0.7077 | 0.6647 | 0.6044 | 0.7732 | 0.7306 |
| (0.0003) | (0.0010) | (0.0037) | (<0.0001) | (0.0002) |
Table 1 reports the correlation between the current series of new confirmed case number and the lagged series of five Baidu indices (i.e., , where is the new confirmed cases number, where is the new confirmed case number, is the lag, and s is the days/time series of the Baidu Index).
Figure 1New suspected cases of COVID-19 and lag days of dry cough, fever, and chest distress.
Figure 2New suspected cases of COVID-19 and lag days of coronavirus and pneumonia.
Comparison of five methods for the estimation.
| Variables | RMSE | MAE | MAPE | Correlation | Correlation of Increment | Number of Predictor |
|---|---|---|---|---|---|---|
| Subset Selection | 51.6671 | 34.0739 | 0.0107 | 0.9996 | 0.9963 | 10 |
| Forward Selection | 70.0168 | 39.9790 | 0.0113 | 0.9993 | 0.9913 | 15 |
| Ridge Regression | 415.2922 | 279.6788 | 0.0827 | 0.9741 | 0.6937 | 51 |
| Lasso Regression | 519.7440 | 358.0979 | 0.1032 | 0.9597 | 0.4858 | 9 |
| Elastic Net(alpha = 0.2) | 527.4250 | 360.9563 | 0.1085 | 0.9585 | 0.4831 | 24 |
| Elastic Net(alpha = 0.4) | 516.1075 | 347.5939 | 0.1041 | 0.9602 | 0.5037 | 18 |
| Elastic Net(alpha = 0.6) | 514.7714 | 347.7290 | 0.1036 | 0.9604 | 0.4906 | 14 |
| Elastic Net(alpha = 0.8) | 510.1201 | 348.5859 | 0.1033 | 0.9611 | 0.5023 | 11 |
Figure 3The prediction by subset selection and the error term.
Correlation between new confirmed cases number and lag time series of five Baidu Indexes.
| Variables | Dry Cough | Fever | Chest Distress | Coronavirus | Pneumonia |
|---|---|---|---|---|---|
| Lag 1 Day | −0.2444 | −0.1588 | 0.0852 | −0.3125 | −0.2046 |
| (0.1930) | (0.4020) | (0.6544) | (0.0927) | (0.2781) | |
| Lag 2 Day | −0.1130 | −0.0186 | 0.1971 | −0.1861 | −0.0720 |
| (0.5523) | (0.9221) | (0.2964) | (0.3248) | (0.7055) | |
| Lag 3 Day | −0.0235 | 0.0479 | 0.2392 | −0.0968 | 0.0276 |
| (0.9017) | (0.8014) | (0.2030) | (0.6108) | (0.8849) | |
| Lag 4 Day | 0.0257 | 0.1169 | 0.2954 | 0.0144 | 0.1360 |
| (0.8929) | (0.5386) | (0.1130) | (0.9397) | (0.4737) | |
| Lag 5 Day | 0.1299 | 0.2169 | 0.3900 | 0.1134 | 0.2269 |
| (0.4938) | (0.2496) | (0.0331) | (0.5506) | (0.2279) | |
| Lag 6 Day | 0.1659 | 0.2663 | 0.3895 | 0.1863 | 0.2861 |
| (0.3809) | (0.1549) | (0.0334) | (0.3243) | (0.1253) | |
| Lag 7 Day | 0.2190 | 0.3271 | 0.4128 | 0.2442 | 0.3368 |
| (0.2449) | (0.0776) | (0.0234) | (0.1934) | (0.0688) | |
| Lag 8 Day | 0.2729 | 0.3757 | 0.4440 | 0.2891 | 0.3621 |
| (0.1446) | (0.0407) | (0.0140) | (0.1213) | (0.0493) | |
| Lag 9 Day | 0.3422 | 0.4381 | 0.4879 | 0.3461 | 0.4061 |
| (0.0641) | (0.0155) | (0.0062) | (0.0610) | (0.0260) | |
| Lag 10 Day | 0.3823 | 0.4666 | 0.4998 | 0.3843 | 0.4363 |
| (0.0371) | (0.0093) | (0.0049) | (0.0360) | (0.0159) |
Table 3 shows the correlation between the current series of new confirmed cases number and the lagged series of five Baidu Indexes (i.e., , where is the new confirmed cases number, is the lag, and s is the days/time series of the Baidu Index).
Correlation between new-confirmed cases number and lag time series of five non-specific COVID-19 features.
| Variables | Angina Pectoris | Difficulty Urinating | Impotence | Urinary Incontinence | Dizziness |
|---|---|---|---|---|---|
| Lag 1 Day | 0.3243 | 0.7197 | 0.7327 | 0.2646 | 0.8089 |
| (0.1515) | (0.2382) | (0.1137) | (0.2464) | (0.4781) | |
| Lag 2 Day | 0.1428 | 0.6323 | 0.6309 | 0.0359 | 0.8702 |
| (0.5368) | (0.7821) | (0.4522) | (0.8772) | (0.1603) | |
| Lag 3 Day | 0.0086 | 0.5699 | 0.6210 | −0.0479 | 0.9599 |
| (0.9705) | (0.6870) | (0.1927) | (0.8367) | (0.9775) | |
| Lag 4 Day | −0.2584 | 0.3913 | 0.5375 | −0.3196 | 0.9445 |
| (0.2581) | (0.0794) | (0.1120) | (0.1578) | (0.8720) | |
| Lag 5 Day | −0.4884 | 0.2344 | 0.3950 | −0.4854 | 0.9082 |
| (0.0747) | (0.3065) | (0.5764) | (0.4257) | (0.0861) | |
| Lag 6 Day | −0.5826 | 0.1215 | 0.3021 | −0.6054 | 0.8637 |
| (0.1156) | (0.5998) | (0.1833) | (0.1136) | (0.1561) | |
| Lag 7 Day | −0.6768 | −0.0797 | 0.2362 | −0.7190 | 0.8054 |
| (0.7438) | (0.7313) | (0.3026) | (0.9922) | (0.4460) | |
| Lag 8 Day | −0.7272 | −0.1196 | 0.1444 | −0.7358 | 0.7309 |
| (0.0965) | (0.6055) | (0.5322) | (0.3351) | (0.1172) | |
| Lag 9 Day | −0.6612 | −0.3142 | −0.0412 | −0.7723 | 0.6429 |
| (0.9211) | (0.1654) | (0.8594) | (0.9945) | (0.1779) | |
| Lag 10 Day | −0.6386 | −0.2417 | −0.0971 | −0.6962 | 0.5584 |
| (0.6418) | (0.2912) | (0.6754) | (0.6625) | (0.2485) |
Table A1 shows the correlation between the current series of new confirmed cases number and the lagged series of five Baidu Indexes (ie. , where is the new confirmed cases number, is the lag and s is the days/time series of the Baidu Index).
Figure 4New confirmed COVID-19 cases and lag days of dry cough, fever, and chest distress.
Figure 5New confirmed COVID-19 and lag days of coronavirus and pneumonia.