| Literature DB >> 35475256 |
Ligui Wang1, Mengxuan Lin2, Jiaojiao Wang3, Hui Chen1, Mingjuan Yang1, Shaofu Qiu1, Tao Zheng2, Zhenjun Li4, Hongbin Song1.
Abstract
Numerous studies have proposed search engine-based estimation of COVID-19 prevalence during the COVID-19 pandemic; however, their estimation models do not consider the impact of various urban socioeconomic indicators (USIs). This study quantitatively analysed the impact of various USIs on search engine-based estimation of COVID-19 prevalence using 15 USIs (including total population, gross regional product (GRP), and population density) from 369 cities in China. The results suggested that 13 USIs affected either the correlation (SC-corr) or time lag (SC-lag) between search engine query volume and new COVID-19 cases ( p <0.05). Total population and GRP impacted SC-corr considerably, with their correlation coefficients r for SC-corr being 0.65 and 0.59, respectively. Total population, GRP per capita, and proportion of the population with a high school diploma or higher had simultaneous positive impacts on SC-corr and SC-lag ( p <0.05); these three indicators explained 37-50% of the total variation in SC-corr and SC-lag. Estimations for different urban agglomerations revealed that the goodness of fit, R 2 , for search engine-based estimation was more than 0.6 only when total urban population, GRP per capita, and proportion of the population with a high school diploma or higher exceeded 11.08 million, 120,700, and 38.13%, respectively. A greater urban size indicated higher accuracy of search engine-based estimation of COVID-19 prevalence. Therefore, the accuracy and time lag for search engine-based estimation of infectious disease prevalence can be improved only when the total urban population, GRP per capita, and proportion of the population with a high school diploma or higher are greater than the aforementioned thresholds.Entities:
Keywords: Effectiveness evaluation; Search engine-based estimation; Urban socioeconomic indicators
Year: 2022 PMID: 35475256 PMCID: PMC9020494 DOI: 10.1016/j.idm.2022.04.003
Source DB: PubMed Journal: Infect Dis Model ISSN: 2468-0427
Fig. 1Baidu index for five keywords with maximum correlation coefficient and new COVID-19 cases in in 2020. Because the Baidu index and confirmed cases vary extensively in terms of unit and order of magnitude, the Baidu index and new confirmed cases were standardised to a scale ranging from 0 to 100 for an intuitive comparison.
Fig. 2Correlation and time lag between the Baidu index and new cases in cities heavily affected by COVID-19 in China. A) SC-corr distribution. B) SC-lag distribution.
Fig. 3Impact of USIs on correlation (SC-corr) and time lag (SC-lag). A) Impact of USIs on SC-corr. Solid red line indicates linear fitting results, and dashed red line indicates the dividing line between high correlation and medium correlation. B) Impact of USIs on SC-lag. Solid blue line indicates linear fitting results, and dashed blue line indicates the dividing line between high hysteresis and medium hysteresis. C) Impact of USIs on SC-corr and SC-lag. Yellow surface indicates fitting results and grey area indicates the 95% confidence interval belt.
Fitted model for USIs and SC-corr as well as SC-lag.
| Indicators | Total population (million) | Population density (relative value) | GRP (billion) | Proportion of primary sector in GRP (%) | Proportion of secondary sector in GRP (%) | Proportion of tertiary sector in GRP (%) | GRP per capita (thousand) | Public budget revenue (relative value) | Public budget expenditure (relative value) | Education expenditure (relative value) | Science and technology expenditure (relative value) | Rate of natural increase (%) | Proportion of population with a high school diploma or higher (%) | Urbanisation rate (%) | Proportion of population aged 0–39 years (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Strong correlation (critical value: | |||||||||||||||
| Threshold | 14.01 | 3.271 | 1593 | NA | NA | 62.56 | 143.5 | 7.28 | 7.48 | 6.63 | 6.13 | NA | 40.87 | 82.70 | NA |
| Threshold city | Guangzhou (14.90) | Guangzhou (3.302) | Tianjin (1881) | NA | NA | Hangzhou (63.90) | Guangzhou (155.4) | Chongqing (7.36) | Shenzhen (7.63) | Guangzhou (6.64) | Guangzhou (6.21) | NA | Shenzhen (41.55) | Tianjin (83.15) | NA |
| 0.65 | 0.41 | 0.59 | −0.27 | −0.36 | 0.49 | 0.39 | 0.50 | 0.47 | 0.50 | 0.47 | 0.09 | 0.45 | 0.45 | 0.29 | |
| Fitting | 0.4195 | 0.1663 | 0.3484 | 0.1435 | 0.1070 | 0.2209 | 0.1490 | 0.2279 | 0.2028 | 0.2320 | 0.2049 | 0.0082 | 0.1997 | 0.2018 | 0.0835 |
| Fitted | <0.0001 | <0.05 | <0.0001 | Not significant | Not significant | <0.05 | <0.05 | <0.05 | <0.05 | <0.05 | <0.05 | Not significant | <0.05 | <0.05 | Not significant |
| Strong hysteresis (critical value: | |||||||||||||||
| Threshold | NA | 3.239 | 1984 | NA | NA | NA | 127.7 | NA | NA | NA | NA | NA | 37.71 | 79.9 | 64.74 |
| Threshold city | NA | Guangzhou (3.302) | Chongqing (2036) | NA | NA | NA | Ningbo (132.6) | NA | NA | NA | NA | NA | Tianjin (38.13) | Wuhan (80.04) | Guangzhou (65.01) |
| 0.17 | 0.33 | 0.32 | −0.21 | −0.08 | 0.17 | 0.39 | 0.07 | 0.03 | 0.02 | 0.10 | 0.30 | 0.45 | 0.40 | 0.32 | |
| Fitting | 0.0277 | 0.1103 | 0.1020 | 0.0229 | 0.0178 | 0.0073 | 0.1556 | 0.0097 | 0.0042 | 0.0007 | 0.0121 | 0.0873 | 0.2018 | 0.1563 | 0.1044 |
| Fitted | Not significant | <0.05 | <0.05 | Not significant | Not significant | Not significant | <0.05 | Not significant | Not significant | Not significant | Not significant | Not significant | <0.05 | <0.05 | <0.05 |
| Strong correlation and strong hysteresis | |||||||||||||||
| Threshold | Guangzhou (14.90) | NA | NA | NA | NA | NA | Guangzhou (155.4) | NA | NA | NA | NA | NA | Xi'an (42.67) | NA | NA |
| Fitting | 0.50 | 0.17 | 0.26 | 0.09 | 0.08 | 0.21 | 0.45 | 0.22 | 0.20 | 0.23 | 0.19 | 0.03 | 0.37 | 0.27 | 0.09 |
| Fitted | <0.05 | Not significant | Not significant | Not significant | Not significant | Not significant | <0.05 | Not significant | Not significant | Not significant | Not significant | Not significant | <0.05 | Not significant | Not significant |
Fig. 4Effects of search engine data-based actual estimation of COVID-19 prevalence in cities of different levels. A) Proportions of USIs from cities of different levels. The respective medians of 15 USIs from Levels Ⅰ–Ⅴ cities were selected to avoid the impact of abnormal values and ensure objective comparisons. B) Estimation results for Level Ⅴ cities. C) Estimation results for Level Ⅳ cities. D) Estimation results for Level Ⅲ cities. E) Estimation results for Level Ⅱ cities. F) Estimation results for Level Ⅰ cities.
Size range of urban agglomerations of different levels and model performance during the actual estimation.
| Model performance | Level Ⅰ cities | Level Ⅱ cities | Level Ⅲ cities | Level Ⅳ cities | Level Ⅴ cities |
|---|---|---|---|---|---|
| Classification criteria | 1.9 < | 2.8 < | 3.7 < | ||
| Total population (million) | 0.97–10.01 | 1.07–9.52 | 8.15–30.75 | 11.08–15.57 | 14.90–24.18 |
| GRP per capita (thousand) | 21.6–79.5 | 59.2–132.6 | 65.9–140.2 | 120.7–135.1 | 135.0–190.0 |
| Proportion of population with a high school diploma or higher (%) | 9.90–27.98 | 19.74–34.67 | 21.70–42.67 | 38.13–46.97 | 41.55–52.72 |
| Lasso regression | 0.28 | 0.36 | 0.27 | 0.16 | 0.1 |
| <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | |
| Adjusted | 0.460 | 0.493 | 0.508 | 0.601 | 0.711 |
| RMSE | 0.575 | 0.527 | 0.346 | 0.290 | 0.305 |