| Literature DB >> 35065932 |
Yifei Han1, Jinliang Huang2, Rendong Li2, Qihui Shao1, Dongfeng Han1, Xiyue Luo3, Juan Qiu4.
Abstract
As a highly contagious disease, COVID-19 caused a worldwide pandemic and it is still ongoing. However, the infection in China has been successfully controlled although its initial transmission was also nationwide and has caused a serious public health crisis. The analysis on the early-stage COVID-19 transmission in China is worth investigating for its guiding significance on prevention to other countries and regions. In this study, we conducted the experiments from the perspectives of COVID-19 occurrence and intensity. We eliminated unimportant factors from 113 variables and applied four machine learning-based classification and regression models to predict COVID-19 occurrence and intensity, respectively. The influence of each important factor was analysed when applicable. Our optimal model on COVID-19 occurrence prediction presented an accuracy of 91.91% and the best R2 of intensity prediction reached 0.778. Linear regression-based model was identified as unable to fit and predict the intensity, and thus only the variable influence on COVID-19 occurrence can be explained. We found that (1) CO VID-19 was more likely to occur in prosperous cities closer to the epicentre and located on higher altitudes, (2) and the occurrence was higher under extreme weather and high minimum relative humidity. (3) Most air pollutants increased the risk of COVID-19 occurrence except NO2 and O3, and there existed a lag effect of 6-7 days. (4) NPIs (non-pharmaceutical interventions) did not show apparent effect until two weeks after.Entities:
Keywords: Air pollutants; COVID-19; Machine learning; Meteorology; Non-pharmaceutical interventions; Social data
Mesh:
Substances:
Year: 2022 PMID: 35065932 PMCID: PMC8776626 DOI: 10.1016/j.envres.2022.112761
Source DB: PubMed Journal: Environ Res ISSN: 0013-9351 Impact factor: 8.431
Fig. 1Overall workflow on COVID-19 occurrence classification.
Fig. 2Overall workflow on COVID-19 intensity estimation.
The last 50 COV_O influence factors calculated by GBDT and RF.
| GBDT | RF | ||||||
|---|---|---|---|---|---|---|---|
| Factor | Influence | Factor | Influence | Factor | Influence | Factor | Influence |
| Reslevel17 | <0.0001 | AQI_2 | <0.0001 | Popmob1 | 0.0011 | SO2 | 0.0026 |
| Reslevel6 | <0.0001 | SO2_6 | <0.0001 | Popmob0 | 0.0013 | PM10_8 | 0.0026 |
| Reslevel14 | <0.0001 | CO_1 | 0.0001 | Popmob2 | 0.0015 | AQI_2 | 0.0026 |
| Reslevel7 | <0.0001 | Popmob3 | 0.0001 | Popmob3 | 0.0015 | PM25 | 0.0026 |
| Reslevel8 | <0.0001 | PM10_2 | 0.0001 | CO_1 | 0.0020 | Popmob4 | 0.0026 |
| Reslevel11 | <0.0001 | AQI_3 | 0.0001 | Popmob5 | 0.0020 | AQI_6 | 0.0027 |
| Reslevel9 | <0.0001 | PM25_4 | 0.0001 | Popmob7 | 0.0020 | AQI_3 | 0.0027 |
| AQI_4 | <0.0001 | Reslevel20 | 0.0001 | CO_3 | 0.0021 | SO2_1 | 0.0027 |
| AQI_9 | <0.0001 | CO_4 | 0.0001 | CO_9 | 0.0021 | AQI_9 | 0.0027 |
| Popmob6 | <0.0001 | PM25_6 | 0.0001 | CO | 0.0021 | PM10_7 | 0.0027 |
| Popmob5 | <0.0001 | AQI_1 | 0.0001 | SO2_3 | 0.0022 | Popmob9 | 0.0027 |
| Popmob2 | <0.0001 | PM10 | 0.0001 | CO_6 | 0.0022 | PM10_2 | 0.0027 |
| Reslevel3 | <0.0001 | AQI | 0.0001 | CO_2 | 0.0023 | PM10_3 | 0.0027 |
| Reslevel13 | <0.0001 | PM10_1 | 0.0001 | CO_5 | 0.0023 | AQI_7 | 0.0027 |
| Reslevel5 | <0.0001 | PM10_5 | 0.0001 | SO2_5 | 0.0023 | PM10_6 | 0.0027 |
| Reslevel12 | <0.0001 | SO2 | 0.0001 | SO2_6 | 0.0024 | SO2_9 | 0.0028 |
| Reslevel16 | <0.0001 | CO_9 | 0.0001 | SO2_2 | 0.0024 | AQI_4 | 0.0028 |
| Reslevel10 | <0.0001 | NO2_9 | 0.0001 | SO2_7 | 0.0024 | AQI_1 | 0.0028 |
| CO_8 | <0.0001 | O3_8 | 0.0001 | CO_8 | 0.0024 | PM10_9 | 0.0028 |
| Reslevel4 | <0.0001 | NO2_1 | 0.0001 | SO2_4 | 0.0024 | PM10_1 | 0.0028 |
| Rh | <0.0001 | PM25_8 | 0.0001 | Popmob6 | 0.0025 | Popmob8 | 0.0028 |
| PM10_8 | <0.0001 | Popmob1 | 0.0001 | CO_4 | 0.0025 | PM10_5 | 0.0028 |
| AQI_6 | <0.0001 | AQI_5 | 0.0001 | AQI_8 | 0.0025 | PM25_6 | 0.0029 |
| AQI_8 | <0.0001 | PM25_9 | 0.0001 | CO_7 | 0.0026 | AQI_5 | 0.0029 |
| Popmob8 | <0.0001 | CO_3 | 0.0001 | SO2_8 | 0.0026 | PM10_4 | 0.0029 |
26 cross-selected COV_O influence factors to be deleted.
| AQI_1 | AQI_8 | CO_9 | Popmob1 | SO2 |
| AQI_2 | AQI_9 | PM10_1 | Popmob2 | SO2_6 |
| AQI_3 | CO_1 | PM10_2 | Popmob3 | |
| AQI_4 | CO_3 | PM10_5 | Popmob5 | |
| AQI_5 | CO_4 | PM10_8 | Popmob6 | |
| AQI_6 | CO_8 | PM25_6 | Popmob8 |
The last 50 COV_I influence factors calculated by GBDT and RF.
| GBDT | RF | ||||||
|---|---|---|---|---|---|---|---|
| Factor | Influence | Factor | Influence | Factor | Influence | Factor | Influence |
| Reslevel6 | <0.0001 | SO2 | 0.0002 | Reslevel12 | 0.0000 | CO_3 | 0.0008 |
| PM25_9 | <0.0001 | Reslevel7 | 0.0002 | Reslevel14 | 0.0000 | SO2_5 | 0.0008 |
| Reslevel20 | <0.0001 | Popmob5 | 0.0002 | Reslevel10 | 0.0001 | SO2_4 | 0.0008 |
| Reslevel5 | <0.0001 | NO2_2 | 0.0002 | Reslevel13 | 0.0001 | SO2_3 | 0.0008 |
| Reslevel13 | <0.0001 | SO2_6 | 0.0002 | Reslevel11 | 0.0001 | CO_6 | 0.0009 |
| SO2_2 | <0.0001 | Popmob7 | 0.0002 | Reslevel9 | 0.0001 | PM25_6 | 0.0009 |
| CO_2 | <0.0001 | PM10_1 | 0.0002 | Reslevel6 | 0.0001 | PM25_1 | 0.0009 |
| CO_5 | <0.0001 | Reslevel11 | 0.0002 | Reslevel15 | 0.0001 | AQI_1 | 0.0009 |
| Reslevel12 | <0.0001 | PM25_6 | 0.0003 | Reslevel8 | 0.0002 | PM25 | 0.0009 |
| CO | <0.0001 | AQI_8 | 0.0003 | Reslevel17 | 0.0002 | SO2_9 | 0.0009 |
| CO_8 | <0.0001 | PM10_9 | 0.0003 | Reslevel7 | 0.0002 | Reslevel20 | 0.0010 |
| Reslevel1 | <0.0001 | PM10 | 0.0003 | Reslevel5 | 0.0002 | CO_9 | 0.0010 |
| SO2_8 | <0.0001 | SO2_4 | 0.0003 | Reslevel18 | 0.0002 | PM25_7 | 0.0010 |
| CO_1 | 0.0001 | AQI_2 | 0.0003 | Popmob8 | 0.0003 | AQI_3 | 0.0010 |
| AQI_3 | 0.0001 | NO2_8 | 0.0003 | Popmob9 | 0.0003 | PM25_2 | 0.0010 |
| NO2_4 | 0.0001 | PM25_8 | 0.0003 | Reslevel16 | 0.0003 | SO2_8 | 0.0010 |
| Reslevel15 | 0.0001 | PM25_2 | 0.0003 | Popmob7 | 0.0004 | SO2_2 | 0.0010 |
| Reslevel10 | 0.0001 | Reslevel18 | 0.0003 | Popmob6 | 0.0004 | NO2_9 | 0.0011 |
| AQI_1 | 0.0001 | NO2_3 | 0.0003 | Popmob5 | 0.0005 | PM25_5 | 0.0011 |
| SO2_3 | 0.0001 | O3_6 | 0.0003 | CO_4 | 0.0006 | PM25_3 | 0.0011 |
| Popmob6 | 0.0002 | SO2_5 | 0.0003 | CO_5 | 0.0006 | AQI_4 | 0.0012 |
| Reslevel0 | 0.0002 | NO2_1 | 0.0003 | SO2_6 | 0.0007 | Reslevel1 | 0.0012 |
| CO_7 | 0.0002 | CO_9 | 0.0004 | AQI_2 | 0.0007 | PM25_4 | 0.0012 |
| CO_6 | 0.0002 | PM25_1 | 0.0004 | SO2_7 | 0.0007 | AQI_6 | 0.0012 |
| NO2_9 | 0.0002 | MaxT | 0.0004 | AQI_5 | 0.0008 | PM10_2 | 0.0013 |
30 cross-selected COV_I influence factors to be deleted.
| AQI_1 | NO2_9 | Popmob7 | Reslevel15 | SO2_2 |
| AQI_2 | PM25_1 | Reslevel1 | Reslevel18 | SO2_3 |
| AQI_3 | PM25_2 | Reslevel10 | Reslevel20 | SO2_4 |
| CO_5 | PM25_6 | Reslevel11 | Reslevel5 | SO2_5 |
| CO_6 | Popmob5 | Reslevel12 | Reslevel6 | SO2_6 |
| CO_9 | Popmob6 | Reslevel13 | Reslevel7 | SO2_8 |
Fig. 3Confusion matrixes of COV_O classification results by Ridge classifier (a), Gradient Boosting Decision Tree (b), Random Forest (c), and 3-layer Artificial Neural network.
Assessment indexes of COV_O classification results by four models.
| Ridge | GBDT | RF | 3-layer ANN | |
|---|---|---|---|---|
| Accuracy | 88.02% | 91.91% | 91.54% | 88.40% |
| Precision | 89.28% | 95.01% | 94.60% | 88.95% |
| Recall | 87.96% | 89.44% | 89.12% | 89.18% |
38 COVID-19 occurrence factors that have high coefficients.
| Factor | Coefficient | Factor | Coefficient |
|---|---|---|---|
| AQI_7 | 0.5675 | PM25_1 | 0.8735 |
| CO_2 | 0.1655 | PM25_2 | 0.1621 |
| CO_5 | −0.3688 | PM25_3 | 0.7964 |
| DEM | −0.1584 | PM25_5 | 0.9568 |
| DisWH | −0.4398 | PM25_7 | −0.7619 |
| GDP | 1.1081 | PD | 0.3344 |
| NO2_1 | −0.8857 | Popmob4 | 0.6645 |
| NO2_4 | −0.1488 | Popmobsum | 0.5297 |
| NO2_5 | −0.1183 | Reslevel1 | 0.7163 |
| NO2_6 | −0.2154 | Reslevel2 | 0.1711 |
| NO2_8 | 0.1998 | Reslevel15 | −0.1343 |
| O3_1 | −0.1249 | Reslevel19 | −0.2020 |
| O3_6 | −0.1668 | Reslevel20 | −0.4461 |
| O3_7 | −0.1183 | MinRh | 0.1623 |
| O3_8 | −0.1001 | SO2_5 | 0.5294 |
| PM10_4 | 0.3215 | SO2_7 | −0.1303 |
| PM10_6 | 0.2028 | MaxT | 0.3439 |
| PM10_7 | −1.0103 | MinT | −0.1128 |
| PM10_9 | −0.2273 | TIME | −1.1932 |
Fig. 4Estimated values and residuals of COV_I by Elastic Net (a), Gradient Boosting Decision Tree (b), Random Forest (c), and 2-layer Artificial Neural Network (d).
Assessment metrics of COV_I regression model on training and test set.
| EN | GBDT | RF | 2-layer ANN | |||||
|---|---|---|---|---|---|---|---|---|
| Training set | Test set | Training set | Test set | Training set | Test set | Training set | Test set | |
| MSE | 1.182 | 1.126 | 0.332 | 0.415 | 0.295 | 0.431 | 0.307 | 0.454 |
| R2 | 0.340 | 0.397 | 0.815 | 0.778 | 0.835 | 0.770 | 0.829 | 0.757 |
| Data type | Variable | Abbreviation | Units | Scale | Data source |
|---|---|---|---|---|---|
| COVID-19 data | Case incidence ratios (cases per 10,000,000 persons) | CIR | 0.00001% | Daily | National Health Commission of the People's Republic of China ( |
| Meteorological data | Minimum temperature | MinT | 0.1 °C | Daily | http://data.cma.cn |
| Maximum temperature | MaxT | 0.1 °C | Daily | ||
| Mean temperature | MeanT | 0.1 °C | Daily | ||
| Relative humidity | Rh | % | Daily | ||
| Minimum relative humidity | MinRh | % | Daily | ||
| Atmospheric environmental quality data | Carbon monoxide, CO | CO, CO_1, …, CO_9 | mg/m3 | Daily | |
| Nitrogen dioxide, NO2 | NO2, NO2_1, …, NO2_9 | μg/m3 | Daily | ||
| Ozone, O3 | O3, O3_1, …, O3_9 | μg/m3 | Daily | ||
| Fine particles, PM2.5 | PM25, PM25_1, …, PM25_9 | μg/m3 | Daily | ||
| Inhalable coarse particles, PM10 | PM10, PM10_1, …, PM10_9 | μg/m3 | Daily | ||
| Sulfur dioxide, SO2 | SO2, SO2_1, …, SO2_9 | μg/m3 | Daily | ||
| Air Quality Index | AQI, AQI_1, …, AQI_9 | Daily | |||
| Geographical data | Mean DEM | MeanDEM | m | Daily | http://www.gscloud.cn |
| Socio-economic data | Household population | Pop | 10,000 Person | 2019, Fixed value | http://tjj.shandong.gov.cn/tjnj/nj2020/zk/indexch.htm, etc. |
| Population density | PD | Person/km2 | 2019, Fixed value | http://www.mohurd.gov.cn/xytj/index.html | |
| GDP per capita | GDP | 100 million RMB | Yearly | http://tjj.shandong.gov.cn/tjnj/nj2020/zk/indexch.htm, etc. | |
| Destination migration scale flow from Wuhan = destination proportion in population flow from Wuhan * migration scale | Popmob, Popmob1, …, Popmob9, Popmobsum | Daily | |||
| The distance of each city from Wuhan | DisWH | km | Fixed value | Distance measurement based on GIS | |
| National emergency response | Reslevel, Reslevel1, …, Reslevel20 | Daily | National Health Commission of the People's Republic of China ( | ||
| Temporal data | Days from Jan 10, 2020 | Time | Daily | Corresponding to each record |