| Literature DB >> 33077215 |
Suman Chakraborti1, Arabinda Maiti2, Suvamoy Pramanik3, Srikanta Sannigrahi4, Francesco Pilla5, Anushna Banerjee6, Dipendra Nath Das6.
Abstract
Coronavirus disease, a novel severe acute respiratory syndrome (SARS COVID-19), has become a global health concern due to its unpredictable nature and lack of adequate medicines. Machine Learning (ML) models could be effective in identifying the most critical factors which are responsible for the overall fatalities caused by COVID-19. The functional capabilities of ML models in epidemiological research, especially for COVID-19, are not substantially explored. To bridge this gap, this study has adopted two advanced ML models, viz. Random Forest (RF) and Gradient Boosted Machine (GBM), to perform the regression modelling and provide subsequent interpretation. Five successive steps were followed to carry out the analysis: (1) identification of relevant key explanatory variables; (2) application of data dimensionality reduction for eliminating redundant information; (3) utilizing ML models for measuring relative influence (RI) of the explanatory variables; (4) evaluating interconnections between and among the key explanatory variables and COVID-19 case and death counts; (5) time series analysis for examining the rate of incidences of COVID-19 cases and deaths. Among the explanatory variables considered in this study, air pollution, migration, economy, and demographic factor were found to be the most significant controlling factors. Since a very limited research is available to discuss the superiority of ML models for identifying the key determinants of COVID-19, this study could be a reference for future public health research. Additionally, all the models and data used in this study are open source and freely available, thereby, reproducibility and scientific replication will be achievable easily.Entities:
Keywords: Air pollution; COVID-19; Machine learning; Pandemic; Relative importance; Socioeconomic
Year: 2020 PMID: 33077215 PMCID: PMC7537593 DOI: 10.1016/j.scitotenv.2020.142723
Source DB: PubMed Journal: Sci Total Environ ISSN: 0048-9697 Impact factor: 7.963
Fig. 1Spatial distribution of global COVID-19 case and death (per 100,000 persons) scenario. Also, continent specific daily progression of COVID-19 cases and deaths are showing in the bottom left corner.
Continent-wise ordinary least square model estimates for COVID cases.
| Source | Value | Standard error | t | Pr > |t| | Lower bound (95%) | Upper bound (95%) |
|---|---|---|---|---|---|---|
| Africa | ||||||
| Intercept | −975.375 | 1171.583 | −0.833 | 0.410 | −3336.546 | 1385.795 |
| TotalCO2 | 0.138 | 0.008 | 16.807 | <0.0001 | 0.122 | 0.155 |
| CO2PerCap | −1726.210 | 464.354 | −3.717 | 0.001 | −2662.054 | −790.365 |
| PM2.5 | 54.080 | 24.737 | 2.186 | 0.034 | 4.226 | 103.934 |
| GDP | 0.621 | 0.234 | 2.650 | 0.011 | 0.149 | 1.094 |
| R2 | 0.915 | |||||
| America | ||||||
| Intercept | −9799.119 | 13,987.621 | −0.701 | 0.489 | −38,407.016 | 18,808.779 |
| TotalCO2 | 0.183 | 0.034 | 5.425 | <0.0001 | 0.114 | 0.252 |
| NetMig | 0.036 | 0.019 | 1.936 | 0.063 | −0.002 | 0.074 |
| Refuge pop | −0.954 | 0.499 | −1.911 | 0.066 | −1.975 | 0.067 |
| TotPop | 0.003 | 0.000 | 7.162 | <0.0001 | 0.002 | 0.003 |
| R2 | 0.968 | |||||
| Asia | ||||||
| Intercept | −252.500 | 17,538.584 | −0.014 | 0.989 | −36,837.345 | 36,332.346 |
| N2O | −0.010 | 0.073 | −0.131 | 0.897 | −0.162 | 0.143 |
| PM2.5 | 497.056 | 313.876 | 1.584 | 0.129 | −157.678 | 1151.789 |
| GDP.gr | −2795.361 | 2849.423 | −0.981 | 0.338 | −8739.153 | 3148.432 |
| NetMig | −0.067 | 0.014 | −4.732 | 0.000 | −0.096 | −0.037 |
| R2 | 0.744 | |||||
| Europe | ||||||
| Intercept | −308,530.573 | 123,353.089 | −2.501 | 0.016 | −557,132.388 | −59,928.758 |
| LEB | 3862.814 | 1578.644 | 2.447 | 0.018 | 681.266 | 7044.362 |
| TotPop | 0.003 | 0.000 | 11.664 | <0.0001 | 0.002 | 0.003 |
| WS | −1587.322 | 2675.880 | −0.593 | 0.556 | −6980.203 | 3805.560 |
| R2 | 0.785 | |||||
Continent-wise ordinary least square model estimates for COVID death.
| Source | Value | Standard error | t | Pr > |t| | Lower bound (95%) | Upper bound (95%) |
|---|---|---|---|---|---|---|
| Africa | ||||||
| Intercept | −68.055 | 100.040 | −0.680 | 0.500 | −270.090 | 133.979 |
| TotalCO2 | 0.003 | 0.000 | 10.417 | <0.0001 | 0.002 | 0.003 |
| Diabetes | 10.583 | 4.577 | 2.312 | 0.026 | 1.338 | 19.827 |
| GDP.gr | 2.737 | 7.317 | 0.374 | 0.710 | −12.040 | 17.514 |
| TropPress | 0.297 | 0.881 | 0.337 | 0.738 | −1.483 | 2.077 |
| R2 | 0.786 | |||||
| America | ||||||
| Intercept | −33,669.459 | 12,314.726 | −2.734 | 0.010 | −58,819.484 | −8519.433 |
| TotGHG | 0.007 | 0.001 | 4.512 | <0.0001 | 0.004 | 0.010 |
| TotalCO2 | 0.010 | 0.001 | 8.716 | <0.0001 | 0.008 | 0.012 |
| TotPop | 0.000 | 0.000 | 3.631 | 0.001 | 0.000 | 0.000 |
| Tmin | 111.479 | 41.986 | 2.655 | 0.013 | 25.733 | 197.225 |
| R2 | 0.992 | |||||
| Asia | ||||||
| Intercept | 1745.681 | 4577.871 | 0.381 | 0.705 | −7547.891 | 11,039.253 |
| TotPop | 0.000 | 0.000 | 3.690 | 0.001 | 0.000 | 0.000 |
| NetMig | 0.001 | 0.000 | 1.951 | 0.059 | 0.000 | 0.002 |
| N2O | −0.012 | 0.007 | −1.663 | 0.105 | −0.027 | 0.003 |
| LEB | −16.218 | 61.790 | −0.262 | 0.794 | −141.659 | 109.223 |
| R2 | 0.471 | |||||
| Europe | ||||||
| Intercept | 110,378.223 | 26,759.609 | 4.125 | 0.000 | 55,265.777 | 165,490.668 |
| AirTran | 0.014 | 0.005 | 2.883 | 0.008 | 0.004 | 0.023 |
| Diabetes | −2557.616 | 514.081 | −4.975 | <0.0001 | −3616.385 | −1498.846 |
| TropPress | −333.409 | 87.589 | −3.807 | 0.001 | −513.801 | −153.017 |
| Preci | −2237.303 | 856.040 | −2.614 | 0.015 | −4000.352 | −474.255 |
| TotPop | 0.000 | 0.000 | 3.612 | 0.001 | 0.000 | 0.001 |
| AgeGroup | −232.852 | 189.825 | −1.227 | 0.231 | −623.803 | 158.100 |
| N2O | −0.566 | 0.223 | −2.539 | 0.018 | −1.026 | −0.107 |
| R2 | 0.892 | |||||
Fig. 2Alluvial plot shows the strength of interconnection between the explanatory variables predicted COVID cases derived from random forest algorithm. Relative influence (RI) values of each variable are shown in the right side of each plot.
Fig. 3Alluvial plot shows the strength of interconnection between the explanatory variables predicted COVID deaths derived from random forest algorithm. Relative influence (RI) values of each variable are shown in the right side of each plot.
Fig. 4The predictive power of the explanatory variables computed for COVID cases derived from the random forest algorithm.
Fig. 5The predictive power of the explanatory variables computed for COVID death derived from the random forest algorithm.
Fig. 6An overall comprehensive global pandemic preparedness path to highlight the strategies that need to be given importance.