| Literature DB >> 36070304 |
Giovanni Scabbia1, Antonio Sanfilippo1, Annamaria Mazzoni1, Dunia Bachour1, Daniel Perez-Astudillo1, Veronica Bermudez1, Etienne Wey2, Mathilde Marchand-Lasserre2, Laurent Saboret2.
Abstract
A growing number of studies suggest that climate may impact the spread of COVID-19. This hypothesis is supported by data from similar viral contagions, such as SARS and the 1918 Flu Pandemic, and corroborated by US influenza data. However, the extent to which climate may affect COVID-19 transmission rates and help modeling COVID-19 risk is still not well understood. This study demonstrates that such an understanding is attainable through the development of regression models that verify how climate contributes to modeling COVID-19 transmission, and the use of feature importance techniques that assess the relative weight of meteorological variables compared to epidemiological, socioeconomic, environmental, and global health factors. The ensuing results show that meteorological factors play a key role in regression models of COVID-19 risk, with ultraviolet radiation (UV) as the main driver. These results are corroborated by statistical correlation analyses and a panel data fixed-effect model confirming that UV radiation coefficients are significantly negatively correlated with COVID-19 transmission rates.Entities:
Mesh:
Year: 2022 PMID: 36070304 PMCID: PMC9451080 DOI: 10.1371/journal.pone.0273078
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Fig 1Accuracy of the based model (left, blue) compared to the optimized model (right, orange) on the test set. We consider only locations with more than 90 data records and records with more than 10 cases per day.
Xgboost hyperparameter tuning result.
| Hyperparameter | Tuning range | Best value |
|---|---|---|
| Learning rate (eta) | 0.001–0.3 | 0.1 |
| Maximum depth | 3–10 | 10 |
| Minimum sum of instance weight (hessian) | 1–10 | 7–8 |
| Gamma | 0–0.4 | 0.2 |
| Subsample ratio of the training instances | 0.5–1 | 1 |
| Subsample ratio of columns when constructing each tree | 0.3–1 | 1 |
| Lambda | 1 | 1 |
| Alpha | 0 | 0 |
| Number of boosting rounds (validated) | ~30 | |
| Learning objective function (regression) | Squared error | |
| Custom evaluation metric (for training and validation) | SMAPE | |
| Early stopping rounds | 10 |
*Irrelevant to the model performance.
Regression modeling performance comparison.
| Model | Mean SMAPE | ||
|---|---|---|---|
| Train | Validation | Test | |
| GBRT (with hyperparameter optimization) | 28.3% | 30.6% |
|
| GBRT (no optimization) | 33.5% | - |
|
| Lasso | 45.7% | 41.9% |
|
| Elastic Net | 60.2% | 60.5% |
|
| Random Forest | 11.9% | 31.2% |
|
| 1-week moving average | - | - |
|
| Persistence (previous day) | - | - |
|
*With hyperparameter optimization
Fig 2Boxplot of the SMAPE distribution as a function of intervals of number of COVID-19 daily cases.
Fig 3Examples of modeling performance for the optimized GBRT model.
Fig 4Spearman’s coefficient.
Fig 5Kendall’s coefficient.
Fig 6Feature importance summary plot.
Mean absolute SHAP value (in log scale) of each variable showing the average impact on the model output magnitude.
Fig 7Feature impact scatter plot.
SHAP value of each variable for all the single observations as a function of their relative value. The color transition on the vertical axis indicates value strength (red/high to blue/low).
Results of the Fisher-type unit-root test analysis based on augmented Dickey-Fuller tests at 0 and 1 lag for each variable considered in the econometric study.
| I(0) 0 Lags | I(1) 1 Lag | |||||||
|---|---|---|---|---|---|---|---|---|
| Inverse chi-squared | Inverse normal | Inverse logit t | Modified inv. chi-squared | Inverse chi-squared | Inverse normal | Inverse logit t | Modified inv. chi-squared | |
| P | Z | L* | Pm | P | Z | L* | Pm | |
| Daily cases (log) | 9101.3 | -74.1 | -106.8 | 171.2 | 5631.1 | -53.5 | -67.5 | 101.6 |
| Temperature_7 | 1868.7 | -14.4 | -14.2 | 16.5 | 3157.1 | -33.9 | -35.7 | 47.0 |
| Absolute Humidity_7 | 1912.3 | -15.2 | -14.8 | 17.4 | 3546.0 | -36.7 | -40.4 | 55.6 |
| Pressure_7 | 3112.6 | -29.9 | -31.9 | 43.1 | 8835.9 | -69.5 | -106.1 | 172.4 |
| Wind speed_7 | 4163.5 | -40.5 | -46.5 | 65.6 | 7312.3 | -62.4 | -87.8 | 138.8 |
| Rainfall_7 | 4974.1 | -43.9 | -54.1 | 82.9 | 6797.6 | -57.4 | -80.1 | 127.4 |
| Short-wave irradiation_7 | 2599.7 | -26.5 | -26.9 | 32.2 | 3156.0 | -34.0 | -36.0 | 47.0 |
| PM2P5_7 | 3763.2 | -36.7 | -40.9 | 57.1 | 8004.9 | -65.7 | -96.0 | 154.0 |
| PM10_7 | 3765.8 | -36.2 | -40.9 | 57.1 | 7778.1 | -63.4 | -92.9 | 149.0 |
| UV_7 | 2516.5 | -25.4 | -25.7 | 30.4 | 3184.9 | -33.6 | -36.2 | 47.7 |
| High_stringency_7 | 3333.0 | -40.3 | -50.0 | 73.7 | 2805.7 | -36.5 | -43.6 | 62.9 |
| High_containment_7 | 4123.7 | -45.5 | -59.62 | 90.4 | 3302.7 | -39.9 | -50.2 | 74.4 |
For each test, we report inverse chi-squared, inverse normal, inverse logit t, and modified
Panel data fixed-effects model.
| Dependent variable: | (1) | (2) | (3) | (4) | (5) |
|---|---|---|---|---|---|
| Daily cases (log) | T = 5 | T = 7 | T = 10 | T = 12 | T = 14 |
| Days_from_start | 0.006 | 0.006 | 0.006 | 0.006 | 0.006 |
| (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | |
| Temperature_T | 0.022 | 0.025 | 0.028 | 0.030 | 0.032 |
| (0.01) | (0.01) | (0.01) | (0.01) | (0.01) | |
| Absolute Humidity_T | -0.011 | -0.012 | -0.013 | -0.015 | -0.016 |
| (0.02) | (0.02) | (0.02) | (0.02) | (0.02) | |
| Pressure_T | -0.004 | -0.004 | -0.005 | -0.005 | -0.005 |
| (0.00) | (0.00) | (0.01) | (0.01) | (0.01) | |
| Windspeed_T | -0.031 | -0.036 | -0.040 | -0.043 | -0.046 |
| (0.02) | (0.02) | (0.03) | (0.03) | (0.04) | |
| Rainfall_T | -0.010 | -0.009 | -0.008 | -0.007 | -0.006 |
| (0.00) | (0.01) | (0.01) | (0.01) | (0.01) | |
| Shortwave Irradiation_T | -0.000 | -0.000 | -0.000 | -0.000 | -0.000 |
| (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | |
| PM 2.5_T | -0.014 | -0.016 | -0.018 | -0.019 | -0.020 |
| (0.00) | (0.00) | (0.00) | (0.00) | (0.01) | |
| PM 10_T | 0.003 | 0.003 | 0.004 | 0.004 | 0.004 |
| (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | |
| UV_T | -0.492 | -0.543 | -0.607 | -0.634 | -0.657 |
| (0.09) | (0.11) | (0.13) | (0.14) | (0.15) | |
| Constant | 7.283 | 7.648 | 8.128 | 8.473 | 8.860 |
| (3.12) | (3.51) | (4.06) | (4.37) | (4.69) | |
|
| 65,369 | 65,369 | 65,369 | 65,369 | 65,369 |
|
| 0.734 | 0.735 | 0.736 | 0.736 | 0.737 |
|
| 0.733 | 0.734 | 0.735 | 0.735 | 0.736 |
T-days moving average. Standard errors in parentheses are clustered at location (country/region) level,
* p < 0.10,
** p < 0.05,
*** p < 0.01.
Panel data fixed-effects model—testing the effect of restrictions.
| Dependent variable: | (1) | (2) | (3) | (4) |
|---|---|---|---|---|
| Daily cases (log) | ||||
| days_from_start | 0.006 | 0.006 | 0.006 | 0.006 |
| (0.00) | (0.00) | (0.00) | (0.00) | |
| Temperature_7 | 0.024 | 0.020 | 0.022 | 0.017 |
| (0.01) | (0.01) | (0.01) | (0.01) | |
| Absolute Humidity_7 | -0.028 | -0.023 | -0.028 | -0.021 |
| (0.02) | (0.02) | (0.02) | (0.02) | |
| Pressure_7 | -0.024 | -0.025 | -0.025 | -0.023 |
| (0.01) | (0.00) | (0.00) | (0.00) | |
| Wind speed_7 | 0.030 | 0.030 | 0.029 | 0.031 |
| (0.02) | (0.02) | (0.02) | (0.02) | |
| Rainfall_7 | -0.013 | -0.010 | -0.013 | -0.013 |
| (0.01) | (0.01) | (0.01) | (0.01) | |
| Short-wave irradiation_7 | -0.000 | -0.000 | -0.000 | -0.000 |
| (0.00) | (0.00) | (0.00) | (0.00) | |
| PM2P5_7 | 0.002 | 0.004 | 0.001 | 0.001 |
| (0.01) | (0.01) | (0.01) | (0.01) | |
| PM10_7 | -0.003 | -0.002 | -0.004 | -0.004 |
| (0.01) | (0.01) | (0.01) | (0.01) | |
| UV_7 | -0.376 | -0.374 | -0.372 | -0.362 |
| (0.17) | (0.17) | (0.17) | (0.17) | |
| High_stringency_7 | -0.087 | |||
| (0.07) | ||||
| High_stringency_14 | -0.273 | |||
| (0.07) | ||||
| High_containment_7 | -0.176 | |||
| (0.07) | ||||
| High_containment_14 | -0.388 | |||
| (0.07) | ||||
| Constant | 28.414 | 28.456 | 28.351 | 26.469 |
| (4.28) | (4.23) | (4.36) | (4.20) | |
|
| 19,289 | 19,289 | 19,289 | 19,289 |
|
| 0.733 | 0.736 | 0.734 | 0.741 |
|
| 0.732 | 0.736 | 0.733 | 0.740 |
Results only for the Canadian territories and the United States. Standard errors in parentheses are clustered at location (country/region) level,
* p < 0.10,
** p < 0.05,
*** p < 0.01.
Coefficients and relative rank describing the impact of climate factors on COVID-19 transmission across the three types of analysis carried out in this study.
| Statistical analysis (Spearman’s coefficient) | Machine learning analysis (SHAP values) | Econometric analysis (Panel data fixed effects model) | |
|---|---|---|---|
| Temperature |
|
|
|
| Absolute Humidity |
|
|
|
| Pressure |
| 14.5 (3) |
|
| Wind speed |
| 10.4 (8) |
|
| Rainfall |
|
|
|
| Short-wave irradiation |
|
| p > 0.10 (11) |
| PM2.5 |
| 10.5 (7) |
|
| PM10 |
|
|
|
| UV |
|
|
|
| Stringency |
|
| |
| Containment |
|
|
Red fonts indicate negative correlations, blue positive correlations, and black undetermined polarity. The integers enclosed in parentheses describe relative rank (1 = highest, 11 = lowest). Stringency and containment results are not available in the statistical analysis.