| Literature DB >> 33417905 |
Harshit Gujral1, Adwitiya Sinha2.
Abstract
This study aims to find the association between short-term exposure to air pollutants, such as particulate matters and ground-level ozone, and SARS-CoV-2 confirmed cases. Generalized linear models (GLM), a typical choice for ecological modeling, have well-established limitations. These limitations include apriori assumptions, inability to handle multicollinearity, and considering differential effects as the fixed effect. We propose an Ensemble-based Dynamic Emission Model (EDEM) to address these limitations. EDEM is developed at the intersection of network science and ensemble learning, i.e., a specialized approach of machine learning. Generalized Additive Model (GAM), i.e., a variant of GLM, and EDEM are tested in Los Angeles and Ventura counties of California, which is one of the biggest SARS-CoV-2 clusters in the US. GAM depicts that a 1 μg/m3, 1 μg/m3, and 1 ppm increase (lag 0-7) in PM 2.5, PM 10, and O3 is associated with 4.51% (CI: 7.01 to -2.00) decrease, 1.62% (CI: 2.23 to -1.022) decrease, and 4.66% (CI: 0.85 to 8.47) increase in daily SARS-CoV-2 cases, respectively. Subsequent increment in lag resulted in the negative association between pollutants and SARS-CoV-2 cases. EDEM results in an R2 score of 90.96% and 79.16% on training and testing datasets, respectively. EDEM confirmed the negative association between particulates and SARS-CoV-2 cases; whereas, the O3 depicts a positive association; however, the positive association observed through GAM is not statistically significant. In addition, the county-level analysis of pollutant concentration interactions suggests that increased emissions from other counties positively affect SARS-CoV-2 cases in adjoining counties as well. The results reiterate the significance of uniformly adhering to air pollution mitigation strategies, especially related to ground-level ozone.Entities:
Keywords: Air pollution; COVID-19; California; Centrality measures; Ensemble learning; Machine learning; Network science
Year: 2021 PMID: 33417905 PMCID: PMC7836725 DOI: 10.1016/j.envres.2020.110704
Source DB: PubMed Journal: Environ Res ISSN: 0013-9351 Impact factor: 6.498
Data Sources
| Source | Access link | |
|---|---|---|
| COVID-19 incidences | Johns Hopkins University | GitHub Repository: CSSEGISandData/COVID-19 |
| Air Pollution and Meteorological data | US EPA – AQS API | API documentation: |
| Demographic data | 2014-18 release of the American Community Survey | Module documentation: |
Fig. A.1Los Angeles County COVID-19 Statistics
Fig. A.2Ventura County COVID-19 Statistics
Descriptive statistics of air pollutants and meteorological variables.
| PM 2.5 | PM 10 | O3 | Pressure | RH | Temperature | Wind | |
|---|---|---|---|---|---|---|---|
| count | 170 | 170 | 170 | 170 | 170 | 170 | 170 |
| mean | 14.68 | 42.48 | 0.05 | 961.57 | 61.08 | 56.06 | 5.11 |
| std | 7.55 | 28.2 | 0.01 | 26.7 | 17.56 | 7.1 | 2.13 |
| min | 4 | 4 | 0.03 | 921.04 | 21.38 | 36.07 | 1.48 |
| 25% | 9.18 | 25.25 | 0.04 | 930.59 | 47.64 | 51.32 | 3.96 |
| 50% | 13.4 | 37 | 0.05 | 980.35 | 65.15 | 55.02 | 5.18 |
| 75% | 17 | 55 | 0.06 | 984.42 | 76.2 | 59.36 | 6.01 |
| max | 48 | 213 | 0.08 | 991.04 | 91.25 | 81.33 | 12.55 |
Fig. 1Observed correlation among pollutants and meteorological variables.
Intercept – P value table depicting relationship between lagged exposure to air pollutants and COVID-19 incidences.
| Pollutants | lag(0–7) days | lag(0–14) days | lag(0–21) days | ||||
|---|---|---|---|---|---|---|---|
| Cases | Mortality | Cases | Mortality | Cases | Mortality | ||
| PM 2.5 | Intercept | −0.045. | −0.015. | −0.089* | −0.031* | −0.182*** | −0.067*** |
| P value | 0.0733 | 0.0882 | 0.0164 | 0.0223 | 0.0003 | 0.0004 | |
| PM 10 | Intercept | −0.016** | −0.003 | −0.035*** | −0.007** | −0.051*** | −0.011*** |
| P value | 0.0080 | 0.1582 | 3.71e-06 | 0.0077 | 2.68e-08 | 0.0003 | |
| O3 | Intercept | 0.046 | 2.468 | −55.00* | −3.931 | −72.14** | 0.057. |
| P value | 0.2227 | 0.6891 | 0.0122 | 0.6183 | 0.0029 | 0.0910 | |
‘***‘, ‘**‘, and ‘*’ shows the significance at 10%, 5%, and 1%.
Fig. A.3Association between short - term exposure to pollutants and COVID Incidences across lags
Fig. 2The results obtained from EDEM. (a) Represents an emission network of PM 2.5 for an arbitrary day. The nodes (in red) represents the monitoring stations in counties of Los Angeles and Ventura. The influence of the emissions of adjoining counties, grey nodes, are represented by the edge. A stronger edge represents a higher influence. (b) Represents the emission network of PM 10 for an arbitrary day. (c) Represents the emission network of O3 for an arbitrary day. (d) Depicts the impact, direction, and relative importance of the association between exposure to pollutants for a short period of time and COVID-19 cases (also refer to Fig. A.4). Emission centrality reflects the county-level effects obtained from Equation (8) through the current flow betweenness centrality measure. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
Fig. A.4Results of 4th sensitivity analysis of the proposed EDEM. COVID-19 cases were transformed by log scale.
Generalized linear models – 1st Sensitivity Analysis. The lagged term for cases and deaths is removed from the model since we are analyzing daily COVID cases and not communitive COVID cases.
| Pollutant | Feature | Cases | Deaths | Cases | Deaths | Cases | Deaths |
|---|---|---|---|---|---|---|---|
| Pollution exposure lag | lag(0–7) | lag(0–7) | lag(0–14) | lag(0–14) | lag(0–21) | lag(0–21) | |
| PM2 | Intercept | −0.057* | −0.025** | −0.122** | −0.050*** | −0.239*** | −0.097*** |
| P-value | 0.0277 | 0.0092 | 0.0014 | 0.0004 | 1.67E-06 | 1.76E-07 | |
| PM10 | Intercept | −0.022*** | −0.004* | −0.043*** | −0.010*** | −0.058*** | −0.016*** |
| P-value | 0.0001 | 0.0367 | 4.00E-10 | 9.17E-05 | 5.16E-13 | 1.86E-07 | |
| O3 | Intercept | −29.19 | 2.5875 | −76.61*** | 0.0464* | −1.046e+02*** | −21.80* |
| P-value | 0.1073 | 0.6983 | 0.0007 | 0.0491 | 2.13E-05 | 0.0195 | |
Generalized linear models – 2nd Sensitivity Analysis. Los Angeles has most COVID cases in the California; thus, we exclude Los Angeles from our data and conduct analysis only on Ventura County.
| Pollutant | Feature | Cases | Deaths | Cases | Deaths | Cases | Deaths |
|---|---|---|---|---|---|---|---|
| Pollution exposure lag | lag(0–7) | lag(0–7) | lag(0–14) | lag(0–14) | lag(0–21) | lag(0–21) | |
| PM2 | Intercept | 0.014 | −0.006 | 0.0464 | 0.015 | −0.006 | 0.001 |
| P-value | 0.6484 | 0.5448 | 0.3497 | 0.3417 | 0.9219 | 0.9647 | |
| PM10 | Intercept | −0.015 | −0.004 | −0.034*** | −0.004 | −0.057*** | −0.008* |
| P-value | 0.0517 | 0.1078 | 0.0004 | 0.1157 | 1.01E-06 | 0.0165 | |
| O3 | Intercept | −0.012*** | −0.014 | −67.73* | −0.009 | −1.287e+02*** | −19.13 |
| P-value | 9.10E-05 | 0.6415 | 0.0117 | 0.6347 | 7.98E-05 | 0.0602 | |
Evaluation metrics of 1st – 4th sensitivity analyses for EDEM. These four sensitivity analyses are as follows. First, we create three separate ensemble learning models separately for three pollutants. Second, we create two separate models for Los Angeles and Ventura counties because out of 170 total data points – 100 belong to Ventura and 70 belong to Los Angeles. Third, an incremental term is added to our original ensemble learning model similar to Equations (1), (2)). Forth, the independent variable, i.e., coronavirus cases, was transformed into the logarithmic domain.
| Sensitivity Analysis | Details | Training R2 score | Testing R2 score |
|---|---|---|---|
| 1 | Three models for three pollutants | 0.7871 | 0.6869 |
| 2 | Two models for Los Angeles and Ventura | 0.8820 | 0.6210 |
| 3 | Added incremental term, i.e., day of the year. | 0.9576 | 0.9229 |
| 4 | Logarithmic transformation of covid-19 cases | 0.9999 | 0.9635 |