Literature DB >> 34619131

The effect of known and unknown confounders on the relationship between air pollution and Covid-19 mortality in Italy: A sensitivity analysis of an ecological study based on the E-value.

Valeria Aloisi¹, Andrea Gatto², Gabriele Accarino³, Francesco Donato⁴, Giovanni Aloisio⁵.

Abstract

Back in December 2019, the novel coronavirus disease 2019 (Covid-19) started rapidly spreading worldwide, especially in Italy that was among the most affected countries. The geographical distribution of air pollution and Covid-19 mortality in Italy suggested atmospheric pollution as a worsening factor of severe Covid-19 health outcomes. The present nationwide ecological study focused on all 107 Italian territorial areas, aiming to assess the potential association between Particulate Matter concentration, less than 2.5 μm in diameter (exposure), and Covid-19 mortality rate (outcome) throughout 2020, by looking at 28 potential confounders. A potential positive association between exposure and outcome was observed when performing a multivariate regression analysis with a Negative Binomial model, suggesting that an increase of 1 μg/m3 in the exposure is associated with an increase of 9.0% (95% CI: 6.5%-11.6%) in the average Covid-19 mortality rate, conditional on all 28 potential confounders. A sensitivity analysis, based on the E-value, shows that a hypothetical unmeasured confounder would have to be associated with both PM2.5 concentration and Covid-19 mortality rate by a rate ratio of at least 1.40-fold each to explain away the exposure-outcome association, conditional on all 28 covariates included in the main analysis model. Moreover, the Observed Covariate E-value (OCE) was reported to provide a contextualization of the E-value on the observed covariates included in the study. The OCE sensitivity analysis shows that a set of unknown confounders similar in size and magnitude to the set of the considered climatic factors could potentially explain away the estimated exposure-outcome association. Consequently, the role of climatic factors in the Covid-19 pandemic is worth of further investigation.

Entities: Chemical

Keywords: Air pollution; Confounding factors; Covid-19; Ecological study; Sensitivity analysis

Mesh：

Substances：

Year: 2021 PMID： 34619131 PMCID： PMC8487852 DOI： 10.1016/j.envres.2021.112131

Source DB: PubMed Journal: Environ Res ISSN： 0013-9351 Impact factor: 6.498

Introduction

In December 2019, several pneumonia cases were suddenly identified in Wuhan (Hubei Province, China) as a result of an infection to a novel coronavirus, called “SARS-CoV-2” (Zhu N. et al., 2020). This virus is mostly transmitted through close contact from person to person (approximately 2 m), and by aerosol respiratory droplets smaller than 5 μm in diameter (Copat et al., 2020). The most severe symptoms, which require intensive care, have generally been observed in older individuals with previous comorbidities, such as cardiovascular, endocrine, digestive and respiratory diseases (Wang et al., 2020). The World Health Organization (WHO) has defined this new syndrome with the acronym Covid-19, which stands for Corona VIrus Disease 2019 (Sohrabi et al., 2020; WHO, 2020a). Despite the drastic containment measures adopted by the Chinese government, SARS-CoV-2 spread globally in a few weeks and the WHO declared Covid-19 a global pandemic on March 11th, 2020 (WHO, 2020b). Covid-19 incidence and mortality rates are still rising in the world, with more than 190 million cases and about 4 million deaths as of July 19th, 2021 (JH-CRC, 2021). Several publications of epidemiological studies about the potential association between air pollution exposure and different Covid-19 variables (e.g., confirmed cases, hospitalizations, deaths, etc …) have given rise to a rapidly expanding area of research. A comprehensive review focused on assessing the role of various air pollutants on the Covid-19 pandemic was reported in (Marquès and Domingo, 2021). Many studies used a descriptive analysis based on several correlation indices (e.g., Pearson, Spearman, Kendall, etc …), focusing on the exposure to different air pollutants (e.g., PM2.5, PM10, NO2) in various countries, such as Italy, Spain, France, Germany, Mexico and the USA (Fattorini and Regoli, 2020; Accarino et al., 2021; Ogen, 2020; Zoran et al., 2020a, 2020b; Bontempi, 2020; Frontera et al., 2020; Bashir et al., 2020; Cucciniello et al., 2021; Tello-Leal and Macías Fernández, 2021). Other studies used a regression analysis to evaluate the potential impact of the exposure to several air pollutants on Covid-19 confirmed cases and deaths. In particular, a Generalized Additive Model (GAM) was applied to examine the effects of meteorological factors and air pollution on Covid-19 incidence in 120 cities of China (Zhu et al., 2020), whereas a multivariate Poisson regression was exploited for the same purpose, focusing on three of the most affected cities in China: Wuhan, XiaoGan and HuangGang (Jiang et al., 2020). Moreover, a simple linear regression model was used to compare Covid-19 incidence with Particulate Matter (PM) concentrations in Wuhan and XiaoGan (Li et al., 2020). The spatial distribution of PM and case fatality rate of Covid-19 in China was also studied by applying a multiple linear regression adjusted for several effect modifiers and confounding factors (Yao et al., 2020). Furthermore, a hierarchical multiple regression model was applied to analyze the impact of PM10 on Covid-19 spread in 55 Italian province capitals (Coccia, 2020; Copat et al., 2020). In addition, an ecological study was carried out to assess the impact of long-term exposure to PM and NO2 on Covid-19 incidence and all-cause mortality, focusing on Lombardy municipalities, in Italy (De Angelis et al., 2021). In particular, multivariable negative binomial mixed regression models were applied accounting for demographic, socio-economic and meteorological variables. As a result, the work showed that exposure to PM was significantly associated with Covid-19 incidence and excess mortality during the 1st pandemic wave in Lombardy. Focusing on the role played by PM2.5 in Covid-19 pandemic, a multivariate regression model was applied to assess the relationship between PM2.5 and Covid-19 cases, deaths, and case-fatality rates in 24 districts of Lima, Peru (Vasquez-Apestegui et al., 2021). Additionally, an ecological study was conducted on 400 counties in Germany to evaluate the correlation between long-term PM exposure and Covid-19 cases, as well as fatalities, by means of OLS regressions including socioeconomic and demographic covariates (Prinz and Richter, 2021). A more extensive study was conducted in the United States (Wu et al., 2020). In this case, the association between the average long-term exposure to PM2.5 and Covid-19 death rate was addressed by considering approximately 3000 counties. A negative binomial mixed model was fit by using Covid-19 mortality rates as the outcome and the average long-term exposure to PM2.5 as variable of interest, and accounting for 20 socio-demographic, socio-economic, behavioral, and meteorological potential confounders at county level. As a result, this study showed that a 1 μg/m3 increase in long-term exposure to PM2.5 is associated with the 11% increase (95% CI: 6%–17%) in the county's Covid-19 mortality rate. More than 80 sensitivity analyses and the E-value computation were performed to assess the robustness of these finding to various modeling assumptions and to unmeasured confounding. Italy was the first country in Europe to be hit by Covid-19 and it is still significantly affected by the pandemic, with more than 4 million cases and nearly 128,000 deaths as of July 19th, 2021 (JH-CRC, 2021). During the 1st wave of the pandemic, the mortality trend in Italy showed strong regional differences with most deaths concentrated in the north of the country (Fattorini and Regoli, 2020). The geographical distribution of Covid-19 mortality during the 1st pandemic wave and air quality in Italy suggested atmospheric pollution as a worsening factor for Covid-19 severity (Fattorini and Regoli, 2020; Accarino et al., 2021). In the present study, this intuition has encouraged the Authors to investigate the relationship between air pollution and Covid-19 mortality rate in Italy. As an extension of a previous study (Accarino et al., 2021), the present work focuses on a nationwide ecological study about the 107 Italian territorial areas, aiming to assess the potential relationship between PM2.5 concentration and Covid-19 mortality rate throughout 2020, by also controlling for 28 known (i.e., measured) confounding factors. Thus, the Authors propose a multivariate regression analysis with a Negative Binomial model, followed by the analysis of the sensitivity to unknown (i.e., unmeasured) confounding, based on the computation of the E-value (VanderWeele and Ding, 2017) and of the Observed Covariate E-value (OCE), that contextualizes the E-value with respect to the confounding factors observed in the study (D'Agostino McGowan and Greevy, 2020). To the best of our knowledge, the joint use of the E-value and the OCE represents the novelty of the present study with respect to the current state-of-the-art, making the interpretation of the sensitivity analysis to unmeasured confounding easier.

Materials and methods

Data sources

This study is based on publicly available data. All variables involved in the analyses, along with the data sources are listed in Table S1 reported in the Supplementary Material. Data sources were integrated using the Python 3.8.2 programming language and analyzed with data science, statistical and plotting libraries such as Pandas, GeoPandas, Shapely, Xarray, NumPy, SciPy, Matplotlib and Seaborn.

Outcome of interest

Covid-19 death counts for each Italian territorial area were periodically reported by the Italian National Institute of Statistics (ISTAT) and the Italian National Institute of Health (ISS). As outcome of interest, the present study considered the cumulative number of Covid-19 deaths in 2020 at the resolution of territorial areas, based on the ISTAT-ISS report published on June 10th, 2021 (ISTAT, 2021a). Covid-19 mortality rates at the aforementioned resolution were computed as the ratio of Covid-19 deaths to the population size in territorial areas on January 1st, 2020.

Exposure of interest

The exposure variable in the present study refers to data about PM2.5 concentration levels provided by the Copernicus Atmosphere Monitoring Service (CAMS), which allows users to retrieve data from several archives based on the monitored atmospheric variables (CAMS, 2021a). In particular, the “CAMS European Air Quality Forecasts (CAMS-EAQF)” dataset provides hourly air quality analysis data for the European domain (CAMS, 2021b). Analysis data are obtained through various data assimilation techniques that combine model simulations with actual observations retrieved from the European Environment Agency (EEA). The CAMS-EAQF dataset was used to gather PM2.5 concentrations in Italy from January 1st, 2018 to February 29th, 2020 on a 0.1° × 0.1° (∼10 km × 10 km) spatial resolution grid. This time period was selected in order to reduce the biased effect on the atmospheric pollutant levels due to Covid-19 containment measures, i.e., lockdown, imposed by the Italian government in early March 2020 (PMD, 2020). First, data concerning PM2.5 concentration levels were averaged over the period from January 1st, 2018 to February 29th, 2020 in order to obtain a PM2.5 temporally averaged value for each point of the grid covering the Italian country. Then, each geographical grid point was associated with the Italian territorial area that contains the grid point within the administrative boundary. Data about the boundaries of the Italian territorial areas as of January 1st, 2020 are provided by ISTAT (ISTAT, 2021b). Last, the PM2.5 level was computed for each Italian territorial area by averaging PM2.5 concentrations over all grid points that had previously been associated with the considered territorial area. As a result, each Italian territorial area is associated with a PM2.5 level obtained through a spatiotemporal average.

Potential confounders

To adjust for confounding bias, a set of 74 demographic, socio-economic, behavioral, mobility, climatic, and health-related variables was collected from different public data sources for various time periods, as shown in Table S1 reported in the Supplementary Material. As indicated in this table, many variables were natively retrieved at the resolution of the single territorial areas in Italy, whereas others were available only at a regional resolution. Thus, for the purposes of the study, a specific procedure was applied to convert regional distributed variables to the level of territorial areas. In particular, the following proportion was used:where “Reg. value” represents the variable value at a regional resolution, “Reg. pop.” is the population of the region, “x” is the unknown value of the variable at the territorial area resolution, whereas “Territorial area pop.” represents the population of a territorial area in the considered region. All these quantities must be referred to the same time period. For the sake of comprehension, when “Reg. value” is available for a certain age range in a certain time period, both “Reg. pop.” and “Territorial area pop.” represent the population size of that age range for the same time period at regional and territorial area resolution, respectively. This proportion is based on the assumption that the distribution of the phenomenon described by each variable is homogeneous within regional boundaries. In order to gather information about air travelers in Italy and from foreign countries, data concerning the number of arriving passengers were provided by ISTAT for each Italian airport (ISTAT, 2021c). As a consequence, these data required a further association procedure between each airport location and the territorial area they belong to, based on the assumption that a person arrives at the airport nearest to the territorial area of destination. For all the 74 collected variables, a preliminary analysis of data distribution was performed with the aim of selecting the proper transformation of each variable in view of the Negative Binomial multivariate regression analysis, which is the core of the present study. In particular, a log-transformation or a percentage conversion was applied to most variables in order to reduce or remove the skewness of their original distribution. Moreover, all 74 variables were standardized to avoid the structural multicollinearity issue potentially affecting multivariate regression analysis. Additionally, a further analysis was carried out prior to the regression analysis to select the variables to be included in the regression model, aiming at minimizing the multicollinearity exhibited by data. Particularly, all variables were categorized into nine groups based on their characteristics, that is demographic, social, economic, educational, behavioral, health, mobility, climatic, and geolocation factors (see Table S1 reported in the Supplementary Material). First, the multicollinearity among all the variables belonging to each group was evaluated by computing the Variance Inflation Factor (VIF) metric, thus selecting the set of variables that showed no intra-group multicollinearity (VIF <10). Then, a further inter-groups multicollinearity analysis was performed on the overall variables of the resulting sets. Specifically, 28 variables were selected to be included in the multivariate regression model as covariates, as detailed in the following Section. These variables assured the absence of inter-groups multicollinearity with the largest VIF value of 6.29. However, the addition of the exposure variable (PM2.5 concentration) to the set of 28 covariates increases the largest VIF to 7.99, which remains under the selected threshold of 10, therefore denoting the absence of multicollinearity.

Statistical analysis

The present study is aimed at assessing the potential relationship between PM2.5 concentration and Covid-19 mortality rate, by means of a multivariate regression analysis (from now on called “Main analysis”) adjusted for the 28 potential confounding factors shown in Table 1 . In particular, this table reports the descriptive statistics of the variables (i.e., exposure, outcome of interest and potential confounding factors) involved in the Main analysis, as well as the indication of the group each potential confounder belongs to. As already said in the previous Section, a log-transformation was applied to some variables in order to reduce or remove the skewness of their original distribution.

Table 1

Descriptive statistics.

	Mean (μ)	Standard Deviation (σ)	Group Factor
Covid-19 deaths (inhabitants)	720.48	974.55	Outcome
PM_2.5 (μg/m³)	11.51	3.47	Exposure
% Male population	48.80	0.47	Demographic
% Over 65 years of age population	24.10	2.35	Demographic
log₂(Population Density (inhabitants/km²))	4.16	1.18	Demographic
Cinema events (per 1000 inhabitants)	48.86	22.58	Social
log₂(1 + Concerts (per 1000 inhabitants))	0.67	0.33	Social
log₂(Bookshops (per 100,000 inhabitants))	2.85	0.44	Social
log₂(1 + Exhibitions (per 1000 inhabitants))	0.79	0.69	Social
log₂(1 + Farmhouses (per 1000 km²))	5.45	1.48	Social
log₂(Restaurants/Bars (per 100,000 inhabitants))	9.31	0.29	Social
Sport events (per 1000 inhabitants)	2.46	2.32	Social
log₂(Theater events (per 1000 inhabitants))	0.77	0.70	Social
log₂(All events expenditure per-capita (€))	4.55	1.13	Economic
Median income (€) a	23,302.04	2950.4	Economic
log₂(Poverty Incidence Rate)	−5.93	1.56	Economic
log₂(People with at least one chronic disease (inhabitants))	17.35	1.01	Health
Consumption of medicines for asthma and COPD (minimum per capita units)	6.42	1.09	Health
Consumption of medicines for diabetes (minimum per capita units)	41.37	7.23	Health
Consumption of medicines for hypertension (minimum per capita units)	145.02	14.53	Health
Primary care physicians (per 1000 inhabitants)	0.93	0.16	Health
Hospital beds (per 1000 inhabitants)	3.41	0.88	Health
Days since 1st Covid-19 case	303.80	4.74	Health
log₂(Passengers arriving from Italy by plane)	18.33	2.85	Mobility
log₂(Passengers arriving from foreign countries by plane)	18.81	3.34	Mobility
log₂(Foreign customers arrivals at the accommodation facilities)	17.47	2.17	Mobility
Mean Temperature (°C)	13.38	2.94	Climatic
Mean Precipitation (mm)	2.98	0.89	Climatic
Mean Relative humidity (%)	72.96	2.05	Climatic
log₂(Average number of annual days with wind gusts of over 25 knots)	3.76	1.81	Climatic
log₂(14+ years of age Smokers (inhabitants)) b	16.09	1.03	Behavioral
Latitude (°N) b	42.91	2.64	Geolocation
Longitude (°E) b	12.10	2.67	Geolocation

COPD = Chronic Obstructive Pulmonary Disease.

Per capita.

Potential confounder exploited only for further sensitivity analyses reported in Section 2.4.

Descriptive statistics. COPD = Chronic Obstructive Pulmonary Disease. Per capita. Potential confounder exploited only for further sensitivity analyses reported in Section 2.4. A Negative Binomial (NB) regression model adjusted for all the 28 confounding factors was exploited using Covid-19 deaths as outcome and PM2.5 as the exposure of interest. Since the outcome variable includes count data which exhibited over dispersion, the NB model proved more appropriate compared to the Poisson regression model. Moreover, the population size as of January 1st, 2020 was considered as the offset of the NB regression model. The model was fit on a dataset made of 107 samples corresponding to all the Italian territorial areas. Then, the estimated regression coefficients were exponentiated in order to compute the Rate Ratios (RRs) along with the corresponding 95% Confidence Interval (CI). Thus, a type I significance level α = 0.05 was considered as the threshold for assessing statistical significance. For each predictor (i.e., the exposure of interest or a covariate of the model), the related RR represents the effect size measure in the predictor-outcome association. In particular, the RR for PM2.5 can be interpreted as the relative decrease (if lower than one) or increase (if greater than one) in the average Covid-19 mortality rate associated with a 1 μg/m3 increase in PM2.5 concentration, by keeping the values of all other covariates included into the main analysis unchanged. Furthermore, a RR lower than one denotes a negative association between the predictor and the outcome, whereas a RR greater than one indicates a positive predictor-outcome association. When the RR is equal to one, which represents the null value at the RR scale, then the predictor-outcome association does not exist. However, when the RR CI crosses the null value, the interpretation of the predictor-outcome association is uncertain. All analyses performed in the present study were carried out by using the Python 3.8.2 programming language. Specifically, the statistical analysis implementation was carried out by exploiting the “Statsmodels” library.

Sensitivity to unmeasured confounding analyses

As widely known, the potential influence of unmeasured confounding variables is a limiting factor for the strength of evidence provided by observational studies. As a consequence, unmeasured confounding is a concern in the performed analyses. Therefore, the sensitivity analyses carried out in the present study aimed to assess the robustness of the estimated exposure-outcome relationship to unmeasured confounding.

E-value

The first sensitivity to unmeasured confounding analysis involves the computation of the “E-value”, which is a concise metric used to evaluate the potential impact of unmeasured confounding on the results of an observational study (VanderWeele and Ding, 2017). According to the definition, the E-value quantifies the minimum strength of association, on the rate ratio scale, that a hypothetical unmeasured confounder U would need to have with both the outcome and the exposure, to explain away the estimated exposure-outcome association, conditional on all the covariates included in the regression model. In other words, the E-value quantifies the magnitude of the unmeasured confounding required to make the estimated exposure-outcome association uncertain at the chosen significance level. Indeed, this happens when the estimated rate ratio CI includes the null value (i.e., RR = 1) at the rate ratio scale. In particular, the lower or upper bound of the rate ratio CI closest to the null is defined as “Limiting Bound (LB)” (D'Agostino McGowan and Greevy, 2020). For example, a RR of 1.25 (95% CI: 1.1–1.5) would no longer be significant at the α = 0.05 level if adjusting for a hypothetical unmeasured confounder shifted the rate ratio CI to include the null value. Furthermore, the higher the E-value, the stronger the associations of U with both the exposure and the outcome must be to explain away the estimated exposure-outcome association (VanderWeele and Ding, 2017). Consequently, higher E-values correspond to situations in which the existence of such unmeasured confounder U, able to tip the estimated exposure-outcome relationship, become more unlikely. Moreover, according to (VanderWeele and Ding, 2017), even if U existed, it would not necessarily explain away the estimated association, but it could be possible to construct scenarios in which U actually could do so. In addition, beside assessing the robustness to unmeasured confounding of a given study, the E-value could provide a basis for comparison with similar ecological studies. Hence, if two studies with the same exposure and outcome of interest reported the same E-value, yet adjusting for a different number of covariates, the study that checked for the highest number of covariates would be more robust. As suggested in (VanderWeele and Ding, 2017), reporting the E-value for the LB, as well as the E-value for the estimated RR, is a good practice in observational studies. Therefore, the following equations provide the E-value computation in both cases: where LL is the Lower Limit of the RR CI and UL is the Upper Limit of the RR CI. At the end of the main analysis, the estimated RR related to the exposure of interest (PM2.5) was exploited in order to compute the corresponding E-value and the E-value for the LB. Although the E-value offers a simple quantified sensitivity analysis, without additional context, readers will have difficulty judging whether the tipping point E-value is large or small for a given study (D'Agostino McGowan and Greevy, 2020).

Observed Covariate E-value (OCE)

Many of the existing methods attempt to quantify the minimum strength of unmeasured confounding needed to tip a given analysis. Unfortunately, they do not provide context for how likely it is that an unmeasured confounder of such a magnitude exists. A hybrid approach used the VanderWeele and Ding's E-value to quantify the unmeasured confounding needed to tip the analysis, and subsequently “grounds” this value in the observed covariates (D'Agostino McGowan and Greevy, 2020). This gives some context to help understand the type of confounder that could have the tipping impact on a given study. In particular, a new metric, the Observed Covariate E-value (OCE), was suggested to quantify the impact of the observed covariates, individually or in groups, on the E-value scale. Thus, the OCE contextualizes the sensitivity analysis' E-value, grounding a seemingly abstract measure in the environment of the study observed covariates. Formally, the OCE is defined by the following equation:where LB is the confidence limit closest to the null of the effect observed by leaving out a given confounder from the regression model, whereas LB represents the limiting bound of the effect estimated by keeping that confounder within the regression model. When the limiting bound LB is less than one, both LB and LB must be replaced with LB * = 1/LB and LB * = 1/LB , respectively. The OCE quantifies the effect of moving the observed limiting bound LB to the adjusted limiting bound LB . Moreover, an omitted covariate which has individually a low impact on the effect estimated by the adjusted model, could have a greater impact if combined with other covariates in a group. Thus, considering the OCE related to groups of covariates is useful to observe the effect of leaving out a group of confounders. In this case, LB represents the confidence limit closest to the null of the effect estimated without a given group of confounders. Practically, LB is the estimated exposure-outcome effect adjusting for all the observed confounders (main analysis model), whereas LB can be computed for each measured confounder by using the observed bias effect (D'Agostino McGowan and Greevy, 2020). The observed bias effect represents the triplet made of the exposure RR and its lower and upper CI bounds obtained by leaving out one covariate (or group of covariates) in turn from the main analysis model. Therefore, the observed bias effect allows demonstrating how the effect of the exposure of interest would change if a covariate (or a group of covariates) had not been observed. In particular, if the observed bias effect RR deviates from the exposure RR of the main analysis, the corresponding omitted covariate (or group of covariates) has a great impact on the estimated exposure-outcome association, thus meaning that its omission would lead to an underestimate or an overestimate of that association. On the other hand, if an observed bias effect RR is close to the main analysis exposure RR, the corresponding omitted covariate (or group of covariates) has a low impact on the exposure-outcome relationship, suggesting that the covariates still remaining in the main analysis model can already take its effect into account.

Further sensitivity analyses

Besides the analyses detailed in the previous Sections, 13 additional sensitivity analyses were performed to assess the robustness of the main analysis results to the data and modelling choices. In particular, 9 out of 13 were carried out by acting on the dataset used for fitting the main analysis model, whereas the remaining analyses involved the modification of the model structure and of the set of potential confounders considered as predictors. First, since the Covid-19 pandemic exhibited a heterogeneous geographical spread in Italy throughout 2020, a further sensitivity analysis was performed by excluding from the dataset the territorial areas of Lombardy, the most affected Italian region (i). Indeed, the aim was to avoid a potential higher influence of the most affected territorial areas on the main analysis results. On the other side, four different sensitivity analyses were carried out in order to assess the impact of the less affected territorial areas. Specifically, for each of these analyses, the territorial areas with the following characteristics were removed in turn from the dataset: less than 50 Covid-19 deaths (ii), less than 100 Covid-19 deaths (iii), less than 503 Covid-19 deaths per 100,000 inhabitants (iv), less than 751 Covid-19 deaths per 100,000 inhabitants (v). Second, in order to avoid the influence of the most populated territorial areas on the study results, the areas with more than 1 million residents were excluded (vi). Third, since the Covid-19 pandemic spread differently among northern, central and southern regions in Italy, three independent sensitivity analyses were carried out by removing the northern (vii), central (viii) and southern (ix) territorial areas, respectively. Additionally, a mixed effects Negative Binomial model was fit on the dataset to evaluate the sensitivity to modeling choices, by still adjusting for the same set of predictors included in the main analysis model (x). The mixed effects model involved a random intercept by region to account for potential correlation in territorial areas within the same region of belonging. In the end, some important potential confounders that were initially excluded after the preliminary multicollinearity analysis, were then added to the main analysis model in turn to assess their impact on the exposure-outcome association. In particular, a sensitivity analysis was carried out by including the “14+ years of age Smokers” variable (log2-transformed and standardized) in the main regression model (xi). Clearly, as shown by the multicollinearity analysis, this new variable was strongly correlated with the “People with at least one chronic disease” covariate. An additional sensitivity analysis was therefore performed by leaving out the latter covariate and including the former variable in the regression model (xii). Moreover, to evaluate how potential spatial residual confounding might affect the main analysis results, the latitude and longitude of the territorial areas centroid were standardized and then included as covariates in the main model, thus giving rise to an additional sensitivity analysis (xiii). Even in this case, the addition of both latitude and longitude leads to an increase of the multicollinearity among the covariates.

Results

The results of the main analysis are reported in Table 2 . A positive association was found between PM2.5 concentration (exposure) and Covid-19 mortality rate (outcome), controlling for 28 potential confounders, with a statistically significant RR of 1.090 (95% CI: 1.065–1.116). Thus, an increase of 1 μg/m3 in PM2.5 concentration resulted to be associated with a statistically significant increase of 9.0% in the average Covid-19 mortality rate, by keeping the values of all the 28 covariates included into the main analysis unchanged. As reported in Table 2, 12 out of 28 covariates corresponding to demographic, social, economic, health, mobility, and climatic factors showed a statistically significant RR, which makes them important predictors of the Covid-19 mortality rate in the model.

Table 2

Rate ratio (RR), 95% confidence interval (CI), and P-value for each covariate included in the main analysis model.

	RR	95% CI	P-value
PM_2.5(μg/m³)	1.090	(1.065–1.116)	0.0000
% Male population a	1.044	(0.975–1.117)	0.2156
% Over 65 years of age population a	1.186	(1.106–1.272)	0.0000
Cinema events (per 1000 inhabitants) a	1.030	(0.980–1.083)	0.2438
log₂(1 + Concerts (per 1000 inhabitants)) a	0.962	(0.908–1.019)	0.1853
log₂(Bookshops (per 100,000 inhabitants)) a	0.981	(0.930–1.035)	0.4764
log₂(1 + Exhibitions (per 1000 inhabitants)) a	1.092	(1.039–1.149)	0.0006
log₂(1 + Farmhouses (per 1000 km²)) a	0.958	(0.911–1.007)	0.0913
log₂(Restaurants/Bars (per 100,000 inhabitants)) a	1.119	(1.047–1.195)	0.0009
Sport events (per 1000 inhabitants) a	0.977	(0.927–1.030)	0.3916
log₂(Theater events (per 1000 inhabitants)) a	1.029	(0.972–1.089)	0.3212
log₂(All events expenditure per-capita (€)) a	0.926	(0.867–0.989)	0.0215
Median income (€) b^,a	1.013	(0.948–1.083)	0.6926
log₂(People with at least one chronic disease (inhabitants)) a	1.035	(0.951–1.126)	0.4259
Consumption of medicines for asthma and COPD (minimum per capita units) a	1.015	(0.963–1.070)	0.5772
Consumption of medicines for diabetes (minimum per capita units) a	1.020	(0.958–1.085)	0.533
Consumption of medicines for hypertension (minimum per capita units) a	0.872	(0.822–0.926)	0.0000
Primary care physicians (per 1000 inhabitants) a	0.966	(0.922–1.013)	0.1557
Hospital beds (per 1000 inhabitants) a	1.075	(1.025–1.127)	0.003
log₂(Passengers arriving from Italy by plane) a	0.942	(0.891–0.996)	0.0346
log₂(Passengers arriving from foreign countries by plane) a	1.036	(0.981–1.094)	0.2026
log₂(Foreign customers arrivals at the accommodation facilities) a	0.899	(0.843–0.958)	0.0011
Mean Temperature (°C) a	0.724	(0.658–0.797)	0.0000
Mean Precipitation (mm) a	1.031	(0.951–1.117)	0.4644
Mean Relative humidity (%) a	0.931	(0.881–0.984)	0.0116
log₂(Average number of annual days with wind gusts of over 25 knots) a	0.968	(0.914–1.025)	0.2686
log₂(Population Density (inhabitants/km²)) a	1.024	(0.948–1.105)	0.5507
log₂(Poverty Incidence Rate) a	0.887	(0.834–0.943)	0.0001
Days since 1st Covid-19 case a	1.226	(1.159–1.297)	0.0000

COPD = Chronic Obstructive Pulmonary Disease.

Standardized.

Per capita.

Rate ratio (RR), 95% confidence interval (CI), and P-value for each covariate included in the main analysis model. COPD = Chronic Obstructive Pulmonary Disease. Standardized. Per capita. Regarding the main analysis and the exposure of interest, the E-value for the LB resulted equal to 1.33, whereas the E-value for the estimated RR was 1.40. Thus, for a hypothetical unmeasured confounder to explain away the estimated exposure-outcome association, it would have to be associated with both PM2.5 concentration and Covid-19 mortality rate by a rate ratio of at least 1.40-fold each, conditional on all 28 covariates included in the main analysis model. Fig. 1 depicts the “observed bias plot”, showing the results of the sensitivity to unmeasured confounding analyses carried out in the present study. In particular, Panel A reports the RR for PM2.5 concentration (i.e., the RR for the exposure) estimated through the main analysis and the observed bias effects, which reveal the effect that a covariate (or a group of covariates) would have on the observed exposure-outcome relationship if it had acted as an unmeasured variable. The groups involved in the computation of the observed bias effects were chosen by grouping the 28 covariates included in the main analysis model with respect to the initial categorization reported in Table 1. All RRs and the corresponding CI bounds are depicted by means of blue error bars.

Fig. 1

Observed Bias Plot. Panel A shows the main analysis results for PM2.5 concentration and the observed bias effects, in terms of RRs and related confidence intervals (blue error bars). The black dashed line represents the null value on the RR scale. The red dashed line is the RR for PM2.5 concentration in the main analysis model (1.090), whereas the solid red lines, filled with the light red region, represent the 95% CI for the exposure-outcome association estimated by the main analysis model (95% CI: 1.065–1.116). Panel B illustrates the E-value for the exposure LB related to the main analysis and the OCEs for the LB of the observed bias effects. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) As shown in Panel A of Fig. 1, mean temperature (individual factor) and climatic factors (group factor) have a greater impact on the estimated exposure-outcome association. Indeed, the aforementioned factors have an observed bias effect RR which falls out of the CI related to the exposure RR of the main analysis, thus meaning that their omission from the multivariable regression model would lead to an underestimate of the exposure-outcome association. For the remaining regression analyses, the observed bias effects RR resulted close to the main analysis exposure RR, thus showing that the corresponding omitted covariate (or group of covariates) has a lower impact on the exposure-outcome relationship. Moreover, Panel B shows the E-value for the LB of the exposure RR, estimated through the main analysis (red cross), and the OCEs derived for the LB of the observed bias effects (purple asterisks for the omission of one covariate in turn and empty green bullets for the omission of a group of covariates in turn). It can be noted that when omitting the climatic factors, the computed OCE is close to the E-value (red cross). The role of climatic factors and their potential impact on the exposure-outcome association are highlighted in the Discussion section. The results of the additional 13 sensitivity analyses, aimed at assessing the robustness of the main analysis results to data and modelling choices, are shown in Fig. 2 . This figure illustrates the exposure RRs, along with their CI, obtained through both the main analysis and the 13 additional analyses, as detailed in Section 2.4. If the exposure RR of a sensitivity analysis deviates from the exposure RR of the main analysis (RR = 1.090, 95% CI: 1.065–1.116), then the modification to data or modelling choices considered in that sensitivity analysis leads to an underestimate or an overestimate of the exposure-outcome association. This effect is more evident when the Northern and the Central Italian territorial areas are excluded in turn, as well as when both latitude and longitude of each territorial area centroid are added to the main analysis model.

Fig. 2

Further sensitivity analyses results. The RRs related to PM2.5 concentration, along with their CI, are shown for the main analysis and for each of the additional 13 sensitivity analyses. The black dashed line represents the null value on the RR scale. The red dashed line indicates the RR for PM2.5 concentration in the main analysis model (1.090), whereas the solid red lines, filled with the light red region, represent the 95% CI (1.065–1.116) for the association between PM2.5 concentration and Covid-19 mortality rate, estimated by the main analysis model. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

Discussion

The present ecological study aimed to assess the potential relationship between PM2.5 concentration and Covid-19 mortality rate at the resolution of the single territorial areas in Italy, by looking at several confounding factors. The considered PM2.5 concentration refers to the time period from January 1st, 2018, to February 29th, 2020, prior to the containment measures imposed by the Italian government in early March 2020 (PMD, 2020). This allowed reducing as much as possible the biased effect on the atmospheric pollutant levels due to Covid-19 containment measures. As main finding, a potential positive association between PM2.5 concentration and Covid-19 mortality rate was found, with a statistically significant RR equal to 1.090 (95% CI: 1.065–1.116). This rate ratio suggests that an increase of 1 μg/m3 in PM2.5 concentration is associated with an increase of 9.0% (95% CI: 6.5%–11.6%) in the average Covid-19 mortality rate, conditional on all 28 covariates included in the main analysis model. This finding agrees with the results emerged from the ecological study proposed in (Wu et al., 2020), which reports a statistically significant positive association (RR = 1.11, 95% CI: 1.06–1.17) between long-term PM2.5 exposure and Covid-19 mortality rate, focusing on the USA counties and adjusting for 20 county-level potential confounding factors. Considering that the potential influence of unmeasured confounding is a concern in observational studies, the robustness to unmeasured confounding of the resulting potential positive association between PM2.5 concentration and Covid-19 mortality rate was assessed. For this purpose, sensitivity analyses were performed by exploiting the E-value and its contextualization to the observed covariates of the main analysis model, through the OCE computation. The E-value for the LB related to PM2.5 concentration and obtained through the main analysis resulted equal to 1.33, whereas the E-value for the estimated RR was 1.40 (see Table S2 reported in the Supplementary Material). This means that a hypothetical unmeasured confounder U would need to meet the following two criteria simultaneously: i) a 1-unit increase in U would have to lead to at least a 40% increase in Covid-19 mortality rate and ii) a 1-unit increase in U would need to be associated with at least a 40% increase in PM2.5 concentration. As detailed in Section 2, the Observed Covariate E-value was also computed in order to quantify the impact of the observed covariates, individually or in groups, on the E-value scale, thus contextualizing the sensitivity analysis' E-value. According to the OCE interpretation provided in (D'Agostino McGowan and Greevy, 2020), this metric gives context to help understand the type of unmeasured confounder that could have the tipping impact on a given study, i.e., that could render the study results statistically uncertain. Indeed, an OCE close to the main analysis E-value and related to an omitted covariate means that a potential unmeasured confounder similar in size and magnitude to the omitted covariate could be able to tip the main analysis results. The sensitivity analysis based on the OCE highlighted an interesting finding related to the temperature and the climatic factors. Indeed, Panel A of Fig. 1 shows that the observed bias effects RRs computed for the exposure by omitting the temperature and the group of climatic factors individually, that in turn act as unmeasured confounders, fall out of the CI related to the exposure RR of the main analysis. This suggests that their omission from the multivariable regression model would lead to an underestimate of the exposure-outcome association. This underestimation is greater when omitting climatic factors as a whole (RR = 1.022, 95% CI: 1.003–1.042) than that obtained by omitting only the temperature (RR = 1.053, 95% CI: 1.031–1.076). Moreover, if the LB of the observed bias effect for the climatic factors were less than or equal to the null value (i.e., OCE E-value for the LB related to PM2.5 concentration), the exposure-outcome association, estimated through the main analysis, would be statistically uncertain. In this scenario, the climatic factors would have acted as unmeasured confounders able to tip the obtained exposure-outcome relationship. As shown in Panel B of Fig. 1, the LB of the observed bias effect for the climatic factors resulted to be very close to the null value, leading to an OCE (1.32) almost equal to the VanderWeele and Ding's E-value (1.33). Consequently, according to the OCE interpretation, a set of unknown confounders (i.e., not yet included in the main analysis) similar to the set of the considered climatic factors could potentially be able to tip the estimated exposure-outcome association. Indeed, the role of climatic factors on Covid-19 spread and mortality have been investigated since the beginning of the pandemic, as shown in several recent studies (Srivastava, 2021; Mecenas et al., 2020; Islam et al., 2021; Yuan et al., 2021; Guo et al., 2021). Additionally, as shown in Fig. 3 (IPCC, 2013), the worrying increase in the global surface temperature expected in the next decades gives rise to a key question: “What would happen considering different hypothetical climatic scenarios?“. This question also stresses the need for further investigation into climatic factors to examine their role in the Covid-19 pandemic spread.

Fig. 3

Time series of global annual mean surface air temperature anomalies (relative to 1986–2005) from CMIP5 concentration-driven experiments. Projections are shown for each RCP for the multi-model mean (solid lines) and the 5–95% range (±1.64 standard deviation) across the distribution of individual models (shading). Discontinuities at 2100 are due to different numbers of models performing the extension runs beyond the 21st century and have no physical meaning. Only one ensemble member is used from each model and numbers in the figure indicate the number of different models contributing to the different time periods. No ranges are given for the RCP6.0 projections beyond 2100 as only two models are available. (Courtesy by IPCC). As described in the Section 2, 13 additional sensitivity analyses were performed to assess the robustness to data and modeling choices of the association between PM2.5 concentration and Covid-19 mortality rate, estimated through the main analysis. As shown in Fig. 2, it resulted that excluding the Central territorial areas leads to an underestimate of the obtained association between PM2.5 concentration and Covid-19 mortality rate. On the contrary, the omission of the Northern territorial areas brought to an uncertain estimate (wide CI) of the aforementioned association, due to the fitting of the model on only 60 out of 107 samples. Therefore, these results suggest that it is important to include these territorial areas in the dataset in order to reduce the bias on the estimation of the exposure-outcome association, as performed in the main analysis. Fig. 2 also highlights that the addition to the main analysis model of both the latitude and longitude of the territorial areas centroid (initially not included due to multicollinearity issues) results in a strong underestimate of the exposure-outcome association. Although the present study is affected by the inherent limitations of an ecological study design, it also has the following strengths. First, the main analysis model was adjusted for a large set of potential confounders, i.e., demographic, socio-economic, behavioral, climatic, mobility and health related. This is crucial because a higher number of covariates could reduce the confounding bias which potentially affects the estimated exposure-outcome association in an ecological design. Second, the present study was carried out by following a methodological procedure based on two preliminary steps: i) a descriptive analysis for each of the initial 74 collected variables with the aim of selecting the proper transformation of each variable in view of multivariate regression; ii) a multicollinearity analysis based on the VIF computation, in order to select the set of variables to be included in the regression model for the sake of minimizing multicollinearity exhibited by data. Moreover, the robustness of the main analysis to unmeasured confounding, data and modeling choices was carefully assessed through several sensitivity analyses by fitting a total of 47 multivariate regression models. Third, an additional sensitivity analysis to unmeasured confounding was performed, based on the E-value computation, in order to evaluate the potential impact of unmeasured confounding on the main analysis results. The novelty of the present study is the contextualization of the E-value with respect to the observed confounders included in the main analysis model, by computing the OCE that was recently theorized in (D'Agostino McGowan and Greevy, 2020). The study has also some limitations because ecological regression analyses are by design subject to a potential ecological bias due to their inability to adjust for individual-level risk factors, which are known to be affecting factors in the context of Covid-19 health outcomes (Wu et al., 2020). Therefore, when individual-level data are unavailable, the ecological regression analysis only allows significant conclusions at the considered resolution without providing information about individual-level associations, since a representative sample of individual-level data would be necessary in this case. Moreover, the resulting positive association between PM2.5 concentration and Covid-19 mortality rate might have been influenced by the heterogeneous spreading of the pandemic in Italy during 2020. In addition, a weak point could be the sample size of the dataset used to fit the main analysis model, as it only consists of 107 samples corresponding to all the Italian territorial areas. The Authors meant to conduct a nationwide study and the size of the sample is due to the lack of publicly available data regarding most of the variables included in the main analysis model with a finer geographic resolution, according to the Authors’ best knowledge. If data concerning all the variables involved in the present study were available with a finer geographic resolution for the whole Italian country (e.g., municipality resolution), the corresponding results might actually be more reliable. Additionally, even though most of the variables included in the main analysis model were natively available at the resolution of the single territorial area, four variables needed further preprocessing steps to reach the desired resolution (i.e., people with at least one chronic disease, number of passengers arriving from Italy by plane, number of passengers arriving from foreign countries by plane, and poverty incidence rate). Preprocessing led to the assumptions detailed in Section 2. As already mentioned, a higher number of covariates could reduce the uncertainty which potentially affects the estimated exposure-outcome association. For example, the inclusion of other variables (e.g., other health comorbidities, or occupational exposure to air pollutants, etc …) could help to give a more insightful view of their potential role in the association between PM2.5 concentration and Covid-19 mortality rate. Unfortunately, this type of data was not publicly available for all the 107 Italian territorial areas when the study was conducted, thus underlining the need of unified nationwide public databases which would allow the implementation of occupation-specific interventions to tackle the spread and severity of pandemic diseases such as Covid-19. Data concerning family clusters, if they were available, would also represent some very useful information to be considered, since they could map the personal contacts inside families and allow better monitoring of the Covid-19 pandemic. Anyway, family clusters could be studied only if individual-level data were available since the index case and the corresponding secondary cases would need to be identified. It is difficult to map this information in ecological studies, since there are no variables related to these aspects at the community level. The benefit which would derive from considering a higher number of covariates within ecological studies should encourage the scientific community to develop new internationally recognized databases that should be continuously updated. Indeed, the existence of such databases would allow researchers to make their ecological studies more reliable, comparable, and extensive. The inherent limitations of the ecological study design prevent the Authors from making conclusions about causality relationships. Improving the scientific rigor of research in this area depends on the availability of representative and individual-level data on Covid-19 health outcomes and potential confounders.

Conclusions

The association resulting from the ecological regression analysis provides statistical evidence to suggest the existence of a potential positive relationship between PM2.5 concentration and Covid-19 mortality rate throughout 2020, considering all the Italian territorial areas. The reported E-value and its contextualization through the OCE computation highlighted that the climatic factors might play an important role, as their omission from the main analysis model leads to an underestimate of the exposure-outcome association. Moreover, the sensitivity to unmeasured confounding analyses showed that a set of unknown confounders similar in size and magnitude to the set of the considered climatic factors could potentially tip the estimated exposure-outcome association. Consequently, the role of climatic factors in the Covid-19 pandemic is worth of further investigation.

Funding sources

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

30 in total

1. Sensitivity Analysis in Observational Research: Introducing the E-Value.

Authors: Tyler J VanderWeele; Peng Ding
Journal: Ann Intern Med Date: 2017-07-11 Impact factor: 25.391

2. Role of the chronic air pollution levels in the Covid-19 outbreak risk in Italy.

Authors: Daniele Fattorini; Francesco Regoli
Journal: Environ Pollut Date: 2020-05-04 Impact factor: 8.071

3. Assessing nitrogen dioxide (NO₂) levels as a contributing factor to coronavirus (COVID-19) fatality.

Authors: Yaron Ogen
Journal: Sci Total Environ Date: 2020-04-11 Impact factor: 7.963

4. Effect of ambient air pollutants and meteorological variables on COVID-19 incidence.

Authors: Ying Jiang; Xiao-Jun Wu; Yan-Jun Guan
Journal: Infect Control Hosp Epidemiol Date: 2020-05-11 Impact factor: 3.254

5. Meteorological factors and COVID-19 incidence in 190 countries: An observational study.

Authors: Cui Guo; Yacong Bo; Changqing Lin; Hao Bi Li; Yiqian Zeng; Yumiao Zhang; Md Shakhaoat Hossain; Jimmy W M Chan; David W Yeung; Kin-On Kwok; Samuel Y S Wong; Alexis K H Lau; Xiang Qian Lao
Journal: Sci Total Environ Date: 2020-11-23 Impact factor: 7.963

6. Association of environmental and meteorological factors on the spread of COVID-19 in Victoria, Mexico, and air quality during the lockdown.

Authors: Edgar Tello-Leal; Bárbara A Macías-Hernández
Journal: Environ Res Date: 2020-11-11 Impact factor: 6.498

7. COVID-19 incidence and mortality in Lombardy, Italy: An ecological study on the role of air pollution, meteorological factors, demographic and socioeconomic variables.

Authors: Elena De Angelis; Stefano Renzetti; Marialuisa Volta; Francesco Donato; Stefano Calza; Donatella Placidi; Roberto G Lucchini; Matteo Rota
Journal: Environ Res Date: 2021-01-22 Impact factor: 6.498

8. A Novel Coronavirus from Patients with Pneumonia in China, 2019.

Authors: Na Zhu; Dingyu Zhang; Wenling Wang; Xingwang Li; Bo Yang; Jingdong Song; Xiang Zhao; Baoying Huang; Weifeng Shi; Roujian Lu; Peihua Niu; Faxian Zhan; Xuejun Ma; Dayan Wang; Wenbo Xu; Guizhen Wu; George F Gao; Wenjie Tan
Journal: N Engl J Med Date: 2020-01-24 Impact factor: 91.245

9. Regional air pollution persistence links to COVID-19 infection zoning.

Authors: Antonio Frontera; Claire Martin; Kostantinos Vlachos; Giovanni Sgubin
Journal: J Infect Date: 2020-04-10 Impact factor: 6.072

10. Assessing correlations between short-term exposure to atmospheric pollutants and COVID-19 spread in all Italian territorial areas.

Authors: Gabriele Accarino; Stefano Lorenzetti; Giovanni Aloisio
Journal: Environ Pollut Date: 2020-10-15 Impact factor: 8.071

2 in total

1. A correlational analysis of COVID-19 incidence and mortality and urban determinants of vitamin D status across the London boroughs.

Authors: Mehrdad Borna; Maria Woloshynowych; Rosa Schiano-Phan; Emanuela V Volpi; Moonisah Usman
Journal: Sci Rep Date: 2022-07-11 Impact factor: 4.996

2. Economic expectations and anxiety during the COVID-19 pandemic: a one-year longitudinal evaluation on Italian university students.

Authors: Giovanni Busetta; Maria Gabriella Campolo; Demetrio Panarello
Journal: Qual Quant Date: 2022-02-28

2 in total