Literature DB >> 24787034

Performance of multi-city land use regression models for nitrogen dioxide and fine particles.

Meng Wang¹, Rob Beelen, Tom Bellander, Matthias Birk, Giulia Cesaroni, Marta Cirach, Josef Cyrys, Kees de Hoogh, Christophe Declercq, Konstantina Dimakopoulou, Marloes Eeftens, Kirsten T Eriksen, Francesco Forastiere, Claudia Galassi, Georgios Grivas, Joachim Heinrich, Barbara Hoffmann, Alex Ineichen, Michal Korek, Timo Lanki, Sarah Lindley, Lars Modig, Anna Mölter, Per Nafstad, Mark J Nieuwenhuijsen, Wenche Nystad, David Olsson, Ole Raaschou-Nielsen, Martina Ragettli, Andrea Ranzi, Morgane Stempfelet, Dorothea Sugiri, Ming-Yi Tsai, Orsolya Udvardy, Mihaly J Varró, Danielle Vienneau, Gudrun Weinmayr, Kathrin Wolf, Tarja Yli-Tuomi, Gerard Hoek, Bert Brunekreef.

Abstract

BACKGROUND: Land use regression (LUR) models have been developed mostly to explain intraurban variations in air pollution based on often small local monitoring campaigns. Transferability of LUR models from city to city has been investigated, but little is known about the performance of models based on large numbers of monitoring sites covering a large area.
OBJECTIVES: We aimed to develop European and regional LUR models and to examine their transferability to areas not used for model development.
METHODS: We evaluated LUR models for nitrogen dioxide (NO2) and particulate matter (PM; PM2.5, PM2.5 absorbance) by combining standardized measurement data from 17 (PM) and 23 (NO2) ESCAPE (European Study of Cohorts for Air Pollution Effects) study areas across 14 European countries for PM and NO2. Models were evaluated with cross-validation (CV) and hold-out validation (HV). We investigated the transferability of the models by successively excluding each study area from model building.
RESULTS: The European model explained 56% of the concentration variability across all sites for NO2, 86% for PM2.5, and 70% for PM2.5 absorbance. The HV R2s were only slightly lower than the model R2 (NO2, 54%; PM2.5, 80%; PM2.5 absorbance, 70%). The European NO2, PM2.5, and PM2.5 absorbance models explained a median of 59%, 48%, and 70% of within-area variability in individual areas. The transferred models predicted a modest-to-large fraction of variability in areas that were excluded from model building (median R2: NO2, 59%; PM2.5, 42%; PM2.5 absorbance, 67%).
CONCLUSIONS: Using a large data set from 23 European study areas, we were able to develop LUR models for NO2 and PM metrics that predicted measurements made at independent sites and areas reasonably well. This finding is useful for assessing exposure in health studies conducted in areas where no measurements were conducted.

Entities: Chemical

Mesh：

Substances：

Year: 2014 PMID： 24787034 PMCID： PMC4123024 DOI： 10.1289/ehp.1307271

Source DB: PubMed Journal: Environ Health Perspect ISSN： 0091-6765 Impact factor: 9.031

Introduction

Many studies have documented adverse health effects associated with long-term exposure to air pollutants (e.g., Brunekreef and Holgate 2002). With the improvement of the accuracy of geographical data, air pollution models incorporating data from geographical information systems (GIS) are of increasing interest in exposure assessment (Hoek et al. 2008; Jerrett et al. 2005). Land use regression (LUR) modeling is a popular method used for exposure assessment in health studies (Cesaroni et al. 2013; Estarlich et al. 2011; Gehring et al. 2011). LUR modeling is a GIS- and statistics-based method that exploits land use, geographic, and traffic characteristics (e.g., traffic intensity, road length, population density) to explain spatial concentration variations at monitoring sites. Land use regression models were constructed and used mostly to predict concentrations within metropolitan areas (Hoek et al. 2011; Madsen et al. 2007; Marshall et al. 2008) or small regions (Brauer et al. 2003; Henderson et al. 2007). Often, models have been based on measurements made at a relatively small number of sampling sites (20 to ~ 80 sites). Our recent study showed a positive association between the number of sampling sites and the prediction capability of models for NO2 based on 144 sites in the Netherlands (Wang et al. 2012), in agreement with observations for Girona, Spain (Basagaña et al. 2012). At least for some of the reported studies, there is still room to improve the model performances if more sampling sites were selected (Hoek et al. 2008). Several studies have reported the possibilities of building models in large areas in Europe, United States, and Canada (Beelen et al. 2009; Hart et al. 2009; Hystad et al. 2011; Vienneau et al. 2009, 2013). With a large number of sites, these models explained large fractions of NO2 variability (61% to ~ 90%) and a modest fraction of the variability of PM (40% to ~ 50%) across all sites. The large-area studies were all based on routine monitoring data. National routine monitoring networks may include only a small number of sites within individual cities. Therefore it may be difficult to evaluate how well a large-area model explains within-city variability. This is relevant for epidemiological studies based in individual cities. A study in Switzerland based on study-specific monitoring suggested that a countrywide model did not perform well within six of the eight geographically diverse study areas (Liu et al. 2012). The applicability of LUR models can be increased by transferring them to adjacent areas with similar geography and GIS databases where no or few measurements were conducted. The transferability of models has been investigated for local and national models (Allen et al. 2011; Poplawski et al. 2009; Vienneau et al. 2010). Most of the earlier studies recommended using the locally built models, even though transferred models explained variations in concentrations fairly well. This was recommended because all the transferred models were city–city or country–country transfers for which local specific variables were not available, and there was no advantage in the number of sampling sites compared with the locally developed models. So far, few studies have attempted to explore the performance of LUR models with combined geographical areas in terms of prediction ability and transferability at independent sites and areas—mainly because sufficient, comparable measurement data are lacking. In the context of the European Study of Cohorts for Air Pollution Effects (ESCAPE 2013), we applied a standardized approach for measurements, GIS variable collection, and model development for nitrogen dioxide (NO2) and particulate matter (PM) in 36 study areas in Europe (Beelen et al. 2013; Cyrys et al. 2012; Eeftens et al. 2012a, 2012b). We recently published LUR models developed within individual study areas for NO2 and PM (Beelen et al. 2013; Eeftens et al. 2012a). The ESCAPE database provides a unique opportunity to address important questions regarding application of LUR models developed for even larger areas. Therefore, the aims of this study are a) to develop LUR models for NO2, PM2.5 (PM with diameter ≤ 2.5 μm), and PM2.5 absorbance based on combining the ESCAPE study areas across Europe and across four regions of Europe; b) to evaluate the model performances systematically in terms of model fitting and prediction ability; and c) to investigate the transferability of the regional and European models to monitoring sites and areas not included in the model building.

Methods

Study areas and air pollution measurements. Details of the ESCAPE study design and the measurement campaign have been described previously (Cyrys et al. 2012; Eeftens et al. 2012b). Briefly, an intensive monitoring campaign was conducted in 36 European study areas between October 2008 and May 2011. ESCAPE included 20 areas with simultaneous measurements of both PM and NO2 at 20 sites per area, and at 20 sites where only NO2 was measured. In an additional 16 areas, where PM measurements were not available, only NO2 measurements were conducted at 40 sites per area. The number of measurement sites was doubled in the large study area of the Netherlands and Belgium. In each area, we chose sampling sites at street, urban background, and regional background locations. Sites were also selected to cover locally important variation—for example, presence of a port or altitude. These sites were selected to represent the spatial distribution of air pollution and residential addresses of participants of cohort studies in these areas. The background sites have been carefully selected to the locations not influenced by local traffic and other local emissions (e.g., industry and port) (Beelen et al. 2013; Eeftens et al. 2012a). Annual average concentrations were calculated from three 2-week samples in the cold, warm, and intermediate seasons. Because the number of samplers was limited, five sites and the references site were measured simultaneously. The measured values were adjusted for temporal trends with data from the continuous reference site in each area by calculating absolute differences between concentrations at monitoring sites and reference sites and using that as adjustment factor (Cyrys et al. 2012; Eeftens et al. 2012b). For this paper, we selected the 23 areas (Figure 1) in which traffic intensity variables were available for LUR model building in line with the importance of traffic intensity variables in model development (Beelen et al. 2013). This included 17 of the 20 PM/NO2 areas and 6 of the 16 NO2-only areas. We allocated the areas to four regions according to the geographic location, the characteristics of the climate, the traffic intensity levels, and the configuration of the cities/country. These regions included five areas in north Europe (Oslo, Norway; Stockholm and Umeå, Sweden; Copenhagen, Denmark; Helsinki/Turku, Finland), seven in the west (Netherlands and Belgium; London, Manchester, and Bradford, UK; Ruhr area and Erfurt, Germany; Paris, France), six in the center (Munich and Vorarlberg, Germany; Györ, Hungary; Lugano, Switzerland; Grenoble and Lyon, France), and five in the south (Turin and Rome, Italy; Athens, Greece; Barcelona, Spain; Marseille, France) (Figure 1, Table 1).

Figure 1

Map of study areas including region indication. Symbols: black, West Europe; +, North Europe; ×, Central Europe; open, South Europe

Table 1

Study areas.

Code	Type	Region	Study area
NOS	PM/NO₂	North	Oslo, Norway
SST	PM/NO₂	North	Stockholm, Sweden
FIH	PM/NO₂	North	Helsinki/Turku, Finland
DCO	PM/NO₂	North	Copenhagen, Denmark
SUM	NO₂	North	Umeå, Sweden
UKM	PM/NO₂	West	Manchester, UK
UKO	PM/NO₂	West	London, Oxford, UK
BNL	PM/NO₂	West	Netherlands and Belgium
GRU	PM/NO₂	West	Ruhr area, Germany
GRE	NO₂	West	Erfurt, Germany
UKB	NO₂	West	Bradford, UK
FPA	PM/NO₂	West	Paris, France
GMU	PM/NO₂	Central	Munich, Germany
AUV	PM/NO₂	Central	Vorarlberg, Austria
FLY	NO₂	Central	Lyon, France
HUG	PM/NO₂	Central	Györ, Hungary
SWL	PM/NO₂	Central	Lugano, Switzerland
FGR	NO₂	Central	Grenoble, France
ITU	PM/NO₂	South	Turin, Italy
IRO	PM/NO₂	South	Rome, Italy
SPB	PM/NO₂	South	Barcelona, Spain
FMA	NO₂	South	Marseille, France
GRA	PM/NO₂	South	Athens, Greece

Map of study areas including region indication. Symbols: black, West Europe; +, North Europe; ×, Central Europe; open, South Europe Study areas. For this study we selected NO2 and PM2.5 absorbance to represent traffic-related air pollution, and PM2.5 for a more complex mixture of sources. NO2 was measured using Ogawa badges following the Ogawa analysis protocol (V 3.98; Ogawa & Co., Pompano Beach, FL USA). PM2.5 samples were collected on preweighted filters using Harvard impactors, and were then used to measure absorbance (Cyrys et al. 2012; Eeftens et al. 2012b). Predictor variables. We extracted values for the GIS predictor variables at the locations of sampling sites using ArcGIS (ESRI, Redlands, CA, USA). Details of the predictor variables have been described in previous papers (Beelen et al. 2013; Eeftens et al. 2012a). Briefly, the predictor variables were derived from both centrally available Europe-wide GIS databases and GIS data collected by the local centers using standard definitions. Central GIS predictor variables included road network, land use, population density, and altitude data. The digital road network was obtained from EuroStreets version 3.1 (EuroStreets 2013) for the year 2008. The total lengths of all roads and major roads were calculated within a buffer size of 25, 50, 100, 300, 500, or 1,000 m. Traffic intensity data were not available for this road network. Land use variables were derived from the European Corine Land Cover (European Environment Agency 2000) database for the year 2000 for the buffer sizes of 100, 300, 500, 1,000 and 5,000 m. Digital elevation data were obtained through the Shuttle Radar Topographic Mission (SRTM) (CGIAR Consortium for Spatial Information 2013). Detailed road network with linked traffic intensity for all road links were obtained from local sources for all 23 areas. Local land use, population density, altitude, and other local variables were also locally extracted for modeling. For the regional and European models, we pooled the data by including all the central GIS predictors and the local traffic variables with traffic intensity. We combined the centrally available land use variables high and low residence density, and the natural and urban green variables because not all the areas contained them separately. We made efforts to incorporate more local common variables for specific regions to capture regional variations. We included regional background concentrations of NO2, PM absorbance, and PM2.5 as the mean of the measured concentrations at ESCAPE regional background sites (1–20) in each local study area to characterize the spatial differences between study areas. In the Netherlands, regional background concentrations were interpolated from regional background sites throughout the country because background concentrations may vary at such a large scale. In total, 49 variables were evaluated at the European level and 54, 53, 54, and 64 variables in the north, west, middle, and south regions, respectively (see Supplemental Material, Table S1). Model development. A total of 960 NO2 sites and 356 PM sites (four sites were missing due to failed campaign) were available for modeling from 23 and 17 study areas, respectively. Detailed procedures of the NO2 and PM model development have been published elsewhere (Beelen et al. 2013; Eeftens et al. 2012a). The regional and European models were developed using the same strictly standardized approaches. Briefly, a supervised stepwise regression was used to develop the LUR model. We first evaluated univariate regression of the annual concentrations by entering all potential predictor variables. We forced the regional background concentration variable in the first step (for the European and regional models). Then the variable that produced the highest adjusted R2 and which had the a priori–defined direction of effect (e.g., positive for traffic intensity) was selected as the second predictor. Second, the remaining variables were added separately, and we assessed whether the variable with the highest increase in adjusted R2 improved the model by at least 1%. This process continued until no more variables with the a priori–specified sign could increase the model-adjusted R2 by at least 1%. In the final step, we excluded variables that had a p-value > 0.1. We checked whether the variance inflation factor was < 3 to avoid multicollinearity. Model evaluations. We used three approaches for model evaluation: We investigated the model fit at individual study areas by applying the European/regional model to the sites of each area that were used for modeling. The Modelintra R2 shows the within-area variations explained by the European/regional models, which are directly comparable with the R2 of city-specific models. The Modelintra R2 is important for studies conducted within individual cities that use the European/regional model. The overall R2 is relevant for multi-city studies that exploit both within- and between-city variability of air pollution contrasts. The Modelintra R2 is important for European studies such as ESCAPE because cohorts were located within a city or small area, and cohort-specific epidemiological analyses were conducted. Cross-validation (CV) is an internal validation for testing the stability of model fit. We conducted leave-one-area-out-cross-validation (LOAOCV) by leaving out all observations from a complete area of n study areas (n = 23 for NO2 and 17 for PM), refitting the model based on the remaining M-1 areas, and investigating the agreement between predicted and observed concentrations for each area that was left out. This was iterated M times, and the LOAOCV reflects the heterogeneity of model fit due to regional variations between study areas. We do not report LOAOCV that was almost identical to the model R2 probably because of the large training data set. The hold-out validation (HV) is an evaluation of model predictive power to independent sites not used for model building. In contrast with CV, HV reflects the prediction ability of models to the cohort addresses within the areas on which the models had been established. As a test, we divided the full set into two parts; the training sets were used for modeling and the remaining test sets were used for external evaluation. For NO2, we developed models using the PM/NO2 sites with 20–40 sites per area (480 sites in total) as training sets and the remaining 480 NO2-only sites as test sets. For PM2.5 and PM2.5 absorbance, a randomly selected 25% of the PM sites stratified by study area were used for validation purpose because we had fewer sites available for PM model building than for NO2 model building. The HV R2 is the squared Pearson correlation between predictions and observations at the independent sites throughout the whole study area. We calculated the HV R2 by truncating the values of predictors in the test data sets that were outside the range of the values observed in the data set for model development, to prevent unrealistic predictions based on model extrapolations (Wang et al. 2012). Prediction errors were estimated by root mean squared error (RMSE). In our previous study, the same NO2 training and test sets were used for the ESCAPE city-specific model evaluations individually in each study area (Wang et al. 2013). Therefore, a fair comparison of prediction ability (HV R2) between the European model and the city-specific models can be conducted using the same test sets for HV. The comparison was available only for NO2 due to relatively large number of sampling sites. Transferability of LUR models. To evaluate the prediction abilities of the regional/European models to independent individual study areas, we developed the regional and European models by excluding one area at a time and applied the transferred models directly to the sites of the area that was left out. Therefore, 23 NO2 models and 17 PM models were built until each of the study areas had been excluded once from model building. The TRANSintra R2 is the squared Pearson correlation between observed and predicted values in each of the remaining areas that was excluded from modeling. The TRANSintra R2 is different from the Modelintra and the LOAOCV R2 because the measurements conducted in the respective validation areas were completely left out from model development.

Results

NO Table 2 shows the concentration distributions of NO2 and PM metrics across the study areas by site types. Substantial spatial variations were found for all the pollutants across Europe. The variability was larger for NO2 than for PM2.5. The spatial variability for PM2.5 absorbance was intermediate between PM2.5 and NO2. Concentration contrasts were larger at the street sites for NO2 and PM2.5 absorbance than at the urban and regional background sites. Concentration contrasts for PM2.5 were more similar at all the site types, suggesting an influence of multiple sources in addition to traffic.

Table 2

Distributions of measured annual average NO2 and PM concentrations across Europe.

Pollutant and site type	n^a	Minimum	25th	Median	75th	Maximum
NO₂ (μg/m³)
Street sites	454	11.80	25.48	33.98	49.90	109.00
Urban background	414	3.03	15.38	22.88	30.67	57.63
Regional background	92	1.53	9.56	15.48	17.98	32.87
PM_2.5 (μg/m³)
Street sites	166	7.87	12.03	17.18	21.17	36.30
Urban background	144	5.62	10.97	15.87	18.62	32.59
Regional background	47	4.42	11.20	13.86	16.64	23.24
PM_2.5 absorbance (10^–5/m)
Street sites	166	0.78	1.63	2.16	2.81	5.09
Urban background	144	0.53	1.23	1.67	2.01	3.03
Regional background	47	0.33	0.92	1.16	1.45	2.37
25th and 75th are percentiles. ^aTotal number of sites in the study areas.

Distributions of measured annual average NO2 and PM concentrations across Europe. Models in combined areas. Table 2 shows the model details of NO2, PM, and PM2.5 absorbance combining all the European study areas. The NO2, PM2.5, and PM2.5 absorbance models explained 56%, 86%, and 70%, respectively, of the variation across all sites, which includes both within and between area variations (overall model R2). The LOAOCV R2 was 5% and 6% lower than the model R2 for NO2 and PM2.5, respectively, and was identical to the model R2 for PM2.5 absorbance. The HV R2s (50% training vs. 50% test sites for NO2, 75% training vs. 25% test sites for PM metrics) were slightly smaller than or nearly identical to the model R2s, explaining 54%, 80%, and 70% for NO2, PM2.5, and PM2.5 absorbance at the independent validation sites respectively (see Supplemental Material, Table S2). The HV R2 did not change if the predictor range was not truncated because only one site for NO2 model was truncated. The HV RMSE values were close to the values of LOAOCV RMSE for NO2 and PM metrics. The RMSE values were relatively small compared with the range of measurements as shown in Supplemental Material, Table S2. The median HV R2 of the European NO2 model at individual study areas was identical to those of the city-specific models reported by Wang et al. (2013) (see Supplemental Material, Figure S1). In the Turin and Paris areas with a low hold-out evaluation R2, for example, the HV R2s of the European model were considerably larger than those of the city-specific models. All the models in Table 3 included traffic intensity variables. The regional background concentration explained a large fraction (71%) of variation in PM2.5 documenting the importance of between-area differences for PM2.5 compared with that for the more traffic-related pollutants NO2 and PM2.5 absorbance.

Table 3

European models for NO2, PM2.5, and PM2.5 absorbance.

Predictors	Partial R²	β^a	Model_intra^b R²/IQR	LOAOCV R²/RMSE
NO₂ (n^c = 960, final model R² = 0.56)			0.59/0.19	0.50/8.49 (μg/m³)
Regional background concentration	0.08	2.63E-01
Traffic load in 50 m	0.35	2.44E-06
Road length in 1,000 m	0.50	2.74E-04
Natural and green in 5,000 m	0.55	–2.84E-07
Traffic intensity on the nearest road	0.56	2.21E-04
Intercept		1.38E+01
PM_2.5 (n^c = 356, final model R² = 0.86)			0.48/0.16	0.81/2.38 (μg/m³)
Regional background concentration	0.71	9.73E-01
Traffic load between 50 m and 1,000 m	0.81	4.75E-09
Traffic load in 50 m	0.84	5.28E-07
Road length in 100 m	0.86	2.12E-03
Intercept		3.06E-01
PM_2.5 absorbance (n^c = 356, final model R² = 0.70)			0.70/0.19	0.70/0.45 (10^–5/m)
Regional background concentration	0.28	9.06E-01
Traffic load in 50 m	0.58	2.07E-07
Road length in 500 m	0.67	2.90E-05
Natural and green in 5,000 m	0.69	–9.63E-09
Traffic load between 50 m and 1,000 m	0.70	4.20E-10
Intercept		2.95E-01
^aCoefficients of predictor variables in the models. ^bThe Model_intra R²s show the median and interquartile range (IQR) of the within-area variability explained by the European model in individual areas. ^cNumber of monitored sites available for model building.

European models for NO2, PM2.5, and PM2.5 absorbance. The regional models performed equally well as the European models in all regions except Southern Europe, where none of the models performed well in terms of the predictions to the independent sites (HV R2: 0–0.23) (see Supplemental Material, Table S3). Reassigning Turin from south Europe to the central Europe region only slightly changed the results. As shown in Table 3, the median within-area variability (Modelintra R2) explained by the European model for NO2 and PM2.5 absorbance at individual study areas was similar to the overall model R2, suggesting predominant sources of local emissions. For PM2.5, the median Modelintra R2 was much lower than the overall model R2 (0.48 vs. 0.86). Figure 2 (see also Supplemental Material, Figure S2) presents the correlation between predicted and measured PM2.5, PM2.5 absorbance, and NO2 by study areas. As the figures show, the variation of PM2.5 between areas was substantial compared to the within areas variation (e.g., low PM2.5 values in northern European cities such as Stockholm and high PM2.5 values in southern European cities such as Rome). On the contrary, for NO2 and PM2.5 absorbance, variation within areas was substantial compared with the variation between areas (see Supplemental Material, Figure S2). The observations are more underpredicted within individual areas for PM metrics (median regression slope: PM2.5, 0.47; PM2.5 absorbance, 0.57; NO2, 0.56) than across the whole European study areas (regression slope: PM2.5, 0.85; PM2.5 absorbance, 0.70; NO2, 0.57).

Figure 2

Scatter plots of predicted and measured PM2.5 with study areas color and symbol coded and two city-specific examples, Stockholm (SST) and Rome (IRO). See Table 1 for study area codes.

Scatter plots of predicted and measured PM2.5 with study areas color and symbol coded and two city-specific examples, Stockholm (SST) and Rome (IRO). See Table 1 for study area codes. Transferability. Table 4 shows the performance of the models that used all monitoring data excluding one area at the time. These models explained on average 57%, 84%, and 69% variability of NO2, PM2.5, and PM2.5 absorbance respectively. The model structures and R2s were similar to the models in Table 3, which were based on all study areas. They included the same variable categories but with, to some extent, different buffer sizes. The models predicted the spatial variations of NO2 and PM2.5 absorbance well in the areas not used for model building, with median TRANSintra R2s of 0.59 for NO2 and 0.67 for PM2.5 absorbance. Transferability was less for PM2.5 with a median R2 of 0.42. The same pattern was found for the model R2 focusing on within-area variability only (Modelintra). The variation in prediction R2s was relatively small for NO2, with an interquartile range (IQR) of 0.09, but larger for PM2.5 (IQR, 0.17) and PM2.5 absorbance (IQR, 0.21), showing that predictions were less comparable for the two PM metrics. The variation is shown in Figure 3 (see also Supplemental Material, Figure S3). Interestingly, this did not depend so much on area as on the specific combination of area and component. For example, the areas in Hungary (GyÖr), Germany (Munich), and Austria (Vorarlberg) showed decent model fit and predictability for NO2 and PM2.5 absorbance, but almost no model fit and predictability for PM2.5. The transferred regional models showed similar characteristics as those of the European models, whereas the median TRANSintra R2 was slightly lower (see Supplemental Material, Table S4).

Table 4

Transferability of European models to areas that were not used for model building [median (IQR)]

Pollutant	Model R²	Model_intra^a R²	TRANS_intra^b
Pollutant	Model R²	Model_intra^a R²	R²	RMSE
NO₂	0.57 (0.01)	0.59 (0.19)	0.59 (0.09)	5.58 (2.28)
PM_2.5	0.84 (0.01)	0.48 (0.16)	0.42 (0.17)	1.14 (0.58)
PM_2.5 absorbance	0.69 (0.01)	0.70 (0.19)	0.67 (0.21)	0.23 (0.07)
IQR, interquartile range. ^aModel_intra R²: R² of within-area variation explained by European model, with the same data as in Table 2. ^bTRANS_intra: squared correlations and RMSE between the predictions and observations at independent areas.

Figure 3

Transferability (TRANSintra R2) of the European models for NO2 and PM in the 23 study areas. See Table 1 for study area codes.

Transferability of European models to areas that were not used for model building [median (IQR)] Transferability (TRANSintra R2) of the European models for NO2 and PM in the 23 study areas. See Table 1 for study area codes.

Discussion

In this study we developed LUR models for NO2, PM2.5, and PM2.5 absorbance, with combined measurement data from 23 study areas across Europe. For NO2 and PM2.5 absorbance, these models predicted spatial variations in areas not commonly used for model building. For PM2.5, prediction R2s were moderate for intraurban variation, though in some areas in central Europe prediction R2s were low. The overall R2 including both between- and within-study area variability was high for PM2.5 and PM2.5 absorbance and more moderate for NO2. Comparisons with other large area studies. Our European models performed comparable or even better in predictions of NO2 and PM2.5 than other large area studies (see Supplemental Material, Table S5) (Beckerman et al. 2013; Beelen et al. 2009; Bergen et al. 2013; Hystad et al. 2011; Novotny et al. 2011; Sampson et al. 2013; Vienneau et al. 2013). For PM2.5 absorbance, this is the first report of LUR models in such a large geographical area. Model R2s are difficult to compare because studies differed in study area, model development strategies, scale of prediction, offered predictor variables, and number of sites. Consistent across studies, the regional background predicted a small fraction of variability for NO2 and a large fraction for PM2.5. For intracity model R2, our NO2 European model exhibited performance (Modelintra R2 = 0.59) comparable with that of the Canadian national model in seven specific areas (Edmonton, Montreal, Sarnia, Toronto, Victoria, Vancouver, and Winnipeg), with Modelintra R2 of 0.43 (Hystad et al. 2011). We observed no heterogeneity of model fit across study areas in the European model (LOAOCV R2s were close to the model R2). Our European and regional models have several strengths compared with previous European models that modeled concentration in 1 × 1 km grids (Beelen et al. 2009; Vienneau et al. 2009) or more recently 100 × 100 m (Vienneau et al. 2013): a) We modeled small-scale variation using sampling sites that were selected according to a standard method to cover intraurban concentration contrasts. b) We included multiple pollutants (PM2.5, PM absorbance), which were much less available or measured with different methods from routine monitoring networks in Europe. c) We incorporated local traffic intensity data not available in Europe-wide databases (land use and road length data only). All the models included traffic intensity variables, improving prediction ability (HV R2) over models not having local traffic intensity data (but potentially road length)—for example, from 0.46 to 0.54 for NO2. The poor performance of the south European model may be attributed to the large heterogeneity of model fit (low LOAOCV R2) across south European study areas in which the concentrations in Athens were overestimated more than those of the other study areas. More formal methods, such as hierarchical cluster analysis to define regions, could be explored to improve comparability of regions. Our PM2.5 European model explained a median of 48% within-area variations compared with the overall model R2 of 86%, which was largely explained by substantial differences in regional background concentrations. This was consistent with the R2s of the Canadian and American PM2.5 model (46% and 63%), of which the satellite data alone explained 41% and 52% of the variability, respectively (Beckerman et al. 2013; Hystad et al. 2011) (see Supplemental Material, Table S5). PM2.5 is well known to be a regional pollutant with a large fraction of secondary aerosol, not explained well by the local GIS and traffic variables typically available for LUR model building. This suggested that for pollutants (e.g., PM2.5) with much larger overall than within-city R2, joint analyses of cohorts including between-city exposure components might be advisable. This does require the assumption of sufficient comparability of cohorts across Europe. Other methods such as partial least squares regression may help to increase the prediction ability of models (Sampson et al. 2013). Comparison with ESCAPE city-specific models. NO2 and PM models based on small training sets and a large number of predictor variables overestimate predictive ability in independent test sets, though still explaining fairly large fractions (50% to ~ 60%) of spatial variability (Wang et al. 2012, 2013). HV R2s of the European models developed on a large number of sites were very similar to the model R2. The average differences of the model R2s and HV R2s were just 2%, 6%, and 0% for NO2, PM2.5, and PM2.5 absorbance. The slightly larger drop for PM2.5 could be attributable to more sources affecting PM2.5 compared with NO2 and PM2.5 absorbance. The ESCAPE city-specific models that have been published previously using local specific variables explained a median of 82%, 71%, and 89% of the concentration variations for NO2, PM2.5, and PM2.5 absorbance (Beelen et al. 2013; Eeftens et al. 2012a). This is higher than the R2 of within-area variability explained by the European models in Table 2 (Modelintra R2: 59%, 48%, 70% respectively). The average differences between the individual city-specific model R2 (Beelen et al. 2013; Eeftens et al. 2012a) and the intraurban R2 (see Supplemental Material, Figure S3) are 24%, 24%, and 17% for NO2, PM2.5, and PM2.5 absorbance respectively. Because model R2 overestimates predictive ability, especially when developed for a relatively small number of sites (Wang et al. 2012, 2013), the comparison between local and European models should not be based on model R2 but HV R2 at independent sites. Comparison of the prediction ability between the European and city-specific models is feasible only for NO2, which suggested that the European and city-specific model had similar median prediction ability to the external sites not used for modeling. The HV R2 in some cities [e.g., Turin (ITU) and Paris (FPA)] that had poor predictions by the city-specific model may be improved substantially by the European model. We cannot draw a firm conclusion about one or the other approach being more reliable because comparisons for PM models were infeasible. The European model may reduce bias in health estimates because of relatively large number of sampling sites and small number of variables compared with the city-specific models (Basagaña et al. 2013). Most of the combined models included traffic variables in both large (≥ 500 m) and small buffers (≤ 50 m), representing general area characteristics as well as localized influences. In contrast to the study-area specific ESCAPE models (Beelen et al. 2013; Eeftens et al. 2012a), none of our European models included population/residence density, but instead selected road length in large buffers, which likely also represents urban–rural difference in terms of population distributions (de Hoogh et al. 2013). In our GIS data set, the squared correlation R2 between road length and population density is 0.46 within a 1,000-m buffer but is only 0.13 within a 100-m buffer. Road length variables in large buffers therefore represent various aspects of “total human activity” such as traffic, heating, population density. Transferability of combined models. Previous studies on the transferability of LUR models were mainly focusing on city-to-city or country-to-country transferability. Briggs et al. (2000) concluded that the SAVIAH (Small-Area Variations In Air Quality and Health) models could be applied to other UK cities after calibrating with data from a few monitoring sites. Poplawski et al. (2009) and Allen et al. (2011) observed that local calibration may improve the predictions of the Canadian city-specific models to a few other comparable cities in Canada and the United States. Vienneau et al. (2010) found reasonable transferability of British and the Dutch models between these two countries. All the previous studies concluded that the performances of the transferred models were worse than those of the local source models. Our results show prediction capabilities for the traffic-related pollutants NO2 and PM2.5 absorbance that are on par with those documented, in terms of HV R2s, with previous local exercises (Basagaña et al. 2012; Wang et al. 2012). This might be attributable to the fact that the ESCAPE study used highly standardized monitoring and GIS data for measurement, data collection, and model building across all areas. This suggests that our combined models can be carefully applied to other areas in Europe with common predictors, similar geographies, and availability of consistent regional background concentration within the region. Because the locations are well characterized, any candidate background location in a new area can be judged against the same criteria. Obviously, this will only work when the pollution characteristics or components are actually measured in the new area. In practice, this means that modeling of new areas will in most cases be restricted to NO2/nitrogen oxides and PM10 (PM ≤ 10 μm) and, in fewer areas (in Europe), to PM2.5 and PM absorbance. Satellite data have large spatial coverage and have improved NO2 and PM10 European models based upon routine monitor data by 5% and 11% (Vienneau et al. 2013). Satellite data could be used in the future to estimate background concentrations in new locations. In some individual areas of central Europe, the European model performed poorly for PM2.5, however, probably due to lack of an important local predictor variable (e.g., residential density in Munich and Vorarlberg, industry in Hungary, or altitude in Vorarlberg). Therefore, caution is needed when transferring the European models to cities for which the European model lacks predictor variables that are known to be important sources of variation locally. The poor performance in a few areas suggests that the value of the European model is especially in multicenter analyses such as ESCAPE compared with studies of individual areas. Applications in epidemiological studies. The overall R2 of the European model was highest for PM2.5 and lowest for NO2. In contrast, for within-city variation, the model had the lowest predictive ability for PM2.5, though it was still fairly high (median R2 = 0.48). The PM2.5 absorbance model explained both large fractions of variability overall and within-city. The high overall R2 suggests that the model can be used in pooled analyses of health data, exploiting exposure contrasts between study areas. Using between-city comparisons would be especially useful to increase PM2.5 contrasts. For ESCAPE, where the health findings based on these local exposure models are currently being published (Beelen et al. 2014; Raaschou-Nielsen et al. 2013), the model offers the possibility for pooled analyses. Pooled analyses have not been conducted so far, partly because of concerns of comparability of the diverse cohorts across Europe. There is also the possibility to include new study populations from areas where local measurements were never conducted but relevant predictor variables are available. For exposure assessment with LUR models, efforts are mainly in the sampling campaign and GIS data collection.

Conclusions

European LUR models for NO2, PM2.5, and PM2.5 absorbance were found to have reasonable power to predict spatial variations of these components in areas not used for model building. Click here for additional data file.

26 in total

Review 1. Air pollution and health.

Authors: Bert Brunekreef; Stephen T Holgate
Journal: Lancet Date: 2002-10-19 Impact factor: 79.321

2. Application of land use regression to estimate long-term concentrations of traffic-related nitrogen oxides and fine particulate matter.

Authors: Sarah B Henderson; Bernardo Beckerman; Michael Jerrett; Michael Brauer
Journal: Environ Sci Technol Date: 2007-04-01 Impact factor: 9.028

3. Measurement error in epidemiologic studies of air pollution based on land-use regression models.

Authors: Xavier Basagaña; Inmaculada Aguilera; Marcela Rivera; David Agis; Maria Foraster; Jaume Marrugat; Roberto Elosua; Nino Künzli
Journal: Am J Epidemiol Date: 2013-09-05 Impact factor: 4.897

4. A GIS-based method for modelling air pollution exposures across Europe.

Authors: D Vienneau; K de Hoogh; D Briggs
Journal: Sci Total Environ Date: 2009-10-28 Impact factor: 7.963

5. Development of Land Use Regression models for PM(2.5), PM(2.5) absorbance, PM(10) and PM(coarse) in 20 European study areas; results of the ESCAPE project.

Authors: Marloes Eeftens; Rob Beelen; Kees de Hoogh; Tom Bellander; Giulia Cesaroni; Marta Cirach; Christophe Declercq; Audrius Dėdelė; Evi Dons; Audrey de Nazelle; Konstantina Dimakopoulou; Kirsten Eriksen; Grégoire Falq; Paul Fischer; Claudia Galassi; Regina Gražulevičienė; Joachim Heinrich; Barbara Hoffmann; Michael Jerrett; Dirk Keidel; Michal Korek; Timo Lanki; Sarah Lindley; Christian Madsen; Anna Mölter; Gizella Nádor; Mark Nieuwenhuijsen; Michael Nonnemacher; Xanthi Pedeli; Ole Raaschou-Nielsen; Evridiki Patelarou; Ulrich Quass; Andrea Ranzi; Christian Schindler; Morgane Stempfelet; Euripides Stephanou; Dorothea Sugiri; Ming-Yi Tsai; Tarja Yli-Tuomi; Mihály J Varró; Danielle Vienneau; Stephanie von Klot; Kathrin Wolf; Bert Brunekreef; Gerard Hoek
Journal: Environ Sci Technol Date: 2012-10-01 Impact factor: 9.028

6. Effects of long-term exposure to air pollution on natural-cause mortality: an analysis of 22 European cohorts within the multicentre ESCAPE project.

Authors: Rob Beelen; Ole Raaschou-Nielsen; Massimo Stafoggia; Zorana Jovanovic Andersen; Gudrun Weinmayr; Barbara Hoffmann; Kathrin Wolf; Evangelia Samoli; Paul Fischer; Mark Nieuwenhuijsen; Paolo Vineis; Wei W Xun; Klea Katsouyanni; Konstantina Dimakopoulou; Anna Oudin; Bertil Forsberg; Lars Modig; Aki S Havulinna; Timo Lanki; Anu Turunen; Bente Oftedal; Wenche Nystad; Per Nafstad; Ulf De Faire; Nancy L Pedersen; Claes-Göran Östenson; Laura Fratiglioni; Johanna Penell; Michal Korek; Göran Pershagen; Kirsten Thorup Eriksen; Kim Overvad; Thomas Ellermann; Marloes Eeftens; Petra H Peeters; Kees Meliefste; Meng Wang; Bas Bueno-de-Mesquita; Dorothea Sugiri; Ursula Krämer; Joachim Heinrich; Kees de Hoogh; Timothy Key; Annette Peters; Regina Hampel; Hans Concin; Gabriele Nagel; Alex Ineichen; Emmanuel Schaffner; Nicole Probst-Hensch; Nino Künzli; Christian Schindler; Tamara Schikowski; Martin Adam; Harish Phuleria; Alice Vilier; Françoise Clavel-Chapelon; Christophe Declercq; Sara Grioni; Vittorio Krogh; Ming-Yi Tsai; Fulvio Ricceri; Carlotta Sacerdote; Claudia Galassi; Enrica Migliore; Andrea Ranzi; Giulia Cesaroni; Chiara Badaloni; Francesco Forastiere; Ibon Tamayo; Pilar Amiano; Miren Dorronsoro; Michail Katsoulis; Antonia Trichopoulou; Bert Brunekreef; Gerard Hoek
Journal: Lancet Date: 2013-12-09 Impact factor: 79.321

7. Intercity transferability of land use regression models for estimating ambient concentrations of nitrogen dioxide.

Authors: Karla Poplawski; Timothy Gould; Eleanor Setton; Ryan Allen; Jason Su; Timothy Larson; Sarah Henderson; Michael Brauer; Perry Hystad; Christy Lightowlers; Peter Keller; Marty Cohen; Carlos Silva; Mike Buzzelli
Journal: J Expo Sci Environ Epidemiol Date: 2008-04-09 Impact factor: 5.563

8. Creating national air pollution models for population exposure assessment in Canada.

Authors: Perry Hystad; Eleanor Setton; Alejandro Cervantes; Karla Poplawski; Steeve Deschenes; Michael Brauer; Aaron van Donkelaar; Lok Lamsal; Randall Martin; Michael Jerrett; Paul Demers
Journal: Environ Health Perspect Date: 2011-03-31 Impact factor: 9.031

9. Spatial modeling of PM10 and NO2 in the continental United States, 1985-2000.

Authors: Jaime E Hart; Jeff D Yanosky; Robin C Puett; Louise Ryan; Douglas W Dockery; Thomas J Smith; Eric Garshick; Francine Laden
Journal: Environ Health Perspect Date: 2009-06-29 Impact factor: 9.031

10. A national prediction model for PM2.5 component exposures and measurement error-corrected health effect inference.

Authors: Silas Bergen; Lianne Sheppard; Paul D Sampson; Sun-Young Kim; Mark Richards; Sverre Vedal; Joel D Kaufman; Adam A Szpiro
Journal: Environ Health Perspect Date: 2013-06-11 Impact factor: 9.031

19 in total

1. Applying land use regression model to estimate spatial variation of PM₂.₅ in Beijing, China.

Authors: Jiansheng Wu; Jiacheng Li; Jian Peng; Weifeng Li; Guang Xu; Chengcheng Dong
Journal: Environ Sci Pollut Res Int Date: 2014-12-10 Impact factor: 4.223

2. Outdoor air pollution, exhaled 8-isoprostane and current asthma in adults: the EGEA study.

Authors: Anaïs Havet; Farid Zerimech; Margaux Sanchez; Valérie Siroux; Nicole Le Moual; Bert Brunekreef; Morgane Stempfelet; Nino Künzli; Bénédicte Jacquemin; Régis Matran; Rachel Nadif
Journal: Eur Respir J Date: 2018-04-04 Impact factor: 16.671

3. A New Technique for Evaluating Land-use Regression Models and Their Impact on Health Effect Estimates.

Authors: Meng Wang; Bert Brunekreef; Ulrike Gehring; Adam Szpiro; Gerard Hoek; Rob Beelen
Journal: Epidemiology Date: 2016-01 Impact factor: 4.822

4. Transferability and generalizability of regression models of ultrafine particles in urban neighborhoods in the Boston area.

Authors: Allison P Patton; Wig Zamore; Elena N Naumova; Jonathan I Levy; Doug Brugge; John L Durant
Journal: Environ Sci Technol Date: 2015-04-30 Impact factor: 9.028

5. Effect of time-activity adjustment on exposure assessment for traffic-related ultrafine particles.

Authors: Kevin J Lane; Jonathan I Levy; Madeleine Kangsen Scammell; Allison P Patton; John L Durant; Mkaya Mwamburi; Wig Zamore; Doug Brugge
Journal: J Expo Sci Environ Epidemiol Date: 2015-04-01 Impact factor: 5.563

6. Performance comparison of LUR and OK in PM2.5 concentration mapping: a multidimensional perspective.

Authors: Bin Zou; Yanqing Luo; Neng Wan; Zhong Zheng; Troy Sternberg; Yilan Liao
Journal: Sci Rep Date: 2015-03-03 Impact factor: 4.379

7. Development of land use regression models for nitrogen dioxide, ultrafine particles, lung deposited surface area, and four other markers of particulate matter pollution in the Swiss SAPALDIA regions.

Authors: Marloes Eeftens; Reto Meier; Christian Schindler; Inmaculada Aguilera; Harish Phuleria; Alex Ineichen; Mark Davey; Regina Ducret-Stich; Dirk Keidel; Nicole Probst-Hensch; Nino Künzli; Ming-Yi Tsai
Journal: Environ Health Date: 2016-04-18 Impact factor: 5.984

8. Prenatal and childhood exposure to air pollution and traffic and the risk of liver injury in European children.

Authors: Erika Garcia; Nikos Stratakis; Damaskini Valvi; Léa Maitre; Nerea Varo; Gunn Marit Aasvang; Sandra Andrusaityte; Xavier Basagana; Maribel Casas; Montserrat de Castro; Serena Fossati; Regina Grazuleviciene; Barbara Heude; Gerard Hoek; Norun Hjertager Krog; Rosemary McEachan; Mark Nieuwenhuijsen; Theano Roumeliotaki; Rémy Slama; Jose Urquiza; Marina Vafeiadi; Miriam B Vos; John Wright; David V Conti; Kiros Berhane; Martine Vrijheid; Rob McConnell; Lida Chatzi
Journal: Environ Epidemiol Date: 2021-05-11

9. Long-Term Trends Worldwide in Ambient NO2 Concentrations Inferred from Satellite Observations.

Authors: Jeffrey A Geddes; Randall V Martin; Brian L Boys; Aaron van Donkelaar
Journal: Environ Health Perspect Date: 2015-08-04 Impact factor: 9.031

Review 10. Design of an Air Pollution Monitoring Campaign in Beijing for Application to Cohort Health Studies.

Authors: Sverre Vedal; Bin Han; Jia Xu; Adam Szpiro; Zhipeng Bai
Journal: Int J Environ Res Public Health Date: 2017-12-15 Impact factor: 3.390