Literature DB >> 35262348

Mixed-Effects Modeling Framework for Amsterdam and Copenhagen for Outdoor NO₂ Concentrations Using Measurements Sampled with Google Street View Cars.

Jules Kerckhoffs¹, Jibran Khan^2,3, Gerard Hoek¹, Zhendong Yuan¹, Thomas Ellermann², Ole Hertel⁴, Matthias Ketzel^2,5, Steen Solvang Jensen², Kees Meliefste¹, Roel Vermeulen^1,6.

Abstract

High-resolution air quality (AQ) maps based on street-by-street measurements have become possible through large-scale mobile measurement campaigns. Such campaigns have produced data-only maps and have been used to produce empirical models [i.e., land use regression (LUR) models]. Assuming that all road segments are measured, we developed a mixed model framework that predicts concentrations by an LUR model, while allowing road segments to deviate from the LUR prediction based on between-segment variation as a random effect. We used Google Street View cars, equipped with high-quality AQ instruments, and measured the concentration of NO2 on every street in Amsterdam (n = 46.664) and Copenhagen (n = 28.499) on average seven times over the course of 9 and 16 months, respectively. We compared the data-only mapping, LUR, and mixed model estimates with measurements from passive samplers (n = 82) and predictions from dispersion models in the same time window as mobile monitoring. In Amsterdam, mixed model estimates correlated rs (Spearman correlation) = 0.85 with external measurements, whereas the data-only approach and LUR model estimates correlated rs = 0.74 and 0.75, respectively. Mixed model estimates also correlated higher rs = 0.65 with the deterministic model predictions compared to the data-only (rs = 0.50) and LUR model (rs = 0.61). In Copenhagen, mixed model estimates correlated rs = 0.51 with external model predictions compared to rs = 0.45 and rs = 0.50 for data-only and LUR model, respectively. Correlation increased for 97 locations (rs = 0.65) with more detailed traffic information. This means that the mixed model approach is able to combine the strength of data-only mapping (to show hyperlocal variation) and LUR models by shrinking uncertain concentrations toward the model output.

Entities: Chemical

Keywords: Google Street View; LUR; NO2 measurements; hyperlocal variation; mixed-effect model

Mesh：

Substances：

Year: 2022 PMID： 35262348 PMCID： PMC9178915 DOI： 10.1021/acs.est.1c05806

Source DB: PubMed Journal: Environ Sci Technol ISSN： 0013-936X Impact factor: 11.357

Introduction

Most air pollutants exhibit small-scale spatial variation that cannot be captured by traditional routine monitoring networks. Exposure assessment of air pollution has, therefore, been revolutionized via mobile monitoring platforms during the past decade.[1−16] With advancements in air monitoring instrumentation, such as higher time resolution and greater portability, mobile monitoring platforms can directly measure spatial gradients and peaks in urban air pollution. Li et al.[17] showed that quantifying spatial variation of NO2 within urban areas with high fidelity (<4 μg/m3 NO2) is not likely attainable unless the sampling network is dense, having more than one or two sensors per km.[2] Whereas mobile sampling is great in measuring the local variation in concentration levels, a fundamental limitation is that such measurements only consist of a limited number of seconds per street segment.[17] To reduce this problem, most mobile monitoring designs used land-use regression (LUR) models to develop concentration maps. Alternatively, when a significant number of repeated measurements are available, these could be used to create measurement-only concentration maps.[3,16,18] Both approaches have strengths and limitations. Regarding data-only mapping, Robinson et al.[18] considered 15 days as the minimum threshold of the daily visits required to produce representative long-term air pollution concentrations. This value is based on the work conducted by Apte et al.,[16] who designed a mobile sampling campaign using Google Street View (GSV) cars to measure air pollution levels on all streets in Oakland, USA. In Apte’s study, each street segment was measured around 50 times to develop a high-resolution measurement-only air pollution map of the city. However, measuring each street segment in a region of interest requires a significant amount of time, which might not be feasible for many locations, particularly in bigger cities (e.g., >100 km2). Therefore, many researchers have combined mobile monitoring with empirical LUR models to produce air pollution maps.[3,4,15] To compare data-only maps with LUR models, Messier et al.[3] measured all streets in Oakland at least 50 times and assumed that driving 50 times on different days generates “robust” long-term average concentrations. The authors then reduced the number of measuring days and compared data-only maps with the LUR models. They found that data-only mapping performed poorly with a few repeated drives, for example, one to two drives, but obtained R2 values better than the LUR approach within four to eight repeated drive days per road segment. A limitation of LUR models is however the loss of the very high spatial resolution as LUR models tend to “smooth” concentration levels over a wider terrain.[19] Therefore, in this paper, we propose a mixed modeling framework that combines the strengths of both data-only mapping and LUR models. This framework can deal with limited mobile monitoring data per street segment and “preserve” the high spatial resolution as much as possible. This method uses all measurements on all street segments to develop a LUR model but allows individual measurements to influence the output based on the between and within-segment variation. All measurements and models were averaged over street segments as the goal is to create a spatial map with long-term exposure predictions. We used mobile NO2 measurements collected with GSV cars in Amsterdam and Copenhagen to test and evaluate this framework. We compared data-only NO2 concentrations, LUR, and mixed model estimates with measurements from passive samplers and routine monitoring networks. We additionally compared with deterministic model predictions.

Materials and Methods

Study Sites

Amsterdam (hereafter, AMS) is the capital and the largest city of the Netherlands (see Figure a). AMS is the most populous city and has one of the densest road networks in the Netherlands. The city center has a mix of residential and commercial mid- and high-rise buildings and is bound by major interstate highways (Figure a). Amsterdam airport is located south-west of the city. AMS is flat (with surrounding flat land) and has an oceanic climate, significantly affected by its proximity to the North Sea to the west, with prevailing westerly winds.

Figure 1

Study sites: (a) City of Amsterdam and (b) Copenhagen metropolitan area containing three municipalities, Copenhagen, Frederiksberg, and Tårnby. The bold black lines show the border of the study sites. Background maps ESRI. The second study site consists of three municipalities, Copenhagen, Frederiksberg, and Tårnby, in the Copenhagen metropolitan area with a large commuter belt surrounding Copenhagen (see Figure b). Copenhagen (hereafter, CPH) is the largest and most populous city in Denmark and the Danish capital located on the eastern shore of the island of Zealand and Amager. The central part of CPH is relatively flat. The urban area stretches up to pprox.. 292 km2. CPH is interspersed with residential and commercial blocks containing low-, mid-, and high-rise buildings including major highways. The Copenhagen airport is in the south of the metropolitan area (Figure b).

Data Collection

Three GSV cars were equipped with 1 Hz nitrogen dioxide (NO2) monitors (CAPS, Aerodyne Research Inc, USA) and used to measure NO2 concentrations on each street segment in AMS and CPH. The instrument directly measures NO2 concentrations based on optical scattering and absorption. The geographical location of the car was recorded via a Global Positioning Unit (GPS; G-Star IV, GlobalSat, Taiwan), which was linked to the NO2 monitor in the GSV car using date and time. We used two GSV cars to monitor concentrations in AMS from 25 May 2019 to 15 March 2020. The third car was used to monitor NO2 concentrations in CPH from 15 October 2018 to 15 March 2020. Both measurement campaigns were stopped on 15 March 2020 due to COVID-19 lockdown restrictions. Measurements were collected between 08:00 and 22:00 on weekdays, with most measurements between 10:00 and 16:00. During data collection, the GSV cars measured in different parts of AMS and CPH as much as possible to reduce the spatial–temporal autocorrelation. NO2 concentrations higher than 500 μg/m3 and lower than 0 μg/m3 were removed from the data set as they are unrealistic and clear outliers. The final data set consisted of 5.9 million and 5.1 million 1 Hz measurements of NO2 in AMS and CPH, respectively. All processing steps, including subsequent model developments and analyses, are done in R software, version 4.0.4.

Data Processing and Aggregation

As street segments were measured at different times of the day and week, we applied a temporal correction to the measured data using nearby reference stations (one per city), explained in detail in the Supporting Information. In brief, the difference between the overall average concentration and the average of specific time windows at the reference station was used to correct all mobile measurements in corresponding time windows. The reference station measured concentrations for the full time period (all days of the week and day and night) of the mobile monitoring campaign, so corrected measurements can be used to reflect long-term concentrations. All measurements were assigned to the nearest street. The assigned values were then averaged over 50 m street segments per individual driving day (hereafter, drive-pass). Subsequently, we computed a mean of all drive-passes to get a single “mean of means” for all street segments. On average, each street segment consisted of 8 [interquartile range (IQR): 3–10] seconds per drive-pass and seven unique drive-passes, with some streets having multiple hours of data. There were 46,664 and 28,499 total street segments in the road network of AMS and CPH, respectively. Data of all drive-passes were used to develop the mixed-effects model for AMS and CPH. The “mean of means” data were used for data-only mapping and as inputs to develop LUR models for AMS and CPH. LUR models were developed by a supervised linear forward stepwise regression model. The criteria used in the development of the LUR models and coefficients for each city can be found in the Supporting Information.

Mixed Model Development

We developed a mixed modeling framework, also known as a linear “mixed-effects” model. The term comes from the coexistence of both fixed and random effects. The fixed effects are obtained from the standard coefficients of the LUR model. As all road segments are measured, we can use the measurements on all street segments as a random effect (cluster-specific effect). This allows the inclusion of cluster-specific effects while borrowing strength/stability from the fixed effects. This borrowing is stronger when data are closer to the average effect or for clusters that have less data. This way, the measured hyperlocal variation is preserved while uncertain low or high concentrations are shrunken toward the LUR model output. The mixed-effect model can be expressed aswhere Y is the mixed model prediction. The second part starts with the fixed effect where β is a (p, 1) vector of fixed effects attached to a matrix (X) of (n, p) covariates. Then, the random effects are added where b is a (q, 1) vector of random effects attached to a matrix (Z) of (n, p) covariates. The regression parameters, β (the fixed effects parameters), are the same for all individual drive-passes. If the vector of random effects b has mean zero, the mixed model estimates are fully based on the fixed effects (LUR model). Mixed model results were then averaged per street segment, similar to the average of the data-only approach and the LUR model.

Comparison with External Monitoring and Modeling

To evaluate the mixed model performance for AMS and CPH, we compared data-only measurements, LUR, and mixed model estimates with monitoring networks and deterministic model predictions. Hereafter, the data-only, LUR, and mixed model estimates, altogether, are referred to as Amsterdam Air View (AAV) and Copenhagen Air View (CAV) data. All comparison data sets are listed in Table , and their details are provided below.

Table 1

Overview of GSV Data and Comparison Data Sets in Amsterdam and Copenhagen

city	data	number of sites	year	name
AMS	Amsterdam Air View data (data-only, LUR, mixed model)	46,664	2019–2020	AAV
	Palmes tubes measurements[20]	82	2019–2020a	Palmes
	model predictions by the National Institute for Public Health and the Environment[21]	7004	2019	NSL
	Dutch National Air Quality Monitoring Programme	7	2019	LML
CPH	Copenhagen Air View data (data-only, LUR, mixed model)	28,499	2018–2020	CAV
	AirGIS model predictions (2019)[22,23]	58,234	2019	LPDV
	AirGIS model predictions along streets	97	2019	CPH-97
	Danish National Air Quality Monitoring Programme[24]	3	2018–2020a	NOVANA

Matches exact time window of GSV measurements. AMS: Amsterdam; CPH: Copenhagen.

Matches exact time window of GSV measurements. AMS: Amsterdam; CPH: Copenhagen. For AMS, the AAV data were compared with measurements from a passive sampler network using Palmes tubes at facades of buildings, which are maintained by the Municipal Health Service (GGD).[20] The Palmes tubes data consisted of repeated 4-weekly measurements throughout the whole year, covering all AMS and its surroundings. A total of 82 sites were within 20 m of the AAV measurements and had measured data available in the exact time window of the AAV campaign. The AAV data were also compared with the model predictions from the Dutch National Collaboration Programme on Air Quality [In Dutch: “Nationaal Samenwerkingsprogramma Luchtkwaliteit” (NSL)].[21] Model predictions from this framework are calculated for each major road at 100 m intervals on both sides, approximately 10 m from the roadside. We compared AAV data with the nearest NSL prediction within 20 m (n = 7004). In addition, we also compared AAV NO2 concentrations, Palmes, and NSL, where all three data sources were available (n = 47, overlapping sites). To assess the “absolute levels” of NO2 concentrations across the city, we compared mixed model estimates with annual average NO2 concentrations collected by the Dutch National Air Quality Monitoring Programme (LML). For CPH, the CAV data were compared with three air quality datasets. The first comparison dataset was based on recently updated Air Quality at Your Street address-level NO2 concentrations, annual average, 2019 (hereafter, LPDV).[23] LPDV is a high-resolution public map of air quality for each address location in Denmark. The air pollution levels were estimated using the Danish multiscale dispersion modeling system (DEHM-UBM-AirGIS), a standard toolkit to calculate pollution levels at any address location in Denmark. The modeled concentrations are representative of close to the building façade. The details of the DEHM-UBM-AirGIS system as well as its detailed inputs are provided in the study by Khan et al., 2019. CAV data were compared to the nearest LPDV point within 20 m (n = 58,234). The second comparison dataset was based on high-quality DEHM-UBM-AirGIS[22] predictions of NO2 concentrations and point locations along 97 busy streets in Copenhagen. Air pollution (e.g., NO2) is usually calculated for these streets as part of the Danish National Monitoring and Assessment Programme for the Aquatic and Terrestrial Environment (NOVANA). Again, the nearest neighbor analysis, as described above, was performed to compare NO2 estimates. This comparison dataset will be referred to as CPH-97. This dataset is based on more detailed and validated traffic data than LPDV as traffic data originate from the traffic monitoring program of the Municipality of Copenhagen. The third comparison dataset (NO2, 2019 annual averages) was acquired from two traffic monitoring stations and two background stations. The monitoring stations are part of the Danish Air Quality Monitoring Network in four major cities of Denmark; see Ellermann et al.[24] for more details. These data (hereafter, NOVANA) are used to assess the “absolute levels” of NO2 concentrations across the city.

Results

In the results section, we split the analyses by city and combine interpretations in the Discussion section.

Amsterdam

The LUR model based on measurements for AMS is shown in Supporting Information Table A2. The model mainly includes variables that describe local traffic intensity. Furthermore, the model includes a large-scale population density variable and the area of ports within a 1000 and 5000 m buffer. The model was able to explain the average concentrations per street segment moderately well (R2 value 0.49). Figure shows the data-only map (a), followed by the variance map with the standard error of the mean (b). This map indicates that multiple street segments have a large absolute and/or relative uncertainty. Figure c shows the predictions by the LUR model, which is much more smoothed than the data-only map. The mixed model prediction map (Figure d) is more smoothed than the data-only map but incorporates the variance of the data-only map in the random effect. This leads to more preservation of the local effects. Figure e shows, via the random effects (i.e., the difference between the mixed model and the LUR model), that the overall variance is comparable to the variance of the data-only map. In Figure f, we show the distribution of the measurements and model predictions by the LUR model and the mixed-effect model. High-resolution NO2 maps are available in the Supporting Information (Figure A1) and the mixed model estimates via Google’s Environmental Insights Explorer (https://insights.sustainability.google/labs/airquality). For all datasets, the concentrations are higher along the highways/major roads and vary generally smoothly along less busy roads. The same variation of pollution was also observed in the city center of AMS. Figure f shows that the variation in data-only NO2 is higher than the LUR and mixed model estimates.

Figure 2

Maps of measurements, predictions, and variance in Amsterdam. (a) Data-only map, (b) standard error of the mean, (c) LUR model (fixed effects), (d) mixed-effect model, (e) random components, and (f) distribution of NO2 measurements and predictions. Note: Boxes represent the IQR; the horizontal line is the median; vertical lines extend to IQR times 1.5 (limited to data); dots are individual outliers; the black squared dot is the mean. Full size maps in the Supporting Information (Figure A1). In Table , we present the summary statistics and Spearman correlation coefficients of all datasets (including external datasets) with matching locations. Measurements by the GSV car (and subsequent mixed model output) are on average higher than measurements and predictions by the Palmes tubes and NSL. Concentration distributions for the external datasets are given in Supporting Information Figure A2.

Table 2

Summary Statistics, Correlation, and Bias Scores for all Comparisons in AMSa

	summary statistics					correlation and bias
	min	Q1	med	Q3	max	r_s	RMSE	mean bias	mean relative bias (%)
Comparison to NSL Predictions, n = 7004
NSL	12	23	26	28	41
data-only	7	23	29	37	116	0.50	11.35	5.2	20
LUR model	16	27	31	34	84	0.61	6.76	5.1	20
mixed model	15	27	30	35	69	0.65	7.47	5.5	21
Comparison to Palmes Measurements, n = 82
Palmes	17	23	28	32	44
data-only	13	23	34	42	58	0.74	10.23	6.3	23
LUR model	20	28	32	35	52	0.75	6.49	4.5	16
mixed model	17	28	34	38	57	0.85	7.67	6.1	22
Comparison to Palmes Measurements with Overlapping NSL Sites, n = 47 (Major Roads Only)
Palmes	19	26	31	33	41
NSL	23	29	33	35	40	0.54	4.84	2.1	7
data-only	16	32	36	43	54	0.74	9.95	6.5	22
LUR model	24	32	35	37	50	0.45	6.93	4.7	16
mixed model	23	33	35	41	51	0.72	8.04	6.4	21

Summary statistics, RMSE, and mean bias in μg/m3. Min = minimum, Q1 = the 25th percentile, med = median or the 50th percentile, Q3 = the 75th percentile, max = maximum, rs = Spearman’s rank correlation, RMSE = root-mean square error, mean bias calculated as mean [(ref – test), mean relative bias calculated as mean bias/ref] where: ref = NSL, Palmes, and test = AAV data. Correlations between all data sets were moderately high, with the highest correlation between the mixed model and Palmes tubes (rs = 0.85). For data-only and LUR model predictions, correlations were 0.74 and 0.75, respectively. Furthermore, at Palmes sites with overlapping NSL predictions (n = 47), the mixed model explained measured concentrations at major roads modestly better than the national dispersion model predictions. Since NSL only makes predictions on major roads, the total variation in concentrations drops, resulting in overall lower correlation scores compared to the full set of monitoring locations. It also shows that a LUR model has more difficulties predicting concentration levels within that higher category, whereas the data-only approach is able to achieve a similar performance compared to the complete validation set. For the entire NSL dataset, we also found slightly higher correlations for the mixed model output than data-only and LUR model outputs. Supporting Information Figure A4 shows the scatterplots and Bland Altman plots for all comparisons. Of note, AAV data and mixed model predictions were on average 6.3 and 6.1 3 μg/m3 higher than the measurements from the Palmes tubes (Table ). The main reason for this difference is the fact that AAV data are measured and predicted on the road, while Palmes measurements were performed on the façade of buildings and expected to be lower due to dilution from road to building façade. Comparing the absolute concentration levels from the mixed model with mean concentrations from the seven routine measurement stations (LML) in AMS over 2019, we found a difference of 3 μg/m3, which is about 10%. In Supporting Information Figure A3, we show a bar chart for all seven LML sites. We found no apparent differences for sites close to traffic and sites in an urban background environment. Both data sets do not exactly overlap as the GSV was conducted from May 2019 till February 2020, and the routine measurements are the annual averages of 2019.

Copenhagen

The developed LUR model based on measurements in CPH is shown in Supporting Information Table B2, and like the AMS LUR model, it mainly includes variables that describe the local traffic intensity. However, the CPH LUR model also includes traffic intensity variables with bigger buffers and the average building height within 100 m. As noted in Section , the estimated/average building height was only available for CPH. The R2 value of the model was slightly higher than that of the AMS LUR model, that is, R2 = 0.54. Figure shows the data-only map of Copenhagen (a), followed by the variance map with the standard error of the mean (b). Like AMS, there are differences in absolute and/or relative uncertainties between street segments. Figure c shows the predictions by the LUR model, which is much more smoothed than the data-only map. The mixed model prediction map (Figure d) is more smoothed than the data-only map but incorporates the variance of the data-only map in the random effect. This leads to increased hyperlocal variability of concentrations. The random effects are shown in Figure e, showing that the overall variance is comparable to the variance of the data-only map. In Figure f, we show the distribution of the measurements and model predictions by the LUR model and mixed-effect model. High-resolution maps are available in the Supporting Information (Figure B1) and the mixed model estimates also via Google’s Environmental Insights Explorer (https://insights.sustainability.google/labs/airquality). Like Amsterdam, the concentrations are higher along the highways/major roads and vary generally smoothly along less busy roads, with variation in data-only NO2 being slightly higher than the other datasets.

Figure 3

Maps of measurements, predictions, and variance in Copenhagen. (a) Data-only map, (b) standard error of the mean, (c) LUR model (fixed effects), (d) mixed-effect model, (e) random components, and (f) distribution of NO2 measurements and predictions. Note: Boxes represent the IQR; the horizontal line is the median; vertical lines extend to IQR times 1.5 (limited to data); dots are individual outliers; the black squared dot is the mean. Full size maps in the Supporting Information (Figure B1). In Table , we present the distribution of measurements and models and Spearman’s correlation coefficients for CPH. Correlations with CAV data were moderately high for the CPH-97 data set but decreased when CAV data were compared with the LPDV data. CPH-97 has higher concentrations than LPDV data because CPH-97 only includes near-traffic locations. Mixed model estimates agreed better with the dispersion model approaches (i.e., LPDV, CPH-97) than the data-only and LUR models. Table also shows that the mixed model is able to lean to a LUR model when this generates better predictions (for LPDV) and uses more data-only measurements when they are more robust (for CPH-97). Supporting Information Figure B4 shows the scatterplots and Bland Altman plots for all comparisons.

Table 3

Summary Statistics, Correlation, and Bias Scores for all Comparisons in CPHa

	summary statistics					correlation and bias
	min	Q1	med	Q3	max	r_s	RMSE	mean bias	mean relative bias
Comparison to LPDV Model Predictions (N = 58,234)
LPDV	11	14	15	18	47
data-only	5	10	13	17	128	0.45	7.18	–1.87	–11
LUR model	8	12	15	18	50	0.50	4.06	–1.08	–6
mixed model	8	12	15	18	53	0.51	4.15	–1.14	–7
Comparison to CPH-97 Model Predictions (N = 97)
CPH-97	17	22	25	28	39
data-only	12	19	25	31	53	0.67	7.75	0.81	3
LUR model	16	24	27	30	50	0.55	5.16	1.78	7
mixed model	16	23	27	30	52	0.65	5.94	2.04	8

Summary statistics, RMSE and mean bias in μg/m3. min = minimum, Q1 = the 25th percentile, med = median or the 50th percentile, Q3 = the 75th percentile, max = maximum, rs = Spearman’s rank correlation, RMSE = root-mean square error, mean bias calculated as mean [(ref – test), mean relative bias calculated as mean bias/ref] where: ref = LPDV, CPH-97, and test = CAV data. In CPH, we did not find significant higher measurements and predictions by the CAV car compared to the LPDV data (Table ), though we found similar differences between CAV mixed model estimates and four stationary sites (Figure B3). Differences were between 10 and 20% in terms of absolute values, except for H.C. Andersen’s Boulevard, where concentrations differ by about 30%.

Discussion

In one of the largest mobile monitoring campaigns to date, we have shown that mobile monitoring can be used to develop accurate air pollution maps. The applied mixed model approach uses the advantages of a data-only and an empirical (LUR) model approach, outperforming the two individual approaches when compared to external measurements and different national dispersion models. Since all road segments are measured, the mixed models use the hyperlocal variation that can be picked up by a data-only approach while borrowing the stability from the LUR model estimates. This way, the measured hyperlocal variation is preserved, while uncertain low or high concentrations are shrunken toward the LUR model output.

Mobile Monitoring

Studies based on mobile monitoring usually face one out of two problems: the high variance (noise) in mobile measurements for specific locations (road segments) or loss of hyperlocal spatial variation by the creation of a LUR model. Figures b and 3b show that the variance (standard error of the mean) differs significantly between streets and neighborhoods. For example, 15% of the street segments in both cities have a standard error of the mean bigger than 5 μg/m3. While for some streets, four to eight repeats will be enough to characterize long-term concentration, some streets remain uncertain. Interpretation of hyperlocal effects is therefore very difficult. Only a few mobile monitoring campaigns have been able to measure such a significant amount of repeated measurements on street segments in a specific area, and there was no need to build a LUR model in order to create an air pollution concentration map.[3,7,16] While Messier et al.[3] found that 4–8 repeats were sufficient to create a data-only map for black carbon and nitrogen oxide (NO) better, or at par with a LUR model, Miller et al.[7] sampled each street segment (n = approx. 10,500) in Harris County, Texas 15–44 times and Apte et al.[16] needed 1 year to sample each street segment (n = approx. 21,000) in different parts of Oakland at least 30 times. It takes a lot of time and effort to create such rich data (>15 drives). For AMS and CPH with 46,664 and 28,499 street segments, respectively, it would take much more time or cars to achieve, let alone scaling up to bigger and more areas. Nevertheless, data-only mapping in AMS correlated highly with external measurements (rs = 0.74; Table ). On the other hand, data-only mapping in CPH correlated poorly with the national model predictions (rs = 0.45; Table ).

LUR Model Development

In previous work,[4] we showed that LUR models based on only two to three visits per street segment could predict external long-term measurements with moderately high accuracy. In Messier et al.,[3] the authors found that even with 2 drive days per road segment, the R2 value, that is 0.52, was within 15% of models developed on 45+ drive-passes. Hatzopoulou et al.[15] decreased the number of road segments from 611 to 100 in steps of 50, and R2 values remained stable up until 200 road segments. Even LUR models based on 100 segments predicted on average 73% of the variation (opposed to 74% for the entire dataset), albeit with a wider confidence interval (55–85% opposed to 70–78% for the entire dataset). Two other studies in Canada also found that increasing the visits (or total measurement time) quickly stabilizes LUR model predictions based on mobile measurements.[25,26] In this study, R2 values for the LUR models in AMS and CPH were also moderately high (0.49 and 0.54; see Tables A1 and B1). Of note, R2 values depend not only on the number of drive-passes or total time spent on a road segment but also on the urban topography of a city and the type of input data available. European cities tend to be more spatially diverse than North American cities, making it harder for LUR models to explain the variability of air pollution.[27] Nonetheless, predictions made with LUR models correlated high with external measurements (rs = 0.75) and moderately, that is, rs = 0.50–0.61, with model predictions (Tables and 3). This is similar to correlations with data-only mapping. Because of the smoothing of LUR models, RMSE and mean bias values are lower than data-only mapping and the mixed model approach, especially in Amsterdam. This mainly happens at the higher end of the concentration scale; see Figures A4 and B4. This relates to LUR models typically less able to capture small-scale variation compared to data-only mapping. This is mainly due to the fact our LUR models incorporated traffic intensity but not features like the composition of traffic and speed. Other local features like street configuration and small industrial sources are also missing. The balance between data-only and LUR-model maps depends on how extensive and detailed predictor variables are available. More and better predictors likely increase the performance of LUR models, especially predictor data that can explain the very local variation of air pollution. By using a mixed modeling framework, we were able to take advantage of both measured concentrations per road segment and LUR modeling at the same time. LUR models are generally more stable but not so well at detecting local features. In Table , we show that the mixed modeling estimates correlated higher with external measurements (rs = 0.85) compared to the data-only (rs = 0.74) and LUR model output (rs = 0.75). Mixed model estimates also correlated higher with external model predictions compared to the data-only and LUR model output (Tables and 3). Spearman correlations were 0.65, 0.51, and 0.65, on average 0.1 higher than data-only mapping and LUR model estimates. A mixed model approach in air pollution research is not new. Several studies used this method to assess spatial and temporal variations of air pollution at the same time.[28−31] For these studies, the main goal was to create a model that can predict concentrations at other locations or at other time points. In our mixed model framework, we only used spatial land use information to create a long-term average map and do not need to predict concentrations at other locations or time points. The mixed-effect model was specifically used to bring in the hyperlocal variation in concentrations that is missed by a typical LUR model. Figures e and 3e show the difference between the LUR model and the mixed model. In other words, it shows the influence of the data-only mapping (random components). On about 10% of the street segments in CPH and 20% in AMS, there is difference of at least 3 μg/m3 between the LUR model and the mixed-effect model. The variance that is lost by the LUR model, compared to data-only map, is brought back by the random components of the mixed model.

Bias

For most comparisons, we found higher NO2 values for the data-only mapping, LUR, and mixed model method compared to all other external measurements and predictions, except the LPDV data. Several studies already reported that mobile monitoring studies create higher output values because these measurements are done in the middle of the road, while all external measurements and predictions are sampled on the side of the road or façade of buildings. In previous studies to UFP (ultrafine particles) and BC (black carbon), we showed that predictions made by models based on mobile monitoring are about 20–30% higher than external home-outdoor stationary measurements.[4,32] For NO2, the number seems to be slightly less, probably because NO2 is slightly less heterogeneously dispersed compared to UFP and BC due to photochemical reactions between NO and ozone-forming NO2, where NO emissions from the road are dispersed to the building façade. Experiments in real-world data also found steeper gradients for UFP and BC compared to NO2.[33−35] In Tables and 3, we show that NO2 predictions made by the mixed model output are about 15–20% higher than the external measurements and predictions. This is also shown in the Bland–Altman plots in the Supporting Information, where a larger bias is observed with higher concentration levels in all comparisons. Also, compared to official monitoring stations in AMS and CPH, the difference is about 15–30% (Figures A3 and B3). The same on-road/off-road ratio was found in a study by Richmond-Bryant et al.[36] in Las Vegas. They found that NO2 concentrations declined from on-road to 10 m from the road by a median of 16% (and 21% for a 20 m distance). This gradient of NO2 concentrations in the vicinity of roads (on-road/off-road ratio) depends on the wind direction and urban topography, making the exact ratio for each road segment individually hard to predict. The most practical solution would be to reduce mobile monitoring output by 20% for all road segments to approach residential exposure. Alternatively, the mixed model predictions could be combined with a dispersion model. Either by using mixed model predictions as line source in a dispersion model or by integrating both models in data fusion steps.

Strengths and Limitations

One of the strengths in this study is the fact that we were able to use external long-term measurements in the same time period as the mobile monitoring to validate our model predictions.[37] Next, we were able to compare our model predictions with model predictions used by official national environmental agencies. Predictions in these models are made with dispersion models, meaning they are constructed very different than our empirical models. Differences between models can therefore not be interpreted as one being better than the other but rather that both models offer different features contributing to exposure estimates. The biggest limitation of the measurement setup used in this work is the amount of time, energy, and significant initial investment it takes to collect such enormous amounts of data. In the study by Apte et al.,[16] they estimated that it would take around 400 mobile monitoring vehicles to create a data-only map (>20 drives) for all streets in the largest 25 US cities, though the number of vehicles could be reduced if data are combined with LUR models. Within a mixed modeling framework this could easily be implemented, though it would need a huge effort in order to sample street segments in a large area (bigger than one or a few cities). A few drives are needed to develop a LUR model, while adding more and more drives increases the accuracy of data-only mapping. Hence, when more and more data are collected, actual measurements could explain more and more local variations. This makes the mixed model approach a very scalable solution to other cities as well. As the mixed model balances the input that is most accurate (data-only or LUR model estimates), there is no minimum number of drive-days to create a stable concentration map. This also means that the mixed model is able to predict concentration levels on street segments without measurements as they could be based on the LUR model output (with the limitations associated with LUR models in regard to smoothing of concentrations). To keep the hyperlocal variation in air pollution maps, measurements on every street in question will always be needed. This could, for example, be achieved by putting measurement devices on municipal utility vehicles. Hasenfratz et al.,[10] for example, collected over 50 million measurements of UFP over a 2 year period using mobile sensor nodes installed on top of public transport vehicles in the city of Zurich, Switzerland. While this effort did not cover the entire city, it contained enough data to develop a LUR model in a short amount of time. Coverage could be further increased when sensors or monitors are installed on utility vehicles that cover large parts of the city (e.g., municipality vehicles and delivery trucks). A mixed model approach will therefore always be at least as good as a LUR model as it takes the LUR model as the baseline and adds additional information based on the measurements.

22 in total

1. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

Authors: Scott Weichenthal; Keith Van Ryswyk; Alon Goldstein; Scott Bagg; Maryam Shekkarizfard; Marianne Hatzopoulou
Journal: Environ Res Date: 2015-12-22 Impact factor: 6.498

2. Characterizing the spatial distribution of ambient ultrafine particles in Toronto, Canada: A land use regression model.

Authors: Scott Weichenthal; Keith Van Ryswyk; Alon Goldstein; Maryam Shekarrizfard; Marianne Hatzopoulou
Journal: Environ Pollut Date: 2015-04-29 Impact factor: 8.071

3. Modelling nationwide spatial variation of ultrafine particles based on mobile monitoring.

Authors: Jules Kerckhoffs; Gerard Hoek; Ulrike Gehring; Roel Vermeulen
Journal: Environ Int Date: 2021-04-15 Impact factor: 9.621

4. Mapping Air Pollution with Google Street View Cars: Efficient Approaches with Mobile Monitoring and Land Use Regression.

Authors: Kyle P Messier; Sarah E Chambliss; Shahzad Gani; Ramon Alvarez; Michael Brauer; Jonathan J Choi; Steven P Hamburg; Jules Kerckhoffs; Brian LaFranchi; Melissa M Lunden; Julian D Marshall; Christopher J Portier; Ananya Roy; Adam A Szpiro; Roel C H Vermeulen; Joshua S Apte
Journal: Environ Sci Technol Date: 2018-10-24 Impact factor: 9.028

5. Performance of Prediction Algorithms for Modeling Outdoor Air Pollution Spatial Surfaces.

Authors: Jules Kerckhoffs; Gerard Hoek; Lützen Portengen; Bert Brunekreef; Roel C H Vermeulen
Journal: Environ Sci Technol Date: 2019-01-18 Impact factor: 9.028

6. Near roadway air pollution across a spatially extensive road and cycling network.

Authors: William Farrell; Scott Weichenthal; Mark Goldberg; Marie-France Valois; Maryam Shekarrizfard; Marianne Hatzopoulou
Journal: Environ Pollut Date: 2016-03-08 Impact factor: 8.071

7. Land Use Regression Models of On-Road Particulate Air Pollution (Particle Number, Black Carbon, PM2.5, Particle Size) Using Mobile Monitoring.

Authors: Steve Hankey; Julian D Marshall
Journal: Environ Sci Technol Date: 2015-07-20 Impact factor: 9.028

8. High-Resolution Air Pollution Mapping with Google Street View Cars: Exploiting Big Data.

Authors: Joshua S Apte; Kyle P Messier; Shahzad Gani; Michael Brauer; Thomas W Kirchstetter; Melissa M Lunden; Julian D Marshall; Christopher J Portier; Roel C H Vermeulen; Steven P Hamburg
Journal: Environ Sci Technol Date: 2017-06-05 Impact factor: 9.028

9. Robustness of intra urban land-use regression models for ultrafine particles and black carbon based on mobile monitoring.

Authors: Jules Kerckhoffs; Gerard Hoek; Jelle Vlaanderen; Erik van Nunen; Kyle Messier; Bert Brunekreef; John Gulliver; Roel Vermeulen
Journal: Environ Res Date: 2017-09-01 Impact factor: 6.498

10. Modeling the intraurban variability of ambient traffic pollution in Toronto, Canada.

Authors: M Jerrett; M A Arain; P Kanaroglou; B Beckerman; D Crouse; N L Gilbert; J R Brook; N Finkelstein; M M Finkelstein
Journal: J Toxicol Environ Health A Date: 2007-02-01

1 in total

1. A Knowledge Transfer Approach to Map Long-Term Concentrations of Hyperlocal Air Pollution from Short-Term Mobile Measurements.

Authors: Zhendong Yuan; Jules Kerckhoffs; Gerard Hoek; Roel Vermeulen
Journal: Environ Sci Technol Date: 2022-09-19 Impact factor: 11.357

1 in total