Literature DB >> 36169978

Within-City Variation in Ambient Carbon Monoxide Concentrations: Leveraging Low-Cost Monitors in a Spatiotemporal Modeling Framework.

Jianzhao Bi¹, Christopher Zuidema¹, David Clausen², Kipruto Kirwa¹, Michael T Young¹, Amanda J Gassett¹, Edmund Y W Seto¹, Paul D Sampson³, Timothy V Larson⁴, Adam A Szpiro², Lianne Sheppard^1,2, Joel D Kaufman^1,5,6.

Abstract

BACKGROUND: Based on human and animal experimental studies, exposure to ambient carbon monoxide (CO) may be associated with cardiovascular disease outcomes, but epidemiological evidence of this link is limited. The number and distribution of ground-level regulatory agency monitors are insufficient to characterize fine-scale variations in CO concentrations.
OBJECTIVES: To develop a daily, high-resolution ambient CO exposure prediction model at the city scale.
METHODS: We developed a CO prediction model in Baltimore, Maryland, based on a spatiotemporal statistical algorithm with regulatory agency monitoring data and measurements from calibrated low-cost gas monitors. We also evaluated the contribution of three novel parameters to model performance: high-resolution meteorological data, satellite remote sensing data, and copollutant (PM2.5, NO2, and NOx) concentrations.
RESULTS: The CO model had spatial cross-validation (CV) R2 and root-mean-square error (RMSE) of 0.70 and 0.02 parts per million (ppm), respectively; the model had temporal CV R2 and RMSE of 0.61 and 0.04 ppm, respectively. The predictions revealed spatially resolved CO hot spots associated with population, traffic, and other nonroad emission sources (e.g., railroads and airport), as well as sharp concentration decreases within short distances from primary roads. DISCUSSION: The three novel parameters did not substantially improve model performance, suggesting that, on its own, our spatiotemporal modeling framework based on geographic features was reliable and robust. As low-cost air monitors become increasingly available, this approach to CO concentration modeling can be generalized to resource-restricted environments to facilitate comprehensive epidemiological research. https://doi.org/10.1289/EHP10889.

Entities: Chemical

Mesh：

Substances：

Year: 2022 PMID： 36169978 PMCID： PMC9518741 DOI： 10.1289/EHP10889

Source DB: PubMed Journal: Environ Health Perspect ISSN： 0091-6765 Impact factor: 11.035

Introduction

Carbon monoxide (CO) is one of the six principal air pollutants regulated by the National Ambient Air Quality Standards (NAAQS). CO also serves as a precursor to ozone (O3), another principal air pollutant.[1,2] Anthropogenic CO is mainly generated from incomplete combustion of carbon fuels from on-road and off-road mobile sources, which were responsible for more than 60% of total anthropogenic emissions in the United States in 2017.[3] Other major sources of CO include wildfires, which have recently played a significant role in CO emissions in the United States[4]; prescribed vegetation burning; residential biomass combustion; and industrial processes.[3] Oxidation of anthropogenic and biogenic hydrocarbons is a secondary source of CO, producing concentrations that peak in summer. The major sink of tropospheric CO is reaction with the hydroxyl (OH) radical. Tropospheric CO has a photochemical lifetime of weeks to months across seasons and locations,[5-7] and it is prone to horizontal and vertical transportation that smooths out its ground-level concentration. Ground-level CO concentrations have traditionally been measured at regulatory air quality stations, e.g., the Air Quality System (AQS) maintained by the U.S. Environmental Protection Agency (U.S. EPA). Low-cost air monitors[8-10] and satellite remote sensing instruments[11-15] are promising novel platforms for larger-scale, higher-resolution spatiotemporal CO measurement. Short-term exposure to CO has been associated with cardiovascular morbidity, with compelling evidence from human-controlled experiments[16] and additional evidence from observational studies relying on temporal exposure variation derived from regulatory monitors.[17-21] However, evidence regarding associations between long- or short-term exposure to ambient CO and central nervous system effects,[22,23] birth outcomes and developmental effects,[24-26] and respiratory morbidity[27,28] and mortality[29] has been more limited. Although CO concentrations in most U.S. cities are below the NAAQS standards [ (ppm) for 8 h and for 1 h], there is no known safe threshold for CO exposure.[22] The lack of spatiotemporally high-resolution exposure data is a major impediment to extensive epidemiological analysis of the effects of CO. In comparison with fine particulate matter [particulate matter (PM) with aerodynamic diameter ()] and nitrogen dioxide (), spatiotemporal exposure prediction of ambient CO is limited.[22] There were approximately 250 air quality stations in the U.S. EPA AQS that measured CO concentrations as of late 2020, in comparison with almost 1,000 for (https://www.epa.gov/outdoor-air-quality-data). The available AQS stations in major U.S. cities are insufficient to characterize within-city variation in ambient CO concentrations.[22] Exposure assessment for epidemiological studies that rely on these sparse central stations would suffer from nontrivial Berkson-type errors.[30] Low-cost air monitors, which can be flexibly and widely deployed at a substantially lower cost than regulatory stations, are a promising supplement for CO exposure assessment. If rigorously calibrated, low-cost CO monitors can perform reliably and accurately.[31,32] Tropospheric CO column density data retrieved from modern satellite remote sensing instruments can serve as another promising input for CO exposure assessment.[14,33] However, whether such supplementary measurement data can improve the performance of high-resolution CO concentration prediction remains unexplored. In this study, we aimed to develop a daily, high-resolution ambient CO prediction model at the city level. We tested the model in Baltimore, Maryland, for the period between April 2017 and May 2019. The prediction model was based on CO measurements from both regulatory agency stations and spatially dense, calibrated low-cost monitors within and around Baltimore. We also evaluated the contribution of high-resolution meteorological data, satellite remote sensing data, and copollutant [, , and nitrogen oxides ()] concentrations to model performance. To our knowledge, this work is the first attempt to generate spatiotemporally high-resolution CO concentration data to characterize variation within a metropolitan area in the United States.

Data and Methods

Data were collected as part of the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) study that aims to investigate the relationship between air pollution exposure and the risk of cardiovascular diseases.[34,35] For this project, data collection was limited to a 2-y period in the greater Baltimore area, one of the six MESA Air study areas.

Ambient CO Measurements

Daily CO measurements were obtained from the U.S. EPA AQS regulatory network (https://www.epa.gov/aqs) between 1 April 2017 and 31 May 2019. We defined our study domain to be the region covering all CO monitors in the greater Baltimore area; we further defined a smaller prediction domain covering the MESA Air participant residences for CO concentration prediction (Figure 1). Within our study domain, there were three AQS stations (IDs 240053001, 240270006, and 240330030) with 2,240 daily maximum 1-h CO measurements. The AQS CO monitors (Teledyne API M300EU; Teledyne Instruments Inc.) operated on the principle of nondispersive infrared (NDIR) detection with the gas filter correlation (GFC) methodology according to the Federal Reference Methods.[22] These monitors had a limit of detection (LOD) of . Although two of the three AQS stations were outside the prediction domain (Figure 1), they played a fundamental role in estimating long-term temporal trends of CO concentrations due to their temporal continuity.

Figure 1.

Study domain with locations of regulatory AQS stations and LCMs for ambient CO measurement. The dashed line shows a subdomain () within which daily level ambient CO concentration predictions were made (at both and resolutions). The solid line shows the boundary of Baltimore City (). LCMs continuously co-located with AQS stations for calibration purpose are not shown as their measurements were not included in CO modeling. © Stamen Design, under a Creative Commons Attribution (CC BY 3.0) license. Note: AQS, Air Quality System; CO, carbon monoxide; LCM, low-cost gas monitor. We conducted a supplementary monitoring campaign at selected MESA Air participant residences using low-cost gas monitors (LCMs) with Alphasense B4 series 4-electrode CO sensors (Alphasense Ltd.) designed and constructed in a custom research application configuration fabricated at the University of Washington. The LCMs were also equipped with air temperature and relative humidity sensors (HumidIcon HIH6130-021-001; Honeywell International Inc.) as well as supporting hardware (e.g., a thermostatically controlled heater, fan, memory card, modem, and microcontroller). Additional information about the LCM device configuration is provided elsewhere.[36] We recruited MESA Air participants who were willing to participate in the monitoring campaign and qualitatively chosen to represent different areas of the city, with some closer to and farther from major roads, in higher and lower population densities, and at higher and lower elevations.[36] The LCMs recorded and reported data every 5 min. For days when at least 75% of LCM measurements were available, we aggregated the measurements to the daily level. There were initially 29 LCM locations. For monitor calibration, one LCM was continuously co-located with each of the two AQS CO stations (IDs 240053001 and 240270006), and all other LCMs were periodically co-located with these stations. We have previously described the LCM calibration.[36] In brief, the calibration was based on a stepwise multiple linear regression model with readings from the sensor’s working and auxiliary electrodes, temperature, relative humidity, and interactions between working electrode readings and temperature and between working electrode readings and relative humidity. The AQS measurements were used on the dependent side of the calibration model. We excluded from CO concentration modeling the LCM measurements at the two co-location stations because the data were used for calibration only, as well as another LCM location in Washington, D.C., because it was distant from our study domain. The remaining 26 LCM locations provided 3,308 daily measurements for concentration modeling (Figure 1; Figure S1). Of the 26 LCM locations, 21 were participant residences monitored during two short-term seasonal sessions: summer 2017 (22 July to 1 September) and winter 2018 (24 January to 28 February). We selected these monitoring periods to reflect CO concentration variations in both warm and cold seasons. The other five locations were monitored for longer periods of up to 2 y (Figure S1). The selection of these long-term monitoring locations was also to represent different areas of the city. The CO measurements were approximately normally distributed and were modeled on the natural scale.

Geographic Covariates

The initial set of geographic covariates consisted of proximity and buffer parameters.[37,38] The proximity parameters included distance to land-use features, e.g., major roads, intersections, truck routes, railroads, rail yards, coastlines, airports, and ports. The buffer variables included groups of land-use features within defined buffer sizes, e.g., major road length, truck route length, land-use category, long-term vegetation index, population density, and air pollution emission sources including local incinerators. Geographic covariates were derived using PostGIS (version 2.4.4; http://postgis.net). The geographic covariates were preprocessed according to criteria developed for the spatiotemporal models for other air pollutants (, , , and ) in MESA Air.[39,40] These criteria aimed to exclude less-informative covariates (e.g., those with minimal variability), including where: a) of monitoring locations had the same value, b) of values were more than five standard deviations (SDs) away from the mean, c) the SD of the distribution of values at participant residences was more than five times the SD of the distribution of values at monitoring locations, and d) the maximum value of a percentage covariate was among all monitoring locations. After applying these criteria, 180 geographic covariates remained for CO concentration modeling (Table S1).

Meteorological Data

We obtained meteorological variables from the High-Resolution Rapid Refresh (HRRR) data set (https://rapidrefresh.noaa.gov/hrrr/). HRRR is a real-time resolution, hourly updated atmospheric model developed and maintained by the National Oceanic and Atmospheric Administration (NOAA).[41] We selected surface-level meteorological data and averaged them from hourly to daily levels. The surface-level variables are listed in Table S2. We then adopted a forward selection strategy to determine the final set of variables to be included in the model. The selection process started with no variables, then tested the addition of each variable based on cross-validation (CV) performance (details of the CV are provided in the “Model evaluation” section), added the variable whose inclusion gave the most significant improvement in CV performance, and repeated the process until no improvement in CV was observed. Two HRRR variables were selected: U-wind speed components at above ground (in meters per second) and downward long-wave radiation flux at surface (in watt per square meter). We used ordinary kriging interpolation to calculate the values of the meteorological variables at our modeling and prediction locations based on the R package “automap” (version 1.0-14).

Satellite-Derived CO Column Density Data

The TROPOspheric Monitoring Instrument (TROPOMI) spectrometer onboard the Copernicus Sentinel-5 Precursor (S5P) satellite provides retrievals of CO total column densities (in moles per square meter) over land with a spatial resolution of at nadir.[42] Based on the first overtone 2-0 absorption band of CO between 2,305 nm and , the retrievals are sensitive to the tropospheric boundary layer under clear sky conditions.[14] TROPOMI CO retrievals have been available since mid-2018, which did not fully overlap with our study period. Thus, instead of using the daily level data, we calculated annual mean CO total column densities and used them as additional (temporally invariant) geographic covariates. We generated the annual mean CO data in 2019 and 2020 at a spatial resolution of based on the Level-3 CO data from the Google Earth Engine (GEE; https://developers.google.com/earth-engine/), which were calculated from the S5P Level-2 offline data. The annual mean data were interpolated by ordinary kriging at our modeling and prediction locations.

and Prediction Data

We generated daily high-resolution , , and annual mean prediction data in Baltimore from 2015 to 2017 based on the same spatiotemporal modeling framework used for our CO prediction (described in the “Spatiotemporal modeling framework” section). We treated these annual mean predictions as additional geographic covariates. The predictions were based on regulatory and measurements from the AQS stations as well as supplementary measurements from the MESA Air monitoring campaigns in Baltimore. Details of the modeling and prediction of , , and are provided elsewhere.[39]

CO Concentration Prediction Model

Spatiotemporal modeling framework.

We used a spatiotemporal (ST) modeling framework developed previously to accommodate air pollution exposure prediction for MESA Air.[43-45] The framework has a hierarchical structure that can integrate unbalanced measurement data, e.g., long-term measurements at a few continuous monitoring locations and a large number of measurements at a few time points from spatially dense short-term monitors. The long-term measurements mainly support the derivation of smoothed temporal trends (referred to as time basis functions) of the pollution. The spatially dense short-term measurements help fit the spatial coefficients that allow the linear combinations of the smoothed temporal trends spatially varying. The framework can be expressed as: where denotes the daily CO concentration at location and time , denotes the spatiotemporal mean surface, and denotes the spatiotemporal residual variation. The spatiotemporal mean surface can be broken down as: where denotes the spatiotemporal covariates (i.e., meteorological parameters in this study), denotes the long-term mean (intercept) at location , and denotes smoothed temporal trends (i.e., time basis functions) derived by singular value decomposition (SVD).[44,46] is the number of spatiotemporal covariates. denotes spatially varying coefficients for the temporal trend based on land-use regression (LUR): where denotes reduced-dimension summaries of geographic covariates at location , and denotes LUR coefficients to be estimated. , the covariance structure for , is a spatial smoothing model with covariance functions parameterized by a range , partial sill , and nugget . is the number of SVD-derived smoothed temporal trends. Finally, , the spatiotemporal residual, is assumed to have a mean of zero and a spatial correlation structure that is independent in time: Previous modeling research using low-cost air monitors has underlined the importance of reducing the negative influence of residual post-calibration uncertainty of low-cost measurements (mostly due to random measurement errors) on exposure prediction models.[47,48] Our previous modeling study using PurpleAir low-cost PM monitors showed that the negative influence can be mitigated well by setting different nuggets in the field for AQS and LCM measurements to account for their different uncertainty levels.[49] Hence, we adopted the same strategy for the AQS and LCM measurements in this study. We also allowed the field to capture additional deviations from the smoothed temporal trends by using a random-effect term. This random-effect term was critical for this analysis because daily level LCM measurements were expected to have larger random errors than longer-term measurements.[39,50] To improve our ST model’s ability to capture pollution variability at different temporal scales, we temporarily removed the smoothed mean temporal trend of CO concentrations during the model fitting stage and then added back the trend during the model prediction and validation stages. The mean temporal trend was calculated by averaging all CO measurements at each time point, which was then smoothed by locally estimated scatterplot smoothing (LOESS) regression with a smoothing span of 0.5. This approach allowed the model to better account for the long-term (e.g., seasonal) regional temporal pattern, and the model relied on the flexible spatiotemporal structure to resolve spatially varying deviations at shorter temporal scales (e.g., monthly, weekly, and daily) from the long-term temporal pattern. This modeling approach is air pollutant agnostic and thus can be applied to pollutants with various temporal patterns (e.g., and ). Our ST modeling framework can provide CO concentration predictions at any given location; hence, there is not a restriction on spatial resolution. In this study, we reported CO predictions at a resolution for spatial mapping and used predictions at a resolution for auxiliary analyses: a) comparison between CO concentrations at grid cells and MESA Air participant residences and b) CO concentration changes within different distances from primary roads. Figure 1 shows the prediction domain where the and grid cells are located. The ST modeling and prediction were performed using the R package “SpatioTemporal” (version 1.1.7).

Dimension reduction of geographic covariates.

Reduced-dimension summaries of geographic covariates [i.e., in Equation 3] have typically been derived by partial least squares (PLS)[39] or principal component analysis (PCA)[35] based on all available covariates. However, in our case, neither PLS nor PCA provided a satisfactory set of reduced-dimension summaries that could result in acceptable CO model performance [i.e., generated CO predictions deviating from ground measurements at AQS locations with CV coefficient of determination () values ]. A possible reason is that many geographic covariates could not explain the spatial distribution of CO concentrations within our study domain well. Therefore, we preselected geographic covariates prior to dimension reduction. The covariate preselection was based on the least absolute shrinkage and selection operator (LASSO) regression with long-term CO concentration averages at monitoring locations over the two seasonal periods (treated as a single timeframe) in summer 2017 and winter 2018 (described in the “Ambient CO Measurements” section) as the dependent variable and all 180 geographic covariates as the independent variables.[37] The use of the two seasonal periods was to ensure that the long-term CO concentration averages were calculated when all monitoring locations had available measurements. The LASSO regression shrank the coefficient estimates of the independent variables toward zero and set less important variables equal to zero. Our covariate preselection was to choose the covariates with nonzero coefficients. A key parameter in LASSO is the shrinkage penalty (), which was determined by CV in this study. The LASSO regression returned 20 geographic covariates with nonzero coefficients (Table S3). We then calculated reduced-dimension summaries of the preselected 20 geographic covariates using PCA. The first two or three principal components (PCs) of the covariates have usually been used for ST modeling.[39,43] However, we observed that the seventh PC (PC7) and 12th PC (PC12) also had substantial contributions to model performance (Figure S2). Figure S3 shows the geographic covariates with the largest contributions to both PC7 and PC12. We opted to use PC7 and PC12 as additional PCs in our ST model. We did not consider including other PCs beyond those selected to avoid introducing too many LUR coefficients and associated overfitting.

Model evaluation.

We examined different values for key ST model parameters: a) the number of smoothed temporal trends (one, two, or three), b) the number of the first several PCs in addition to PC7 and PC12 per temporal trend (one, two, or three), and c) the covariance structure of the fields (spatial smoothing or no spatial smoothing). The optimal values of these parameters were determined by leave-one-monitor-out (LOMO) spatial and temporal CVs. The LOMO spatial CV treated one monitor as the test set and all other monitors as the training set to build a model and repeated the process for each monitor. The spatial CV reflects the model’s ability to capture long-term spatial variability of CO. The validation concentrations for the spatial CV were CO concentration averages over the two seasonal periods (described in the “Ambient CO Measurements” section) during which all 29 monitors had available measurements. The temporal CV summarized the median CV performance metrics of individual monitors at the daily level, reflecting the model’s ability to capture daily level temporal variability of CO. We assessed the CV performance using two metrics, root-mean-square error (RMSE) and MSE-based coefficient of determination ().[38,39,50] The MSE-based fits to the 1:1 line to report the prediction accuracy of a model because it reflects both systematic bias and variation in the predictions. The optimal values of the model parameters were determined to be three smoothed temporal trends, two PCs (PC1 and PC2) per temporal trend in addition to PC7 and PC12, and the spatially smoothing fields based on the exponential variogram model. We evaluated the added value of a) spatiotemporal HRRR meteorological data, b) annual mean TROPOMI CO column densities, and c) annual mean , , and predictions to our CO prediction by building and contrasting models with and without these variables. The HRRR variables were used as spatiotemporal covariates [ in Equation 2], and the other two were directly incorporated as additional LUR variables [ in Equation 3], in addition to those obtained through dimension reduction. The baseline model, as a baseline reference, did not contain HRRR, TROPOMI, or copollutant variables. We adopted the aforementioned spatial and temporal CVs to evaluate the contribution of these variables to model performance. We conducted auxiliary analyses to further evaluate our CO concentration predictions. The first analysis compared domain-wide CO concentration distribution (at prediction locations as a representative sample; number of ) to that at the locations of MESA Air participant residences within our study domain (number of ; Figure S4 shows the jittered locations). This analysis aimed to assess whether the participant residences of a real epidemiological cohort had different CO concentration levels than the domain average. The second analysis examined CO concentration changes within short distances from primary (A1) roads (defined as interstate highways, U.S. numbered highways, business routes, and some state routes). This analysis aimed to validate our model’s ability to reflect a pattern observed in a previous study—a sharp decrease in CO measurements within hundreds of meters from highways.[51]

Results

CO Concentration Measurements

The mean daily CO concentration measurements over the modeling period (April 2017 to May 2019) were and at the AQS and long-term LCM locations, respectively; the short-term LCM locations had a mean daily concentration of during the seasonal LCM deployment periods (Table 1). As presented in our previous calibration work,[36] the LCM calibration model had a CV of 0.76 and RMSE of for CO. Underestimation of peak CO concentrations by LCMs was of less concern in this study because the CO concentration levels within our study domain were consistently much lower than the monitor’s maximum detection limit. Table S4 shows the fitted coefficients of the calibration model. Figure S5 compares the calibrated LCM measurements and AQS observations at the co-location stations. Figure S6 shows the temporal variations of CO measurements from both the AQS stations and (calibrated) LCMs over the modeling period. The AQS stations and LCMs captured a similar temporal variation of CO concentrations, with concentrations higher in winter. The LCM measurements were generally higher than the AQS measurements, possibly because the LCMs were closer to pollution sources.

Table 1

Summary of daily level ambient carbon monoxide measurements (in parts per million) from AQS stations and LCMs from April 2017 to May 2019. LCMs were deployed in a supplementary monitoring campaign at selected Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) participant residences.

Monitor type		Number of monitoring locations	Number of measurements	Start date	End date	Mean (SD) in ppm	Minimum, maximum in ppm
AQS		3	2,240	1 April 2017	31 May 2019	0.22 (0.10)	0.01, 0.81
LCM	Long-term	5	2,294	6 April 2017	13 May 2019	0.30 (0.09)	0.07, 1.11
	Short-term	21	1,014	22 July 2017	1 September 2017	0.28 (0.07)	0.10, 0.70
	Short-term	21	1,014	24 January 2018	28 February 2018	0.28 (0.07)	0.10, 0.70

Note: AQS, Air Quality System; LCM, low-cost gas monitor; ppm, parts per million; SD, standard deviation.

Spatiotemporal Model Performance

Table 2 shows the LOMO spatial and median monitor-specific temporal CV performance of four models: a) with only geographic covariates (a.k.a., the baseline model), b) with geographic covariates and HRRR meteorological variables, c) with geographic covariates and TROPOMI CO retrievals, and d) with geographic covariates and concentrations of three copollutants (, , and ). Figure S7 shows the spatial CV scatters of the four models. Figure S8 summarizes the monitor-specific temporal CV and RMSE of the models. The baseline model had spatial CV and RMSE of 0.70 and , respectively; the model had median temporal CV and RMSE of 0.61 and , respectively. Incorporating the meteorological and remote sensing variables resulted in improved temporal CV performance with increased by only 0.02–0.03. The incorporation of copollutant concentrations decreased both the spatial and temporal CV performance. Therefore, we considered the baseline model without meteorological, remote sensing, and copollutant inputs as our final model, on which the following CO prediction results are based.

Table 2

Model type	Number of time bases	Principal components	Spatial CV		Temporal CV
Model type	Number of time bases	Principal components	R2	RMSE (ppm)	R2	RMSE (ppm)
Baseline	3	First principal component (PC1), PC2, PC7, and PC12	0.70	0.02	0.61	0.04
Meteorology			0.70	0.02	0.63	0.04
Satellite			0.64	0.03	0.64	0.04
Copollution			0.39	0.03	0.57	0.04

Note: CO, carbon monoxide; CV, cross-validation; HRRR, High-Resolution Rapid Refresh; LOMO, leave-one-monitor-out; PC, principal component; RMSE, root-mean-square error.

LOMO spatial and median monitor-specific temporal CV performance of CO models built with different covariates: the models with a) only geographic covariates (baseline), b) geographic covariates and HRRR meteorological variables (meteorology), c) geographic covariates and TROPOspheric monitoring instrument (TROPOMI) CO retrievals (satellite), and d) geographic covariates and copollutant (, , and ) predictions (copollution). CV performance was evaluated based on coefficient of determination () and RMSE (in parts per million). Note: CO, carbon monoxide; CV, cross-validation; HRRR, High-Resolution Rapid Refresh; LOMO, leave-one-monitor-out; PC, principal component; RMSE, root-mean-square error. Figures S9 and S10 show the three SVD-derived smoothed temporal trends (time basis functions) and the fitted temporal variations (time-series) of eight long-term monitors (three AQS monitors and five LCMs), respectively. The comparison between the time basis functions and fitted time series indicates that the first time basis captured the major pattern of domain-wide short-term temporal variations of CO, and the second and third bases reflected some longer-term temporal variations.

CO Concentration Predictions

Figure 2 shows the annual mean CO predictions in 2018 at a spatial resolution. Urban areas had elevated CO concentrations with hot spots in downtown areas (associated with population and traffic), along railroads, and around the largest airport (the Baltimore/Washington International Thurgood Marshall Airport). Geographic covariates associated with elevated CO concentrations included population density, developed high intensity areas (with a high proportion of impervious surfaces), distance to railroads, and distance to large airports (Figure S11). On the other hand, our CO predictions also captured areas of expected lower concentrations, such as parks and greenspace, in downtown areas. Background CO concentrations in our study domain (i.e., in regions with values of the evergreen forest land parameter its 95th percentile) were below ; the hot spots had concentrations greater than .

Figure 2.

Top: annual mean CO concentration predictions in parts per million generated by the baseline model in 2018 at a spatial resolution with major roads, railroads, and the location of the Baltimore/Washington International Thurgood Marshall Airport. Bottom: CO concentration prediction details around the city center. © Stamen Design, under a Creative Commons Attribution (CC BY 3.0) license. Note: CO, carbon monoxide; ppm, parts per million. As a comparison, Figure S12 shows the mean CO distributions retrieved from TROPOMI in 2019 and 2020. Satellite-retrieved distributions had distinct spatial patterns from our CO predictions; the retrievals in 2019 had a Pearson correlation coefficient of 0.43 with our predictions in 2018. The satellite instrument retrieved CO concentration levels over the entire atmospheric column that were less correlated with ground-level concentrations. Hence, integrating TROPOMI CO data had a minimal contribution to the model improvement for the temporal validation and decreased performance for the spatial validation (Table 2). Additionally, Figure S13 shows the mean concentration distributions of , , and in 2017. These copollutants had distinct spatial patterns in comparison with those of CO, which is possibly why their predictions had a negative contribution to CO model improvement. Figure 3A summarizes domain-average, daily level CO concentrations in different seasons calculated from the CO predictions at the prediction locations; Figure 3B shows the domain-average, daily level CO concentration time series in 2018. CO had the highest concentrations in winter (December–January–February). Spring (March–April–May) and summer (June–July–August) showed consistently lower concentrations, generally below . The concentrations in autumn (September–October–November) were relatively low but with larger variations than those in spring and summer. There were more frequent high-level concentrations () beginning in November and continuing throughout winter.

Figure 3.

(A) Summary of domain-average, daily level CO concentration predictions from the baseline model in different seasons (DJF: December–January–February, MAM: March–April–May, JJA: June–July–August, and SON: September–October–November) of 2018 at the prediction locations (boxplots based on the five-number summary: the minimum, maximum, sample median, and first and third quartiles). (B) Domain-average, daily level CO prediction time series in 2018 (black curve: smoothed conditional means). Note: CO, carbon monoxide; ppm, parts per million.

Auxiliary Analyses with CO Predictions

Figure 4 compares the histograms of annual mean CO concentration predictions at the prediction locations with those at MESA Air participant residences within our study domain in 2018. In comparison with domain-wide CO concentrations, the residential locations had a slightly higher mean concentration ( vs. ) and smaller concentration range [an interquartile range (IQR) of vs. ]. The concentration differences between the two types of locations were more clearly shown in the histograms than by the summary statistics. A Mann-Whitney U-test indicates the differences were statistically significant at an alpha level of 0.05 (). In 2018, mean daily CO predictions at the prediction locations were in winter, spring, summer, and autumn, respectively. In comparison, mean daily predictions at MESA Air participant residences were in the four seasons, respectively, which were slightly higher than those at the prediction locations. The two types of locations had similar SDs of CO predictions: approximately in the four seasons, respectively.

Figure 4.

Top: distribution of annual mean CO concentration predictions in parts per million from the baseline model at the prediction locations with predictions at the AQS monitoring locations (blue points) in 2018. Bottom: distribution of annual mean CO concentration predictions at the MESA Air participant residences with predictions at the low-cost monitoring locations (red points) in 2018. Note: AQS, Air Quality System; CO, carbon monoxide; IQR, interquartile range; MESA Air, Multi-Ethnic Study of Atherosclerosis and Air Pollution; N, number of predictions; ppm, parts per million; SD, standard deviation. Figure S14A illustrates the CO concentration decrease within short distances from primary roads (based on the annual mean concentration predictions at the prediction locations in 2018). Within of primary roads, CO concentrations were elevated, with the majority of them above . In comparison with locations or more from primary roads, the elevated concentrations within accounted for of the SD of CO concentrations. The observed concentration decrease from primary roads is considered to be conservative due to the heterogeneity in downwind directions and the spatial variability of concentration decreases. As an example, Figure S15, a bivariate polar plot, indicates that the CO predictions were heterogeneously associated with wind speeds and directions. Additionally, Figure S14B shows the distances from CO monitoring locations to primary roads. The monitoring locations had a median distance to primary roads of . More LCMs than AQS stations were closer to primary roads, suggesting that LCMs may be more useful for assessing fine-scale distribution of CO concentrations associated with traffic.

Discussion

We built a daily, high-resolution ambient CO prediction model based on regulatory measurements at agency air monitoring stations and LCM measurements at residences in Baltimore for the period between April 2017 and May 2019. We also examined the added value of incorporating novel multiplatform data—high-resolution meteorological data, satellite remote sensing retrievals, and copollutant predictions—into the prediction model. We showed that densely deployed LCMs enabled reasonable characterization of CO concentration distribution, which would be impossible when relying solely on spatially sparse agency measurements that missed important aspects of the distribution. To the best of our knowledge, this is the first high-resolution CO model at the city level in the United States, and it demonstrated good spatial and temporal predictive performance that can facilitate epidemiological research. Our model predictions also demonstrated a tangible improvement in terms of spatial resolution when compared to satellite products. The CO concentrations observed were low when compared to current regulatory standards, but our model demonstrated substantial CO concentration variations that can be leveraged in future assessments of health impacts. Our CO predictions showed elevated concentrations in downtown areas (associated with high population and traffic), near railroads, and around the largest airport in Baltimore (Figure 2). The hot spots were consistent with the major emission sources in the greater Baltimore area (including the city of Baltimore and Baltimore County), where in 2017 emissions from on-road mobile sources (i.e., on-road diesel/nondiesel light/heavy-duty vehicles) were responsible for 53.8% of total CO emissions, and the nonroad emissions, including locomotives, commercial marine vessels, and other nonroad equipment, were responsible for 36.6% of total emissions.[3] Satellite-retrieved hot spots of CO column densities were inconsistent with our predicted ground-level hot spots (Figure S12), which is likely due to the fact that the retrievals were affected by CO concentrations above the ground. Also, the satellite retrievals were too spatially coarse to reflect within-city CO concentration variations. Previous studies reported sharp decreases in CO concentrations within hundreds of meters from highways in downwind directions.[51,52] A similar but weaker concentration gradient was observed in this study (Figure S14), although the observed gradient is likely to be an underestimate because it would be smoothed due to the calculation based on the averaged, temporally invariant concentrations along all wind directions, instead of temporally varying downwind directions. The observed concentration gradient near primary roads corroborates the reliability of our model in reflecting realistic spatial distribution of CO concentrations. The seasonal variation in our CO predictions is similar to previously reported remote sensing and chemical transport modeling data,[53] where CO concentrations were higher in cold seasons and lower in warm seasons. The factors driving the observed seasonal variation are three-fold: a) the major sink of CO, the reaction with the OH radical, is stronger in warm seasons when OH concentrations are higher[5,54]; b) the warm seasons tend to have more intense mixing processes that dilute the CO concentrations at the ground surface[53]; and c) there are increased CO emissions in cold seasons in the northeastern United States, e.g., from residential wood combustion for heating (residential wood combustion was responsible for of total CO emissions in the greater Baltimore area in 2017).[3] Ambient CO exposure levels at residences are of particular interest because our primary goal of producing high-resolution exposure predictions is to support epidemiological studies. We observed that, in comparison with domain averages, the mean CO concentrations at MESA Air participant residences were higher and had smaller variations (Figure 4). The higher concentrations at residences were associated with their proximity to CO emission sources because the majority of the participants lived in urban areas close to traffic and with high population density (Figure S4). Because CO is a relatively inert gas, its indoor decay rate is negligible.[55] Without considering indoor sources, the indoor-outdoor concentration ratio of CO approximates 1 due to its high infiltration rate.[55] Hence, the elevated outdoor CO concentrations at residences would be associated with corresponding elevations in infiltrated concentrations indoors. On the other hand, nonfatal exposure to CO of indoor origin (e.g., home heating) is a nonnegligible microenvironmental source of total personal CO exposure.[22] This exposure modeling study, however, aimed to support epidemiological findings and policy implications of ambient CO, and thus did not focus on indoor-generated concentrations and personal exposures. We also found that the concentrations measured at the regulatory AQS stations did not represent the CO concentrations at residences well. The average of CO measurements at the AQS stations was in 2018, lower than the mean concentration of at MESA Air participant residences (i.e., the concentrations at residences were higher). This concentration difference highlights the importance of generating reliable, spatially resolved exposure estimates at places where individuals spend most of their time. Although the LCMs had been well-calibrated with reduced systematic biases, the measurements were still subject to random residual errors.[47] Our model accounted for the random residual errors in two ways: a) fitting different nugget values for AQS and LCM in the random residual field [ in Equation 4] and b) allowing the random residual field to capture large-scale deviations from the smoothed temporal trends. The former allowed the model to distinguish the higher random uncertainties in the calibrated LCM measurements from the minimal uncertainties in the AQS measurements. The latter allowed the model to adjust for additional random noise in the SVD-derived smoothed temporal trends. With the robust daily level prediction framework developed in this study, we expect to further explore in future research how sensitive the CO prediction model is to residual uncertainty in low-cost measurements that cannot be addressed by calibration, as we did in our prior work for other air pollutants, e.g., .[49] In addition, it is critical for air pollution prediction to be based on monitoring locations that are representative of locations of epidemiological interest (e.g., residential locations of a target cohort). In this study, we deployed LCMs at a subset of residential locations of MESA Air participants—our target cohort. In our prior research into , we developed a quantitative measure based on geographic features to select optimal monitoring locations representative of any target prediction locations.[49] We plan to examine the effectiveness of this quantitative measure for CO monitor deployment in future MESA Air monitoring campaigns. The integration of high-resolution meteorological data, satellite-retrieved CO column densities, and copollutant (, , and ) concentrations did not meaningfully improve our model’s spatial or temporal predictive performance (Table 2). The lack of contribution of the meteorological data indicates that the SVD-derived smoothed temporal trends fitted the temporal variation of CO concentrations at monitoring locations within our city-scale study domain well. Follow-up research is necessary to examine whether the high-resolution meteorological data can help recover weather-related CO variations that are not adequately captured by the SVD temporal trends in a larger study domain (e.g., at the regional or national scale). A possible reason for the lack of contribution of both satellite CO retrievals and copollutant concentrations to model improvement is that these variables had only moderate correlations with ground-level CO concentrations and showed different spatial patterns (Figures S12 and S13). Satellite retrievals reflect CO concentrations in the atmospheric column. Nonnegligible CO concentrations in the free troposphere result from in situ photochemical production and convective transport from the planetary boundary layer.[53] Because of the long tropospheric lifetime of CO (weeks to months) and large-scale spatial heterogeneity of these processes,[5-7] column CO might poorly reflect surface-level CO. Additionally, temporal misalignment of satellite CO retrievals and ground-level CO measurements may result in a further reduced correlation between the two: Satellite retrievals are limited to time of overpass, whereas ground measurements are temporally continuous. This result reflects an advantage of our ST modeling framework: Because it is parameterized to allow spatial smoothing, it may be more reliable in estimating spatial variations of air pollution based on geographic features.[39] This may not be the case for other models such as those based on nonparametric machine learning algorithms, which do not incorporate an internal spatial smoothing structure.[56-58] It is still worth examining the contribution of satellite retrievals to those models. Finally, care should be taken when using copollutant predictions as geographic covariates or spatiotemporal variables, because the predictions are subject to spatially variable uncertainties; i.e., the uncertainties may be lower at monitoring locations and higher at locations without ground-level measurements. These variable spatial uncertainties may negatively influence the quality of exposure estimates. A major limitation of this study was the use of LCM measurements in the CV process. Even though the LCM measurements were rigorously calibrated, their residual uncertainty could still bias the CV results. An ideal validation data set should only include accurate ground-level measurements (e.g., AQS). However, the AQS stations were too spatially sparse. Validation using only AQS data would be spatially less informative and may not be representative of ambient exposures at residences, potentially leading to substantial Berkson-type errors. The use of LCM measurements in CV overcame these issues and hence was a reasonable trade-off. Additionally, there were eight long-term monitors (three AQS monitors and five LCMs) across our study domain. More long-term monitors would allow a better estimation of the temporal variations of CO. However, the potential improvement would be limited because the temporal variations were generally consistent across the study domain (Figure S6). Furthermore, although we selected the LCM deployment locations from MESA Air participant residences, the selected locations were relatively sparse in the Baltimore city center, and there were no monitors in the South Baltimore. As a result, the model’s predictive performance might be spatially heterogeneous and lower in areas with fewer monitors. Finally, this daily level CO modeling work did not take advantage of raw LCM measurements at a higher temporal resolution. We recently demonstrated that our ST modeling framework can be applied effectively to urban black carbon prediction at the hourly scale.[59] It is important for future studies to explore the use of the ST framework to predict CO concentrations at finer temporal scales (e.g., hourly), which will be beneficial for microscale exposure assessment[60] and air pollution research that requires a higher temporal resolution (e.g., wildfire smoke). In summary, we successfully developed a daily level spatiotemporal prediction model for ambient CO concentrations based on both temporally continuous regulatory and spatially dense low-cost measurements and characterized within-city variation in CO concentrations. The derived high-resolution ambient CO exposure estimates can facilitate more comprehensive epidemiological research to fill in the knowledge gaps regarding adverse health effects of CO at concentrations below the NAAQS thresholds (as is the case in most U.S. cities) and in resource-restricted environments with rapidly increasing deployment of low-cost air monitors. Click here for additional data file. Click here for additional data file.

35 in total

1. Effect of carbon monoxide on maximal treadmill exercise. A study in normal persons.

Authors: W S Aronow; J Cassidy
Journal: Ann Intern Med Date: 1975-10 Impact factor: 25.391

2. Revised analyses of the National Morbidity, Mortality, and Air Pollution Study: mortality among residents of 90 cities.

Authors: Francesca Dominici; Aidan McDermott; Michael Daniels; Scott L Zeger; Jonathan M Samet
Journal: J Toxicol Environ Health A Date: 2005 Jul 9-23

3. Satellite-Based NO2 and Model Validation in a National Prediction Model Based on Universal Kriging and Land-Use Regression.

Authors: Michael T Young; Matthew J Bechle; Paul D Sampson; Adam A Szpiro; Julian D Marshall; Lianne Sheppard; Joel D Kaufman
Journal: Environ Sci Technol Date: 2016-03-21 Impact factor: 9.028

4. Air pollution and hospital admissions for cardiovascular disease in Taipei, Taiwan.

Authors: Chih-Ching Chang; Shang-Shyue Tsai; Shu-Chen Ho; Chun-Yuh Yang
Journal: Environ Res Date: 2005-05 Impact factor: 6.498

5. Insights from Application of a Hierarchical Spatio-Temporal Model to an Intensive Urban Black Carbon Monitoring Dataset.

Authors: Travis Hee Wai; Joshua S Apte; Maria H Harris; Thomas W Kirchstetter; Christopher J Portier; Chelsea V Preble; Ananya Roy; Adam A Szpiro
Journal: Atmos Environ (1994) Date: 2022-03-23 Impact factor: 5.755

6. Assessing PM2.5 Exposures with High Spatiotemporal Resolution across the Continental United States.

Authors: Qian Di; Itai Kloog; Petros Koutrakis; Alexei Lyapustin; Yujie Wang; Joel Schwartz
Journal: Environ Sci Technol Date: 2016-04-22 Impact factor: 9.028

7. Publicly available low-cost sensor measurements for PM_2.5 exposure modeling: Guidance for monitor deployment and data selection.

Authors: Jianzhao Bi; Nancy Carmona; Magali N Blanco; Amanda J Gassett; Edmund Seto; Adam A Szpiro; Timothy V Larson; Paul D Sampson; Joel D Kaufman; Lianne Sheppard
Journal: Environ Int Date: 2021-09-30 Impact factor: 9.621

8. Asthma symptoms in Hispanic children and daily ambient exposures to toxic and criteria air pollutants.

Authors: Ralph J Delfino; Henry Gong; William S Linn; Edo D Pellizzari; Ye Hu
Journal: Environ Health Perspect Date: 2003-04 Impact factor: 9.031

9. Birth outcomes and prenatal exposure to ozone, carbon monoxide, and particulate matter: results from the Children's Health Study.

Authors: Muhammad T Salam; Joshua Millstein; Yu-Fen Li; Frederick W Lurmann; Helene G Margolis; Frank D Gilliland
Journal: Environ Health Perspect Date: 2005-11 Impact factor: 9.031

10. The effects of air pollution on hospitalizations for cardiovascular disease in elderly people in Australian and New Zealand cities.

Authors: Adrian G Barnett; Gail M Williams; Joel Schwartz; Trudi L Best; Anne H Neller; Anna L Petroeschevsky; Rod W Simpson
Journal: Environ Health Perspect Date: 2006-07 Impact factor: 9.031