Literature DB >> 29317739

Predicting polycyclic aromatic hydrocarbons using a mass fraction approach in a geostatistical framework across North Carolina.

Jeanette M Reyes¹, Heidi F Hubbard², Matthew A Stiegel³, Joachim D Pleil^4,5, Marc L Serre⁶.

Abstract

Currently in the United States there are no regulatory standards for ambient concentrations of polycyclic aromatic hydrocarbons (PAHs), a class of organic compounds with known carcinogenic species. As such, monitoring data are not routinely collected resulting in limited exposure mapping and epidemiologic studies. This work develops the log-mass fraction (LMF) Bayesian maximum entropy (BME) geostatistical prediction method used to predict the concentration of nine particle-bound PAHs across the US state of North Carolina. The LMF method develops a relationship between a relatively small number of collocated PAH and fine Particulate Matter (PM2.5) samples collected in 2005 and applies that relationship to a larger number of locations where PM2.5 is routinely monitored to more broadly estimate PAH concentrations across the state. Cross validation and mapping results indicate that by incorporating both PAH and PM2.5 data, the LMF BME method reduces mean squared error by 28.4% and produces more realistic spatial gradients compared to the traditional kriging approach based solely on observed PAH data. The LMF BME method efficiently creates PAH predictions in a PAH data sparse and PM2.5 data rich setting, opening the door for more expansive epidemiologic exposure assessments of ambient PAH.

Entities: CellLine Chemical Disease Gene Species

Keywords: Ambient exposures; Bayesian maximum entropy; Geostatistics; Mass fraction; PAHs

Mesh：

Substances：

Year: 2018 PMID： 29317739 PMCID： PMC6013350 DOI： 10.1038/s41370-017-0009-6

Source DB: PubMed Journal: J Expo Sci Environ Epidemiol ISSN： 1559-0631 Impact factor: 5.563

1. Introduction

Polycyclic Aromatic Hydrocarbons (PAHs) are a class of organic compounds containing 2 or more fused aromatic rings created through incomplete fuel combustion from a variety of sources including biofuel burning, wildfires, coal production, etc.[1,2] Several species of PAHs and their metabolites have been designated by the United States Environmental Protection Agency (US EPA) as being probable human carcinogens.[3-6] Currently the EPA only has PAH regulatory standards for drinking water and the National Institute for Occupational Safety and Health (NIOSH) has established occupational exposure limits to coal tar pitch volatiles.[7] International organizations and other countries have established ambient concentration guidelines for one of the more toxic PAHs, benzo(a)pyrene.[8] However, currently in the US there are no regulatory standards for ambient concentrations of PAHs. Compared to regulated ambient air pollutants, there are few epidemiologic studies that have utilized observed data or explored ambient exposures to different PAHs, which can be costly to measure.[9,10] From a geostatistical perspective, limited ambient observed data have resulted in few studies creating maps of PAHs concentrations.[11-14] Others have used Chemical Transport Models (CTMs) to predict PAH concentrations.[8,15,16] However, these studies are also limited in number. As a result, there is a gap in the literature exploring ambient PAH exposures and their associations with various health endpoints. Short-term health effects include eye and skin irritation, nausea and vomiting while long-term health effects include increased risk to skin, lung and bladder cancer as well as cardiopulmonary mortality.[7] While many of these health effects are associated with either occupational exposures or drinking water exposures, the relationship between ambient concentrations of PAH to their associated health effects has not been well explored. Both inside and outside the US there is a lack of consistent PAH observed monitoring outside of monitoring campaigns conducted for specific studies. In contrast to the data sparse environment of PAH observed data, Particulate Matter ≤ 2.5 micrometers in diameter (PM2.5) exists in a data rich environment with a vast, consistent, historical monitoring network across the US.[17,18] Currently there are 16 EPA designated priority PAHs, 9 of which are particle-bound.[13] Thus, a portion of PM2.5 can be particle-bound PAH. Currently, the US state of North Carolina has no maps displaying PAH concentrations using observed data. This study explores the relationship between PM2.5 and PAH and is an extension of previous work done by Allshouse et al.[13] that developed the log-Mass Fraction (LMF) Bayesian Maximum Entropy[19,20] (BME) geostatistical method and applied this method to model the distribution of PAH near the World Trade Center after September 11th. The observational PAH data used in that work came from the analysis of 243 PM2.5 filters from four sites spanning approximately 200 days split among the sites near and around Ground Zero set up following September 11th. In this work we analyze the PAH content of PM2.5 filters collected across the US state of North Carolina and implemented the LMF BME method to predict PAH concentration at unmonitored locations, creating the first maps of PAH across North Carolina for 2005 using observed data. Furthermore, we compare the LMF BME method with a simple Linear Regression (LR) BME method and more traditional geostatistical methods for the first time. Methods are evaluated through cross validation. Predictive maps are used to visualize the probability of exceeding PAH cutoff concentrations. Lastly, a comparison is performed between the LMF BME and other methods to learn how the relationship between PAH concentrations near wildfires may change for different prediction methods. These results provide a method for which a data sparse environment can be exploited in an efficient manner in conjunction with a data rich secondary data (e.g. PM2.5 data) environment in which the resulting relationship between the two can be applied to estimate concentrations of data sparse air pollutants elsewhere in a given domain. This cost-effective method in terms of analyzing observed data can be applied to other air pollution parameters that have not been previously mapped. This methodology opens the door for greater epidemiologic studies exploring the association between ambient concentrations of PAHs and various health endpoints.

2. Materials and Methods

2.1 Observed PM2.5 and PAH data

Daily PM2.5 filters in North Carolina during 2004-2005 were collected as part of the monitoring effort needed to provide the data reported in the EPA's Air Quality Systems (AQS) data base.[17] Of the PM2.5 filters collected during this time period, we selected 84 filters collected in 2005 and analyzed them for the following 9 species of PAHs: benz(a)anthracene, chrysene, benzo(b)fluoranthrene, benzo(k)fluoranthrene, benzo(e)pyrene, benzo(a)pyrene, indeno(1,2,3-cd)pyrene, benzo(g,h,i)perylene, dibenzo(a,h)anthracene and the summation of the 9 PAH species called Total PAH. PM2.5 has units of μg/m3 and PAH has units of ng/m3.

2.2 The Linear Regression and log-Mass Fraction method

There are approximately 8,000 space/time locations for the state of North Carolina in 2005 where daily PM2.5 is observed and recorded in AQS. PAH was estimated at these locations using surrounding PAH and PM2.5 information. There were two different PAH estimation methods: 1) a LR method consisting of a regression created from paired PM2.5 and PAH in an estimation neighborhood in which PAH is then predicted at locations where PM2.5 is known and 2) a LMF method that assumes the ratio of PAH/PM2.5 is constant within an optimized estimation neighborhood in which PAH is then predicted by applying the ratio at locations where PM2.5 is known. Let be the PAH space/time locations where PAH was directly measured from a PM2.5 filter, where the location of the ith PAH measurement is denoted as = (, t) ∈ , where is the spatial coordinate and t is the time coordinate. Let be the space/time locations where PAH is estimated from PM2.5, in which the location of the jth individual estimate is denoted as ∈ . The LR method is a simple linear regression of PAH with respect to PM2.5. This linear regression can be expressed at the locations (where both PAH and PM2.5 are measured) as: The subsequent sections describe how the parameters β0 and β1 are estimated. The relationship in Equ. 1 can then be used to estimate PAH at the locations (where only PM2.5 is measured) with the distribution where is the linear regression prediction variance. The LMF method was defined by Allshouse et al.[13] as expressing the relation between PAH and PM2.5 at the locations as which can be rewritten as This approach is attractive because it uses only one parameter, namely LMF, to estimate PAH based on PM2.5, making the LMF more parsimonious and more localized around the estimation location of interest. At locations where only PM2.5 is measured, the mean μ, and variance of LMF can be estimated as N() is the number of LMF values closest to the space/time location . The optimization of N() is described in subsequent sections The relationship in Equ. 3 can then be used to estimate PAH at locations (where only PM2.5 is measured) as Equ. 7 becomes a component in the BME estimation methodology described in section 2.4. In the limiting case, the LMF and LR methods are equivalent when β0 = μ and β1 = 1.

2.3 Neighborhood optimization

The parameters N() and N() are optimized using the following methodology. For each of the 84 space/time PAH measurements, the measured PAH is excluded and re-estimated based on the collocated PM2.5 using either the LR method (Equ. 2) or the LMF method (Equ. 7) calibrated based on the paired PAH/PM2.5 values located in a local neighborhood of the excluded PAH value. This local neighborhood consists of the n, the number of observed data ranging from 1 to 84, closest pairs, where space/time proximity is defined based on the space/time distance d = r + STM × t, such that r (km) is the spatial distance, t (day) is the time difference and STM (km/day) is the space/time metric. A given choice of the parameters n and STM creates 84 errors between measured and re-estimated PAH value, from which a Mean Squared Error (MSE) is calculated (Figure S1). This MSE is calculated for 75,600 different combinations of n and STM for each PAH and method (i.e. LR and LMF). For each PAH, the parameters N() and N() are selected by choosing the n and STM that produced the lowest MSE. Due to the number of parameters of the LMF and LR methods (i.e. one parameter for LMR and two for LR), N() ≥ 2 while N() ≥ 1. See Christakos and Serre (1999) for a more detailed explanation of the STM.[21] The values found for N() and N() for each PAH are then applied to all locations to estimate the corresponding PAH using Equ. 2 and Equ. 7. The optimized n and STM were only used for the optimization of parameters N(p) and N(p). These PAH estimates become input data in the BME estimation framework described next.

2.4 Bayesian Maximum Entropy estimation methodology

BME provides a mathematically rigorous geostatistical space/time framework for the estimation of PAH at locations where neither PAH or PM2.5 are monitored.[19,20] BME can incorporate information from multiple data sources and is implemented using the BMElib suite of functions in MATLAB™.[22,23] The buttress of BME has been detailed in other works,[21,24,25] and can be summarized as performing the following steps: 1) gathering the general knowledge base (G-KB) and site-specific knowledge base (S-KB) characterizing the Space/Time Random Field (S/TRF) X() representing a process at , 2) using the Maximum Entropy principle of information theory to process the G-KB in the form of a prior Probability Distribution Function (PDF) f, through a mean trend and an isotropic covariance model, 3) integrating S-KB in the form of a PDF f with and without measurement error using an epistemic Bayesian conditionalization rule (i.e. in this work, a priori information is updated on observed data) to create a posterior PDF f and 4) creating space/time predictions based on the analysis. All the code and data used for the analysis presented in this work are available from https://github.com/reyesjmUNC/ReyesEtAlJESEE_PAH. In this study, we use an S/TRF to describe the variability of PAH across North Carolina in 2005. In this work are the observed PAH data and f() is obtained at the locations where PAH is estimated from PM2.5 observations through either the LR (Equ 2) or the LMF (Equ. 7) method. We can then calculate x, the predicted daily PAH at the unmonitored location . More information about the prediction methodology can be found in the Supplementary Information.

2.5 Leave-One-Out Cross Validation accuracy analysis

To assess the prediction accuracy of the LMF and LR methods, a Leave-One-Out Cross Validation (LOOCV) accuracy analysis is performed. For each monitoring station where observed PAH data exist, all observed data from a given station are removed one at a time and a BME prediction was conducted (without recalculating f) to obtain the BME predictions at that station using all the remaining observed and estimated data. The difference between each mean predicted PAH x̃ and observed PAH value x̂ is the prediction error, e = x̃ – x̂. The prediction accuracy was quantified based on prediction error statistics, which consist of the Mean Error (ME, ng/m3), Variance of Errors (VE, (ng/m3)2), Root Mean Squared Error (RMSE, ng/m3), Mean Squared Error (MSE, (ng/m3)2) and the squared of the Pearson correlation coefficient (r2, unitless) calculated between observed and mean predicted values. LMF BME and LR BME predictions were then compared to kriging (i.e. predictions created only using observed PAH data) and cokriging (i.e. predictions created using both PAH and PM2.5 observed data).

2.6 Fire comparisons

Wildfires contribute to a sizable percentage of PAH emissions in the US.[2] The mean difference in PAH concentrations near known wildfire locations were estimated. PAH was estimated on a fine grid across North Carolina on days with observed PAH data. PAH was estimated using 4 different prediction methods: 1) kriging, 2) cokriging, 3) LR BME and 4) LMF BME. Fire data were obtained from the Federal Wildfire Fire Occurrence Website.[26] All fires greater than or equal to one acre in North Carolina, Virginia, Tennessee and South Carolina were collected in 2005 on days for which PAH observed data were measured where the start and control date of the fires were known (n = 213). A two-tailed two-sampled t-test (assuming unequal variances) is calculated on the PAH predictions on a fine grid at a 5% significance level. The significance test is performed on all fine grid predictions within 100 km of known fire locations and all fine grid predictions outside of 100 km.

3. Results and Discussion

3.1 Neighborhood optimization

A log-transformation of PM2.5 and PAH were taken due to the skewness of observed values. For each PAH and BME estimation method (i.e. LR and LMF), the optimal values of the parameters n and STM defining the estimation neighborhood were selected such that it minimized the MSE. Across each PAH the n closest observed data that optimized the estimation neighborhood was always smaller for the LMF method compared to the LR method (Table 1). Generally, we expect that the calibration of the LMF method requires less paired PAH/PM2.5 values because it is more parsimonious (i.e. has less parameters) than the LR method. Indeed, we find that the parameter n ranges from 2-5 for the LMF method whereas n ranges from 7-14 for the LR method. Benzo(g,h,i)perylene, indeno(1,2,3-c,d)pyrene and benzo(e)pyrene require n = 2 from the LMF method, the least amount of paired PAH/PM2.5 values across all PAHs. Seven out of 9 PAHs in the LR method require n = 14.

Table 1

Neighborhood optimization for each PAH estimate. Optimized n closest observed data locations (as determined by the space/time metric) corresponding to the minimized mean squared error validation statistic calculated through the linear regression and mass fractions methods across the 9 PAHs, with Total PAH being the summation. Bolded numbers indicate the lowest MSE for each PAH across the neighborhood optimization methods.

	Linear Regression			log-Mass Fraction
PAH	n	S/T Metric (km/days)	MSE (ng/m³)²	n	S/T Metric (km/days)	MSE (ng/m³)²
benz(a)anthracene	14	0.891	1.128	5	0.839	0.908
chrysene	7	0.600	0.979	5	0.839	0.799
benzo(b)fluoranthrene	7	0.863	1.358	5	0.899	1.180
benzo(k)fluoranthrene	14	0.895	1.375	5	0.842	1.046
benzo(e)pyrene	14	0.895	1.006	2	0.868	0.726
benzo(a)pyrene	14	0.895	1.332	5	0.899	1.417
indeno(1,2,3-c,d)pyrene	14	0.891	0.892	2	0.868	0.702
benzo(g,h,i)perylene	14	0.895	0.757	2	0.777	0.742
dibenzo(a,h)anthracene	14	0.772	1.532	3	0.820	1.115
Total PAH	14	0.895	0.890	3	0.820	0.675

Across each PAH and Total PAH, the minimized MSE was consistently lower for the LMF method than the LR method with the exception of benzo(a)pyrene. With these optimized neighborhoods, estimates were created by each method and each PAH is predicted across North Carolina using BME. The PAH estimation neighborhood for the LMF method is smaller than the LR method. Out of the previously mentioned studies, very little use observed PAH data and of those studies that do, most observed data come from short-lived monitoring campaigns. The results presented in this work utilize long-term, established PM2.5 regulatory monitoring sites. PM2.5 data is comparatively plentiful. Previously, data fusion methods have blended together multiple air pollutants that have different spatial supports.[27,28] By developing a relationship between a few PAH observations and several PM2.5 observations, the door is opened to applying this relationship to a network with a large amount of publicly available data. Data sparse environments (e.g. PAH) can benefit from data rich secondary environments (e.g. PM2.5). However, for this relationship to be fully exploited, it must be constructed in such a manner that best utilizes the limited data set. That is, the relationship between PAH and PM2.5 must be parsimonious. The LMF method has only one parameter to be estimated, namely, LMF(Equ. 6). The minimum number of observed data needed to construct a PAH estimation is low with N() ≥ 1. The PAH estimates created from the LMF method required less observed data than the LR method. We hypothesize that this increase in the number of parameters makes the LR model less parsimonious, requiring more paired PAH and PM2.5 to optimize the estimation neighborhood. The PAH paired data is then outside of the relevant air shed of estimation.

3.2 PAH prediction maps

This work created the first maps of predicted PAH in space/time across the US state of North Carolina for 2005 using observed data. Each of the 9 PAHs was predicted on a fine grid across North Carolina every day observed PAH data were collected (41 days) in 2005 for the 4 prediction methods: kriging, cokriging, LR BME and LMF BME. Estimation method parameters can be found in Supplementary Information (Supplementary Tables S1 and S2). Mean prediction maps of benzo(b)fluoranthrene for the 4 methods are displayed across North Carolina on April 16, 2005 with observed and PAH estimates pictured (Fig. 1). The kriging map consistently predicts the highest PAH concentrations across the 4 methods at unmonitored locations with the least realistic gradient. Kriging has difficulty distinguishing between multiple PAH fronts and plumes. The minimal gradation is influenced by the sparse data. Predictions made far from observed data therefore had a large associated variance. The sparse data was only able to pick up the coarsest of PAH gradients. The cokriging map is visually similar to the kriging map. The cross-covariance relationship between PAH and PM2.5 contributed little to the cokriging predictions (see Table S2). The prediction map becomes visibly different for the LR BME method. The gradient for the LR BME method falls more in line with a geographical pattern across the state. There is an increase in concentration in Eastern North Carolina compared with the kriging and cokriging maps. The LMF BME prediction method produces the lowest PAH concentrations across large sections of the state. Across all 4 methods, the relatively highest concentrations were found in Western North Carolina and concentrations become increasingly more refined across methods. The LMF map is the only map to show two different PAH concentration fronts: one in the western part of the state and another separate front in the Eastern North Carolina.

Figure 1

Map of benzo(b)fluoranthrene (ng/m3). Maps of mean benzo(b)fluoranthrene concentration for North Carolina on April 16, 2005 across the 4 prediction methods: (a) kriging, (b) cokriging, (c) Linear Regression BME, (d) log-Mass Fraction BME. Square markers indicate observed data, circle markers indicate PAH estimates, X's mark known fires for that day with a 100 km buffer.

Few other studies have created maps of ambient PAH concentrations across a given area using geostatistical methods from observed data. These limited studies are due in part to the lack of observed data, much like the mapping scenario presented in this work and previous works.[13] One previous study fit a temporal trend comparing a few long-running PAH stations from the Great Lakes region of the US and a few stations across Europe. However, only a temporal trend was fit through a regression and a spatial interpolation was not conducted.[29] One of the few studies that created maps over a large area, displayed benzo(a)pyrene across Europe for 1990, 2001 and 2005 using a transport model.[8] Another study creating maps of PAH across Europe utilized kriging to estimated benzo(a)pyrene for 2012 using two different chemical transport models as data.[15] A study in Portugal used observed PAH data extracted from lichen and created maps using kriging.[11,12] Land use regression models have also been used to estimate PAH.[30,31] Few studies have investigated PAH bound to PM.[32] The closest study to the LR BME method presented in this work used a monitoring campaign along with personal monitors to analyze PAH from PM2.5 in which predictions were made at unmonitored locations using kriging in Kaohsiung city, Taiwan.[14] A regression model with a variety of explanatory variables was then applied to PM2.5 data to predict PAH. With only a handful of observed PAH data taken throughout the year, the LR BME and LMF BME method can create estimates with a corresponding uncertainty that was incorporated into the BME framework. Incorporating the PAH estimates added to the set of available data ultimately used for prediction allowing for increased spatial variation. Of the BME methods, LMF BME was superior in terms of visually distinguishing spatial variations of mean predicted PAH concentrations across the state.

3.3 Cross-validation

A LOOCV analysis was performed across 2005 using the 4 prediction methods. Summary statistics were calculated showing performance for Total PAH (Table 2). Cross validations statistics for all 9 PAHs can be found in Supplementary Information (Supplementary Table S3). For Total PAH ME decreases from kriging to LMF BME. ME is negative across each prediction method meaning that overall, the methods under-predicts observed Total PAH concentrations. ME is highest in magnitude for kriging and closest to zero for LMF BME. There is a 58.8% reduction in ME from LR BME to LMF BME. There is less variation in error from kriging to LMF BME as seen through a 26.7% reduction in VE from kriging to LMF BME. There is a consistent reduction in MSE across the 4 prediction methods. There is a 28.4% reduction in MSE from kriging to LMF BME. The correlation coefficient increases across methods. There is a 10.3% increase in r2 from LR BME to LMF BME. The performance statistics from kriging are similar to cokriging. This echoes the results seen in the prediction maps. Traditional incorporation of PM2.5 as a co-pollutant through cokriging adds little to the predictive captivity of PAH. Incorporating PM2.5 with the BME methods showed more substantial improvements in the cross-validation statistics, with the best performance obtained through the LMF BME method.

Table 2

Cross validation statistics. Leave-One-Out Cross Validation statistics for Total PAH (summation of the 9 PAHs) comparing observed and predicted concentrations across the 4 prediction methods for North Carolina in 2005. ME is Mean Error, VE is Variance of Error, RMSE is Root Mean Squared Error, MSE is Mean Squared Error and r2 is the Pearson correlation coefficient squared.

Statistic	Kriging	Cokriging	Linear Regression BME	log-Mass Fraction BME
ME (ng/m³)	-0.145	-0.137	-0.102	-0.042
VE (ng/m³)²	0.806	0.782	0.764	0.591
RMSE (ng/m³)	0.904	0.890	0.875	0.765
MSE (ng/m³)²	0.818	0.792	0.766	0.586
r² (unitless)	0.747	0.752	0.744	0.821

The LMF BME method consistently outperformed the other comparison methods as seen visually through maps and through the LOOCV statistics. Of the 4 prediction methods, kriging performed the worst. Kriging predictions were driven exclusively by the observed data. Cokriging performed similarly to kriging. Cokriging is an intuitive choice for collocated, ambient, environmental parameters in a geostatistical setting. In the literature, to the best of our knowledge, cokriging has not been used to predict ambient PAH concentrations, making it an ideal candidate method to explore. In this work the cokriging cross-covariance is able to capture the relationship between PAH and PM2.5. However, as seen through predictive maps and through cross validation, the cokriging incorporation of PM2.5 contributes little in terms of predictive capacity. Linear regression is another intuitive choice with collocated data. The LR BME method shows a marked improvement visually and through estimation accuracy. The LR method is able to estimate PAH at PM2.5 space/time locations using an optimized neighborhood customized for each PAH. However, LR performed consistently worse than the LMF method. The LR method uses 2 parameters (i.e. β0 and β1 while the LMF method uses only one. We hypothesize that this difference in the number of parameters influences cross validation performance.

3.4 Probability of exceedance

In a geostatistical framework, predictions come in the form of a PDF with a corresponding mean and variance. With this PDF, the probability of exceeding a given value can be calculated. An annual benzo(a)pyrene concentration of 0.25 ng/m3 has been suggested in the United Kingdom.[8] With this standard in mind, the probability of exceeding this cutoff was calculated for annual benzo(a)pyrene concentrations on a fine grid in North Carolina in 2005 for each prediction method by taking the mean and variance of daily benzo(a) pyrene predictions (Fig. 2). Overall PAH concentration decrease across methods, thus the probability of exceeding the 0.25 ng/m3 cutoff in turn decreases from kriging to LMF BME. Across methods, the region of the state with the relatively highest probability of exceedance is maintained as Western North Carolina as well as the border with the US state of South Carolina. Across all prediction methods, the probability of exceedance remained relatively low with the maximum probability of exceedance remaining below 0.50. The area covered from increasing probabilities of exceedance increases across the 4 prediction methods (Table 3). The cokriging method had the lowest maximum probability of exceedance (i.e. 0.16) across the annual prediction locations with 54,432 km2 having a probability of exceedance ≥ 0.15. We see the BME methods were better able to differentiate areas of high and low probabilities of exceedance. The LMF BME was best able to distinguish the maximum probability of exceedance. Neither the kriging or cokriging maps contain any area with a probability of exceedance ≥ 0.30. The LMF BME method has 2.5 times the area with ≥ 0.30 probability of exceedance compared to the LR BME method (i.e. 6,480 km2 and 2,592 km2, respectively). Through having more realistic ambient predictive gradients, the LMF BME method becomes an effective tool to identify areas of exceedance of different PAH concentrations.

Figure 2

Probability of exceedance. Probability of annual benzo(a)pyrene exceeding 0.25 ng/m3 across North Carolina in 2005 as predicted by (a) kriging, (b) cokriging, (c) Linear Regression BME and (d) log-Mass Fraction BME.

Table 3

Area (in km2) covered corresponding to increasing probabilities of exceeding the average annual benzo(a)pyrene standard of 0.25 ng/m3 among the prediction locations in and around North Carolina in 2005.

Probability of Exceedance	≥0.10	≥0.15	≥0.20	≥0.25	≥0.30
Kriging	162,000	119,232	5,184	0	0
Cokriging	143,856	54,432	0	0	0
Linear Regression BME	164,592	139,968	36,288	10,368	2,592
log-Mass Fraction BME	156,816	116,640	28,512	15,552	6,480

3.5 Association with wildfires

The mean difference in PAH predictions (for the 9 PAHs and Total PAH) as calculated through the 4 prediction methods was found through a two-sampled t-test comparing areas near (≤ 100 km) and far (> 100 km) from known wildfire locations predicted across all days with observed data in 2005 (Table 4). For the LMF method, all 9 PAHs and Total PAH showed a statistically significant difference between predictions near versus far from fires. For the LR method 6 PAHs and Total PAH showed a significant difference. For both kriging and cokriging 4 PAHs and Total PAH showed a significant difference greater than zero. Of those PAHs that showed a significant difference greater than zero, the LMF method had the largest differences across 8 PAHs (i.e. benzo(g,h,i)perylene being the exception) and Total PAH. Known fire locations for April 16, 2005 are marked along with a 100 km radial buffer surrounding each location (Fig. 1). Across prediction methods, PAH concentrations are higher within/near these buffers. Indeed, benzo(b)fluoranthrene (depicted in Fig. 1) was one of the 4 PAHs (along with Total PAH) that showed both a significant, positive difference across all prediction methods.

Table 4

Mean difference in PAH near versus far from fires. 95% confidence intervals comparing the mean difference in predicted PAH near (within 100 km) versus far (> 100 km) from fires for each of the 9 PAHs and Total PAH across the 4 prediction methods. Units are in ng/m3.

PAH	Kriging	Cokriging	Linear Regression BME	log-Mass Fraction BME
benz(a)anthracene	(-4.94E-03,-2.17E-03)*	(-2.61E-03,-1.07E-05)*	(-1.26E-03,1.07E-03)	(1.57E-03,4.38E-03)*^,#
chrysene	(-6.67E-03,-3.53E-03)*	(-3.74E-03,-8.28E-04)*	(-9.40E-04,1.67E-03)	(2.07E-03,5.39E-03)*^,#
benzo(b)fluoranthrene	(3.98E-03,1.11E-02)*^,#	(3.78E-03,1.10E-02)*^,#	(1.09E-02,2.23E-02)*^,#	(2.36E-02,3.02E-02)*^,#
benzo(k)fluoranthrene	(3.14E-03,6.47E-03)*^,#	(2.32E-03,5.01E-03)*^,#	(5.27E-03,7.94E-03)*^,#	(7.80E-03,1.08E-02)*^,#
benzo(e)pyrene	(-2.92E-03,2.23E-03)	(-3.17E-03,1.71E-03)	(5.22E-03,9.80E-03)*^,#	(1.83E-02,2.49E-02)*^,#
benzo(a)pyrene	(-3.83E-03,1.84E-03)	(-6.24E-03,-8.42E-04)*	(2.23E-03,1.37E-02)*^,#	(5.14E-03,1.02E-02)*^,#
indeno(1,2,3-c,d)pyrene	(1.87E-02,3.05E-02)*^,#	(1.73E-02,2.87E-02)*^,#	(2.04E-02,3.24E-02)*^,#	(4.79E-02,6.11E-02)*^,#
benzo(g,h,i)perylene	(3.04E-02,4.27E-02)*^,#	(2.54E-02,3.66E-02)*^,#	(2.72E-02,4.06E-02)*^,#	(3.13E-02,4.06E-02)*^,#
dibenzo(a,h)anthracene	(-2.07E-02,-1.36E-02)*	(-1.71E-02,-1.07E-02)*	(-4.80E-03,2.16E-03)	(1.90E-03,8.77E-03)*^,#
Total PAH	(2.28E-02,6.75E-02)*^,#	(2.13E-02,6.57E-02)*^,#	(6.29E-02,1.03E-01)*^,#	(1.72E-01,2.30E-01)*^,#

mean difference is statistically significant (p-value≤ 0.05),

mean difference > 0.

This work investigates ambient concentrations of a set of particle-bound PAHs. Ambient concentrations alone cannot distinguish sources. However, there are PAH ratios associated with certain sources. The diagnostic ratio of indeno(1,2,3-c,d)pyrene/(indeno(1,2,3-c,d)pyrene + benzo(g,h,i)perylene)=0.62 is associated with wood burning.[8] This ratio was calculated for March 5, 2005 data across all 4 prediction methods (Fig. 3). This day was chosen as one of the highest fire activity day for 2005, and thus, most likely to show an impact from fires. The ratio for kriging and cokriging remained under 0.62 across all prediction locations of the day. There is little variation of this ratio across the day for kriging and cokriging. This ratio increases and becomes closer in magnitude to 0.62 for the BME methods. There is more variation of this ratio for the LR BME method. We hypothesize that the LR BME method has better differentiation between PAH sources. LMF BME has the largest variation of the PAH diagnostic ratio, with the largest number of predictions near the 0.62 value. Both kriging and cokriging have ≤ 3.5% of prediction ratios for the day around 0.62 (i.e. 0.62 ± 0.05), LR BME has 5.6% of prediction ratios around 0.62 and LMF BME has 12% of prediction ratios around 0.62.

Figure 3

PAH ratios. Ratio of indeno(1,2,3-c,d)pyrene/(indeno(1,2,3-c,d)pyrene+benzo(g,h,i)perylene) on March 5, 2005 in North Carolina across the 4 prediction methods: (a) kriging, (b) cokriging, (c) Linear Regression BME, (d) log-Mass Fraction BME. Square markers indicate the ratio of observed data, circle markers indicate the ratio of PAH estimates, X's mark known fires for that day with a 100 km buffer.

Gathering information about wildfire smoke has become increasingly important as the number of large wildfires have increased in recent years.[33] The chronic health effects of wildfire smoke for firefighters and the general population is currently lacking or sparse in the literature.[34-36] The LMF BME method was better able to distinguish higher significant differences in PAH concentrations near known fire locations compared with other prediction methods. Of the 4 prediction methods the LMF was the only method that showed statistically significant, positive differences around areas with fires across all 9 PAHs and Total PAH. Although each fire may have a different acreage burned and the same buffer size was used for all the fires, the significance implies an association. Depending on the acreage burned from a fire, the type of vegetation burned and the duration of the fire, the smoke produced may be long lasting and may have long range transport. Smoke may have lingering effects past the control date of a fire. When the control date of fires is extended by one day, the kriging and cokriging methods have more PAHs with a statistically significant increase in concentrations near versus far from fires (Supplementary Table S4-S7). Diagnostic ratios should not be used in isolation. However, when used along known fire locations, it can strengthen the association between PAH concentration and its known sources.

3.6 Overall contributions

The LMF BME method allows for straightforward predictions of PAHs to be used for exposure assessments. There are a plethora of studies exploring the association between ambient PM2.5 and various health endpoints.[37-39] However, there are far less studies that explore ambient PAH exposures and associated health effects. Occupational inhalation exposures and associated health outcomes including lung cancer have been more thoroughly investigated in comparison to ambient exposures.[7] Few studies have investigated chronic ambient concentrations of PAHs. Many of the epidemiologic studies that have been explored investigate respiratory illnesses such as lung cancer and pulmonary function.[7,15,40] However, these studies are small. The lack of long-term ambient concentrations to PAHs may be related to inadequate exposure data. Analyzing PM filters for specific PAHs can be very costly, making it difficult of obtain larger amounts of observed data needed for exposure assessment.[9] The LMF BME method allows for an efficient and cost-effective way to utilize minimal PAHs observed data. The LMF BME method can be easily utilized to fill in this clear gap in the literature. Tied with corresponding health data, ambient predictions calculated through the LMF BME method could be used to assign exposure. Health metrics can then be calculated from the exposures. This opens the door to investigate possible health endpoints as well as assigning risk. This work created the first maps of ambient PAH concentration across the US state of North Carolina using observed data through the LMF BME geostatistical method. This method developed a relationship between paired PAH and PM2.5 data in a manner that is a parsimonious and cost-effective that can be utilized in a data sparse environment. The LMF BME method outperforms more traditionally used geostatistical methods and has the ability to elucidate a significant association between PAH predictions and known fire locations. The LMF BME method has the potential to be used to assign exposure in epidemiologic analyses to fill in the significant knowledge gap currently existing in the literature between ambient PAH exposures and potential health outcomes. Supplementary Figure S1. Exhaustive validation search of optimal estimation neighborhood for the (a) the Linear Regression method and (b) the log-Mass Fraction method for Total PAH displaying the MSE. The green “X” marks the lowest MSE ((ng/m3)2). Supplementary Table S1. Covariance model parameters for observed PAH data. Supplementary Table S2. Cokriging covariance model parameters for observed PAH and PM2.5 data. C is in in (ng/m3)2, C is in (μg/m3)2 and C is in (ng/m3) * (μg/m3). Supplementary Table S3. Cross validation statistics for all 9 PAHs and Total PAH. Supplementary Table S4. Mean difference in PAH near versus far from fires for kriging. 95% confidence intervals comparing the mean difference in predicted PAH near (within 100 km) versus far (> 100 km) from fires for each of the 9 PAH and Total PAH for the kriging method where the control date of fires was extended 0-3 days. Units are in ng/m3. *mean difference is statistically significant (p-value≤ 0.05), #mean difference > 0. Supplementary Table S5. Mean difference in PAH near versus far from fires for cokriging. 95% confidence intervals comparing the mean difference in predicted PAH near (within 100 km) versus far (> 100 km) from fires for each of the 9 PAH and Total PAH for the cokriging method where the control date of fires was extended 0-3 days. Units are in ng/m3. *mean difference is statistically significant (p-value≤ 0.05), #mean difference > 0. Supplementary Table S6. Mean difference in PAH near versus far from fires for linear regression BME. 95% confidence intervals comparing the mean difference in predicted PAH near (within 100 km) versus far (> 100 km) from fires for each of the 9 PAH and Total PAH for the linear regression BME method where the control date of fires was extended 0-3 days. Units are in ng/m3. *mean difference is statistically significant (p-value≤ 0.05), #mean difference > 0. Supplementary Table S7. Mean difference in PAH near versus far from fires for log-mass fraction BME. 95% confidence intervals comparing the mean difference in predicted PAH near (within 100 km) versus far (> 100 km) from fires for each of the 9 PAH and Total PAH for the log-mass fraction BME method where the control date of fires was extended 0-3 days. Units are in ng/m3. *mean difference is statistically significant (p-value≤ 0.05), #mean difference > 0.

20 in total

Review 1. Review of the health effects of wildland fire smoke on wildland firefighters and the public.

Authors: Olorunfemi Adetona; Timothy E Reinhardt; Joe Domitrovich; George Broyles; Anna M Adetona; Michael T Kleinman; Roger D Ottmar; Luke P Naeher
Journal: Inhal Toxicol Date: 2016 Impact factor: 2.724

2. Estimating population exposure to ambient polycyclic aromatic hydrocarbon in the United States - Part I: Model development and evaluation.

Authors: Jie Zhang; Jingyi Li; Peng Wang; Gang Chen; Pauline Mendola; Seth Sherman; Qi Ying
Journal: Environ Int Date: 2016-12-14 Impact factor: 9.621

3. A new grid-scale model simulating the spatiotemporal distribution of PM2.5-PAHs for exposure assessment.

Authors: Chon-Lin Lee; Hu-Ching Huang; Chin-Chou Wang; Chau-Chyun Sheu; Chao-Chien Wu; Sum-Yee Leung; Ruay-Sheng Lai; Chi-Cheng Lin; Yu-Feng Wei; I-Chien Lai; Han Jiang; Wei-Ling Chou; Wen-Yu Chung; Ming-Shyan Huang; Shau-Ku Huang
Journal: J Hazard Mater Date: 2016-04-22 Impact factor: 10.588

4. Differences in spatiotemporal variations of atmospheric PAH levels between North America and Europe: data from two air monitoring projects.

Authors: Liang-Ying Liu; Petr Kukučka; Marta Venier; Amina Salamova; Jana Klánová; Ronald A Hites
Journal: Environ Int Date: 2013-12-22 Impact factor: 9.621

5. Cardiovascular mortality and long-term exposure to particulate air pollution: epidemiological evidence of general pathophysiological pathways of disease.

Authors: C Arden Pope; Richard T Burnett; George D Thurston; Michael J Thun; Eugenia E Calle; Daniel Krewski; John J Godleski
Journal: Circulation Date: 2003-12-15 Impact factor: 29.690

6. Bayesian Maximum Entropy Integration of Ozone Observations and Model Predictions: A National Application.

Authors: Yadong Xu; Marc L Serre; Jeanette Reyes; William Vizuete
Journal: Environ Sci Technol Date: 2016-04-07 Impact factor: 9.028

7. Differential respiratory health effects from the 2008 northern California wildfires: A spatiotemporal approach.

Authors: Colleen E Reid; Michael Jerrett; Ira B Tager; Maya L Petersen; Jennifer K Mann; John R Balmes
Journal: Environ Res Date: 2016-06-15 Impact factor: 6.498

8. Exposures among pregnant women near the World Trade Center site on 11 September 2001.

Authors: Mary S Wolff; Susan L Teitelbaum; Paul J Lioy; Regina M Santella; Richard Y Wang; Robert L Jones; Kathleen L Caldwell; Andreas Sjödin; Wayman E Turner; Wei Li; Panos Georgopoulos; Gertrud S Berkowitz
Journal: Environ Health Perspect Date: 2005-06 Impact factor: 9.031

Review 9. Critical Review of Health Impacts of Wildfire Smoke Exposure.

Authors: Colleen E Reid; Michael Brauer; Fay H Johnston; Michael Jerrett; John R Balmes; Catherine T Elliott
Journal: Environ Health Perspect Date: 2016-04-15 Impact factor: 9.031

10. Ambient polycyclic aromatic hydrocarbons and pulmonary function in children.

Authors: Amy M Padula; John R Balmes; Ellen A Eisen; Jennifer Mann; Elizabeth M Noth; Frederick W Lurmann; Boriana Pratt; Ira B Tager; Kari Nadeau; S Katharine Hammond
Journal: J Expo Sci Environ Epidemiol Date: 2014-06-18 Impact factor: 5.563

1 in total

1. Improving emissions inputs via mobile measurements to estimate fine-scale Black Carbon monthly concentrations through geostatistical space-time data fusion.

Authors: Alejandro Valencia; Saravanan Arunachalam; Vlad Isakov; Brian Naess; Marc Serre
Journal: Sci Total Environ Date: 2021-06-10 Impact factor: 7.963

1 in total