| Literature DB >> 25867675 |
Allison P Patton1, Wig Zamore2, Elena N Naumova1,3, Jonathan I Levy4, Doug Brugge3, John L Durant1.
Abstract
Land use regression (LUR) models have been used to assess air pollutant exposure, but limited evidence exists on whether location-specific LUR models are applicable to other locations (transferability) or general models are applicable to smaller areas (generalizability). We tested transferability and generalizability of spatial-temporal LUR models of hourly particle number concentration (PNC) for Boston-area (MA, U.S.A.) urban neighborhoods near Interstate 93. Four neighborhood-specific regression models and one Boston-area model were developed from mobile monitoring measurements (34-46 days/neighborhood over one year each). Transferability was tested by applying each neighborhood-specific model to the other neighborhoods; generalizability was tested by applying the Boston-area model to each neighborhood. Both the transferability and generalizability of models were tested with and without neighborhood-specific calibration. Important PNC predictors (adjusted-R(2) = 0.24-0.43) included wind speed and direction, temperature, highway traffic volume, and distance from the highway edge. Direct model transferability was poor (R(2) < 0.17). Locally-calibrated transferred models (R(2) = 0.19-0.40) and the Boston-area model (adjusted-R(2) = 0.26, range: 0.13-0.30) performed similarly to neighborhood-specific models; however, some coefficients of locally calibrated transferred models were uninterpretable. Our results show that transferability of neighborhood-specific LUR models of hourly PNC was limited, but that a general model performed acceptably in multiple areas when calibrated with local data.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25867675 PMCID: PMC4440409 DOI: 10.1021/es5061676
Source DB: PubMed Journal: Environ Sci Technol ISSN: 0013-936X Impact factor: 9.028
Figure 1Annual average PNC predicted by the land use regression models. The left panel shows PNC predictions from the Boston-area model. Annual average PNC predictions from the (a) Somerville, (b) Malden, (c) Chinatown, and (d) Dorchester neighborhood-specific models are shown in the right panels. The Boston Globe stationary monitoring station is marked with a black star.
Summary of Alternative Land Use Regression Models and Their Evaluation Criteria
| model | model building | predictions | observed ln(PNC) | predicted ln(PNC) | test | question addressed by the model |
|---|---|---|---|---|---|---|
| neighborhood | Leave-one-day-out cross-validation | How similar are neighborhood-specific PNC models for exposure assessment? | ||||
| direct transfer | RMSE and correlation | Were site-specific models better than directly transferred models? | ||||
| calibrated transfer | RMSE and correlation | Were site-specific models better than calibrated transferred models? | ||||
| Boston-area (pooled data) | Z | Leave-one-day-out cross-validation | Could a general model including all of the neighborhoods perform well overall? | |||
| Boston-area (applied to individual neighborhoods) | RMSE and correlation | Were the models generalizable? Were site-specific models better than the BA model? | ||||
| Boston-area with neighborhood-specific calibration | | RMSE and correlation | Was the locally calibrated BA model better than the BA model with calibration from all neighborhoods? |
Model building is a procedure of using known ln(PNC), Y, values to estimate the values of regression parameters, β, for a given set of explanatory variables, X. Subscripts refer to a neighborhood used to develop a particular model (k), a new neighborhood where a model is applied (j), and pooled data or variables (p). Predicted ln(PNC) using original calibration (Z) or a new calibration for testing (W) have two subscripts: the neighborhood where the model is applied followed by the neighborhood where the model was developed. For example, the calibrated transfer model-building step predicts ln(PNC) in a new neighborhood (Y) from β fit using data from the new neighborhood and X, explanatory variables selected from model building in neighborhood k but with values from neighborhood j. The predictions for the calibrated transfer model, W, use the same β and X from the model building step.
Predicted ln(PNC) was compared to observed ln(PNC) to test the model performance.
Multivariate Neighborhood-Specific and Boston-Area Land Use Regression Models for ln(PNC)a
| Somerville | Dorchester | Chinatown | Malden | Boston-area | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| model adjusted R2 | 0.42 | 0.35 | 0.23 | 0.31 | 0.26 | |||||
| variable | coeff | SE | coeff | SE | coeff | SE | coeff | SE | coeff | SE |
| (intercept) | 10.677 | 0.011 | 9.844 | 0.014 | 10.209 | 0.014 | 7.012 | 0.029 | 10.618 | 0.004 |
| spatial variables | ||||||||||
| within highway corridor | 0.244 | 0.006 | 0.292 | 0.014 | NA | NA | NA | NA | 0.219 | 0.005 |
| on a major road | 0.208 | 0.005 | 0.132 | 0.003 | NA | NA | 0.102 | 0.006 | 0.108 | 0.003 |
| upwind of I-93 | –0.192 | 0.005 | –0.449 | 0.004 | NA | NA | NA | NA | –0.247 | 0.002 |
| upwind of nearest major road | NA | NA | NA | NA | NA | NA | –0.012 | 0.005 | –0.047 | 0.002 |
| distance upwind of I-93, km | –0.213 | 0.006 | –0.204 | 0.004 | NA | NA | NA | NA | –0.247 | 0.001 |
| distance downwind of I-93, km | –0.464 | 0.007 | –0.626 | 0.005 | –0.373 | 0.014 | NA | NA | –0.314 | 0.001 |
| distance from nearest major road, km | –0.230 | 0.014 | NA | NA | NA | NA | NA | NA | –0.362 | 0.009 |
| distance from Dorchester Ave, km | NA | NA | –0.642 | 0.006 | NA | NA | NA | NA | NA | NA |
| distance from major intersection, km | NA | NA | NA | NA | –0.964 | 0.020 | –0.267 | 0.008 | NA | NA |
| meteorology | ||||||||||
| temperature, °C | –0.037 | 0.000 | –0.012 | 0.000 | –0.008 | 0.000 | –0.007 | 0.000 | –0.0192 | 0.0001 |
| humidity, % | NA | NA | NA | NA | 0.002 | 0.000 | NA | NA | NA | NA |
| wind speed (U), m/s | –0.182 | 0.002 | –0.179 | 0.001 | –0.071 | 0.001 | –0.113 | 0.002 | –0.100 | 0.001 |
| cosine of wind direction relative to I-93 | –0.029 | 0.003 | NA | NA | NA | NA | NA | NA | NA | NA |
| square of cosine of wind direction relative to southeast | 0.820 | 0.007 | NA | NA | NA | NA | 0.804 | 0.008 | NA | NA |
| East | NA | NA | –0.228 | 0.007 | NA | NA | NA | NA | NA | NA |
| N-ENE | NA | NA | 0.438 | 0.007 | NA | NA | NA | NA | NA | NA |
| West | NA | NA | 0.336 | 0.007 | NA | NA | NA | NA | NA | NA |
| sine of wind direction | NA | NA | NA | NA | 0.347 | 0.004 | NA | NA | NA | NA |
| wind direction ±15° from airport and downtown Boston | NA | NA | NA | NA | NA | NA | NA | NA | 0.400 | 0.003 |
| traffic and day of the week | ||||||||||
| low traffic (<7000 vph) | –0.103 | 0.006 | NA | NA | NA | NA | NA | NA | –0.204 | 0.003 |
| congestion (<64 km/h) | 0.181 | 0.005 | NA | NA | NA | NA | NA | NA | 0.022 | 0.003 |
| volume on I-93, 1000 vph | NA | NA | 0.138 | 0.001 | 0.012 | 0.001 | 0.177 | 0.002 | NA | NA |
| Monday | 0.398 | 0.010 | NA | NA | 0.496 | 0.008 | 1.823 | 0.015 | 0.297 | 0.003 |
| Tuesday | 0.569 | 0.008 | NA | NA | 0.373 | 0.006 | 1.645 | 0.014 | 0.521 | 0.003 |
| Wednesday | 0.530 | 0.008 | NA | NA | 0.379 | 0.006 | 1.457 | 0.013 | 0.501 | 0.003 |
| Thursday | 0.579 | 0.008 | NA | NA | 0.773 | 0.006 | 1.109 | 0.013 | 0.359 | 0.003 |
| Friday | 0.239 | 0.011 | NA | NA | 0.793 | 0.006 | 1.107 | 0.015 | 0.559 | 0.004 |
| Saturday | 0.504 | 0.008 | NA | NA | 0.018 | 0.006 | 0.459 | 0.016 | 0.043 | 0.004 |
| Weekday | NA | NA | 0.080 | 0.003 | NA | NA | NA | NA | NA | NA |
Variables in the model are statistically significant (p ≤ 0.001). Temporal variables are input on an hourly basis. NA = not applicable for this model.
Coeff is the coefficient estimate. The full model is the intercept plus the sum of products of the coefficients and their variable values.
SE is the standard error in the coefficient estimate.
These variables are categorical variables. All other variables are linear variables.
The wind categories for Dorchester are defined as Variable or Calm (reference), N-ENE (337.5°–67.5°), East (67.5°–180°), and West (180°–337.5°).
Major intersections are defined as either intersections with average vehicle delay of 20 or more seconds (Chinatown) or intersections adjacent to transit stations (Malden).
The reference for day of week is Sunday when all days are included individually or weekend days when only weekday vs weekend is included.
Leave-One-Day-out Cross-Validation of Neighborhood-Specific and Boston-Area Land Use Regression Models of ln(PNC)
| model | adj-R2 | RMSE | prediction RMSE | |
|---|---|---|---|---|
| Somerville | 39 (43) | 0.42 ± 0.01 | 0.64 ± 0.007 | 0.67 ± 0.21 |
| Dorchester | 31 (35) | 0.35 ± 0.01 | 0.63 ± 0.007 | 0.63 ± 0.21 |
| Chinatown | 45 (46) | 0.23 ± 0.01 | 0.69 ± 0.006 | 0.75 ± 0.26 |
| Malden | 33 (34) | 0.32 ± 0.01 | 0.76 ± 0.009 | 0.86 ± 0.30 |
| Boston-area | 153 (158) | 0.26 ± 0.003 | 0.74 ± 0.002 | 0.73 ± 0.27 |
Monitoring was conducted on ntotal days and cross-validation was possible for nCV days. Leave-one-day-out cross-validation (LOO) was performed by removing 1 day of measurements at a time, so there are nCV cross-validation models, each of which was built on ∼10 000 one-second PNC observations.
Each leave-one-day-out cross-validation result is reported as mean ± standard deviation. The LOO adjusted R2 and RMSE are for the model developed on the training data set with 1 day removed. Prediction RMSE was calculated as the error in hourly predictions for each point in each testing data set that consisted of the day that was removed.
Evaluation of Performance of Neighborhood-Specific and Boston-Area Land Use Regression Models of ln(PNC) when Directly Transferred to Somerville, Dorchester, Chinatown, and Malden
| area applied | |||||
|---|---|---|---|---|---|
| model | statistic | Somerville | Dorchester | Chinatown | Malden |
| Somerville | SLR | 0.15 | 0.19 | 0.15 | |
| R2 | 0.04 | 0.09 | 0.12 | ||
| RMSE | 0.64 | 0.89 | 0.83 | 0.88 | |
| Dorchester | SLR | 0.12 | –0.03 | ||
| R2 | 0.12 | 0.06 | <0.01 | ||
| RMSE | 0.82 | 0.63 | 0.81 | 1.10 | |
| Chinatown | SLR | 0.12 | 0.05 | 0.01 | |
| R2 | 0.07 | 0.01 | <0.01 | ||
| RMSE | 0.83 | 0.82 | 0.69 | 1.01 | |
| Malden | SLR | 0.09 | |||
| R2 | 0.09 | 0.01 | 0.10 | ||
| RMSE | 1.30 | 1.17 | 1.16 | 0.76 | |
| Boston-area | SLR | 0.18 | 0.12 | ||
| R2 | 0.13 | 0.16 | |||
| RMSE | 0.71 | 0.70 | 0.74 | 0.84 | |
Each row represents a neighborhood-specific or Boston-area (BA) model.
Each column represents the neighborhood where the measurements were predicted.
The reported statistics are SLR = simple linear regression between predictions and measurements formatted as (slope)x + (intercept), R2 = R2 from SLR, RMSE = root-mean-square error between measurements and predictions. Values are bold for R2 > 0.2 and slope >0.2.
Cells on the diagonal represent the performance of models when applied to the neighborhood where they were developed.
Evaluation of Performance of Neighborhood-Specific and Boston-Area Land Use Regression Models of ln(PNC) when Locally Calibrated in Somerville, Dorchester, Chinatown, And Malden
| area applied | |||||
|---|---|---|---|---|---|
| model | statistic | Somerville | Dorchester | Chinatown | Malden |
| Somerville | R2 | ||||
| RMSE | 0.64 | 0.65 | 0.70 | 0.77 | |
| Dorchester | R2 | 0.19 | |||
| RMSE | 0.66 | 0.63 | 0.71 | 0.81 | |
| Chinatown | R2 | ||||
| RMSE | 0.70 | 0.68 | 0.69 | 0.79 | |
| Malden | R2 | ||||
| RMSE | 0.65 | 0.67 | 0.70 | 0.76 | |
| Boston-area | R2 | ||||
| RMSE | 0.65 | 0.67 | 0.71 | 0.78 | |
Each row represents a neighborhood-specific or Boston-area (BA) model.
Each column represents the neighborhood where the measurements were predicted.
The reported statistics are R2 = R2 and slope from the simple linear regression between predictions and measurements, RMSE = root-mean-square error between measurements and predictions. Note that the slope is equal to the R2 because for a simple linear regression conducted on the same unit space, the relationships between a random variable (containing both systematic and random components) and prediction (containing the variance of the systematic components) are expected to equalize the slope and the square of a correlation coefficient, which is R2. Values are bold for R2 > 0.2.
Cells on the diagonal represent the performance of models when applied to the neighborhood where they were developed.
Figure 2Predicted (PRED) vs measured (MEAS) ln(PNC) (particles/cm3) for all four areas using the Boston-area model calibrated using data from all neighborhoods (transfer PRED, top) and recalibrated with only neighborhood-specific data (Recal PRED, bottom). Shaded contours enclose 25, 50, 75, and 96% of the data.