| Literature DB >> 34322279 |
Tianyu Zhang1, Guannan Geng2, Yang Liu3, Howard H Chang1.
Abstract
Bayesian additive regression tree (BART) is a recent statistical method that combines ensemble learning and nonparametric regression. BART is constructed under a probabilistic framework that also allows for model-based prediction uncertainty quantification. We evaluated the application of BART in predicting daily concentrations of four fine particulate matter (PM2.5) components (elemental carbon, organic carbon, nitrate, and sulfate) in California during the period 2005 to 2014. We demonstrate in this paper how BART can be tuned to optimize prediction performance and how to evaluate variable importance. Our BART models included, as predictors, a large suite of land-use variables, meteorological conditions, satellite-derived aerosol optical depth parameters, and simulations from a chemical transport model. In cross-validation experiments, BART demonstrated good out-of-sample prediction performance at monitoring locations (R2 from 0.62 to 0.73). More importantly, prediction intervals associated with concentration estimates from BART showed good coverage probability at locations with and without monitoring data. In our case study, major PM2.5 components could be estimated with good accuracy, especially when collocated PM2.5 total mass observations were available. In conclusion, BART is an attractive approach for modeling ambient air pollution levels, especially for its ability to provide uncertainty in estimates that may be useful for subsequent health impact and health effect analyses.Entities:
Keywords: Bayesian model; Community Multiscale Air Quality (CMAQ); aerosol optical depth; machine learning; particulate matter; regression trees
Year: 2020 PMID: 34322279 PMCID: PMC8315111 DOI: 10.3390/atmos11111233
Source DB: PubMed Journal: Atmosphere (Basel) ISSN: 2073-4433 Impact factor: 2.686
Tenfold ordinary, spatial, and spatial cluster cross-validation (CV) results using Bayesian additive regression trees (BARTs) for predicting fine particulate matter (PM2.5) components elemental carbon (EC), organic carbon (OC), sulfate (SO4), and nitrate (NO3), with and without using PM2.5 total mass as a predictor. All models include meteorology, land-use variables, Community Multiscale Air Quality (CMAQ) simulations, and fractional aerosol optical depth (AOD) with variable selection implemented.
| Without PM2.5 | With PM2.5 | ||||||
|---|---|---|---|---|---|---|---|
| RMSE | Cvg95 | RMSE | Cvg95 | ||||
| EC | 0.67 | 0.42 | 0.95 | 0.78 | 0.35 | 0.95 | |
| OC | 0.62 | 1.84 | 0.96 | 0.84 | 1.18 | 0.95 | |
| SO4 | 0.73 | 0.56 | 0.95 | 0.80 | 0.49 | 0.96 | |
| NO3 | 0.65 | 1.53 | 0.95 | 0.80 | 1.17 | 0.95 | |
| EC | 0.54 | 0.50 | 0.93 | 0.63 | 0.45 | 0.93 | |
| OC | 0.44 | 2.26 | 0.93 | 0.74 | 1.54 | 0.92 | |
| SO4 | 0.70 | 0.59 | 0.95 | 0.77 | 0.52 | 0.95 | |
| NO3 | 0.59 | 1.66 | 0.95 | 0.71 | 1.40 | 0.93 | |
| EC | 0.51 | 0.52 | 0.94 | 0.64 | 0.44 | 0.93 | |
| OC | 0.27 | 2.66 | 0.91 | 0.69 | 1.66 | 0.91 | |
| SO4 | 0.61 | 0.67 | 0.93 | 0.70 | 0.58 | 0.93 | |
| NO3 | 0.50 | 1.83 | 0.93 | 0.72 | 1.38 | 0.93 | |
RMSE: root-mean-square error; Cvg95: empirical coverage probability of the 95% prediction intervals.
Posterior mean and 95% posterior interval of BART variance parameters from models with and without PM2.5 as a predictor. Parameter describes the variability in terminal nodes across trees and σ2 describes the residual variability not explained by the ensemble trees.
| Without PM2.5 | With PM2.5 | |||
|---|---|---|---|---|
| EC | 0.19 (0.15, 0.23) | 0.14 (0.13, 0.14) | 0.19 (0.17, 0.23) | 0.09 (0.09, 0.09) |
| OC | 8.88 (8.07, 10.5) | 2.21 (2.17, 2.25) | 5.01 (4.50, 5.54) | 0.91 (0.90, 0.93) |
| SO4 | 0.51 (0.43, 0.62) | 0.23 (0.22, 0.24) | 0.35 (0.39, 0.46) | 0.17 (0.16, 0.17) |
| NO3 | 7.07 (6.26, 8.21) | 1.08 (1.06, 1.11) | 7.32 (6.64, 8.75) | 0.71 (0.70, 0.72) |
Figure 1.R2 of spatial cross-validation (CV) results for predicting PM2.5 components elemental carbon (EC), organic carbon (OC), sulfate (SO4), and nitrate (NO3), comparing the inclusion of only Multi-Angle Imaging Spectroradiometer (MISR) fraction AOD, only CMAQ simulations, or both AOD and CMAQ. All models contain meteorology and land-use variables.
Figure 2.BART variable importance (proportion in trees) of individual AOD fractional components for predicting PM2.5 components elemental carbon (EC), organic carbon (OC), sulfate (SO4), and nitrate (NO3), under different predictor sets (with MISR fractional AOD, with AOD and CMAQ simulations, or with AOD and PM2.5 total mass). All models include meteorology and land-use predictors.
Figure 3.Estimated 2010 annual average of elemental carbon (EC), organic carbon (OC), sulfate, and nitrate in California. The prediction standard errors are for the annual averages. Concentration is given in μg/m3.