Lianfa Li1, Mariam Girguis2, Frederick Lurmann3, Nathan Pavlovic3, Crystal McClure3, Meredith Franklin2, Jun Wu4, Luke D Oman5, Carrie Breton2, Frank Gilliland2, Rima Habre6. 1. Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA; State Key Laboratory of Resources and Environmental Information System, Institute of Geographical Sciences and Natural Resources, Chinese Academy of Sciences, Beijing, China. Electronic address: lianfali@usc.edu. 2. Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA. 3. Sonoma Technology, Inc., Petaluma, CA, USA. 4. Program in Public Health, Susan and Henry Samueli College of Health Sciences, University of California, Irvine, CA, USA. 5. Goddard Space Flight Center, National Aeronautics and Space Administration, Greenbelt, MD, USA. 6. Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA. Electronic address: habre@usc.edu.
Abstract
INTRODUCTION: Estimating PM2.5 concentrations and their prediction uncertainties at a high spatiotemporal resolution is important for air pollution health effect studies. This is particularly challenging for California, which has high variability in natural (e.g, wildfires, dust) and anthropogenic emissions, meteorology, topography (e.g. desert surfaces, mountains, snow cover) and land use. METHODS: Using ensemble-based deep learning with big data fused from multiple sources we developed a PM2.5 prediction model with uncertainty estimates at a high spatial (1 km × 1 km) and temporal (weekly) resolution for a 10-year time span (2008-2017). We leveraged autoencoder-based full residual deep networks to model complex nonlinear interrelationships among PM2.5 emission, transport and dispersion factors and other influential features. These included remote sensing data (MAIAC aerosol optical depth (AOD), normalized difference vegetation index, impervious surface), MERRA-2 GMI Replay Simulation (M2GMI) output, wildfire smoke plume dispersion, meteorology, land cover, traffic, elevation, and spatiotemporal trends (geo-coordinates, temporal basis functions, time index). As one of the primary predictors of interest with substantial missing data in California related to bright surfaces, cloud cover and other known interferences, missing MAIAC AOD observations were imputed and adjusted for relative humidity and vertical distribution. Wildfire smoke contribution to PM2.5 was also calculated through HYSPLIT dispersion modeling of smoke emissions derived from MODIS fire radiative power using the Fire Energetics and Emissions Research version 1.0 model. RESULTS: Ensemble deep learning to predict PM2.5 achieved an overall mean training RMSE of 1.54 μg/m3 (R2: 0.94) and test RMSE of 2.29 μg/m3 (R2: 0.87). The top predictors included M2GMI carbon monoxide mixing ratio in the bottom layer, temporal basis functions, spatial location, air temperature, MAIAC AOD, and PM2.5 sea salt mass concentration. In an independent test using three long-term AQS sites and one short-term non-AQS site, our model achieved a high correlation (>0.8) and a low RMSE (<3 μg/m3). Statewide predictions indicated that our model can capture the spatial distribution and temporal peaks in wildfire-related PM2.5. The coefficient of variation indicated highest uncertainty over deciduous and mixed forests and open water land covers. CONCLUSION: Our method can be generalized to other regions, including those having a mix of major urban areas, deserts, intensive smoke events, snow cover and complex terrains, where PM2.5 has previously been challenging to predict. Prediction uncertainty estimates can also inform further model development and measurement error evaluations in exposure and health studies.
INTRODUCTION: Estimating PM2.5 concentrations and their prediction uncertainties at a high spatiotemporal resolution is important for air pollution health effect studies. This is particularly challenging for California, which has high variability in natural (e.g, wildfires, dust) and anthropogenic emissions, meteorology, topography (e.g. desert surfaces, mountains, snow cover) and land use. METHODS: Using ensemble-based deep learning with big data fused from multiple sources we developed a PM2.5 prediction model with uncertainty estimates at a high spatial (1 km × 1 km) and temporal (weekly) resolution for a 10-year time span (2008-2017). We leveraged autoencoder-based full residual deep networks to model complex nonlinear interrelationships among PM2.5 emission, transport and dispersion factors and other influential features. These included remote sensing data (MAIAC aerosol optical depth (AOD), normalized difference vegetation index, impervious surface), MERRA-2 GMI Replay Simulation (M2GMI) output, wildfire smoke plume dispersion, meteorology, land cover, traffic, elevation, and spatiotemporal trends (geo-coordinates, temporal basis functions, time index). As one of the primary predictors of interest with substantial missing data in California related to bright surfaces, cloud cover and other known interferences, missing MAIAC AOD observations were imputed and adjusted for relative humidity and vertical distribution. Wildfire smoke contribution to PM2.5 was also calculated through HYSPLIT dispersion modeling of smoke emissions derived from MODIS fire radiative power using the Fire Energetics and Emissions Research version 1.0 model. RESULTS: Ensemble deep learning to predict PM2.5 achieved an overall mean training RMSE of 1.54 μg/m3 (R2: 0.94) and test RMSE of 2.29 μg/m3 (R2: 0.87). The top predictors included M2GMIcarbon monoxide mixing ratio in the bottom layer, temporal basis functions, spatial location, air temperature, MAIAC AOD, and PM2.5 sea salt mass concentration. In an independent test using three long-term AQS sites and one short-term non-AQS site, our model achieved a high correlation (>0.8) and a low RMSE (<3 μg/m3). Statewide predictions indicated that our model can capture the spatial distribution and temporal peaks in wildfire-related PM2.5. The coefficient of variation indicated highest uncertainty over deciduous and mixed forests and open water land covers. CONCLUSION: Our method can be generalized to other regions, including those having a mix of major urban areas, deserts, intensive smoke events, snow cover and complex terrains, where PM2.5 has previously been challenging to predict. Prediction uncertainty estimates can also inform further model development and measurement error evaluations in exposure and health studies.
Authors: Xuefei Hu; Jessica H Belle; Xia Meng; Avani Wildani; Lance A Waller; Matthew J Strickland; Yang Liu Journal: Environ Sci Technol Date: 2017-06-01 Impact factor: 9.028
Authors: Farimah Shirmohammadi; Sina Hasheminassab; Arian Saffari; James J Schauer; Ralph J Delfino; Constantinos Sioutas Journal: Sci Total Environ Date: 2015-11-11 Impact factor: 7.963
Authors: Scott Fruin; Robert Urman; Fred Lurmann; Rob McConnell; James Gauderman; Ed Rappaport; Meredith Franklin; Frank D Gilliland; Martin Shafer; Patrick Gorski; Ed Avol Journal: Atmos Environ (1994) Date: 2014-02-01 Impact factor: 4.798
Authors: Ryan W Allen; Sara D Adar; Ed Avol; Martin Cohen; Cynthia L Curl; Timothy Larson; L-J Sally Liu; Lianne Sheppard; Joel D Kaufman Journal: Environ Health Perspect Date: 2012-02-22 Impact factor: 9.031
Authors: Colleen E Reid; Michael Brauer; Fay H Johnston; Michael Jerrett; John R Balmes; Catherine T Elliott Journal: Environ Health Perspect Date: 2016-04-15 Impact factor: 9.031