Gongbo Chen1, Shanshan Li1, Luke D Knibbs2, N A S Hamm3, Wei Cao4, Tiantian Li5, Jianping Guo6, Hongyan Ren4, Michael J Abramson1, Yuming Guo7. 1. Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia. 2. Department of Epidemiology and Biostatistics, School of Public Health, The University of Queensland, Brisbane, Australia. 3. Geospatial Research Group and School of Geographical Sciences, Faculty of Science and Engineering, University of Nottingham, Ningbo, China. 4. Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China. 5. National Institute of Environmental Health Sciences, Chinese Center for Disease Control and Prevention, Beijing, China. 6. State Key Laboratory of Severe Weather, Chinese Academy of Meteorological Sciences, Beijing, China. 7. Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia. Electronic address: yuming.guo@monash.edu.
Abstract
BACKGROUND: Machine learning algorithms have very high predictive ability. However, no study has used machine learning to estimate historical concentrations of PM2.5 (particulate matter with aerodynamic diameter ≤ 2.5 μm) at daily time scale in China at a national level. OBJECTIVES: To estimate daily concentrations of PM2.5 across China during 2005-2016. METHODS: Daily ground-level PM2.5 data were obtained from 1479 stations across China during 2014-2016. Data on aerosol optical depth (AOD), meteorological conditions and other predictors were downloaded. A random forests model (non-parametric machine learning algorithms) and two traditional regression models were developed to estimate ground-level PM2.5 concentrations. The best-fit model was then utilized to estimate the daily concentrations of PM2.5 across China with a resolution of 0.1° (≈10 km) during 2005-2016. RESULTS: The daily random forests model showed much higher predictive accuracy than the other two traditional regression models, explaining the majority of spatial variability in daily PM2.5 [10-fold cross-validation (CV) R2 = 83%, root mean squared prediction error (RMSE) = 28.1 μg/m3]. At the monthly and annual time-scale, the explained variability of average PM2.5 increased up to 86% (RMSE = 10.7 μg/m3 and 6.9 μg/m3, respectively). CONCLUSIONS: Taking advantage of a novel application of modeling framework and the most recent ground-level PM2.5 observations, the machine learning method showed higher predictive ability than previous studies. CAPSULE: Random forests approach can be used to estimate historical exposure to PM2.5 in China with high accuracy.
BACKGROUND: Machine learning algorithms have very high predictive ability. However, no study has used machine learning to estimate historical concentrations of PM2.5 (particulate matter with aerodynamic diameter ≤ 2.5 μm) at daily time scale in China at a national level. OBJECTIVES: To estimate daily concentrations of PM2.5 across China during 2005-2016. METHODS: Daily ground-level PM2.5 data were obtained from 1479 stations across China during 2014-2016. Data on aerosol optical depth (AOD), meteorological conditions and other predictors were downloaded. A random forests model (non-parametric machine learning algorithms) and two traditional regression models were developed to estimate ground-level PM2.5 concentrations. The best-fit model was then utilized to estimate the daily concentrations of PM2.5 across China with a resolution of 0.1° (≈10 km) during 2005-2016. RESULTS: The daily random forests model showed much higher predictive accuracy than the other two traditional regression models, explaining the majority of spatial variability in daily PM2.5 [10-fold cross-validation (CV) R2 = 83%, root mean squared prediction error (RMSE) = 28.1 μg/m3]. At the monthly and annual time-scale, the explained variability of average PM2.5 increased up to 86% (RMSE = 10.7 μg/m3 and 6.9 μg/m3, respectively). CONCLUSIONS: Taking advantage of a novel application of modeling framework and the most recent ground-level PM2.5 observations, the machine learning method showed higher predictive ability than previous studies. CAPSULE: Random forests approach can be used to estimate historical exposure to PM2.5 in China with high accuracy.
Authors: Rong Guo; Ying Qi; Bu Zhao; Ziyu Pei; Fei Wen; Shun Wu; Qiang Zhang Journal: Int J Environ Res Public Health Date: 2022-06-29 Impact factor: 4.614
Authors: Mona Elbarbary; Trenton Honda; Geoffrey Morgan; Yuming Guo; Yanfei Guo; Paul Kowal; Joel Negin Journal: Int J Environ Res Public Health Date: 2020-05-05 Impact factor: 3.390