| Literature DB >> 35627828 |
Hongbin Dai1, Guangqiu Huang1, Jingjing Wang2, Huibin Zeng1, Fangyu Zhou3.
Abstract
Fine particulate matter (PM2.5) has a continuing impact on the environment, climate change and human health. In order to improve the accuracy of PM2.5 estimation and obtain a continuous spatial distribution of PM2.5 concentration, this paper proposes a LUR-GBM model based on land-use regression (LUR), the Kriging method and LightGBM (light gradient boosting machine). Firstly, this study modelled the spatial distribution of PM2.5 in the Chinese region by obtaining PM2.5 concentration data from monitoring stations in the Chinese study region and established a PM2.5 mass concentration estimation method based on the LUR-GBM model by combining data on land use type, meteorology, topography, vegetation index, population density, traffic and pollution sources. Secondly, the performance of the LUR-GBM model was evaluated by a ten-fold cross-validation method based on samples, stations and time. Finally, the results of the model proposed in this paper are compared with those of the back propagation neural network (BPNN), deep neural network (DNN), random forest (RF), XGBoost and LightGBM models. The results show that the prediction accuracy of the LUR-GBM model is better than other models, with the R2 of the model reaching 0.964 (spring), 0.91 (summer), 0.967 (autumn), 0.98 (winter) and 0.976 (average for 2016-2021) for each season and annual average, respectively. It can be seen that the LUR-GBM model has good applicability in simulating the spatial distribution of PM2.5 concentrations in China. The spatial distribution of PM2.5 concentrations in the Chinese region shows a clear characteristic of high in the east and low in the west, and the spatial distribution is strongly influenced by topographical factors. The seasonal variation in mean concentration values is marked by low summer and high winter values. The results of this study can provide a scientific basis for the prevention and control of regional PM2.5 pollution in China and can also provide new ideas for the acquisition of data on the spatial distribution of PM2.5 concentrations within cities.Entities:
Keywords: LightGBM; PM2.5; land-use regression; remote sensing retrieval; spatial and temporal characteristics
Mesh:
Substances:
Year: 2022 PMID: 35627828 PMCID: PMC9141263 DOI: 10.3390/ijerph19106292
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 4.614
Figure 1Distribution of PM2.5 ground monitoring stations in China.
Classification and description of independent variables.
| Variable Type | Variable Name | Unit | Variable Description |
|---|---|---|---|
| Land type | cro | % | Cropland |
| for | % | Forest | |
| gra | % | Grass | |
| wat | % | Water | |
| ind | % | Industrial and residential | |
| sem | % | Seminatural | |
| Terrain and landforms | altitude | m | Altitude |
| Population | pop | people | Population |
| Road traffic | hig | m | Highway |
| maj | m | Major road | |
| hm | m | Sum of highway and Major road | |
| min | m | Minor road | |
| Meteorological elements | GST | °C | 0 cm Surface temperature |
| SSD | h | Sunshine hours | |
| PRS | hPa | Pressure | |
| TEM | °C | Temperature | |
| RHU | % | Relative humidity | |
| PRE | mm | Precipitation | |
| WIN | m/s | Wind speed |
Figure 2LUR-GBM model structure diagram.
Results of bivariate correlation analysis between PM2.5 concentration and impact factors.
| Independent Variable | Pearson Correlation |
| Independent Variable | Pearson Correlation |
|
|---|---|---|---|---|---|
| cro | 0.343 | 0.003 | pop | 0.310 | 0.021 |
| wat | −0.059 | 0.002 | altitude | −0.559 | 0.000 |
| for | −0.379 | 0.000 | GST | 0.178 | 0.000 |
| gra | −0.299 | 0.000 | SSD | 0.018 | 0.000 |
| ind | 0.322 | 0.000 | PRS | 0.302 | 0.000 |
| sem | −0.134 | 0.000 | TEM | 0.523 | 0.000 |
| hig | −0.084 | 0.000 | RHU | −0.215 | 0.001 |
| maj | 0.187 | 0.002 | PRE | −0.346 | 0.004 |
| hm | 0.177 | 0.000 | WIN | 0.415 | 0.000 |
| min | 0.125 | 0.002 |
Comparison of results of various models.
| Based on Samples | Based on Sites | Based on Time | |||||||
|---|---|---|---|---|---|---|---|---|---|
| R2 | RMSE | MAE | R2 | RMSE | MAE | R2 | RMSE | MAE | |
| BPNN | 0.76 | 11.27 | 8.35 | 0.65 | 11.26 | 9.34 | 0.56 | 13.28 | 9.69 |
| DNN | 0.84 | 10.33 | 8.05 | 0.78 | 11.09 | 8.86 | 0.77 | 10.43 | 7.67 |
| RF | 0.86 | 9.19 | 6.08 | 0.81 | 11.03 | 7.46 | 0.79 | 11.27 | 8.03 |
| XGBoost | 0.88 | 7.34 | 4.79 | 0.83 | 10.54 | 6.78 | 0.81 | 9.86 | 6.93 |
| LightGBM | 0.91 | 6.56 | 4.56 | 0.85 | 8.32 | 5.76 | 0.83 | 7.86 | 5.49 |
| LUR-GBM | 0.98 | 6.43 | 4.17 | 0.91 | 7.46 | 5.01 | 0.89 | 7.07 | 4.95 |
Figure 3Six-model scatter point density map.
Figure 4Scatter density plots of annual mean concentrations for the six models.
Figure 5Scatter density map based on LUR-GBM model for all seasons in 2021.
Figure 6PM2.5 concentration simulation 2016–2021 scatter density map.
Figure 7Spatial distribution of quarterly inversions based on the LUR-GBM model for 2021.
Figure 8Simulated annual average distribution of PM2.5 concentrations based on the LUR-GBM model.
Figure 9Scatter plot of model results for the 10 cities in the heavily polluted areas based on the LUR-GBM model.