| Literature DB >> 35409987 |
Mengjie Wang1,2,3, Yanjun Wang1,2,3, Fei Teng1,2,3, Shaochun Li1,2,3, Yunhao Lin1,2,3, Hengfan Cai1,2,3.
Abstract
Rapid economic and social development has caused serious atmospheric environmental problems. The temporal and spatial distribution characteristics of PM2.5 concentrations have become an important research topic for sustainable social development monitoring. Based on NPP-VIIRS nighttime light images, meteorological data, and SRTM DEM data, this article builds a PM2.5 concentration estimation model for the Chang-Zhu-Tan urban agglomeration. First, the partial least squares method is used to calculate the nighttime light radiance, meteorological elements (temperature, relative humidity, and wind speed), and topographic elements (elevation, slope, and topographic undulation) for correlation analysis. Second, we construct seasonal and annual PM2.5 concentration estimation models, including multiple linear regression, support random forest, vector regression, Gaussian process regression, etc., with different factor sets. Finally, the accuracy of the PM2.5 concentration estimation model that results in the Chang-Zhu-Tan urban agglomeration is analyzed, and the spatial distribution of the PM2.5 concentration is inverted. The results show that the PM2.5 concentration correlation of meteorological elements is the strongest, and the topographic elements are the weakest. In terms of seasonal estimation, the spring estimation results of multiple linear regression and machine learning estimation models are the worst, the winter estimation results of multiple linear regression estimation models are the best, and the annual estimation results of machine learning estimation models are the best. At the same time, the study found that there is a significant difference in the temporal and spatial distribution of PM2.5 concentrations. The methods in this article overcome the high cost and spatial resolution limitations of traditional large-scale PM2.5 concentration monitoring, to a certain extent, and can provide a reference for the study of PM2.5 concentration estimation and prediction based on satellite remote sensing technology.Entities:
Keywords: PM2.5 concentration estimation; machine learning; multisource data; partial least squares
Year: 2022 PMID: 35409987 PMCID: PMC8998965 DOI: 10.3390/ijerph19074306
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Location of the Chang-Zhu-Tan urban agglomeration.
Figure 2Datasets used in this study. (a) SRTM DEM data and spatial distribution of the monitoring stations; (b) NPP-VIIRS nighttime light (NTL) image.
Important parameters of the various PM2.5 concentration estimation models based on machine learning.
| Model Parameters | Spring | Summer | Autumn | Winter | Annual |
|---|---|---|---|---|---|
| Model II smallest leaf | 12 | 4 | 12 | 12 | 12 |
| Model III kernel function | Linear | Linear | Linear | Linear | Quadratic |
| Model IV kernel function | Exponential | Exponential | Matern 5/2 | Exponential | Matern 5/2 |
| Model parameters | Spring | Summer | Autumn | Winter | Annual |
VIP scores of different factors for the PM2.5 concentration estimation.
| Factor | Spring | Summer | Autumn | Winter | Annual |
|---|---|---|---|---|---|
|
| 1.138 | 0.464 | 0.366 | 0.302 | 0.249 |
|
| 1.381 | 1.507 | 1.658 | 1.465 | 1.748 |
|
| 1.157 | 1.449 | 0.530 | 0.662 | 0.178 |
|
| 0.508 | 0.985 | 0.979 | 0.986 | 0.719 |
|
| 0.943 | 0.526 | 0.742 | 1.384 | 1.907 |
|
| 0.414 | 0.257 | 0.322 | 0.442 | 0.164 |
|
| 0.723 | 0.249 | 0.175 | 0.283 | 0.091 |
R2 values of the PM2.5 concentration estimation model in the Chang-Zhu-Tan urban agglomeration.
| Model | Factor Set | Spring | Summer | Autumn | Winter | Annual |
|---|---|---|---|---|---|---|
| Factor set A | 0.36 | 0.81 | 0.76 | 0.89 | 0.82 | |
| Model I | Factor set B | 0.31 | 0.79 | 0.75 | 0.88 | 0.82 |
| Factor set C | 0.25 | 0.78 | 0.75 | 0.85 | 0.82 | |
| Factor set A | 0.17 | 0.65 | 0.72 | 0.79 | 0.90 | |
| Model II | Factor set B | 0.16 | 0.66 | 0.72 | 0.80 | 0.92 |
| Factor set C | 0.07 | 0.71 | 0.67 | 0.80 | 0.91 | |
| Factor set A | 0.23 | 0.69 | 0.69 | 0.77 | 0.88 | |
| Model III | Factor set B | 0.20 | 0.55 | 0.66 | 0.75 | 0.90 |
| Factor set C | 0.13 | 0.67 | 0.69 | 0.73 | 0.90 | |
| Factor set A | 0.08 | 0.64 | 0.54 | 0.73 | 0.89 | |
| Model IV | Factor set B | 0.07 | 0.63 | 0.64 | 0.72 | 0.90 |
| Factor set C | 0.06 | 0.67 | 0.63 | 0.72 | 0.92 |
Root mean square errors of the PM2.5 concentration estimation model in the Chang-Zhu-Tan urban agglomeration.
| Model | Factor Set | Spring | Summer | Autumn | Winter | Annual |
|---|---|---|---|---|---|---|
| Factor set A | 4.48 | 3.74 | 6.06 | 7.75 | 11.80 | |
| Model I | Factor set B | 4.64 | 3.88 | 6.11 | 8.11 | 11.85 |
| Factor set C | 4.85 | 3.94 | 6.15 | 8.91 | 11.90 | |
| Factor set A | 5.14 | 5.12 | 6.79 | 11.06 | 8.65 | |
| Model II | Factor set B | 5.19 | 5.49 | 6.72 | 10.40 | 7.73 |
| Factor set C | 5.50 | 4.64 | 7.10 | 10.69 | 8.25 | |
| Factor set A | 4.94 | 4.76 | 7.19 | 11.58 | 9.85 | |
| Model III | Factor set B | 5.05 | 6.30 | 7.30 | 11.77 | 8.73 |
| Factor set C | 5.34 | 4.98 | 6.90 | 12.45 | 8.75 | |
| Factor set A | 5.40 | 5.14 | 8.71 | 12.68 | 9.22 | |
| Model IV | Factor set B | 5.44 | 5.69 | 7.57 | 12.30 | 8.67 |
| Factor set C | 5.54 | 4.92 | 7.54 | 12.67 | 8.14 |
Figure 3Scatter plots of estimated and actual PM2.5 concentrations.
Figure 4Inversion of seasonal PM2.5 concentration in 2018 in the Chang Zhu Tan urban agglomeration. AC means average PM2.5 concentration.
Figure 5Scatter plots of estimated and actual PM2.5 concentrations in the four seasons.
Figure 6Comparison of the estimated and measured PM2.5 concentrations, given the sample set sequence. The blue line represents the Level 1 standard, and the orange line represents the Level 2 standard. The Level 1 standard refers to the 24-h average PM2.5 concentration lower than 35 µg·m−3. The Level 2 standard refers to the 24-h average PM2.5 concentration lower than 75 µg·m−3.
Figure 7Error distribution of the estimated PM2.5 concentration, given the sample set sequence. RE refers to the real error of each station in the four seasons. MRE refers to the mean real error of 47 stations in the four seasons.
Figure 8Spatial distribution of low and high-error stations.