| Literature DB >> 29673227 |
Lu Bai1, Jianzhou Wang2, Xuejiao Ma3, Haiyan Lu4.
Abstract
Air pollution is defined as a phenomenon harmful to the ecological system and the normal conditions of human existence and development when some substances in the atmosphere exceed a certain concentration. In the face of increasingly serious environmental pollution problems, scholars have conducted a significant quantity of related research, and in those studies, the forecasting of air pollution has been of paramount importance. As a precaution, the air pollution forecast is the basis for taking effective pollution control measures, and accurate forecasting of air pollution has become an important task. Extensive research indicates that the methods of air pollution forecasting can be broadly divided into three classical categories: statistical forecasting methods, artificial intelligence methods, and numerical forecasting methods. More recently, some hybrid models have been proposed, which can improve the forecast accuracy. To provide a clear perspective on air pollution forecasting, this study reviews the theory and application of those forecasting models. In addition, based on a comparison of different forecasting methods, the advantages and disadvantages of some methods of forecasting are also provided. This study aims to provide an overview of air pollution forecasting methods for easy access and reference by researchers, which will be helpful in further studies.Entities:
Keywords: air pollution forecast; artificial intelligence methods; forecasting models; hybrid models; numerical forecast methods; statistical methods
Mesh:
Year: 2018 PMID: 29673227 PMCID: PMC5923822 DOI: 10.3390/ijerph15040780
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1The construction of this paper.
List of assessment methods.
| List of Assessment Methods | ||
|---|---|---|
| Types | Main Equations | Meaning of Variables |
| Market value method |
| |
| Opportunity cost method |
| |
| Engineering cost method |
| |
Figure 2The flowchart of the assessment methods.
Figure 3The current status of air pollution research.
Nomenclature of methods.
| Abbreviation | Explanation | Abbreviation | Explanation |
|---|---|---|---|
| ADMS | Atmospheric Dispersion Modelling System | GM | Gray model |
| AI | Artificial intelligence | GCA | Gray correlation analysis |
| ANN | Artificial neural network | GRNN | General regression neural networks |
| ANF | Adaptive neuro-fuzzy | HF | Hybrid forecast |
| ARIMA | Autoregressive integrated moving average | HS | Hybrid system |
| ANFIS | Adaptive neural network fuzzy inference system | ICEEMD | Improved complementary ensemble empirical mode decomposition |
| BPNN | Back-propagation neutral networks | KF | Kalman filter |
| CAMx | Comprehensive Air Quality Model with Extensions | MLP | Multi-layer Perceptron |
| CALPUFF | California Puff model | MLR | Multiple-linear regress |
| CALMET | California Meteorological Model | MM5 | Mesoscale Model 5 |
| CS | Cuckoo search | PCR | Principal component regress |
| CMAQ | Community Multi-scale Air Quality | PCA | Principal component analysis |
| CEEMD | Complete ensemble empirical mode decomposition | PP | Projection pursuit model |
| CERC | Cambridge Environment Research Corporation | RM | Rolling mechanism |
| DEA | Data Envelopment Analysis | SVM | Support vector machine |
| EMD | Empirical mode decomposition | SVR | Support vector regression |
| EEMD | Ensemble empirical model decomposition | SWT | Stationary wavelet transform |
| FCM | Fuzzy c–Means algorithm | SSA | Singular spectrum analysis |
| FTS | Fuzzy time series | WOA | Whale optimization algorithm |
| FFNN | Feed-forward neural networks | WRF | Weather Research and Forecasting Model |
| FFMLP | Feed forward multi-layer perception | WRF-Chem | Weather Research and Forecasting Model coupled with Chemistry |
| GA | Genetic algorithm |
The definitions and formulas of indexes involved in this paper.
| Metric | Definition | Equation |
|---|---|---|
| MAE | The mean absolute error of |
|
| MSE | The mean squared error of |
|
| RMSE | The square root of average of the error squares |
|
| NMSE | The normalized average of the squares of the errors |
|
| MAPE | The average of |
|
| IA | The index of agreement of forecasting results |
|
| R | The correlation coefficient |
|
| AE | The absolute error of forecasting results |
|
| FB | The fractional bias of N forecasting results |
|
| IOA | The index of agreement |
|
Transformation of the nonlinear regression and linear regression.
| Types | Nonlinear Function | Do Transformation | Linear Function |
|---|---|---|---|
| Hyperbolic function |
|
|
|
| Power function |
|
|
|
| Exponential function | |||
| Logarithmic function |
|
|
|
| S curve type |
|
|
|
| Parabolic type |
|
|
|
Forecast accuracy of possible SARIMA model.
| SARIMA | MAPE | MAE | MSE | RMSE |
|---|---|---|---|---|
|
| ||||
| (0,1,1)(0,1,1)12 | 11.08 | 5.39 | 37.76 | 6.14 |
| (0,1,1)(1,1,0)12 | 11.08 | 5.77 | 44.50 | 6.67 |
|
| ||||
| (1,1,0)(1,1,0)12 | 15.28 | 7.06 | 76.05 | 8.72 |
| (1,1,0)(0,1,1)12 | 9.99 | 4.12 | 21.90 | 4.68 |
| (0,1,1)(1,1,0)12 | 19.13 | 8.87 | 120.69 | 10.99 |
| (0,1,1)(0,1,1)12 | 9.77 | 4.22 | 23.82 | 4.88 |
|
| ||||
| (1,1,0)(0,1,1)12 | 12.20 | 5.42 | 49.13 | 7.10 |
| (0,1,1)(2,1,0)12 | 11.32 | 5.10 | 38.62 | 6.21 |
| (0,1,1)(0,1,1)12 | 10.44 | 4.84 | 33.49 | 5.79 |
Division of values.
| Grade | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
| Range of | (0 ≤ | (0.2 ≤ | (0.4 ≤ | (0.6 ≤ | (0.8 ≤ |
PP regression forecast result.
| Actual Type | Forecast Type | Absolute Error | Relative Error |
|---|---|---|---|
| 2 | 2.399 | 0.399 | 19.9% |
| 3 | 5.632 | 2.632 | 87.7% |
| 4 | 4.298 | 0.298 | 7.5% |
| 5 | 5.439 | 0.439 | 8.8% |
Forecast accuracy of ANN of pollutants.
| Pollutants | Station 1 | Station 2 | Station 3 | Station 4 | ||||
|---|---|---|---|---|---|---|---|---|
| MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | |
| SO2 | 0.0674 | 0.0910 | 0.0524 | 0.0929 | 0.0386 | 0.0636 | 0.0512 | 0.0870 |
| PM10 | 0.0428 | 0.0631 | 0.0476 | 0.0615 | 0.0485 | 0.0740 | 0.0494 | 0.0872 |
Comparisons result with different forecasting methods.
| Study Areas | Methods | MAE | MSE | RMSE |
|---|---|---|---|---|
| Pasir Gudang | SARMIA | 5.39 | 37.76 | 6.14 |
| FTS | 5.88 | 53.43 | 7.31 | |
| ANN | 3.87 | 32.09 | 5.66 | |
| Johor Bahru | SARMIA | 4.12 | 21.90 | 4.68 |
| FTS | 5.21 | 33.82 | 5.82 | |
| ANN | 2.70 | 12.79 | 3.58 | |
| Muar | SARMIA | 4.84 | 33.49 | 5.79 |
| FTS | 3.49 | 18.44 | 4.29 | |
| ANN | 3.29 | 18.05 | 4.25 |
Comparison of the forecasting performances using different models.
| Model | Air Pollutants | Performance Criteria | |
|---|---|---|---|
| MAPE | RMSE | ||
| W-BPNN | PM10 | 15.277 | 15.391 |
| SO2 | 15.886 | 8.269 | |
| NO2 | 16.544 | 2.621 | |
| BPNN | PM10 | 31.266 | 23.624 |
| SO2 | 22.119 | 12.716 | |
| NO2 | 35.030 | 5.406 | |
Short summary of commonly used wavelet.
| Wavelet | Main Equations | Description |
|---|---|---|
| Haar wavelet |
| Haar function is the earliest use of wavelet analysis in the wavelet, and is also the simplest wavelet. The function itself is a step function |
| Mexican Hat wavelet |
| Mexican Hat wavelet is the two-order derivative of Gauss function (plus minus) |
| Morlet wavelet |
| Morlet wavelet does not have orthogonality and no compact support set, so it can only satisfy the condition of continuous wavelet, but cannot be discrete wavelet transform and orthogonal wavelet transform |
| Daubechies wavelet |
| Assuming, |
Forecast accuracy of SVM of SO2 and PM10.
| Pollutants | Station 1 | Station 2 | Station 3 | Station 4 | ||||
|---|---|---|---|---|---|---|---|---|
| MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | |
| SO2 | 0.0477 | 0.0840 | 0.0491 | 0.0866 | 0.0266 | 0.0498 | 0.0358 | 0.0602 |
| PM10 | 0.0393 | 0.0606 | 0.0341 | 0.0518 | 0.0468 | 0.0739 | 0.0420 | 0.0756 |
Forecast accuracy in testing period of FTS.
| Study Areas | MAE | MSE | RMSE |
|---|---|---|---|
| Pasir Gudang | 5.88 | 53.43 | 7.31 |
| Johor Bahru | 5.21 | 33.82 | 5.82 |
| Muar | 3.49 | 18.44 | 4.29 |
Figure 4The flaw chart of fuzzy identification.
Figure 5The process diagram of WRF–CALMET–CALPUFF modeling system.
Figure 6Structures of numerical forecast methods.
Comparison of MM5 and WRF model.
| Project | MM5 Model | WRF Model |
|---|---|---|
| Vertical coordinate | Terrain following height coordinates | Terrain following quality coordinates |
| Conservation | Not necessarily conservative | Conservation of mass, momentum and scalar quantity |
| Time integral | Leapfrog integration scheme | Three order Runge-Kutta integral scheme |
| Horizontal convection | Second order accuracy center format | Five order upwind difference scheme |
| Damping filter | Four order smoothing | No requirement |
| Typical time step | 3 times the distance of the grid | 6 times the distance of the grid |
Recent studies on air pollution forecasting using three dimensional model.
| Method | Pollutant | Country | Inputs | Ref. |
|---|---|---|---|---|
| WRF-Chem | PM10 | Poland | Meteorological data, emission data | [ |
| Models-3/CMAQ | O3 | United States | Meteorological information, emission rates from sources | [ |
| CMAQ-MOS | PM10, NO2 | China | Wind field (U, V), temperature field (Ts), relative humidity (RH) | [ |
| CMAQ-ANNs | PM10, SO2 | China | Wind field (U, V), temperature field (Ts), relative humidity (RH), concentrations of PM2.5, PM10, SO2, NO2, O3 | [ |
| WRF-ADMS | Perfluoromethylcyclohexane | Tunis | Initial and boundary conditions, topography, land use and soil data, exit diameter, release point height, flow rate, temperature, hourly averaged meteorological data | [ |
| Coupled WRF-SFIRE with WRF-Chem | Fire somke | United States | Fuel categories, FINN emission factors, | [ |
| CALPUFF-WRF | SO2 | Sultan | land use categories, terrain elevations, surface and upper air meteorological observations or meteorological fields | [ |
| WRF-Chem | O3 | United States | No detailed description | [ |
| WRF/Chem-MADRID | O3, PM2.5 | United States | No detailed description | [ |
| CALPUFF | Total suspended particulate (TSP) | Israel | Temperature, relative humidity, barometric pressure, 10 min average wind speed and direction, cloud cover, topographic data | [ |
| AERMOD | Total suspended particulate (TSP) | Israel | Meteorological data (Temperature, relative humidity, barometric pressure, 10 min average wind speed and direction) from two site, cloud cover, topographic data | [ |
Statistical performance measures for ANN–MLP model.
| Statistical Measures | Ideal Value | Training Value | Validation Value |
|---|---|---|---|
| R | 1 | 0.89 | 0.91 |
| IOA | 1 | 0.99 | 0.98 |
| NMSE | 0 | 0.016 | 0.017 |
| FB | 0 | 0.001 | −0.021 |
Results for forecast of the average concentration of PM10 for the next day.
| Stations | Clustering Algorithms | Time Window | Number of Cluster | MAE | MSE |
|---|---|---|---|---|---|
| CRUZ ROJA (CR) | 1 | 8 | 0.0207 | 0.00085 | |
| FCM | 1 | 7 | 0.0208 | 0.00083 | |
| Nativitas (NA) | 1 | 2 | 0.0230 | 0.00087 | |
| FCM | 2 | 5 | 0.2031 | 0.00095 | |
| DIF (DF) | 3 | 8 | 0.0280 | 0.00134 | |
| FCM | 1 | 3 | 0.0257 | 0.00113 |
Performance indicators for the developed forecast models.
| Stations | Metric | FFMLP | GA-MLP | MLPnomet | MLR |
|---|---|---|---|---|---|
| Station 1 | MAE | 14.03 | 15.36 | 18.91 | 17.46 |
| RMSE | 20.28 | 22.39 | 27.87 | 26.68 | |
| R | 0.78 | 0.73 | 0.53 | 0.59 | |
| IA | 0.87 | 0.83 | 0.65 | 0.72 | |
| Station 2 | MAE | 14.18 | 14.48 | 16.99 | 17.37 |
| RMSE | 19.36 | 19.26 | 22.47 | 23.90 | |
| R | 0.70 | 0.65 | 0.48 | 0.53 | |
| IA | 0.80 | 0.79 | 0.63 | 0.65 | |
| Station 3 | MAE | 19.08 | 20.55 | 27.49 | 24.53 |
| RMSE | 26.06 | 28.70 | 38.11 | 35.14 | |
| R | 0.80 | 0.73 | 0.43 | 0.55 | |
| IA | 0.88 | 0.83 | 0.56 | 0.64 | |
| Station 4 | MAE | 7.68 | 7.54 | 10.25 | 11.94 |
| RMSE | 12.35 | 12.16 | 16.62 | 17.06 | |
| R | 0.82 | 0.83 | 0.54 | 0.55 | |
| IA | 0.89 | 0.90 | 0.65 | 0.65 |
The performance of forecast model.
| Stations | CS-BPANN | EEMD-BPANN | CS-EEMD-BPANN | |||
|---|---|---|---|---|---|---|
| AE | MAPE | AE | MAPE | AE | MAPE | |
| Station 1 | 1.71 | 11.27% | - | - | 1.583 | 9.37% |
| Station 2 | 15.45 | 18.53% | 13.82 | 17.56% | 13.86 | 15.78% |
| Station 3 | 28.56 | 41.04% | 28.16 | 40.59% | 27.64 | 36.98% |
GCA is initially used to identify the major factors influencing PM. Gray relational order is examined between the PM and potential factors. Forecasting result is improved by 24%, 16%, 16% and 13% for different strategies. The developed model could be used in sites with different characteristics. Proposed method CS-EEMD-BPANN is more stable than BPANN and EEMD-BPANN.
Forecast results of ICEEMD-SVM-WOA model in three study areas.
| Study Areas | PM2.5 | PM10 | SO2 | NO2 | CO | O3 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MAE | MAPE | MAE | MAPE | MAE | MAPE | MAE | MAPE | MAE | MAPE | MAE | MAPE | |
| Taiyuan | 3.197 | 9.204 | 5.517 | 6.689 | 1.497 | 7.831 | 1.765 | 5.614 | 0.024 | 2.820 | 3.392 | 4.225 |
| Harbin | 1.781 | 2.260 | 3.203 | 7.457 | 0.533 | 9.351 | 2.420 | 7.236 | 0.023 | 2.921 | 3.430 | 7.900 |
| Chongqing | 2.900 | 8.795 | 5.263 | 10.311 | 1.160 | 13.219 | 2.882 | 8.265 | 0.049 | 5.005 | 4.350 | 11.514 |
The short summary of hybrid system for air pollution.
| List of Recent Research on the Application of HS in the Field of Air Pollution | |
|---|---|
| Author | Main Contribution |
| Chen et al. [ | Combining numerical forecast (WRF) with statistical analysis (temporal synoptic index) to forecast high-PM10 concentration in Beijing. This hybrid forecast system forecasts high-PM pollution events is more accurately than current forecast methods. It combines the strengths of various methods while avoiding the disadvantages found when statistical forecast methods are used alone. |
| Zhou et al. [ | Established a hybrid EEMD-GRNN model to forecast the concentration of pollutants in Xi’an, which was shown to be superior to other conventional models. |
| Qin et al. [ | Proposed the CS-EEMD-BPANN model for forecasting PM concentrations in Beijing, Shanghai, Guangzhou and Lanzhou. The forecasting result is improved and this method is more stable than BPNN and EEMD-BPANN. |
| Qin et al. [ | Using an a priori algorithm mined the spatial and temporal associations of intercity |
| Wang et al. [ | They used HANN, HSVM and Taylor expansion forecasting model in Taiyuan. The innovation involved in this approach is that it sufficiently and validly utilizes the useful residual information on an incomplete input variable condition. |
| Feng et al. [ | 1. Using trajectory based geographic parameter as an extra input to ANN model; |
| Xu et al. [ | Proposed ICEEMD-SVM-WOA model and FE model. This model not only forecast the concentrate on air pollutants, but also evaluates the effectiveness of the new forecast system by fuzzy evaluation method. |
| Wongsathan et al. [ | Proposed a fundamental hybrid forecast model. This model can improve the performance of the forecast models, the exogenous variable may be considered as well as the modified of the hybrid algorithm |
The description of three geographic models.
| Model | Description |
|---|---|
| Single-site neighborhood model | The main idea of this model is to use the air pollution index of one or more neighboring regions as the input variables of the forecast area. |
| Two-site neighborhood model | This model considers two neighboring districts. The rationale for this model is that using more predictor variables should achieve higher accuracy. |
| Distance-based model | In this model, the weighted average value of air pollutants is calculated according to the distance between the adjacent regions and the forecasted distance. The model is based on the idea that the effects of air pollutant levels of the neighboring district are inversely proportional to the distance between the two districts. |
Different models of air pollution forecast.
| Method Types | Authors | Models | Main Conclusions |
|---|---|---|---|
| Statistical methods | Silibello et al. [ | Kalman filter (KF) and Hybrid forecast (HF) | Use two adjustment techniques, the HF and the KF, to improve the accuracy of forecasting supplied by an air quality forecast system |
| Huebnerova et al. [ | Generalized linear models with log–link and gamma distribution | It’s shown that the predicted meteorological variables are used to predict well though comparative analysis of the two models | |
| Artificial intelligence methods | Catalano et al. [ | ANN and ARIMAX | Forecasted the extreme concentrations by integrating the two models into an ensemble |
| Feng et al. [ | SVM-GABPNN | Proposed a hybrid model which SVM was used to classify data, GA used to optimize the BPNN model. | |
| Bai et al. [ | W-BPNN | Using wavelet transform to realize feature extraction and characterization of air pollutants | |
| Siwek et al. [ | Wavelet transformation, the multilayer perceptron, radial basis function, Elman network, SVM and linear ARX model | Decomposed the data into the wavelet coefficients and used different NN to individual prediction, then combined the few predictors in the ensemble. This approach does not require very exhaustive information about air pollutants, and it has the ability of allowing the nonlinear relationships between very different predictor variables. | |
| Hybrid methods | Feng et al. [ | Hybrid ANN | Used trajectory based geographic parameter as an extra input to ANN model; using wavelet transformation decomposed original series into a few sub-series with lower variability |
| Fu et al. [ | RM-GM-FFNN | Enhanced FFNN model with RM and GM to assess the possible correlation between different input variables for improving forecast accuracy | |
| Song et al. [ | ANF, Distribution functions, | Proposed interval prediction method and ANF to address the uncertainty of PMs according to the pollutant emission distribution. | |
| Three dimensional models | Luo et al. [ | Models-3/CMAQ | Provided a method of analyzing the change of pollutants’ concentration in the condition of lacking practical pollution data. |
| Grell et al. [ | Fully coupled online chemistry with the WRF model | The accuracy of forecasting of meteorological modules and chemical modules under different conditions of separation and coupling is explored. The result indicate that the ability to predict a slight increase | |
| Other methods | Kurt et al. [ | Neural networks based on geographic forecasting models | The models which considered the geographic factor performed better than the models which unconsidered. |
| Pan et al. [ | GM | Selected 30 indexes of 5 categories, and find mainly impact factors by using grey relational analysis, then used GM (1, 1) model to forecast the concentration of pollutants |