| Literature DB >> 33840911 |
Pritthijit Nath1, Pratik Saha2, Asif Iqbal Middya1, Sarbani Roy1.
Abstract
Tackling air pollution has become of utmost importance since the last few decades. Different statistical as well as deep learning methods have been proposed till now, but seldom those have been used to forecast future long-term pollution trends. Forecasting long-term pollution trends into the future is highly important for government bodies around the globe as they help in the framing of efficient environmental policies. This paper presents a comparative study of various statistical and deep learning methods to forecast long-term pollution trends for the two most important categories of particulate matter (PM) which are PM2.5 and PM10. The study is based on Kolkata, a major city on the eastern side of India. The historical pollution data collected from government set-up monitoring stations in Kolkata are used to analyse the underlying patterns with the help of various time-series analysis techniques, which is then used to produce a forecast for the next two years using different statistical and deep learning methods. The findings reflect that statistical methods such as auto-regressive (AR), seasonal auto-regressive integrated moving average (SARIMA) and Holt-Winters outperform deep learning methods such as stacked, bi-directional, auto-encoder and convolution long short-term memory networks based on the limited data available.Entities:
Keywords: Air pollution; Deep learning; Long-term forecast; Statistical models; Time-series analysis
Year: 2021 PMID: 33840911 PMCID: PMC8019307 DOI: 10.1007/s00521-021-05901-2
Source DB: PubMed Journal: Neural Comput Appl ISSN: 0941-0643 Impact factor: 5.606
Summary of forecasting models proposed by researchers in recent decades
| Author | Year | Method | Description |
|---|---|---|---|
| Mahajan et al. [ | (2017) | Neural network auto regression (NNAR) | Hourly forecast of PM2.5 was performed and its prediction was compared with ARIMA and Holt–Winters models |
| Xiang [ | (2019) | Multiple kernel learning (MKL) framework | MKL was proposed to forecast the near future PM2.5 values and was compared to single kernel-based support vector regression (SVR) model |
| Xie [ | (2017) | Deep neural network | The proposed model was based on manifold learning along with a deep belief network (DBN) developed to learn the features of the input candidates for local PM2.5 forecast |
| Luo et al. [ | (2018) | Adaptive iterative forecast (AIF) Model | The proposed AIF model could predict the value of PM2.5 for the next few hours (by linear programming, normalization and time series) based on the trend of historical data |
| Feng et al. [ | (2015) | Hybrid artificial neural network (ANN) | A hybrid model combining air mass trajectory analysis and wavelet transformation was proposed to improve the forecast’s accuracy |
| Haiming and Xiaoxiao [ | (2013) | RBF neural network | Along with PM2.5, other influence factors were chosen to predict its concentration and then compared with the classic BP network model |
| Yan et al. [ | (2018) | Encoder–decoder model | Three prediction models: BP, stack GRU and encoder–decoder were constructed to predict the PM2.5 concentration of every hour of the next day |
| Maria et al. [ | (2015) | Multilayer perceptron neural network and clustering algorithm | In addition to multilayer neural network, clustering algorithm was used to find relationships between PM10 and meteorological variables for increasing accuracy of forecasting |
| Al-Kassabeh et al. [ | (2013) | Nonparametric artificial neural network (ANN) | For prediction of PM10, other meteorological parameters were also considered and an artificial neural network based auto regressive with external input (ANNARX) model was proposed to provide high calibre modelling |
| Lam and Mok [ | (2007) | ANN applied three-layer feed-forward network (TLFN) | Along with six input parameters for each seasonal model, highest absolute values of correlation coefficients were selected to form the model input pattern to feed into the ANN for 24 hour predictions |
Descriptive statistics for PM2.5, PM10, temperature and relative humidity
| Month | PM2.5 ( | PM10 ( | Temperature ( | Relative humidity (%) | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
| min | max |
|
| min | max |
|
| min | max |
|
| min | max | |
| Jan | 163.35 | 69.28 | 46.38 | 508.0 | 194.18 | 75.04 | 75.32 | 451.42 | 18.34 | 1.93 | 11.61 | 23.28 | 70.69 | 7.61 | 49.77 | 95.51 |
| Feb | 111.49 | 44.40 | 18.33 | 281.0 | 159.88 | 66.16 | 27.58 | 303.09 | 23.05 | 2.82 | 17.28 | 30.17 | 65.22 | 9.69 | 45.00 | 95.15 |
| Mar | 67.18 | 27.42 | 2.71 | 159.0 | 82.02 | 32.60 | 29.15 | 193.57 | 27.44 | 2.25 | 20.40 | 31.22 | 65.01 | 10.13 | 43.10 | 88.90 |
| Apr | 38.09 | 13.47 | 3.04 | 74.0 | 56.62 | 20.27 | 20.18 | 137.81 | 30.02 | 2.05 | 25.62 | 34.89 | 69.33 | 7.78 | 44.20 | 81.70 |
| May | 37.08 | 14.53 | 0.72 | 114.0 | 55.88 | 19.97 | 2.29 | 120.07 | 30.34 | 1.61 | 25.35 | 33.28 | 73.01 | 6.06 | 59.10 | 94.20 |
| Jun | 33.58 | 19.25 | 0.30 | 172.0 | 53.25 | 43.81 | 0.59 | 298.22 | 30.08 | 1.59 | 25.32 | 34.18 | 78.82 | 6.20 | 61.89 | 95.89 |
| Jul | 29.61 | 15.44 | 2.00 | 112.0 | 42.00 | 39.63 | 8.97 | 288.13 | 28.97 | 1.18 | 26.25 | 31.51 | 85.07 | 5.89 | 71.80 | 97.57 |
| Aug | 28.75 | 14.07 | 0.04 | 72.0 | 37.50 | 16.13 | 6.74 | 85.66 | 28.74 | 1.29 | 25.07 | 31.61 | 85.19 | 9.04 | 14.72 | 98.88 |
| Sep | 30.77 | 18.10 | 3.29 | 113.0 | 44.30 | 26.95 | 5.22 | 111.23 | 28.79 | 1.42 | 25.17 | 31.94 | 84.08 | 5.68 | 70.80 | 97.27 |
| Oct | 62.83 | 38.37 | 8.91 | 257.0 | 92.93 | 51.59 | 13.01 | 204.75 | 27.37 | 1.89 | 22.57 | 31.83 | 79.52 | 8.19 | 63.30 | 97.77 |
| Nov | 120.50 | 65.64 | 12.38 | 308.0 | 165.74 | 68.57 | 17.34 | 354.31 | 23.73 | 1.94 | 19.00 | 28.61 | 72.46 | 8.65 | 54.60 | 97.57 |
| Dec | 152.33 | 67.50 | 26.00 | 402.0 | 193.37 | 66.46 | 74.53 | 365.14 | 19.47 | 2.26 | 14.67 | 24.06 | 72.59 | 7.09 | 49.84 | 90.50 |
, , min and max represent the mean, standard deviation, minimum and maximum, respectively
Fig. 1Taxonomy of methods used in this study for time-series analysis and in statistical and deep learning-based modelling
Fig. 2General overview of the proposed approach
Fig. 3Model architecture diagrams using a LSTM auto-encoder, b bi-directional LSTM, c convolution LSTM and d stacked LSTM
Fig. 4Pearson correlation heatmaps a before imputation and b after imputation of PM10
Fig. 5Daily time-series plots for a PM2.5 and b PM10
Fig. 6HP Filter, simple moving average and monthly plots for a–c PM2.5 and d–f PM10
Fig. 7a–b Trend and c–d seasonal plots for monthly PM2.5 and PM10
Fig. 8Flow diagram demonstrating the calculation of the Pearson correlation coefficient. PM2.5 and PM10 data are shown in blue and in green, respectively. The trends (blue and green dotted lines for PM2.5 and PM10, respectively) are opposite in nature. The deviations in PM2.5 and PM10 from their respective mean (i.e. 75.65 and 106.08) are shown in violet and red colour, respectively (Color figure online)
Fig. 9Autocorrelation plots for monthly PM2.5 and PM10. The blue arrow marks show the lag having the highest positive correlation (i.e. lag 12) in the set of positive lags after the first set of negative correlation lags
Performance metrics of statistical models for PM2.5and PM10
| Pollutant | Model | RMSE | MAE |
|---|---|---|---|
| PM2.5 | AR | 15.68 | 13.08 |
| SARIMA | 12.19 | 10.12 | |
| Holt–Winters | |||
| Prophet | 31.87 | 24.27 | |
| PM10 | AR | 21.98 | 19.48 |
| SARIMA | 20.53 | 16.07 | |
| Holt–Winters | |||
| Prophet | 39.57 | 35.58 |
Bold values indicate the best performing models with the respect to the metrics mentioned
Performance metrics of deep learning models
| Pollutant | Model | RMSE | MAE | Train time (in s) |
|---|---|---|---|---|
| PM2.5 | Stacked LSTM | 22.32 | 16.62 | 33.06 |
| LSTM auto-encoder | 18.88 | 15.88 | 9.50 | |
| Bi-directional LSTM | 19.27 | 16.57 | 11.01 | |
| Convolution LSTM | ||||
| PM10 | Stacked LSTM | 19.81 | ||
| LSTM auto-encoder | 33.35 | 26.58 | 5.19 | |
| Bi-directional LSTM | 29.92 | 23.78 | 13.36 | |
| Convolution LSTM | 29.92 | 22.73 |
Bold values indicate the best performing models with the respect to the metrics mentioned
Fig. 10Actual vs predicted scatter plots of statistical models for (a-d) PM2.5 and (e-h) PM10
Fig. 11Actual vs predicted scatter plots of deep learning models for a–d PM2.5 and e–h PM10
Fig. 12PM2.5 forecast plots for statistical and deep learning models with the shaded portion representing the forecast region
Fig. 13PM10 forecast plots for statistical and deep learning models with the shaded portion representing the forecast region