| Literature DB >> 35445884 |
Gul Muhammad Khan1, Sohail Yousaf1, Saba Gul2,3.
Abstract
Particulate matter is one of the key contributors of air pollution and climate change. Long-term exposure to constituents of air pollutants has exerted serious health implications in both humans and plants leading to a detrimental impact on economy. Among the pollutants contributing to air quality determination, particulate matter has been linked to serious health implications causing pulmonary complications, cardiovascular diseases, growth retardation and ultimately death. In agriculture, crop yield is also negatively impacted by the deposition of particulate matter on stomata of the plant which is alarming and can cause food security concerns. The deleterious impact of air pollutants on human health, agricultural and economic well-being highlights the importance of quantifying and forecasting particulate matter. Several deterministic and deep learning models have been employed in the recent years to forecast the concentration of particulate matter. Among them, deep learning models have shown promising results when it comes to modeling time series data and forecasting it. We have explored recurrent neural networks with LSTM model which shows potential to predict the particulate matter ([Formula: see text]) based on multi-step multi-variate data of two of the most polluted regions of South Asia, Beijing, China and Punjab, Pakistan effectively. The LSTM model is tuned using Bayesian optimization technique to employ the appropriate hyper-parameters and weight initialization strategies based on the dataset. The model was able to predict [Formula: see text] for the next hour with root-mean-square error (RMSE) of 0.1913 (91.5% accuracy) and this error gradually increases with the number of time steps with next 24 hours steps prediction having RMSE of 0.7290. While in case of Punjab dataset with data recorded once a day, the RMSE for the next day forecast is 0.2192. These multi-step short-term forecasts would play a pivotal role in establishing an early warning system based on the air quality index (AQI) calculated and enable the government in enacting policies to contain it.Entities:
Keywords: AQI; Air pollution; Early warning system; Forecasting; LSTM; Particulate matter
Mesh:
Substances:
Year: 2022 PMID: 35445884 PMCID: PMC9022063 DOI: 10.1007/s10661-022-10029-4
Source DB: PubMed Journal: Environ Monit Assess ISSN: 0167-6369 Impact factor: 3.307
Fig. 1Proposed Network Architecture
Data-set Specifications Of Modified UCI Dataset
| City/Region | Beijing, China |
| Time Span | 2010–2017 |
| Meteorological data | Dew Point, Temperature, Pressure, Combined wind direction, Cumulated wind speed, Cumulated hours of snow, Cumulated hours of rain |
| Pollutants Data | |
| Number of Recording Stations | 35 |
Fig. 2Air Quality Index (AQI) Scale, EPA USA
Air Quality Index set by environment protection agency, US
| 0.000–0.059 | 0–54 | 0–12 | 0–4.4 | 0.000–0.034 | - | 0–50 | Good |
| 0.060–0.075 | 55–154 | 12.1–35.4 | 4.5–9.4 | 0.035–0.144 | - | 51–100 | Moderate |
| 0.076–0.095 | 155–254 | 35.5–55.4 | 9.5–12.4 | 0.145–0.224 | - | 101–150 | Unhealthy for sensitive groups |
| 0.096–0.115 | 255–354 | 55.5–150.4 | 12.5–15.4 | 0.225–0.304 | - | 151–200 | Unhealthy |
| 0.116–0.374 | 355–424 | 150.5–250.4 | 15.5–30.4 | 0.305–0.604 | 0.65–1.24 | 201–300 | Very Unhealthy |
| - | 425–504 | 250.5–350.4 | 30.5–40.4 | 0.605–0.804 | 1.25–1.64 | 301–400 | Hazardous |
| - | 505–604 | 350.5–500.4 | 40.5–50.4 | 0.805–1.004 | 1.65–2.04 | 401–500 | Hazardous |
Fig. 3t-SNE plot of the modified UCI Beijing air quality dataset
Data-set Specifications Of Punjab Dataset
| City/Region | Punjab, Pakistan |
| Time Span | 2017–2020 |
| Metrological data | wind direction, temperature, barometric pressure, humidity, and visibility |
| Pollutants Data | |
| Number of Recording Stations | 6 |
Fig. 4t-SNE plot of Punjab dataset
Fig. 5Hyper-parameter tuning using Bayesian optimization
Selection of Optimizer and Activation using Bayesian optimization on modified Beijing air quality dataset
| Adagrad | linear | 0.0416348 |
| Adagrad | linear | 0.0416676 |
| Adagrad | tanh | 0.0419074 |
| Adadelta | relu | 0.0419169 |
| Adagrad | linear | 0.0419371 |
| Adagrad | softsign | 0.0420134 |
| Adagrad | tanh | 0.04209329 |
The bold and italic emphasis is used to highlight the best parameter in each table
Optimal learning rate selection using Bayesian optimization on modified Beijing air quality dataset
| 0.013885 | 0.0413445 |
| 0.00665 | 0.0415720 |
| 0.01271 | 0.0416139 |
| 0.009799 | 0.0416274 |
| 0.005592 | 0.0417381 |
| 0.011722 | 0.0418148 |
| 0.003004 | 0.0418748 |
The bold and italic emphasis is used to highlight the best parameter in each table
Fig. 6Actual Vs. Predicted values of employed architecture on Hourly data of modified UCI Beijing air quality dataset
Fig. 7Actual Vs. Predicted values of employed architecture on 24 hour data of modified UCI Beijing air quality dataset
Test RMSE for multi-step prediction of on modified Beijing air quality dataset
| 3 future hours | 0.2598 | 0.932526 |
| 6 future hours | 0.3475 | 0.879228 |
| 9 future hours | 0.4096 | 0.832258 |
| 12 future hours | 0.5578 | 0.688855 |
| 15 future hours | 0.5195 | 0.730129 |
| 18 future hours | 0.5020 | 0.748015 |
| 21 future hours | 0.5408 | 0.707541 |
| 24 future hours | 0.7290 | 0.468538 |
Comparison of test RMSE with Deep Air for various future time lags on modified Beijing air quality dataset
| Our Model | 1 future hour/single step | 0.1913 | 0.963402 |
| Deep Air (Reddy et al., | 1 future hour/single step | 12.78 | 0.96 |
| Our Model | 5 future hours | 0.3229 | 0.895758 |
| Deep Air (Reddy et al., | 5 future hours | 44.15 | 0.689 |
| Our Model | 10 future hours | 0.4086 | 0.833078 |
| Deep Air (Reddy et al., | 10 future hours | 74.8 | 0.588 |
Selection of Optimizer and Activation using Bayesian optimization on Punjab dataset
| Nadam | relu | 0.7275196 |
| Nadam | relu | 0.7335685 |
| Nadam | relu | 0.7343104 |
| Nadam | relu | 0.7576756 |
| Nadam | relu | 0.7613520 |
| Nadam | relu | 0.7674339 |
| Nadam | relu | 0.7830696 |
The bold and italic emphasis is used to highlight the best parameter in each table
Optimal learning rate selection using Bayesian optimization on Punjab dataset
| 0.0032243 | 0.6428966 |
| 0.0035615 | 0.6451763 |
| 0.0145944 | 0.6503993 |
| 0.0046863 | 0.6523746 |
| 0.0013331 | 0.6587814 |
| 0.0013885 | 0.6672297 |
| 0.0129494 | 0.6792039 |
The bold and italic emphasis is used to highlight the best parameter in each table
Fig. 8Actual Vs. Predicted values of employed architecture on 24 hour data of Punjab dataset