| Literature DB >> 34777973 |
Ahmed Samy Moursi1, Nawal El-Fishawy1, Soufiene Djahel2, Marwa Ahmed Shouman1.
Abstract
Air pollution is a major issue resulting from the excessive use of conventional energy sources in developing countries and worldwide. Particulate Matter less than 2.5 µm in diameter (PM2.5) is the most dangerous air pollutant invading the human respiratory system and causing lung and heart diseases. Therefore, innovative air pollution forecasting methods and systems are required to reduce such risk. To that end, this paper proposes an Internet of Things (IoT) enabled system for monitoring and predicting PM2.5 concentration on both edge devices and the cloud. This system employs a hybrid prediction architecture using several Machine Learning (ML) algorithms hosted by Nonlinear AutoRegression with eXogenous input (NARX). It uses the past 24 h of PM2.5, cumulated wind speed and cumulated rain hours to predict the next hour of PM2.5. This system was tested on a PC to evaluate cloud prediction and a Raspberry P i to evaluate edge devices' prediction. Such a system is essential, responding quickly to air pollution in remote areas with low bandwidth or no internet connection. The performance of our system was assessed using Root Mean Square Error (RMSE), Normalized Root Mean Square Error (NRMSE), coefficient of determination (R 2), Index of Agreement (IA), and duration in seconds. The obtained results highlighted that NARX/LSTM achieved the highest R 2 and IA and the least RMSE and NRMSE, outperforming other previously proposed deep learning hybrid algorithms. In contrast, NARX/XGBRF achieved the best balance between accuracy and speed on the Raspberry P i .Entities:
Keywords: Air pollution forecast; Edge computing; IoT; Machine learning; NARX architecture; PM2.5
Year: 2021 PMID: 34777973 PMCID: PMC8320723 DOI: 10.1007/s40747-021-00476-w
Source DB: PubMed Journal: Complex Intell Systems ISSN: 2199-4536
Related work summary
| Reference | Evaluation metrics | Pros | Cons |
|---|---|---|---|
| [ | MAE, RMSE, IA | Feasibility and practicality were verified experimentally for forecasting PM2.5 using their proposal | Algorithmic predictions did not follow real trend accurately and were a bit shifted and disordered |
| [ | MAE, RMSE | CNN-GRU and CNN-LSTM worked better for PM10 and PM2.5, respectively | Hybrid models weakly predicted future highest and lowest levels of PM2.5 |
| [ | RMSE, R2 | After using multiple algorithms, it was found that Extra Trees gives the best performance | The study was limited in the number of machine learning algorithms compared. There was a bit of a shift between actual and predicted values for most algorithms |
| [ | RMSE, MAE | CNN could extract air quality features, shortening training time, whereas LSTM could perform prediction using long-term historical input data | More evaluation parameters, stating closeness to real values like R2 or IA rather than only errors metrics, could have been used to confirm their models' performance |
| [ | NMSE, FB and FA2 | PM2.5 concentration was predicted using meteorological parameters and PM10 and CO without a history of PM2.5 itself | More machine learning models could have been used to test their methodology further |
| [ | SMAPE | The ensemble of the three models (AccuAir) proved to be better than the individual components tested | They did not use LSTM in their Seq2Seq model, although it was proven to be very efficient in time series prediction |
| [ | RMSE, MAE and MAPE | Their model was compared to Multilayer Perceptron (MLP) and LSTM models and proved to be more stable and accurate | Their system predicts only the daily average and cannot be deployed to predict the hourly or real-time concentration of PM2.5 |
| [ | A comparison of prediction values vs. real value using different sample sets | Their proposed system uses many sensors to ensure accuracy and minimize monitoring cost. The system is scalable and suitable for big data analysis | The study did not use any clear evaluation metric; instead, they presented a comparison of prediction values vs. actual value using different sample sets |
| [ | Calculating AQI and comparing two setups with and without measurements flattened and calibration and accumulation algorithms employed | They developed a system that saves bandwidth and energy consumption | Further processing by the edge can save even more bandwidth and energy consumption. However, no prediction exists on the edge devices or the cloud side |
| [ | There is no evaluation metric of their system, only a proof of concept | It tackles security issues of that kind of IoT system. Their IoT solution is scalable, reliable, secure and has HA (high availability) | The system is used primarily for monitoring rather than conducting prediction of future pollution levels. It relies on central management and central prediction rather than performing prediction on edge devices |
| [ | RMSE, MAE and F1 | It comprised both prediction and classification to make an alarm system. LSTM was compared to SVR as a baseline, and LSTM was proven to be a better algorithm | Their research did not include a comparison to other works and used only one base model |
Fig. 1Proposed IoT System architecture
Fig. 2Proposed system data flow architecture
Fig. 3NARX model
Fig. 4Simple RNN with one layer and no gated memory cells
Fig. 5LSTM RNN elemental network structure
Fig. 6Proposed architecture diagram
Dataset statistics
| Cumulated wind speed | Cumulated hours of rain | PM2.5 | |
|---|---|---|---|
| Count | 43,824 | 43,824 | 41,757 |
| Mean | 23.88914 | 0.194916 | 98.61321 |
| Standard deviation | 50.01006 | 1.415851 | 92.04928 |
| Minimum | 0.45 | 0 | 0 |
| Percentile (25%) | 1.79 | 0 | 29 |
| Percentile (50%) | 5.37 | 0 | 72 |
| Percentile (75%) | 21.91 | 0 | 137 |
| Maximum | 585.6 | 36 | 994 |
| Empty count | 0 | 0 | 2067 |
| Loss percentage | 0.00% | 0.00% | 4.95% |
| Coverage percentage | 100.00% | 100.00% | 95.28% |
Fig. 7Real vs. Prediction for the proposed NARX hybrid and LSTM run on PC
Fig. 8Real vs. Prediction for the proposed NARX hybrid and Random Forests run on PC
Fig. 9Real vs. Prediction for the proposed NARX hybrid and Extra Trees run on PC
Fig. 10Real vs. Prediction for the proposed NARX hybrid and Gradient Boost run on PC
Fig. 11Real vs. Prediction for the proposed NARX hybrid and Extreme Gradient Boost run on PC
Fig. 12Real vs. Prediction for the proposed NARX hybrid and Random Forests in XGBoost run on PC
Fig. 13Real vs. Prediction for the proposed NARX hybrid and LSTM run on Raspberry Pi 4
Fig. 14Real vs. Prediction for the proposed NARX hybrid and Random Forests run on Raspberry Pi 4
Fig. 15Real vs. Prediction for the proposed NARX hybrid and Extra Trees run on Raspberry Pi 4
Fig. 16Real vs. Prediction for the proposed NARX hybrid and Gradient Boost run on Raspberry Pi 4
Fig. 17Real vs. Prediction for the proposed NARX hybrid and Extreme Gradient Boost run on Raspberry Pi 4
Fig. 18Real vs. Prediction for the proposed NARX hybrid and Random Forests in XGBoost run on Raspberry Pi 4
Average Prediction evaluation results for K-Fold = 10
Fig. 19Comparison of the proposed NARX hybrid with other ML algorithms with respect to Root Mean Square Error on PC and Raspberry Pi 4
Fig. 20Comparison of the proposed NARX hybrid with other ML algorithms in terms of Normalized Root Mean Square Error on PC and Raspberry Pi 4
Fig. 21Comparison of the proposed NARX hybrid with other ML algorithms in terms of Coefficient of Determination on PC and Raspberry Pi 4
Fig. 22Comparison of the proposed NARX hybrid with other ML algorithms in terms of Index of Agreement on PC and Raspberry Pi 4
Fig. 23Comparison of the proposed NARX hybrid with other ML algorithms in terms of training duration on PC and Raspberry Pi 4