| Literature DB >> 35648766 |
Ditsuhi Iskandaryan1, Francisco Ramos1, Sergio Trilles1.
Abstract
Nitrogen dioxide is one of the pollutants with the most significant health effects. Advanced information on its concentration in the air can help to monitor and control further consequences more effectively, while also making it easier to apply preventive and mitigating measures. Machine learning technologies with available methods and capabilities, combined with the geospatial dimension, can perform predictive analyses with higher accuracy and, as a result, can serve as a supportive tool for productive management. One of the most advanced machine learning algorithms, Bidirectional convolutional LSTM, is being used in ongoing work to predict the concentration of nitrogen dioxide. The model has been validated to perform more accurate spatiotemporal analysis based on the integration of temporal and geospatial factors. The analysis was carried out according to two scenarios developed on the basis of selected features using data from the city of Madrid for the periods January-June 2019 and January-June 2020. Evaluation of the model's performance was conducted using the Root Mean Square Error and the Mean Absolute Error which emphasises the superiority of the proposed model over the reference models. In addition, the significance of a feature selection technique providing improved accuracy was underlined. In terms of execution time, due to the complexity of the Bidirectional convolutional LSTM architecture, convergence and generalisation of the data took longer, resulting in the superiority of the reference models.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35648766 PMCID: PMC9159618 DOI: 10.1371/journal.pone.0269295
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Implemented algorithms and evaluation metrics extracted from the publications focused on the prediction of NO2 (*).
| Work | ML Algorithm | Evaluation Metric |
|---|---|---|
| [ | BRT, SVM, XGBoost, RF, GAM, Cubist | RMSE, ME, NRMSE, NME, POD, POF, R2 |
| [ | LSTM | RMSE, NSE, PBIAS, R |
| [ | LSTM | MSE |
| [ | MLR, MLPNN, ELM, OSMLR, OSELM | |
| [ | LSTM | RMSE, MAE |
| [ | ELM | RMSE, MAE, IA, R2 |
| [ | ANN | RMSE, R, NMB, NMSD, Rs, SD, SD′ |
| [ | SVM, M5P model trees, ANN | RMSE, NRMSE, PTA |
| [ | Cluster-based bagging | RMSE, R2, RMSEIQR |
| [ | MLP with hierarchical clustering, SOM and k-means clustering | RMSE, MAE, NRMSE, MBE, IA, R |
| [ | GAM, Bagging, RF, GBM, ANN, KRLS, SVR, Linear stepwise regression algorithms, Regularization or shrinkage algorithms | RMSE, R2, MSE-R2 |
| [ | Ensemble model with DRR | RMSE |
| [ | AIS-RNN (RNN, LSTM, GRU) | RMSE, MAE, MAPE |
| [ | SVM | RMSE, MAE, CWIA, RE |
| [ | RF partition model | MAPE, MADE, BIC, R2 |
| [ | SVM | RMSE, MAE, WIA |
| [ | LSTM | RMSE |
* ML Algorithms: BRT–Boosted Regression Trees, SVM–Support Vector Machine, XGBoost–EXtreme Gradient Boosting, RF–Random Forest, GAN–Generalized Additive Model, LSTM–Long Short Term Memory, ANN–Artificial Neural Network, GBM–Gradient Boosting Machines, KRLS–Kernel-based Regularized Least Squares, AIS–Adaptive Input Selection, RNN–Recurrent Neural Network, GRU–Gated Recurrent Unit, MLR–Multiple Linear Regression, MLPNN–Multi-layer Perceptron Neural Networks, ELM–Extreme Learning Machine, OSMLR–Online Sequential Multiple Linear Regression, OSELM–Online Sequential Extreme Learning Machine, SOM–Self-organizing Map, DRR–Discounted Ridge Regression; Evaluation Metrics: RMSE–Root Mean Squared Error, ME–Mean Error, NRMSE–Normalized Root Mean Squared Error, NME–Normalized Mean Error, POD–Probability of Detection, POF–Probability of False Alarm, R–Coefficient of Determination, NSE–Nash–Sutcliffe Efficiency Index, PBIAS–Percentage Bias, R–Pearson Correlation Coefficient, MSE–Mean Squared Error, MAE–Mean Absolute Error, IA–Index of Agreement, NMB–Normalised Mean Bias, R–Rank Correlation by Spearman, SD–Standard Deviation, PTA–Prediction Trend Accuracy, MBE–Mean Bias Error, MAPE–Mean Absolute Percentage Error, CWIA–Complementary Willmott’s Index of Agreement, RE–Relative Error, MADE–Mean Absolute Deviation Error, BIC–Bayesian Information Criterion, WIA–Willmott’s Index of Agreement.
Fig 1Air quality stations, meteorological stations, traffic measurement points and grid cells segments on the defined area of the city of Madrid (Map data © OpenStreetMap contributors, Microsoft, Esri Community Maps contributors, Map layer by Esri [50]).
Summary statistics of the periods January-June 2019 and January-June 2020 for each data type.
| Descriptors | January-June 2019 | January-June 2020 | |
|---|---|---|---|
| Nitrogen_dioxide | Mean (SD) | 36.69 (30.85) | 26.03 (25.35) |
| Median [Min,Max] | 27.0 [0.0, 328] | 17.0 [0.0, 326] | |
| Ultrav._rad. | Mean (SD) | 15.83 (30.27) | - |
| Median [Min,Max] | 1.0 [0.0, 199] | - | |
| Wind_speed | Mean (SD) | 1.41 (1.11) | 1.31 (1.05) |
| Median [Min,Max] | 1.14 [0.0, 8.75] | 1.05 [0.0, 8.97] | |
| Wind_direction | Mean (SD) | 167.80 (105.72) | 140.82 (98.35) |
| Median [Min,Max] | 182.0 [0.0, 359] | 135.0 [0.0, 359] | |
| Temperature | Mean (SD) | 13.38 (8.09) | 13.63 (7.6) |
| Median [Min,Max] | 12.5 [-55.0, 47.3] | 12.6 [-55.0, 44.6] | |
| Humidity | Mean (SD) | 48.73 (21.60) | 60.76 (22.77) |
| Median [Min,Max] | 47.0 [-25, 100] | 62.0 [-25, 100] | |
| Pressure | Mean (SD) | 943.3 (34.91) | 940.62 (63.28) |
| Median [Min,Max] | 945.0 [0.0, 962.0] | 945.0 [0.0, 1073.0] | |
| Solar_irradiance | Mean (SD) | 220.73 (301.06) | 191.95 (279.83) |
| Median [Min,Max] | 11.0 [0.0, 1103.0] | 9.0 [0.0, 1113.0] | |
| Precipitation | Mean (SD) | 0.03 (0.41) | 0.03 (0.27) |
| Median [Min,Max] | 0.0 [0.0, 30.4] | 0.0 [0.0, 13.5] | |
| Intensity | Count_non_zero | 885863 (59.98%) | 892197 (60.09%) |
| Mean (SD) | 245.69 (402.73) | 161.45 (313.33) | |
| Median [Min,Max] | 63.0 [0.0, 6348.0] | 34.19 [0.0, 6588.0] | |
| Occupation | Count_non_zero | 845031 (57.21%) | 822652 (55.41%) |
| Mean (SD) | 3.96 (6.36) | 2.57 (4.9) | |
| Median [Min,Max] | 0.95 [0.0, 100.0] | 0.42 [0.0, 99.0] | |
| Load | Count_non_zero | 881500 (59.68%) | 884950 (59.60%) |
| Mean (SD) | 11.65 (14.91) | 7.85 (11.75) | |
| Median [Min,Max] | 4.0 [0.0, 100.0] | 2.2 [0.0, 100.0] | |
| Average_speed | Count_non_zero | 233415 (15.8%) | 223052 (15.0%) |
| Mean (SD) | 4.39 (13.28) | 4.04 (12.96) | |
| Median [Min,Max] | 0.0 [0.0, 96.5] | 0.0 [-127.0, 127.0] |
Fig 2Heatmap showing spatial correlations of the 24 air quality monitoring stations.
Fig 3a) Autocorrelation and b) Partial autocorrelation plots with 80 lags from the NO2 dataset.
Fig 4The concentration of NO2 in weekdays dimension for the period January-June 2019.
Fig 5a) The architecture of a ConvLSTM cell [57] and b) Bidirectional ConvLSTM cell [12].
Fig 6The detailed workflow of the analysis.
Fig 7The feature importance scores based on mutual information.
The dimension of each sets.
| Set | Dimension ( |
|---|---|
| Training Set | 4344 × 340 × 16/13 |
| Validation Set | 2184 × 340 × 16/13 |
| Testing Set | 2184 × 340 × 16/13 |
* x—Number of samples; y—Number of grid cells (340 = 20×17); z1—Number of all features (NO2, wind speed, temperature, humidity, barometric pressure, solar irradiance, intensity, occupancy time, north, east, south, west, southwest, northeast, southeast, northwest), z2—Number of selected features (NO2, wind speed, barometric pressure, intensity, occupancy time, north, east, south, west, southwest, northeast, southeast, northwest). Note that features include wind directions after the implementation of One Hot Encoder.
Parameter optimisation with GridSearchCV.
| Parameters | Options |
|---|---|
| Number of Filters | 8, |
| Kernel Size | |
| Optimiser | RMSprop, |
| Merge Mode | ‘ |
| Number of Layers | 2, |
Prediction errors (RMSE, MAE) and runtime of the models for the next 6 hours prediction implemented on all features.
| Models | RMSE ( | MAE ( | Time |
|---|---|---|---|
| LSTM-FC | 38.89 | 32.17 | 4m15s |
| ConvLSTM | 32.95 | 32.04 | 33m15s |
| BiConvLSTM | 19.14 | 13.06 | 36m57s |
Prediction errors (RMSE, MAE) and runtime of the models for the next 6 hours prediction implemented on the selected features.
| Models | RMSE ( | MAE ( | Time |
|---|---|---|---|
| LSTM-FC | 15.68 | 13.54 | 3m58s |
| ConvLSTM | 15.11 | 11.9 | 27m53s |
| BiConvLSTM | 12.65 | 9.72 | 34m33s |