| Literature DB >> 35327846 |
Xue-Bo Jin1,2, Wen-Tao Gong1,2, Jian-Lei Kong1,2, Yu-Ting Bai1,2, Ting-Li Su1,2.
Abstract
Compared with mechanism-based modeling methods, data-driven modeling based on big data has become a popular research field in recent years because of its applicability. However, it is not always better to have more data when building a forecasting model in practical areas. Due to the noise and conflict, redundancy, and inconsistency of big time-series data, the forecasting accuracy may reduce on the contrary. This paper proposes a deep network by selecting and understanding data to improve performance. Firstly, a data self-screening layer (DSSL) with a maximal information distance coefficient (MIDC) is designed to filter input data with high correlation and low redundancy; then, a variational Bayesian gated recurrent unit (VBGRU) is used to improve the anti-noise ability and robustness of the model. Beijing's air quality and meteorological data are conducted in a verification experiment of 24 h PM2.5 concentration forecasting, proving that the proposed model is superior to other models in accuracy.Entities:
Keywords: data self-screening layer; gated recurrent unit; maximal information distance coefficient; time-series data forecast; variational inference
Year: 2022 PMID: 35327846 PMCID: PMC8947458 DOI: 10.3390/e24030335
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
List of abbreviations.
| Full Name | Abbreviation |
|---|---|
| Data self-screening layer | DSSL |
| Variational Bayesian gated recurrent unit | VBGRU |
| Maximal information distance coefficient | MIDC |
| Maximal information coefficient | MIC |
| Distance entropy | DE |
| Gaussian process regression | GPR |
| Kullback–Leibler | KL |
| Long short-term memory network | LSTM |
| Gated recurrent unit | GRU |
| Convolutional long short-term memory network | ConvLSTM |
| Convolutional neural network-long short-term memory network | CNN-LSTM |
| Time convolutional network | TCN |
| Root mean square error | RMSE |
| Mean square error | MSE |
| Mean absolute error | MAE |
Figure 1Deep Bayesian prediction network model framework with data self-screening layer.
Figure 2DSSL calculation flow chart.
Figure 3MIDC with different parameters.
Figure 4VBGRU structure.
Figure 5Schematic diagram of the weights of the variational Bayesian network.
Figure 6Distribution of PM2.5 in Beijing.
Figure 7PM2.5 changes in Beijing Wanshou West Palace and Yongdingmen Observatory.
Figure 8MIC results between different air quality variables in the Guanyuan area.
Forecast results of PM2.5 in the Guanyuan combined with air quality factors.
| The Input Data | MIC | MIDC | RMSE | MSE | MAE | Train_Time |
|---|---|---|---|---|---|---|
| PM2.5, AQI | 0.76 | 0.26 | 28.87 | 833.48 | 20.29 | 48.44 s |
| PM2.5, CO | 0.57 | 0.91 | 28.66 | 821.18 | 20.02 | 44.81 s |
Figure 9Part of the prediction results of each model.
Figure 10MAE, RMSE, and MSE of different models.
Evaluation results of different models on the same data set.
| Models | RMSE | MSE | MAE | Train_Time |
|---|---|---|---|---|
| CNN-LSTM [ | 29.76 | 886.45 | 20.51 | 69.11 s |
| LSTM [ | 30.66 | 942.24 | 20.60 | 36.73 s |
| GRU [ | 30.13 | 911.26 | 20.28 | 39.97 s |
| ConvLSTM [ | 31.45 | 990.17 | 21.61 | 78.26 s |
| TCN [ | 35.05 | 1233.42 | 24.43 | 119.54 s |
| Our proposed VBGRU | 28.59 | 817.12 | 19.78 | 44.81 s |
Figure 11Violin plot of 10 cross-validation results for different models.