| Literature DB >> 35017569 |
Xiaomei Sun1,2,3,4, Haiou Zhang1,2,3,4, Jian Wang1,2,3,4, Chendi Shi1,2,3,4, Dongwen Hua1,2,3,4, Juan Li5,6,7,8.
Abstract
Reliable and accurate streamflow forecasting plays a vital role in the optimal management of water resources. To improve the stability and accuracy of streamflow forecasting, a hybrid decomposition-ensemble model named VMD-LSTM-GBRT, which is sensitive to sampling, noise and long historical changes of streamflow, was established. The variational mode decomposition (VMD) algorithm was first applied to extract features, which were then learned by several long short-term memory (LSTM) networks. Simultaneously, an ensemble tree, a gradient boosting tree for regression (GBRT), was trained to model the relationships between the extracted features and the original streamflow. The outputs of these LSTMs were finally reconstructed by the GBRT model to obtain the forecasting streamflow results. A historical daily streamflow series (from 1/1/1997 to 31/12/2014) for Yangxian station, Han River, China, was investigated by the proposed model. VMD-LSTM-GBRT was compared with respect to three aspects: (1) feature extraction algorithm; ensemble empirical mode decomposition (EEMD) was used. (2) Feature learning techniques; deep neural networks (DNNs) and support vector machines for regression (SVRs) were exploited. (3) Ensemble strategy; the summation strategy was used. The results indicate that the VMD-LSTM-GBRT model overwhelms all other peer models in terms of the root mean square error (RMSE = 36.3692), determination coefficient (R2 = 0.9890), mean absolute error (MAE = 9.5246) and peak percentage threshold statistics (PPTS(5) = 0.0391%). The addressed approach based on the memory of long historical changes with deep feature representations had good stability and high prediction precision.Entities:
Mesh:
Year: 2022 PMID: 35017569 PMCID: PMC8752851 DOI: 10.1038/s41598-021-03725-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(a) Chain-like structure of the RNN. Because of the connections between hidden units, information can be passed from one time step to the next. (b) A graphical representation of the LSTM recurrent network with the memory cell block.
Figure 2Application of the proposed model VMD-LSTM-GBRT.
Figure 3Location of the Yangxian hydrological station.
Figure 4Daily Streamflow of the Yangxian station from 1/1/1997 to 31/12/2014.
Formulas for error analysis criteria.
| Error analysis criteria | Definition |
|---|---|
| Root mean square error | |
| Mean absolute error | |
| Determination coefficient | |
| Peak percentage threshold statistics (%) |
N is the number of samples, is the original series, is the average of the original series and is the predicted series.
Figure 5VMD decomposition results: (a) the decomposition sequence waveform and (b) the frequency spectrum representation.
Figure 6Schematic diagram of the center frequency aliasing of the last IMF: (a) the last two sequence waveforms and (b) the frequency spectrum representations. The area surrounded by the red rectangular border indicates the aliasing.
Figure 7PACFs of subseries of daily streamflow during the period 1997/01/01 to 2014/12/31 for the Yangxian hydrological station.
Figure 8PPTS(5) of different LSTM structures for predicting streamflow of IMF1 during the training and development period: (a) line chart plots of PPTS(5) for different hidden layers and (b) boxplots of the optimal structure of each hidden layer.
Figure 9Forecasting result of sub-series during the testing period.
Results of evaluation criteria with different hidden layers and hidden units for sub-sequences.
| Sequence | Hidden layers | Hidden units | Training | Developing | ||
|---|---|---|---|---|---|---|
| PPTS(5) (%) | R2 | PPTS(5) (%) | R2 | |||
| IMF1 | 1 | 15 | 0.0025 | 1.0000 | 0.0024 | 0.9999 |
| IMF2 | 2 | 22 | 0.0396 | 0.9999 | 0.0227 | 0.9999 |
| IMF3 | 1 | 16 | 0.2684 | 0.9999 | 0.2163 | 0.9997 |
| IMF4 | 3 | 20 | 0.7646 | 0.9997 | 0.7156 | 0.9966 |
| IMF5 | 4 | 20 | 0.8728 | 0.9993 | 0.2523 | 0.9983 |
| IMF6 | 2 | 21 | 0.5031 | 0.9986 | 1.9367 | 0.9960 |
| IMF7 | 2 | 23 | 1.2368 | 0.9973 | 0.4885 | 0.9943 |
| IMF8 | 5 | 20 | 1.8371 | 0.9952 | 4.4562 | 0.9832 |
| IMF9 | 1 | 25 | 1.5380 | 0.9972 | 2.1116 | 0.9897 |
Figure 10Comparison of prediction results of the test dataset using different ensemble techniques, GBRT and summation: (a) plots of the prediction results and records, (b) scatters for the test set (15/3/2013–31/12/2014), and (c) plots of the prediction results for the period 07/09/2014–28/09/2014.
Figure 11Comparison of prediction results of the test dataset using the different decomposition algorithms VMD and EEMD: (a) forecasting results, (b) scatters for the testing data (15/3/2013–31/12/2014), and (c) plots of the prediction results for the period 07/09/2014–28/09/2014.
Figure 12Comparison of forecasting results of the test dataset using the different feature learning models LSTM, DNN and SVR: (a) forecasting results, (b) scatters for the testing data (15/3/2013–31/12/2014), and (c) plots of the forecasting results during the period 07/09/2014–28/09/2014.
Comparison of the forecasting performances using different models.
| Model | Performance criteria | |||
|---|---|---|---|---|
| RMSE | R2 | MAE | PPTS(5) (%) | |
| VMD-LSTM-GBRT | 36.3692 | 0.9890 | 9.5246 | 0.0391 |
| VMD-LSTM-SUM | 67.8297 | 0.9619 | 27.8412 | 0.1500 |
| EEMD-LSTM-GBRT | 87.4506 | 0.9366 | 22.0321 | 0.0883 |
| VMD-DNN-GBRT | 44.9735 | 0.9832 | 12.1853 | 0.0451 |
| VMD-SVR-GBRT | 47.0555 | 0.9816 | 12.8919 | 0.0472 |
| Linear regression | 224.5310 | 0.5820 | 69.0201 | 0.2740 |
| Multilayer perceptron | 225.1806 | 0.5796 | 64.2869 | 0.1858 |
Figure 13The forecasting results of the test dataset using VMD-LSTM-SUM: (a) forecasting results, (b) scatters for the testing data (15/3/2013–31/12/2014), and (c) plots of the forecasting results during the period 07/09/2014–28/09/2014.
The forecasting performances using the proposed model VMD-LSTM-SUM on the Huaxian hydrological station.
| Model | Performance criteria | |||
|---|---|---|---|---|
| RMSE | R2 | MAE | PPTS(5) (%) | |
| VMD-LSTM-SUM | 57.5052 | 0.9754 | 22.5159 | 0.0725 |
| Linear regression | 148.0834 | 0.8363 | 43.1553 | 0.0951 |
| Multilayer perceptron | 141.5619 | 0.8505 | 40.7676 | 0.0931 |