| Literature DB >> 35755176 |
Shahrokh Shahi1, Flavio H Fenton2, Elizabeth M Cherry1.
Abstract
In recent years, machine-learning techniques, particularly deep learning, have outperformed traditional time-series forecasting approaches in many contexts, including univariate and multivariate predictions. This study aims to investigate the capability of (i) gated recurrent neural networks, including long short-term memory (LSTM) and gated recurrent unit (GRU) networks, (ii) reservoir computing (RC) techniques, such as echo state networks (ESNs) and hybrid physics-informed ESNs, and (iii) the nonlinear vector autoregression (NVAR) approach, which has recently been introduced as the next generation RC, for the prediction of chaotic time series and to compare their performance in terms of accuracy, efficiency, and robustness. We apply the methods to predict time series obtained from two widely used chaotic benchmarks, the Mackey-Glass and Lorenz-63 models, as well as two other chaotic datasets representing a bursting neuron and the dynamics of the El Niño Southern Oscillation, and to one experimental dataset representing a time series of cardiac voltage with complex dynamics. We find that even though gated RNN techniques have been successful in forecasting time series generally, they can fall short in predicting chaotic time series for the methods, datasets, and ranges of hyperparameter values considered here. In contrast, for the chaotic datasets studied, we found that reservoir computing and NVAR techniques are more computationally efficient and offer more promise in long-term prediction of chaotic time series.Entities:
Keywords: Chaotic time series; Deep learning; Echo state networks; Nonlinear vector autoregression; Recurrent neural networks; Reservoir computing
Year: 2022 PMID: 35755176 PMCID: PMC9230140 DOI: 10.1016/j.mlwa.2022.100300
Source DB: PubMed Journal: Mach Learn Appl ISSN: 2666-8270
Fig. 1.Architectures of memory cells in gated recurrent neural networks. (a) Long short-term memory. (b) Gated recurrent unit.
Fig. 2.Components of reservoir computing approaches, including (a) the baseline ESN, (b) CESN, and (c) HESN. In these architectures, the input and output signals can be either univariate or multivariate time series.
Fig. 3.The Mackey–Glass time series. (a) Generated time series including unused pre-training data (gray), training data (blue), and testing (prediction) data (black). (b) Zoomed-in section corresponding to the shaded region in panel (a). (c) Mackey–Glass time series (solid) and the imperfect knowledge-based model (dashed). (d) Zoomed-in section corresponding to the shaded region in panel (c) demonstrating the difference between the generated time series and the imperfect knowledge-based model.
Fig. 4.The Lorenz system time series. (a) The unused pre-training data is shown in gray, and the training data are in purple, green, and blue colors indicating the x, y, and z variables, respectively. The testing data is in black. (b) Zoomed-in section corresponding to the shaded region in panel (a). (c) Lorenz system time series (solid) and the imperfect knowledge-based model (dashed). (d) Zoomed-in section corresponding to the shaded region in panel (c) demonstrating the difference between the generated time series and the imperfect knowledge-based model.
Fig. 5.The bursting Morris–Lecar time series. (a) The unused pre-training data is shown in gray, and the training data are in purple, green, and blue colors indicating the V, w, and u variables, respectively. The testing data is in black. (b) Zoomed-in section corresponding to the shaded region in panel (a). (c) The bursting Morris–Lecar time series (solid) and the imperfect knowledge-based model (dashed). (d) Zoomed-in section corresponding to the shaded region in panel (c) demonstrating the difference between the generated time series and the imperfect knowledge-based model.
Fig. 6.The ENSO time series. (a) The unused pre-training data is shown in gray, and the training data are in purple, green, and blue colors indicating the u, T, and T variables, respectively. The testing data is in black. (b) Zoomed-in section corresponding to the shaded region in panel (a). (c) The ENSO time series (solid) and the imperfect knowledge-based model (dashed). (d) Zoomed-in section corresponding to the shaded region in panel (c) demonstrating the difference between the generated time series and the imperfect knowledge-based model.
Fig. 7.Experimental cardiac voltage time series featuring irregular action potentials. (a) Voltage time series including unused pre-training data (gray), training data (blue), and testing data (black). (b) Zoomed-in section corresponding to the shaded region in panel (a). (c) Experimental cardiac action potential time series (solid) and the imperfect knowledge-based model (dashed)line. (d) Zoomed-in section corresponding to the shaded region in panel (c), where the difference between the generated time series and the imperfect knowledge-based model can be observed.
Hyperparameter values used for the grid search optimization for each prediction method. In the case of gated RNNs, the initial learning rate, maximum number of epochs, and regularization factor are the effective hyperparameters for training the networks by the Adam optimizer, while the number of layers determines the architecture of each network for a given number of hidden units. The dropout probability controls the dropping rate of dropout layers used as a regularization technique preventing overfitting. In RC techniques, i.e., ESN, CESN, and HESN, the input weight scale represents the scalar value multiplied by the jth input time series (1 ≤ j ≤ d) to adjust the magnitude of the input signal. Therefore, for univariate time series, e.g., the MG dataset, d is equal to 1 and the input weight scale is only one scalar, while for multivariate time series, e.g., the Lorenz dataset, σ consists of three scalar values that need to be tuned. Similarly, the HESN approach requires an additional set of weight scales for the knowledge-based model input(s), denoted by . The selected spectral radius ρ is used to scale the reservoir weight matrix in ESNs. The amount of excitation discarded by the leaky integrator neurons is specified by the leaking rate α. The sparsity of the reservoir graph is controlled by the connection probability pr, defined as the probability of connection between any two hidden units in the reservoir. Furthermore, the CESN approach requires an additional hyperparameter pr, which is the inter-cluster probability indicating the sparsity of the connections between each pair of sub-reservoirs. The regularization parameter λ determines the ridge regression regularization factor used to obtain the readout weights. In the NVAR approach, the hyperparameter skip (s) controls the number of steps skipped between each two entries of delay embedding vectors.
| Methods | Parameters | Values |
|---|---|---|
| Number of layers | {1, 2, 4} | |
| Initial learning rate | {0.001, 0.002, 0.005, 0.010, 0.050, 0.100} | |
| LSTM | Dropout probability | {0.00, 0.05, 0.10, 0.20, 0.50} |
| Max number of epochs | {5, 10, 15, 20, 30, 50, 100} | |
| {10−6, 10−5, 10−4, 10−30} | ||
|
| ||
| Number of layers | {1, 2, 4} | |
| Initial learning rate | {0.001, 0.002, 0.005, 0.010, 0.050, 0.100} | |
| GRU | Dropout probability | {0.00, 0.05, 0.10, 0.20, 0.50} |
| Max number of epochs | {5, 10, 15, 20, 30, 50, 100} | |
| {10−6, 10−5, 10−4, 10−30} | ||
|
| ||
| Input weight scale ( | {0.02, 0.05, 0.10, 0.20, 0.50, 0.80} | |
| Spectral radius ( | {0.80, 0.85, 0.90, 0.99, 1.05, 1.15, 1.25, 1.55} | |
| ESN | Leaking rate ( | {0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 1.00} |
| Regularization ( | {10−7, 10−6, 10−5, 10−4, 10−3, 10−2, 10−1} | |
| Connection probability ( | {0.01, 0.02, 0.05, 0.10, 0.15, 0.20} | |
|
| ||
| Number of clusters ( | {2, 3, 4, 5} | |
| Input weight scale ( | {0.02, 0.05, 0.10, 0.20, 0.50, 0.80} | |
| Spectral radius ( | {0.80, 0.85, 0.90, 0.99, 1.05, 1.15, 1.25, 1.55} | |
| CESN | Leaking rate ( | {0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 1.00} |
| Regularization ( | {10−7, 10 6, 10−5, 10−4, 10−3, 10−2, 10−1} | |
| Intra-cluster connection probability ( | {0.60, 0.7, 0.80, 0.85, 0.90, 0.95, 0.98} | |
| Inter-cluster connection probability ( | {0.01, 0.02, 0.05, 0.10, 0.15, 0.20} | |
|
| ||
| Input weight scale ( | {0.02, 0.05, 0.10, 0.20, 0.50, 0.80} | |
| Knowledge based input weight scale ( | {0.02, 0.05, 0.10, 0.20, 0.50, 0.80} | |
| HESN | Spectral radius ( | {0.80, 0.85, 0.90, 0.99, 1.05, 1.15, 1.25, 1.55} |
| Leaking rate ( | {0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 1.00} | |
| Regularization ( | {10−7, 10−6, 10−5, 10−4, 10−3, 10−2, 10−1} | |
| Connection probability ( | {0.01, 0.02, 0.05, 0.10, 0.15, 0.20} | |
|
| ||
| NVAR | Skip ( | {2, 3, 4, 5, 6, 7, 8, 10, 15, 20, 25, 30} |
| Regularization ( | {10−7, 10−6, 10−5, 10−4, 10−3, 10−2, 10−1} | |
Fig. 8.Components of reservoir computing approaches for modeling cardiac action potential time series, including (a) the baseline ESN, (b) CESN, and (c) HESN.
Fig. 9.Mackey–Glass dataset forecasting results obtained by the six methods using a fixed network size of 100 neurons for the gated RNNs and ESN models and a computationally equivalent delay size for NVAR approach. The reference test data are shown in black and the predictions in color. Absolute errors of the predictions are presented in the bottom subplot, with color corresponding to each prediction method.
Fig. 10.Comparison of RMSE (top) and computational time (bottom) for each method and network size tested for the Mackey–Glass dataset.
Fig. 11.Lorenz system dataset forecasting results obtained by the six methods using a fixed network size of 100 neurons for the gated RNNs and ESN models and an equivalent delay size for the NVAR approach. The reference test data are shown in black and the predictions in color. Absolute errors of the predictions are presented in the bottom subplot, with color corresponding to each prediction method. Note that the reported error is the mean absolute error over the three input time series.
Fig. 12.Comparison of RMSE (top) and computational time (bottom) for each method and network size tested for the Lorenz system dataset.
Fig. 13.Morris–Lecar dataset forecasting results obtained by the six methods using a fixed network size of 100 neurons for the gated RNNs and ESN models and an equivalent delay size for the NVAR approach. The reference test data are shown in black and the predictions in color. Absolute errors of the predictions are presented in the bottom subplot, with color corresponding to each prediction method. Note that the reported error is the mean absolute error over the three input time series.
Fig. 14.Comparison of RMSE (top) and computational time (bottom) for each method and network size tested for the Morris–Lecar dataset.
Fig. 15.ENSO dataset forecasting results obtained by the six methods using a fixed network size of 100 neurons for the gated RNNs and ESN models and an equivalent delay size for the NVAR approach. The reference test data are shown in black and the predictions in color. Absolute errors of the predictions are presented in the bottom subplot, with color corresponding to each prediction method. Note that the reported error is the mean absolute error over the three input time series.
Fig. 16.Comparison of RMSE (top) and computational time (bottom) for each method and network size tested for the ENSO dataset.
Fig. 17.Experimental dataset forecasting results obtained by the six methods using a fixed network size of 100 neurons for the gated RNNs and ESN models and an equivalent delay size for NVAR approach. The reference test data are shown in black and the predictions in color. Absolute errors of the predictions are presented in the bottom subplot, with color corresponding to each prediction method.
Fig. 18.Experimental dataset APD forecasting results obtained by the six methods using a fixed network size of 100 neurons for the gated RNNs and ESN models and an equivalent delay size for NVAR approach. The reference APD values are shown in black and the predicted values in color.
Fig. 19.Comparison of RMSE (top) and computational time (bottom) for each method and network size tested for the experimental dataset.
Fig. 20.Experimental dataset action potential forecasting results obtained by the NVAR method using larger values for delays and skipped steps showing the reference test data (black) and the prediction results (red).
Fig. 21.Experimental dataset APD forecasting results obtained by the NVAR method using larger values for delays and skipped steps. The reference APD values are shown in black and the predicted values in red.