| Literature DB >> 35730045 |
Hongjing Bi1, Lilei Lu1, Yizhen Meng1.
Abstract
Multivariate time series long-term forecasting has always been the subject of research in various fields such as economics, finance, and traffic. In recent years, attention-based recurrent neural networks (RNNs) have received attention due to their ability of reducing error accumulation. However, the existing attention-based RNNs fail to eliminate the negative influence of irrelevant factors on prediction, and ignore the conflict between exogenous factors and target factor. To tackle these problems, we propose a novel Hierarchical Attention Network (HANet) for multivariate time series long-term forecasting. At first, HANet designs a factor-aware attention network (FAN) and uses it as the first component of the encoder. FAN weakens the negative impact of irrelevant exogenous factors on predictions by assigning small weights to them. Then HANet proposes a multi-modal fusion network (MFN) as the second component of the encoder. MFN employs a specially designed multi-modal fusion gate to adaptively select how much information about the expression of current time come from target and exogenous factors. Experiments on two real-world datasets reveal that HANet not only outperforms state-of-the-art methods, but also provides interpretability for prediction.Entities:
Keywords: Deep neural network; Hierarchical attention; Long-term forecasting; Multi-modal fusion; Multivariate time series
Year: 2022 PMID: 35730045 PMCID: PMC9204070 DOI: 10.1007/s10489-022-03825-5
Source DB: PubMed Journal: Appl Intell (Dordr) ISSN: 0924-669X Impact factor: 5.019
Fig. 1A graphical illustration of Hierarchical Attention Network (HANet). The encoder of HANet consists of three components, i.e., factor-aware attention network (FAN), multi-modal fusion network (MFN), and LSTM. Here, is k-th exogenous series within the length of window size T. y is the target factor at time step t. is the high-level semantic representation of , where = {}∈ℝ is a vector with n exogenous factors at time t. MFN is the multi-modal fusion network, which generates a hidden representation z by fusing target factor y and hidden state . d is the hidden state of decoder unit i, co is the context vector generated by temporal attention. is the predicted value
Fig. 2Temporal attention
Performance of different methods compared in Chlorophyll dataset for different predictive horizons
| Methods | ||||||||
|---|---|---|---|---|---|---|---|---|
| MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | |
| Seq2Seq | 0.4823 | 0.7764 | 0.6439 | 0.9819 | 0.7745 | 1.1408 | 0.9086 | 1.2646 |
| DA-RNN | 0.4853 | 0.6659 | 0.6297 | 0.8388 | 0.6183 | 0.7855 | 0.8285 | 1.1767 |
| GED | 0.4267 | 0.6011 | 0.4977 | 0.7035 | 0.7663 | 0.8845 | 1.1904 | |
| HCA-GRU | 0.4478 | 0.6482 | 0.5082 | 0.7083 | 0.5560 | 0.7805 | 0.7038 | 1.1523 |
| DSTP-RNN | 0.4311 | 0.6121 | 0.5874 | 0.7425 | 0.5673 | 0.8113 | 0.7221 | 1.1452 |
| STANet | 0.4107 | 0.5823 | 0.5077 | 0.6988 | 0.5782 | 0.7694 | 0.7577 | 1.1232 |
| MsANet | 0.4415 | 0.6333 | 0.4986 | 0.7056 | 0.5604 | 0.7739 | 0.7346 | 1.1656 |
| HANet | 0.5565 | |||||||
Performance of different methods compared in PM 2.5 dataset for different predictive horizons
| Methods | ||||||||
|---|---|---|---|---|---|---|---|---|
| MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | |
| Seq2Seq | 14.3048 | 24.7627 | 29.0523 | 48.6893 | 45.4879 | 66.6200 | 58.7161 | 81.0068 |
| DA-RNN | 12.0539 | 21.3099 | 26.3209 | 40.0097 | 37.0001 | 51.2154 | 88.6673 | |
| GED | 13.6901 | 22.6873 | 25.4548 | 40.3872 | 39.8709 | 62.1063 | 56.3646 | 78.0019 |
| HCA-GRU | 11.1151 | 18.9156 | 25.1317 | 39.2720 | 37.9419 | 55.9917 | 50.0695 | |
| DSTP-RNN | 11.9881 | 20.4029 | 26.7392 | 42.0621 | 38.4374 | 56.7373 | 50.3837 | 71.3543 |
| STANet | 12.0001 | 19.6192 | 27.2872 | 49.8296 | 38.6140 | 58.6099 | 53.2927 | 74.0792 |
| MsANet | 13.0729 | 20.2910 | 25.1451 | 39.2832 | 37.7276 | 56.5410 | 50.3315 | 73.7386 |
| HANet | 54.8148 | 71.795 | ||||||
Fig. 3Performance comparisons among different methods and different datasets when horizon is 24
Ablation experiments in Chlorophyll dataset for different predictive horizons
| Methods | ||||||||
|---|---|---|---|---|---|---|---|---|
| MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | |
| Seq2Seq | 0.4823 | 0.7764 | 0.6439 | 0.9819 | 0.7745 | 1.1408 | 0.9086 | 1.2646 |
| GED | 0.4267 | 0.6011 | 0.4977 | 0.7035 | 0.7663 | 0.8845 | 1.1904 | |
| HA-LSTM | 0.4112 | 0.6065 | 0.4961 | 0.7440 | 0.5684 | 0.7622 | 0.7070 | 1.1314 |
| HANet | 0.5565 | |||||||
Ablation experiments in PM 2.5 dataset for different predictive horizons
| Methods | ||||||||
|---|---|---|---|---|---|---|---|---|
| MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | |
| Seq2Seq | 14.3048 | 24.7627 | 29.0523 | 48.6893 | 45.4879 | 66.6200 | 58.7161 | 81.0068 |
| GED | 13.6901 | 22.6873 | 25.4548 | 40.3872 | 39.8709 | 62.1063 | 56.3646 | 78.0019 |
| HA-LSTM | 12.4656 | 21.8553 | 25.2919 | 40.7350 | 37.0144 | 55.4472 | 51.5809 | 72.2052 |
| HANet | 54.8148 | |||||||
Fig. 4Weight distribution of factor-aware attention
Paired 2-tailed t-tests with HANet. Confidence level = 0.05
| Methods | PM2.5 | Chlorophyll | ||||
|---|---|---|---|---|---|---|
| p value | t-statistic | avg.RMSE | p value | t-statistic | avg.RMSE | |
| Seq2Seq | 0.0000 | −4.5627 | 55.2697 | 0.0002 | −5.8847 | 1.0409 |
| DA-RNN | 0.0053 | −2.7922 | 51.1163 | 0.0005 | −3.0309 | 0.8667 |
| GED | 0.0004 | −3.5140 | 50.7957 | 0.0000 | −4.6342 | 0.8153 |
| HCA-GRU | 0.0498 | −1.9619 | 46.1631 | 0.0128 | −2.4871 | 0.8223 |
| DSTP-RNN | 0.0432 | −2.0220 | 47.6367 | 0.1268 | −1.5271 | 0.8277 |
| STANet | 0.0030 | −3.1048 | 50.5345 | 0.0010 | −3.2949 | 0.7934 |
| MsAnet | 0.0024 | −3.0429 | 47.4635 | 0.0272 | −2.5780 | 0.8196 |
| HA-LSTM | 0.0053 | −2.7877 | 47.5607 | 0.1064 | −1.6150 | 0.8410 |
| HANet | – | – | 45.9319 | – | – | 0.7813 |