| Literature DB >> 35161585 |
Lei Huang1,2, Feng Mao1,2, Kai Zhang1,3, Zhiheng Li1.
Abstract
Multivariate time series forecasting has long been a research hotspot because of its wide range of application scenarios. However, the dynamics and multiple patterns of spatiotemporal dependencies make this problem challenging. Most existing methods suffer from two major shortcomings: (1) They ignore the local context semantics when modeling temporal dependencies. (2) They lack the ability to capture the spatial dependencies of multiple patterns. To tackle such issues, we propose a novel Transformer-based model for multivariate time series forecasting, called the spatial-temporal convolutional Transformer network (STCTN). STCTN mainly consists of two novel attention mechanisms to respectively model temporal and spatial dependencies. Local-range convolutional attention mechanism is proposed in STCTN to simultaneously focus on both global and local context temporal dependencies at the sequence level, which addresses the first shortcoming. Group-range convolutional attention mechanism is designed to model multiple spatial dependency patterns at graph level, as well as reduce the computation and memory complexity, which addresses the second shortcoming. Continuous positional encoding is proposed to link the historical observations and predicted future values in positional encoding, which also improves the forecasting performance. Extensive experiments on six real-world datasets show that the proposed STCTN outperforms the start-of-the-art methods and is more robust to nonsmooth time series data.Entities:
Keywords: attention mechanism; convolutional Transformer; multivariate time series forecasting; spatiotemporal
Mesh:
Year: 2022 PMID: 35161585 PMCID: PMC8838990 DOI: 10.3390/s22030841
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The framework of STCTN. “” denotes the elementwise addition. For simplicity, we hide all the residual connections, layer regularizations, and fully connected feed-forward networks that are similar to the standard Transformer [35].
Figure 2The Local-Range Convolutional Attention. represent query matrix, key matrix, value matrix, respectively.
Figure 3The Group-Range Convolutional Attention.
Dataset description and statistics.
| Datasets | #Timesteps | #Nodes | Sample Rate | Start Time | Input Steps | Predict Steps | |
|---|---|---|---|---|---|---|---|
| With priori graph topology | PEMS03 | 26,208 | 358 | 5 min | 9 January 2018 | 12 | 12 |
| PEMS04 | 16,992 | 307 | 5 min | 1 January 2018 | 12 | 12 | |
| PEMS07 | 28,224 | 883 | 5 min | 5 January 2017 | 12 | 12 | |
| PEMS08 | 17,856 | 170 | 5 min | 7 January 2017 | 12 | 12 | |
| Without priori graph topology | Electricity | 26,304 | 321 | 1 h | 1 January 2012 | 24 | 12 |
| Traffic | 17,554 | 862 | 1 h | 1 January 2015 | 24 | 12 |
Performance comparison of different approaches on datasets with priori graph topology.
| Methods | VAR | FC-LSTM | DCRNN | STGCN | ASTGCN | Graph WaveNet | MTGNN | Transformer | STCTN | |
|---|---|---|---|---|---|---|---|---|---|---|
| Datasets | Metrics | |||||||||
| PEMS03 | MAE | 23.65 | 21.16 | 18.18 | 17.49 | 17.69 | 19.85 | 17.79 | 20.01 |
|
| MAPE (%) | 24.51 | 23.33 | 18.91 | 17.15 | 19.40 | 19.31 | 18.84 | 23.12 |
| |
| RMSE | 38.26 | 35.11 | 30.31 | 30.12 | 29.66 | 32.94 | 28.75 | 30.01 |
| |
| PEMS04 | MAE | 23.75 | 27.14 | 24.70 | 22.70 | 22.93 | 25.45 | 23.31 | 24.06 |
|
| MAPE (%) | 18.09 | 18.20 | 17.12 |
| 16.56 | 17.29 | 17.89 | 17.25 | 15.21 | |
| RMSE | 36.66 | 41.59 | 38.12 | 35.55 | 35.22 | 39.70 | 36.07 | 37.66 |
| |
| PEMS07 | MAE | 75.63 | 29.98 | 25.30 | 25.38 | 28.05 | 26.85 | 25.28 | 28.07 |
|
| MAPE (%) | 32.22 | 13.20 | 11.66 | 11.08 | 13.92 | 12.12 | 12.48 | 14.13 |
| |
| RMSE | 115.24 | 45.94 | 38.58 | 38.78 | 42.57 | 42.78 | 38.91 | 41.42 |
| |
| PEMS08 | MAE | 23.46 | 22.20 | 17.86 | 18.02 | 18.61 | 19.13 | 17.96 | 18.93 |
|
| MAPE (%) | 15.42 | 14.20 | 11.45 | 11.40 | 13.08 | 12.68 | 12.03 | 13.69 |
| |
| RMSE | 36.33 | 34.06 | 27.83 | 27.83 | 28.16 | 31.05 | 27.76 | 28.11 |
| |
Performance comparison of different approaches on datasets without priori graph topology.
| Models | Horizon 3 | Horizon 6 | Horizon 12 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MAE | MAPE | RMSE | MAE | MAPE | RMSE | MAE | MAPE | RMSE | ||
| Traffic | VAR | 1.63 | 73.34 | 3.17 | 1.84 | 77.51 | 3.51 | 1.95 | 78.36 | 3.69 |
| FC-LSTM | 1.68 | 46.52 | 4.01 | 1.71 | 47.81 | 4.05 | 175 | 52.25 | 4.02 | |
| Graph WaveNet | 1.77 | 60.49 | 3.90 | 1.99 | 69.08 | 4.56 | 1.82 | 60.56 | 4.05 | |
| N-BEATS | 1.24 | 38.24 | 3.00 | 1.39 | 49.42 | 3.23 | 1.46 | 44.32 | 3.40 | |
| MTGNN | 1.29 | 47.28 | 3.01 | 1.34 | 50.82 | 3.15 | 1.43 | 45.46 | 3.23 | |
| Transformer | 1.62 | 47.53 | 3.79 | 1.69 | 50.56 | 3.85 | 1.73 | 52.03 | 3.97 | |
| Informer | 1.38 | 43.84 | 3.38 | 1.59 | 45.89 | 3.42 | 1.65 | 47.25 | 3.59 | |
|
|
|
|
|
|
|
|
|
|
| |
| Electricity | VAR | 5.96 | 19.72 | 8.65 | 8.58 | 28.39 | 12.02 | 8.97 | 33.78 | 13.22 |
| FC-LSTM | 7.03 | 19.34 | 9.91 | 6.99 | 19.45 | 9.86 | 7.16 | 24.18 | 10.08 | |
| Graph WaveNet | 4.71 | 14.67 | 7.22 | 6.32 | 21.48 | 9.17 | 5.04 | 16.50 | 8.08 | |
| N-BEATS | 3.41 | 10.12 | 5.66 | 3.73 | 11.43 | 6.38 | 3.90 | 12.55 | 7.18 | |
| MTGNN | 3.20 | 10.19 | 5.23 | 3.55 | 11.20 | 6.11 | 3.81 | 12.69 | 6.55 | |
| Transformer | 4.85 | 15.32 | 8.43 | 5.32 | 17.69 | 9.25 | 5.98 | 19.37 | 9.91 | |
| Informer | 4.05 | 13.97 | 7.82 | 4.45 | 14.09 | 8.22 | 5.07 | 16.26 | 8.56 | |
|
|
|
|
|
|
|
|
|
|
| |
Figure 4The forecasting results of 12 steps on different datasets. (a) PEMS08; (b) Electricity.
Figure 5The long-term forecasting results (12 h, 144 steps). (a) STCTN (ours); (b) N-BEATS; (c) Graph WaveNet; (d) MTGNN.
Figure 6The long-term forecasting results (one day, 288 steps). (a) STCTN (ours); (b) N-BEATS; (c) Graph WaveNet; (d) MTGNN.
Ablation study on PEMS08 dataset.
| Methods | w/o CPE | w/o LCA | w/o GCA | STCTN |
|---|---|---|---|---|
| MAE | 17.61 | 18.77 | 18.08 | 17.15 |
| MAPE (%) | 11.47 | 12.16 | 12.84 | 10.93 |
| RMSE | 27.57 | 28.90 | 28.44 | 26.63 |
Figure 7Errors in each prediction step of STCTN and the three variants. (a) MAE; (b) MAPE; (c) RMSE.
Figure 8Parameter study. (a) Stacked layers; (b) group size; (c) model channels.