| Literature DB >> 34151290 |
Mahtab Mohtasham Khani1, Sahand Vahidnia2, Alireza Abbasi2.
Abstract
The spread of COVID-19 has had a devastating impact on the world economy, international trade relations, and globalization. As this pandemic advances and new potential pandemics are on the horizon, a precise analysis of recent fluctuations of trade becomes necessary for international decisions and controlling the world in a similar crisis. The COVID-19 pandemic made a new pattern of trade in the world and affected how businesses work and trade with each other. It means that every potential pandemic or any unprecedented event in the world can change the market rules. This research develops a novel model to have a proper estimation of the stock market values with respect to the COVID-19 dataset using long short-term memory networks (LSTM). The goal of this study is to establish a model that can predict near future regarding the variable set of features. The nature of the features in each pandemic is completely different; therefore, prediction results for a pandemic by a specific model cannot be applied to other pandemics. Hence, recognizing and extracting the features which affect the pandemic is pivotal. In this study, we develop a framework that provides a better understanding of the features and feature selection process. Although the global impacts of COVID-19 are complicated, we are trying to show how additional features like COVID-19 cases can help to forecast in a real-world scenario, rather than relying solely on the history of tickers, which is used conventionally for prediction. This study is based on a preliminary analysis of features such as COVID-19 cases and other market tickers for enhancing forecasting models' performance against fluctuations in the market. Our predictors are based on the market value data and COVID-19 pandemic daily time-series data (i.e. the number of new cases). In this study, we selected Gold price as a base for our forecasting task which can be replaced by any other markets. We have applied Convolutional Neural Networks (CNN) LSTM, vector sequence output LSTM, Bidirectional LSTM, and encoder-decoder LSTM on the dataset. The results of the vector sequence output LSTM achieved an MSE of 6.0 e - 4 , 8.0 e - 4 , and 2.0 e - 3 on the validation set respectfully for 1 day, 2 days, and 30 days predictions in advance which are outperforming other proposed method in the literature.Entities:
Keywords: COVID-19; Deep learning; Economy; Time series
Year: 2021 PMID: 34151290 PMCID: PMC8196294 DOI: 10.1007/s42979-021-00724-3
Source DB: PubMed Journal: SN Comput Sci ISSN: 2661-8907
Fig. 1LSTM internal architecture which consist of: for the output from last LSTM unit, for memory of the last LSTM unit and for candidates of cell state at time t Two s to represent non-linearity as a sigmoid layers, : represent layer Vector operations: X expresses scaling the information, and expresses adding information
Fig. 2LSTM network architecture. X is input, Y is output, with stacked LSTM cells
Table of results
| # | Steps | Variable | Method | History | Model | RMSE | MSE | MAE | MSLE | R2 | ACC |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 01 | 1 | Multi | SE | 5 | L100-L100-d100 | 0.0251 | 0.00063 | 0.01774 | 0.00018 | 0.85759 | 0.97803 |
| 02 | 1 | Multi | SE | 22 | L100-L100-d100 | 0.0290 | 0.00084 | 0.02168 | 0.00024 | 0.76198 | 0.97438 |
| 03 | 1 | Multi | SE | 30 | L100-L100-d100 | 0.0246 | 0.00060 | 0.02072 | 0.00017 | 0.79249 | 0.97592 |
| 04 | 1 | Multi | SE | 9 | L100-L100-drop-d100-d1 | 0.0781 | 0.00067 | 0.02066 | 0.00020 | 0.82069 | 0.94616 |
| 05 | 1 | Multi | SE | 30 | L300-L300-d100-d1 | 0.0900 | 0.00053 | 0.01870 | 0.00016 | 0.80063 | 0.91020 |
| 06 | 1 | COVID less | SE | 30 | L300-L300-d100-d1 | 0.0020 | 0.00197 | 0.03611 | 0.00059 | 0.50048 | 0.93532 |
| 07 | 2 | Multi | SE | 5 | L200-L200-drop-d2 | 0.0439 | 0.00192 | 0.03294 | 0.00057 | 0.54020 | 0.95891 |
| 08 | 2 | Multi | SE | 15 | L200-L200-drop-d2 | 0.0426 | 0.00174 | 0.03738 | 0.00052 | 0.58592 | 0.95340 |
| 09 | 2 | Multi | SE | 22 | L200-L200-drop-d2 | 0.0311 | 0.00096 | 0.02539 | 0.00028 | 0.70735 | 0.97065 |
| 10 | 2 | Multi | SE | 22 | L200-L200-drop-d2 | 0.0283 | 0.00080 | 0.02334 | 0.00024 | 0.75998 | 0.97118 |
| 11 | 2 | Uni | SE | 22 | L200-L200-drop-d2 | 0.0983 | 0.00479 | 0.09456 | 0.00274 | 0.51678 | 0.87320 |
| 12 | 2 | COVID less | SE | 22 | L200-L200-drop-d2 | 0.0287 | 0.00082 | 0.02410 | 0.00024 | 0.75448 | 0.97081 |
| 13 | 2 | Multi | SE | 30 | L200-L200-drop-d2 | 0.0414 | 0.00171 | 0.03342 | 0.00048 | 0.36443 | 0.96104 |
| 14 | 2 | Multi | SE | 30 | L200-L300-drop-d2 | 0.0423 | 0.00178 | 0.02961 | 0.00050 | 0.34163 | 0.96603 |
| 15 | 2 | Multi | E-D | 5 | L200- | 0.0525 | 0.00275 | 0.04250 | 0.00082 | 0.32775 | 0.94633 |
| 16 | 2 | Multi | E-D | 15 | L200- | 0.0303 | 0.00091 | 0.02290 | 0.00027 | 0.78431 | 0.97180 |
| 17 | 2 | COVID less | E-D | 15 | L200- | 0.0317 | 0.00100 | 0.02456 | 0.00030 | 0.76589 | 0.96948 |
| 18 | 2 | Uni | E-D | 15 | L200- | 0.0745 | 0.00554 | 0.05769 | 0.00165 | -0.30800 | 0.83155 |
| 19 | 2 | Multi | E-D | 15 | L200- | 0.0386 | 0.00148 | 0.03276 | 0.00045 | 0.64927 | 0.95866 |
| 20 | 2 | Multi | E-D | 22 | L200- | 0.0424 | 0.00180 | 0.03723 | 0.00052 | 0.10212 | 0.96339 |
| 21 | 2 | Multi | E-D | 30 | L100- | 0.0500 | 0.00250 | 0.03244 | 0.00070 | 0.42893 | 0.95437 |
| 22 | 2 | Multi | E-D | 30 | L200- | 0.0348 | 0.00121 | 0.02520 | 0.00033 | 0.55437 | 0.97100 |
| 23 | 14 | Multi | E-D | 14 | L200- | 0.0938 | 0.00280 | 0.04374 | 0.00088 | – 0.10390 | 0.94558 |
| 24 | 14 | Multi | E-D | 15 | L200- | 0.0523 | 0.00270 | 0.04312 | 0.00084 | – 0.09150 | 0.94591 |
| 25 | 14 | Uni | E-D | 14 | L200- | 0.0513 | 0.00263 | 0.04029 | 0.00080 | 0.07561 | 0.95002 |
| 26 | |||||||||||
| 27 | 1 | Multi | CNN | 9 | C64-C128-Max128- | 0.0768 | 0.00278 | 0.03868 | 0.00080 | 0.37183 | 0.93054 |
| 28 | 2 | Uni | CNN | 9 | C64-C128-Max128-L200-td32-td1 | 0.0532 | 0.00283 | 0.03821 | 0.00083 | 0.32742 | 0.95390 |
| 29 | 2 | Multi | CNN | 9 | C64-C128-Max128-L200-td32-td1 | 0.0663 | 0.00430 | 0.04930 | 0.00130 | – 0.03590 | 0.94067 |
| 30 | |||||||||||
| 31 | 14 | Multi | BD | 14 | BDL28, td1 | 0.0337 | 0.25308 | 0.02388 | 0.00034 | 0.57346 | 0.96969 |
| 32 | 14 | Multi | BD | 14 | BDL100,td1 | 0.0436 | 0.25430 | 0.03513 | 0.00057 | 0.33846 | 0.95614 |
| 33 | 14 | Multi | BD | 14 | BDL100,td28,td1 | 0.0459 | 0.25460 | 0.03723 | 0.00064 | 0.26752 | 0.95343 |
The column “Step” indicates the number of steps ahead (days) predicted that are 1, 2, and 14 days in this table.
The column “Variable” shows different variables used in feature space (i.e Uni indicates the dataset which only includes Gold, Multi includes all the dataset, and COVID less includes all the financial, variables without COVID-19 data)
History column shows the number of days that predictions are based on
L denotes LSTM layers, d denotes Dense layers, C denotes Convolution layers, BDL denotes bidirectional LSTM layers, pool denotes Max-pooling layers, td denotes time distributed dense, pool denotes Max-pooling layers, E-D denotes encoder–decoder model, BD denotes bidirectional model, CNN denotes CNN–LSTM model, SE denotes vector sequence output prediction model
Based on [27]
Based on [8]
Fig. 3Encoder–decoder LSTM network architecture. X is input, Y is output, and h is hidden state of the LSTM cells
Fig. 4Technology symbols and COVID-19 time series in 300 days until 03 Jul 2020
Correlation analysis of all sectors vs. gold
| Variables | ||
|---|---|---|
| Consumer-staples (Close) | 0.817 | 0.817 |
| Consumer-staple (Avg.) | 0.817 | 0.818 |
| Volume energy | 0.438 | 0.440 |
| Financial (Close) | 0.468 | 0.469 |
| Financial (Avg.) | 0.466 | 0.467 |
| Healthcare (Close) | 0.721 | 0.722 |
| Healthcare (Avg.) | 0.722 | 0.722 |
| Materials (Close) | 0.600 | 0.600 |
| Materials (Avg.) | 0.598 | 0.599 |
| Real-estate (Close) | 0.492 | 0.493 |
| Real-estate (Avg.) | 0.495 | 0.496 |
| Tec-symbols (Close) | 0.777 | 0.778 |
| Tec-symbols (Avg.) | 0.775 | 0.777 |
| Telecom (Close) | 0.675 | 0.675 |
| Telecom (Avg.) | 0.675 | 0.675 |
| Utilities (Close) | 0.710 | 0.711 |
| Utilities (Avg.) | 0.720 | 0.711 |
| Gold (Close) | 1.0 | 0.999 |
| Gold (Avg.) | 0.999 | 1.0 |
| Volume-food | 0.4665 | – 0.682 |
| Covid-all-total-cases (norm) | 0.877 | 0.885 |
| Covid-all-new-cases (norm) | 0.881 | 0.886 |
| Covid-US-total-cases (norm) | 0.889 | 0.898 |
| Covid-US-new-cases (norm) | 0.858 | 0.860 |
| Covid-ww-total-cases (norm) | 0.873 | 0.881 |
| Covid-ww-new-cases (norm) | 0.879 | 0.884 |
| Covid-all-total-cases | 0.877 | 0.885 |
| Covid-all-new-cases | 0.881 | 0.886 |
| Covid-US-new-cases | 0.858 | 0.860 |
| Covid-ww-total-cases | 0.873 | 0.881 |
| Covid-ww-new-cases | 0.879 | 0.884 |
The p values for the corresponding correlations in this table, which already have been filtered, are below
ww notion in COVID-19 cases indicates world-wide cases except the United States, US indicates the United States cases, norm indicates normalized data, all denotes the aggregated cases of the United States and the rest of the world, Close indicates the mean closing price of sector symbols, Avg. indicates the mean daily-average price of sector symbols
Fig. 5Single-step time series prediction. Figure on the left a has been trained on the market data without COVID-19 time series. Figure on the right b has been trained on all features including COVID-19 time series data. Red markings are ground truth validation points and green marking are the predictions. The model makes prediction based on 30 days of historical data. The architecture is: LSTM(300)-LSTM(300)-Dense(100)-Dense(1)
MSE error rates for 30 days prediction in advance with some of the best architectures
| Type | History | Network | MSE |
|---|---|---|---|
| E-D | 60 | L500-L100-d100 | 0.0217 |
| BD | 45 | BDL300-L100-d100 | 0.0024 |
| CNN | 45 | C100-C200-pool-L100-d100 | 0.0061 |
History column shows the number of days that predictions are based on.
L denotes LSTM layers, d denotes Dense layers, C denotes Convolution layers, BDL denotes bidirectional LSTM layers, pool denotes Max-pooling layers, E-D denotes encoder–decoder model, BD denotes bidirectional model, CNN denotes CNN–LSTM model, SE denotes vector output sequence prediction model
Fig. 6Multi-step time series prediction. Figure on the left a has been trained on the market data without COVID-19 time series. Figure on the right b has been trained on all features including COVID-19 time series data. Red markings are ground truth validation points and green marking are the predictions. The model makes prediction based on 22 days of historical data. The architecture is: LSTM(200)-LSTM(200)-Dropout-Dense(2)