Literature DB >> 33935586

Forecasting stock price using integrated artificial neural network and metaheuristic algorithms compared to time series models.

Milad Shahvaroughi Farahani¹, Seyed Hossein Razavi Hajiagha².

Abstract

Today, stock market has important function and it can be a place as a measure of economic position. People can earn a lot of money and return by investing their money in the stock exchange market. But it is not easy because many factors should be considered. So, there are many ways to predict the movement of share price. The main goal of this article is to predict stock price indices using artificial neural network (ANN) and train it with some new metaheuristic algorithms such as social spider optimization (SSO) and bat algorithm (BA). We used some technical indicators as input variables. Then, we used genetic algorithms (GA) as a heuristic algorithm for feature selection and choosing the best and most related indicators. We used some loss functions such as mean absolute error (MAE) as error evaluation criteria. On the other hand, we used some time series models forecasting like ARMA and ARIMA for prediction of stock price. Finally, we compared the results with each other means ANN-Metaheuristic algorithms and time series models. The statistical population of research have five most important and international indices which were S&P500, DAX, FTSE100, Nasdaq and DJI.

Entities: Chemical

Keywords: Artificial neural network; Bat algorithm; Genetic algorithm; Social spider optimization algorithm

Year: 2021 PMID： 33935586 PMCID： PMC8070984 DOI： 10.1007/s00500-021-05775-5

Source DB: PubMed Journal: Soft comput ISSN： 1432-7643 Impact factor: 3.643

Introduction

People are always looking for some ways to invest their capital. Stock market is one of the main places to invest the money and capital. However, stock markets confront with different risks. Therefore, investors require forecasting stock price and it depends on several psychological, economic, etc., factors. Thus, several methods have been developed to predict stock prices. These forecasting methods aim at proposing approaches to predict index value or stock prices (Lah et al. 2019). They need different considerations due to the quality and quantity of data. Technical analysis, fundamental analysis and statistical methods are used for stock price prediction. One of the main hypotheses which should be considered and better to test it is efficient market hypothesis (EMH) (Malkiel 1989, 2003). EMH means that information has a high impact on stock prices and prices modifying themselves according to this information (Greco et al. 2019). The efficient market ensures investors that they access similar information (Naseer and Bin Tariq 2015). The efficient market is based on the assumption that no system can beat the market because if this system becomes general, everybody will use it. Thus, it negates its potential profitability. Time series is a main method which is used for the prediction of share prices. Time series analysis deals with analyzing a series of data gathered during time. Time series are common in different fields of economy, finance, healthcare, etc (Bisgaard and Kulahci 2011). This method tries to forecast future by assuming that the previously observed pattern can be considered as the foundation to extract future behavior (Shin 2017). Heuristic algorithms are another set of methods being used for prediction. Heuristic algorithms are often used as an alternative optimization algorithm, instead on exact methods that usually deal with finding a good feasible solution without any assurance of being optimal (Kaveh and Ghazaan 2018). Heuristic algorithms are applicable in different decision problems which have complex structures, and it takes a long time to identify their characteristics. The other methods are metaheuristic algorithms. Metaheuristic algorithms are actually a set of algorithms that are applied to heuristic algorithms and simultaneously allow the use of heuristic algorithms in a large number of issues. It does not take into account the characteristics of the model and is compatible with any model and different solutions (Osman and Kelly 1996; Talbi 2009). In cases which the set of solutions is too large to being sampled completely, metaheuristics examine a set of these solutions. Since metaheuristics are usually developed based on a limited set of assumptions, they can be used for a variety of problems (Blum and Roli 2003). Comparing with exact methods, there is no guarantee that metaheuristics can find global optimum of an optimization problem (Blum and Roli 2003; Khosravanian et al. 2018). Metaheuristic algorithms are applied to solve difficult and complicated problems in an affordable time. These algorithms usually found acceptable rather than optimal solutions for these types of problems (Talbi 2009). Gogna and Tayal (2013), Abdel-Basset et al. (2018), Wong and Ming (2019) are a sample of studies reviewed the applications of metaheuristic algorithms in different fields. The other method is ANN which is retrieved from the function of human brain and thinking. ANN is in the subset of artificial intelligence (AI), and it is usable in different contents such as pattern recognition, classification, regression. Because most of the financial data are nonlinear and are asymmetric, ANN can recognize the relationship perfectly. This paper aims to predict the stock price by ANN. The developed ANN is trained using some metaheuristic algorithms, including social spider optimization (SSO) and bat algorithm (BA). A group of technical measures are used as input variables. Genetic algorithm (GA) also is used as feature selection and choosing the best and most related indicators. Different loss function is used as error assessment criteria. To evaluate the performance of the mentioned hybrid algorithm, the obtained results are compared with results of ARIMA as a time series model to predict the stock price. This obtained performance and its comparisons are done on five most important and international indices including S&P500, DAX, FTSE100, NASDAQ and DJI. The paper is structured as follows: the 2nd part reviews the available literature. The 3rd part describes ANN structure and proposed algorithm. In Sect. 4 ARIMA is used for time series forecasting. Sections 5 and 6 examine the experimental process and the results. Finally, last section means 7th part concludes the paper. You can see more results in Appendices A and B.

Literature review

Stock market is a place where you can invest your capital to buy or sell part of the company's assets in the form of shares (Preethi and Santhi 2012). We can see the market as a pulse of economic activities and almost country, which can be a place with high benefits for investors which they can grow their capital and money or totally their wealth. Stock market is characterized by features such as nonlinearity, discontinuity, and volatile multifaceted elements because many items affect is such as general economic situations, political actions and broker's assumptions (Hadavandi et al. 2010). Considering the amount of fluctuation in this market, a rapid decision making process is required. Therefore, it is very important that transactions are done in the shortest possible time (Barakat et al. 2016). Obtaining maximum profit is the ultimate goal of the investors. As a result, many researchers are looking for market forecasting capabilities in a variety of ways (Prasanna and Ezhilmaran 2013). According to previous studies, ANN seems a good and reasonably validated method in the prediction of stock price (Idris et al. 2015). The three most popular ANNs for stock prediction are the recurrent neural network (RNN) (Saad et al. 1998), the radial basis function (RBF) (Han et al. 2001), and multilayer perceptron (MLP). There are many methods for training the ANN and some of them are better than the others in finding the linear and nonlinear relationship. ANN uses two thresholds for exploration of linear and nonlinear qualifications. The number of layer is very important in predictability. If we use too many layers, the ANN couldn't find the fittest choice and the structure will be complicated. In addition, too few layers mean that the ANN is unable to find the global solution and nonlinear relationships. The researchers have tried to discover some methods which have high speed with high accuracy and lower the error. For this reason, the metaheuristic algorithms are used. These methods are used for the network optimization and finding the best number of input and hidden layers. The ANN models in forecasting stock price, stock return, exchange rate, inflation and imports work better than traditional statistical models (Yim and Mitchell 2002). Gupta and Wang (2010) used feed-forward neural networks to forecast and trade the future index prices of the Standard and Poor’s 500 (S&P 500). The effect of training the network with the most recent data, together with gradually subsampled past index data, has been studied in this research. They also studied the effect of past NASDAQ 100 data on the prediction of future S&P 500. A daily trading strategy has been used, to buy/sell, according to the predicted prices, and hence calculate the directional efficiency and the rate of returns for different periods. They were able to obtain significantly higher returns compared to earlier work. There were numerous exchange-traded funds (ETFs), which attempted to replicate the performance of the S&P 500 by holding the same stocks in the same proportion as the index, and therefore, giving the same percentage returns as the S&P 500. Zhu and Wang (2010) proposed an intelligent trading system using support vector regression optimized by genetic algorithms (SVR-GA) and multilayer perceptron optimized with GA (MLP-GA). Experimental results showed that both approaches outperformed conventional trading systems without prediction and a recent fuzzy trading system in terms of final equity and maximum drawdown for Hong Kong Hang Seng stock index. He et al. (2013) did researches on the principles and theories in the field of financial market, and basic technical analysis methodologies about the stock market were studied and practiced with the help of Feature Selection algorithms. They used the data of Shanghai Stock Exchange Composite Index (SSECI) from 24 March 1997 to 23 August 2006 to measure twelve technical indicators for later research. The twelve chosen technical indicators were calculated, and the results were taken as the input of the Feature Selection algorithms. The three kinds of Feature Selection algorithms, principle component analysis (PCA), genetic algorithm (GA) and sequential forward selection (SFS) were studied. According to the results and analysis, PCA was the most reliable, but might be time-consuming if the input has very large dimensions. Genetic Algorithm would have a better performance since it takes the advantage of randomness in such a situation. SFS could generate the local optimal solution, but with a risk of “nesting problem”. Dong et al. (2013) first reproduced the one-step ahead prediction system from Phua et al. for the prediction of stock price. Secondly, they made some modifications and successfully outperform the original prediction system in terms of MSE value, hit-rate and absolute error. Moreover, they explored a difficult multi-step prediction problem. Firstly, they reproduced a multi-step prediction system using simple recursive algorithm. Then, they proposed an error constraint algorithm in order to obtain better weights and bias, as well as smaller accumulated errors. The results outperformed the simple recursive algorithm by observation. Zheng et al. (2013) explored the application of a wavelet neural network (WNN), whose hidden layer was comprised of neurons with adjustable wavelets as activation functions, to stock prediction. They discussed some basic rationales behind technical analysis, and based on which, inputs of the prediction system were carefully selected. This system was tested on Istanbul Stock Exchange National 100 Index and compared with traditional neural networks. The results showed that the WNN could achieve very good prediction accuracy. Fang et al. (2014) improved stock market prediction based on genetic algorithms (GA) and wavelet neural networks (WNN) and reported significantly better accuracies compared to existing approaches to stock market prediction, including the hierarchical GA (HGA) WNN. Specifically, they added information such as trading volume as inputs and they used the Morlet wavelet function instead of Morlet–Gaussian wavelet function in their prediction model. They also employed a smaller number of hidden nodes in WNN compared to other research work. The prediction system was tested using Shenzhen Composite Index data. Lim et al. (2016) used delayed neural network models to predict public housing prices in Singapore. The delayed neural networks are used to estimate the trend of the resale price index (RPI) of Singapore housing from the Singapore Housing Development Board (HDB), with nine independent economic and demographic variables. The results show that the delayed neural network model is able to produce a good fit and predictions. Göçken et al. (2016) predicted Turkish stock price index using technical indicators and hybrid ANN based on GA and harmony search (HS). The results showed that the error of hybrid metaheuristic algorithms is less than ANN. They compared the hybrid ANN-HS and ANN-GA model and found that the error of ANN-HS is less than ANN-GA. Considering the problem of dealing with features with a similar contribution, the feature weighted SVM (FWSVM) and feature weighted K-nearest neighbor (FWKNN) are proposed to forecast market indices of stock by assigning different weights to different features (Chen and Hao 2017). Then this model is tested on two stock markets. Comparing the results, the FWSVM and FWKNN perform better than non-weighted models. Ghasemiyeh et al. (2017) optimized artificial neural network by metaheuristic algorithms. In their research, cuckoo search, improved cuckoo search, enhanced cuckoo search genetic algorithm, genetic algorithm and particle swarm optimization (PSO) are examined. Testing these hybrid algorithms and using 28 variables as input, the results show that particle warm outperforms other algorithms in this study. Goli et al (2018) used various metaheuristic algorithms for improving the results and predicting demand in dairy industry too. This study used two well liked metaheuristic algorithms, such as GA and PSO, together with two more recent algorithms titled invasive weed optimization (IWO) and cultural algorithm (CA) as feature selection and demand forecasting in dairy industry. According to the results, PSO showed the best performance in feature selection while IWO can significantly improve the prediction error. Sin and Wang (2017) explored the relationship between the features of Bitcoin and the next day change in the price of Bitcoin using an Artificial Neural Network ensemble approach called Genetic Algorithm-based Selective Neural Network Ensemble, constructed using Multi-Layered Perceptron as the base model for each of the neural network in the ensemble. To better understand the practicality and its effectiveness in real-world application, the ensemble was used to predict the next day direction of the price of Bitcoin given a set of approximately 200 features of the cryptocurrency over a span of 2 years. Over a span of 50 days, a trading strategy based on the ensemble was compared against a “previous day trend following” trading strategy through back-testing. The former trading strategy generated almost 85% returns, outperforming the “previous day trend following” trading strategy which produced an approximate 38% returns and a trading strategy that follows the single, best MLP model in the ensemble that generated approximately 53% in returns. Chong et al. (2017) applied three methods such as PCA, restricted Boltzmann machine (RBM) and auto encoder on the deep learning network as feature extraction with three loss functions such as root-mean-squared error (RMSE), mean absolute error (MAE), mutual information (MI) and normalized mean squared error (NMSE), to predict future market trend of South Korea. Sezer et al. (2017) employed GA for stock trading system on base of deep neural network (DNN) to anticipate buy–sell–hold. GA was used as feature selection and generates the buy–sell point in mentioned system. Later, Dixon (2018) also used a long short-term memory (LSTM) network and forecasted short-term price movements. Zhang et al. (2018) designed a system for prediction of stock price trend which could predict stock price movement and its increase or decrease trend interval during predetermined periods. They used random forest model and trained it on historical data from a China Market to categorize the multiple clips of stocks into four major groups regard to the different kinds of their close prices. The result indicates the improvement in prediction of volatility in market and some merits such as precision and return per dealing. Baek and Kim (2018) proposed a framework, entitled ModAugNet, consisting two modules based on LSTM: one for prevention and one for prediction. The framework is tested over two Korean stock data set. The obtained results show an improvement in different error measures. Ahmed et al. (2019) used ant colony optimization (ACO) in forecasting stock price of Nigerian stock exchange. They compared ACO with three other algorithms including Price Momentum Oscillator, Stochastic and Moving Average. They concluded that ACO has more accuracy and lower error than other methods. Ghanbari and Arian (2019) used support vector regression (SVR) and butterfly optimization algorithm (BOA) in forecasting stock market. They presented a new BOA-SVR model based on BOA and compared it with results of 11 metaheuristic algorithms on NASDAQ data. The result indicated that the considered model can improve the results and optimizing the SVR Parameters. On the other hand, this model has worked very well with higher performance accuracy and lower time consumption compared to other models. Chandana (2019) used a novel approach based on least square support vector regression (LSSVR) and Machine Learning. He decided to design an expert system for prediction of stock price and he hoped to help strengthen the forecast with improving the power of accuracy. Their system was successful because the computation became fewer and calculation was simpler too. Rajesh et al. (2019) used ensemble learning techniques for stock trend prediction concentrating on the stock price change percentage. They predicted S&P500 and its future trend with ensemble learning. To this aim, they considered two foreseen methods: ensemble learning and heat map. Evidences suggest that support vector machine (SVM), random forest, and K-neighbor's classifiers have more promising results compared to other methods. The accuracy of the forecast model seems upper than 51%, which illustrate 23% increase in prediction accuracy. Pal and Kar (2019) used a hybrid approach to forecast time series of stock price by using data discretization based on fuzzistics [1; 2], where cumulative probability distribution approach (CPDA) is used to get the intervals for the linguistic values. First-order fuzzy rule generation and reduction of rule sets by rough set theory have been performed. Thereafter, forecasting of the time series data is computed from defuzzification using reduced rule base and its historical evidences. Proposed approach is applied on stock index closing price of three-time series data (BSE, NYSE, and TAIEX) as experimental data sets and the results show that the method is more effective than its counter parts. Liu and Wang (2019), in order to address the profit bias in model evaluation, proposed a new effective metric, mean profit rate (MPR). The effectiveness of metric was measured based on the correlation between the metric value and profit of the model. Experiments on five stock daily index data among four countries show that MPR outperformed the classification metrics in correlating to profit. In view of these findings, they suggested that MPR is a more effective metric than the classification metrics in stock trend prediction. Lv et al. (2019) assessed different types of machine learning algorithms based on trading cost. They tried to compare traditional algorithms and advance DNN models. They used data of period 2010–2017 from different index component stocks. The random forest, naïve Bayes, logistic regression, classification and regression tree (CART), traditional machine learning algorithms are SVM, and extreme gradient boosting while the DNN architectures include deep belief network (DBN), multilayer perceptron (MLP), RNN, Stacked Auto encoders (SAE), LSTM, and gated recurrent unit (GRU). Their results indicated that each algorithm is superior than other based on transaction cost. For example, regardless of the transaction cost, traditional machine learning algorithms perform better in many directional assessment indices; however, DNN models perform better despite of transaction cost. Zaman (2019) realized that efficiency of Bangladesh largest stock market is weak. To improve the results, he conducted parametric and nonparametric tests of DSE & CSE from 2013 to 2017. The results proved that two stock exchanges are not efficient in the weak form. Zhou et al. (2020) investigated the SVM power in predicting stock price change direction. They used five different datasets, including technical indices, stock posts, transaction data records, news and Baidu index, and concluded that there are different ideal data source to forecast active stocks and inactive ones. Finally, they found that more active stocks can be predicted with higher accuracy for different periods of time. Sahoo and Mohanty (2020) proposed a combination of ANN and gray wolf optimization (GWO) technique and compared the hybrid ANN-GWO with ANN. They compare these models on a dataset of Bombay stock exchange in a time period from 2004 to 2018. The performance of the ANN-GWO and ANN is evaluated according to different error measures. The comparisons illustrate that the mentioned hybrid method results better than the ANN model. Kumar et al. (2020) reviewed and organized the published papers on stock market prediction using computational intelligence. The related papers are organized according to related datasets, input variables used, pre-processing methods, techniques used for future selection, forecasting methods and performance metrics to evaluate the models. According to the above reviewed papers, it can be inferred that study on stock market prediction is still being raised among researchers. Also, it seems that hybrid methods are the permanent approach used in different researches. Considering the acceptance of ANN-based methods, the focus is to enhance the performance of ANN through some metaheuristics. Limitations of the previous methods are provided in Table 1 (Obthong et al. 2020).

Table 1

Limitations of the previous methods

No	Methods	Purpose	Limitations
1	ARIMA (autoregressive integrated moving average model)	Forecasting and clustering	• Doesn't work well for nonlinear time series • Requires more data • Takes a long time processing for a large dataset
2	BPNN (back propagation neural network)	Forecasting	• Sensitive to noise • Actual performance based on initial values • Slow convergent speed • Easily converging to a local minimum
3	CART (classification and regression trees)	Classification and forecasting	• Unstable even when the training data are small changed
4	GP (Gaussian process)	Classification and forecasting	• Generates “black box” models which are difficult to interpret • Can be computationally expensive
5	GRNN (generalized regression neural network)	Classification and forecasting	• Requires more memory space to store the model • Can be computationally expensive because of its huge size
6	Hierarchical clustering	Clustering	• The length of each time series is the same because of the Euclidean distance • Useful only for small datasets because of its quadratic computational complexity
7	HMM (hidden Markov model)	Clustering, classification and clustering	• Requires parameters to be set and is based on user assumptions that may be false with the result that clusters would be inaccurate • Takes a long time processing for a large dataset
8	K-Mean	clustering	• The number of clusters must be specified in advance • Sensitive to noise • Only spherical shapes can be determined as clusters • Unable to handle long time series effectively because of poor scalability
9	KNN (K nearest neighbor)	Classification and forecasting	• The number of nearest neighbors must first be determined • Can be computationally expensive • Memory limitation • Sensitive to the local structure of the data
10	LR (logistic regression)	Classification and forecasting	• Sensitive to outliers • Strong assumptions
11	LSTM (long short-term memory)	Classification and forecasting	• Lacks a mechanism to index the memory while writing and reading the data the number of memory cells is linked to the size of the recurrent weight matrices
12	MLP (multi-layer perceptron)	Classification and forecasting	• Convergence is quite slow • Local minima can affect the training process • Hard to scale
13	PSO (particle swarm optimization)	Forecasting	• Lacks a solid mathematical foundation for analyzing future development of relevant theories
14	RBF (radial basis function neural network)	Classification and forecasting	• Classification process is slower than MLP
15	RF (random forest)	Classification and forecasting	• Requires more computational power and resources because it creates a lot of trees • Requires more time to train than decision trees
16	RNN (recurrent neural network)	Classification and forecasting	• Difficult to train
17	SOM (self optimizing maps)	Clustering and classification	• Does not work well for time series of unequal length because of the difficulty involved in determining the scale of weight vectors • Sensitive to outliers
18	SVM (support vector machine)	Classification and forecasting	• Sensitive to outliers • Sensitive to parameter selection
19	SVR (support vector regression)	Forecasting	• Sensitive to users’ defined free parameters
20	ANN (artificial neural network)	Classification and forecasting	• Over fitting • Sensitive to parameter selection—ANNs just give predicted target values for some unknown data without any variance information to assess the prediction

Limitations of the previous methods • Doesn't work well for nonlinear time series • Requires more data • Takes a long time processing for a large dataset • Sensitive to noise • Actual performance based on initial values • Slow convergent speed • Easily converging to a local minimum • Generates “black box” models which are difficult to interpret • Can be computationally expensive • Requires more memory space to store the model • Can be computationally expensive because of its huge size • The length of each time series is the same because of the Euclidean distance • Useful only for small datasets because of its quadratic computational complexity • Requires parameters to be set and is based on user assumptions that may be false with the result that clusters would be inaccurate • Takes a long time processing for a large dataset • The number of clusters must be specified in advance • Sensitive to noise • Only spherical shapes can be determined as clusters • Unable to handle long time series effectively because of poor scalability • The number of nearest neighbors must first be determined • Can be computationally expensive • Memory limitation • Sensitive to the local structure of the data • Sensitive to outliers • Strong assumptions • Convergence is quite slow • Local minima can affect the training process • Hard to scale • Requires more computational power and resources because it creates a lot of trees • Requires more time to train than decision trees • Does not work well for time series of unequal length because of the difficulty involved in determining the scale of weight vectors • Sensitive to outliers • Sensitive to outliers • Sensitive to parameter selection • Over fitting • Sensitive to parameter selection—ANNs just give predicted target values for some unknown data without any variance information to assess the prediction

Hybrid metaheuristic ANN for stock price prediction

Technical indicators

ANN includes 3 layers that the input layer is the first one. Here, some important technical indices are used as input variables of the network. Indicators are mathematical functions that are based on specific formulas for analyzing stock prices or analyzing market indices using graphical tools. Investors and managers can use them to analysis of stock market. Choosing the best and most related technical indicators is a controversial issue. To deal with this challenge, GA is used for feature selection. The considered technical indicators are illustrated in Table 2.

Table 2

Important technical indicators

Row	Feature	Definition	Formula
1	Open	The first price	–
2	High	The highest price	–
3	Low	The lowest price	–
4	Close	The last price	–
5	Volume	Number of traded shares	–
6	SMA-5	Simple moving average-5 days	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{{\left( {Close1 + Close2 + \cdots + Close5} \right)}}{5}$$\end{document}Close1+Close2+⋯+Close55
7	SMA-20	Simple moving average-20 days	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{{\left( {Close1 + Close2 + \cdots + Close20} \right)}}{20}$$\end{document}Close1+Close2+⋯+Close2020
8	EMA-5	Exponential moving average-5 days	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{{Close Todayk + EMA\left( 5 \right)Yestarday\left( {1 - k} \right)}}{5}$$\end{document}CloseToday∗k+EMA5Yestarday∗1-k5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K = \frac{2}{5 + 1} \cdot EMA\left( 5 \right)0 = SMA\left( 5 \right)$$\end{document}K=25+1·EMA50=SMA5
9	ADL	Accumulation Distribution Line	ADL_Yesterday + Volume * CLV CLV = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{{\left[ {\left( {Close - Low} \right) - \left( {High - Close} \right)} \right]}}{High - Low}$$\end{document}Close-Low-High-CloseHigh-Low
10	CMF	Chaikin Money Flow	(((Close − Low) − (High − Close))/(High − Low)) * Volume)/Total (Volume, 21)
11	MFI	Money Flow Index	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{MFI}} = 100 - \frac{100}{{1 + Money Flow Ratio}}$$\end{document}MFI=100-1001+MoneyFlowRatio Money Flow Ratio = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{14 Period Positive Money Flow}{{14 Period Negative Money Flow}}$$\end{document}14PeriodPositiveMoneyFlow14PeriodNegativeMoneyFlow Raw Money Flow = TP * Volume TP = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{High + Low + Close}{3}$$\end{document}High+Low+Close3
12	TP	Typical Price	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{High + Low + Close}{3}$$\end{document}High+Low+Close3
13	RSI	Relative Strength Index	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$100 - \frac{100}{{1 + RS}} \cdot RS = \frac{SMA\left( U \right)}{{SMA\left( D \right)}}$$\end{document}100-1001+RS·RS=SMAUSMAD
14	ROC	Rate of change	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{{\left( {Close{ }today - Close{ }N{ }previous{ }day} \right)}}{{Close{ }N{ }previous{ }day}}$$\end{document}Closetoday-CloseNpreviousdayCloseNpreviousday
15	Upper Band	Upper Band of Bollinger	SMA(20) + dev(20) * 2
16	Lower Band	Lower Band of Bollinger	SMA(20) − dev(20) * 2
17	MP	Mean Price	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{{\left( {High + Low} \right)}}{2}$$\end{document}High+Low2
18	ATR	Average True Range	Current ART = [(Prior ATR × 13) + Current TR]/14 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{ATR}} = \left( \frac{1}{n} \right)\mathop \sum \limits_{{\left( {i = 1} \right)}}^{\left( n \right)} TRi$$\end{document}ATR=1n∑i=1nTRi TRi = A particular true range n = the number of time period
19	CCI	Commodity Channel Index	CCI = (TP − 20 Period SMA of TP)/(0.015 × Mean Deviation) (TP) = (High + Low + Close)/3 Constant = 0.015
20	DX	Directional Movement Index	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+ {\text{DI}} = \left( {\frac{Smoothed + DM}{{ATR}}} \right) \times 100$$\end{document}+DI=Smoothed+DMATR×100 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$- {\text{DI}} = \left( {\frac{Smoothed - DM}{{ATR}}} \right) \times 100$$\end{document}-DI=Smoothed-DMATR×100 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{DX}} = \left( {\frac{{\left\| { + DI - DI} \right\|}}{{\left\| { + DI + - DI} \right\|}}} \right) \times 100$$\end{document}DX=+DI-DI+DI+-DI×100 + DM (Directional Movement) = Current High-PH PH = Previous High − DM = Previous Low-Current Low Smoothed ± DM = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathop \sum \limits_{t = 1}^{14} DM - \left( {\frac{{\mathop \sum \nolimits_{t = 1}^{14} DM}}{14}} \right) + CDM$$\end{document}∑t=114DM-∑t=114DM14+CDM CDM = Current DM ATR = Average True Range

Important technical indicators ADLYesterday + Volume * CLV CLV = Money Flow Ratio = Raw Money Flow = TP * Volume TP = Current ART = [(Prior ATR × 13) + Current TR]/14 TRi = A particular true range n = the number of time period CCI = (TP − 20 Period SMA of TP)/(0.015 × Mean Deviation) (TP) = (High + Low + Close)/3 Constant = 0.015 + DM (Directional Movement) = Current High-PH PH = Previous High − DM = Previous Low-Current Low Smoothed ± DM = CDM = Current DM ATR = Average True Range

Artificial neural network (ANN)

Today, ANN is used in different problems. Some of the well-known applications include function approximation, classification and clustering information, save and reviewing data, optimization, etc (Versace et al. 2004). ANN can be used for a variety of topics, including time series forecasting. Because stock price data is not normal and it has some characteristics such as skewness, kurtosis, fat tail and nonlinearity, ANN can be used for recognizing these qualifications. As mentioned earlier, a typical ANN includes 3 layers: (1) Input, (2) Hidden; and (3) Output. The number of layers in each phase is important because by changing them, the network will react differently. Thus, GA is applied for choosing important variables. The GA is used as feature selection for some reasons include (1) conceptual easiness; (2) searching a wide area of solutions instead of just examining a single point; (3) supporting multi-objective optimization; (4) GA is a stochastic process and robust to local minima/maxima; and finally (5) GA is easily paralyzed (Oreski et al. 2012). By doing this, the speed rate of calculation is increased and also the network will be prevented getting into local minima or maxima trap. Neural network is based on learning which means each times it tries to reduce their error based on trial and error. The network has three phases: (1) training, (2) validation and (3) testing. This study includes two main parts. The first one includes calculating technical indicators and selecting the most optimal indicator by using GA. Second one includes forecasting closing price by using different hybrid ANN models and comparing their prediction error. Two metaheuristic algorithms, means SSO and BA, are used since they have had successful and brilliant results in various fields and researches such as prediction of the stock price and prediction of interest rate; on the other hand, they have some properties including their approximate and usually non-deterministic nature and also they are not problem-specific and flexible too (Beheshti and Shamsuddin 2013) So, stock price data, from 2013 to 2018, are split into two sections: training and testing. Then, it is analyzed with artificial intelligence algorithms and forecasting the next day closing stock price. Like Göçken et al. (2016), for training period, 70% of observations are used and for testing and validation period, the remaining 30% is used. Models are compared based on 8 criteria of prediction error. Different algorithms are used for training ANN, e.g. gradient descent backpropagation (Mozer et al. 1995), Levenberg–Marquardt (LM) backpropagation (Hao and Wilamowski 2011), BFGS quasi-Newton backpropagation (Fahad et al. 2018), Bayesian regulation backpropagation (Burden and Winkler 2008), etc. In this study, the hidden layer neurons of the normal neural network are determined based on trial and error and it is not fixed. Due to the feature of MATLAB software, the number of hidden layers is fixed to 1. This can be considered as a kind of limitation. To this aim, 1–32 neurons are examined in hidden layer; the fittest amount of neurons with the most accuracy is chosen as ANN model. For training ANN, error-back propagation is used. LM algorithm is also used as the minimization algorithm in learning the model (Haddad and Haghighat Monfared 2012). The amount of training epochs is a thousand and for improving the results, we increased it to 2000, and the initial training rate set to 0.01 and is decreased to 0.001 to improve the accuracy of the results. ANN has two threshold functions for recognizing the linear and nonlinear qualification of the model. The output function for the hidden layer is a tangent sigmoid function which is a mathematically shifted version of the sigmoid function and it has the feature of both functions that's mean the Tahn and sigmoid and threshold function for the output layer is pure line function. We used the Tanh function for some reasons: (1) because the range of our numbers is between [1, − 1]. (2) The activation works almost always better than the sigmoid function. (3) It is capable to learn and perform more complex and nonlinear tasks. Hence the mean for the hidden layer comes out be 0 or very close to it and hence helps in centering the data by bringing mean close to 0. This makes learning for the next layer much easier. The architecture of the proposed neural network is represented in Fig. 1.

Fig. 1

The structure of the desired artificial neural network (Ghasemiyeh 2017)

The structure of the desired artificial neural network (Ghasemiyeh 2017) Here, input variables are illustrated with 20 technical variables. These variables are normalized to be used as input variables using Eq. (1). Similarly, the goal of normalization is to change the values of dataset to a common scale, without distorting differences in the ranges and it generally speeds up learning and leads to faster convergence. Where is the ith observation. Figure 2 represents the research methodology.

Fig. 2

Research methodology

GA-ANN forecasting model

To select input variables, GA is used. GA is a stochastic search algorithm inspiring natural evolution (Kuo and Han 2011; Saber et al. 2013). Generally, GA seeks the approximate optimal solution by coding and decoding of a population of solutions and reproduction by crossover and mutation, as its main operators. In this study, inputs are coded using binary variables. The chromosomes are defined to contain 26 bits. Of these bits, 21 bits are associated with existence (bit value equal to 1) or nonexistence (bit value equal to 0) of input variables (technical indicators). 5 additional bits show the figure of neurons (25 = 32) in hidden layer. The population size of GA is 20 (Davallou and Azizi 2017; Kai and Wenhua 1997). The primary population is formed stochastically. Technical indicators and the number of hidden layer are entered to the GA and using ANN as its fitness function, and it is the amount of MSE reproduced as output. The fittest choice is one with the lowest MSE. To increase the training speed, the epochs are set to 100. As mentioned, 70% of data are employed for training and 30% is considered to test and validate. Table 3 illustrates the parameters of genetic algorithm.

Table 3

Parameters of GA

Output error	Output activation function	Input activation function	Mutation rate	Crossover rate	Number of generation	Population size
SSE	Logistic	Logistic	0.1	0.9	50	50

Parameters of GA Figure 3 illustrates the proposed GA-ANN algorithm.

Fig. 3

Considered GA flow chart for training ANN (Liu and Wang 2019)

Considered GA flow chart for training ANN (Liu and Wang 2019) According to Göçken et al. (2016), roulette wheel is used for parents’ selection and crossover rate is settled 80%. The one-point crossover is also used. A mutation rate of 20% along with binary mutation is also used. Selecting the best chromosomes among parents and children's, new generation continues with repeating the algorithm until termination condition is satisfied. Two termination conditions used are (1) repeating the best individual to 100 generations, and (2) reaching the maximum generation condition, i.e. reproduction of 2000 generations. Different parameters such as mutation rate, crossover rate and the number of population (population size) have been set based on Göçken et al. (2016). However, since different problems have different properties such as scalability/non-scalability, dimensional dependent/independent, there are some common beliefs about the range of parameters in different researches. For example, it is better that the number of population be between 20 and 50 and the crossover and mutation rate between 80–95 and 0.5–1, respectively (Hassanat et al. 2019). The GA pseudo-code (i.e. steps and how to get the parameters) is illustrated in Table 4.

Table 4

GA-ANN pseudo-code

Bat algorithm (BA)

Inspiring from echolocation behavior of microbats, the bat algorithm (BA) is proposed as a heuristic optimization algorithm (Iglesiasa et al. 2020). Mirjalili et al. (2014) proved the superiority of BA to some of the other algorithms, like GA and PSO. The echolocation of microbats is interesting and there are several parameters for simulation of its behavior such as speed, location, rate of occurrence, and loudness (Gàlvez and Iglesias 2016): every virtual and figurative bat flies stochastically with a speed υi at location (solution) x, with a rate of occurrence , changing wavelength λ and loudness A0. Searching and finding its prey, based on the approximation of the target, the rate of occurrence and loudness are changed and rate of pulse emission r is justified (Yang 2010). Exploration is strengthened by a local accidental walk and choosing the best continues until reaching the termination attributes (Nawi et al. 2014); to control the dynamic behavior of swarm of bats, a technique with frequency-tuning nature is used. Also, tuning the algorithm parameters can be applied balance between exploration and exploitation (Yang 2010). The loudness may change in different ways; it can be supposed that it alters from a big positive value A0 to a minimum fixed value A. Initially, BA started with a random population of bats, and then to renovate the location of each bat, the following formulas are used at each step:where is the position of the best bat, is a random value in [0, 1], and are the values of max and min frequency, and here they are assumed as being 1 and 0, respectively. The initial value of the frequency of each bat is selected from the range [. is applied to manage the velocity and the bats' movement scope (Nawi et al. 2014). Afterwards in the local search, each bat uses a random walk to create a new alternative. To accomplish this, each bat produces a random number . If is greater than the pulse emission rate, the new solution is generated by Eq. (4), otherwise it is generated by Eqs. (5–8) (Tsai et al. 2014; Chou and Nguyen 2018). e is an incidental value from − 1 to 1 and illustrates the mean value of the all bats' loudness. Here, to optimize the generated solution in the case that β is not greater than , a modification method is presented. The main objective of this modification is to increase the diversity of the bat population using mutation and utilizing crossover which help to enhance the search efficiency. Thus, for each bat , three bats are selected randomly in which . Now, by using the mutation and crossover operators, two below improved solutions are produced:n is the dimension of this problem.where and are randomly generated numbers in [0, 1] interval. Among and , the best one is replaced with . If and , the new generated solution is acceptable. Accepting the new solution, the loudness and the pulse emission rate are renewed as below: Here, and are constant values, is the initial rate of the pulse emission and t indicates the number of iterations. In this study, the explained BA is used to modify the weight matrix of ANN. In BAT-ANN, at first, the primary population of bats is used to form the initial weight matrix. This matrix is then passed to ANN to start the training phase (Hafezi et al. 2015). Then, BA specified the best solution based on the neural network results. A local search is then performed to discover new solutions. The replacement of new acceptable solution with the best knows solution is replied until satisfaction of the termination criteria (Yang 2010). Finally, the optimal values of the weight matrix are determined. Figure 2 shows the flowchart of BAT-ANN. It should be noted that the calculation method is adapted from Yang (2010), Golmaryami et al. (2015), and Jantan et al. (2017). Table 5 summarizes the notation used for parameters of BA.

Table 5

Bat algorithm parameters

Fitness function	Mean square error (MSE)
n	Population size (typically between 10 and 40)
N-gen	Number of generations
A_max	Maximum loudness
A_min	Minimum loudness
R	Pulse rate (constant or decreasing)
Q_min	Frequency minimum (0)
Q_max	Frequency maximum (1.5)
N-iter	Number of iteration (1000)
D	Number of dimension (20)
LB	Lower bound (− 100)
UB	Upper bound (+ 100)
Q	Frequency
V	Velocity

Bat algorithm parameters As in GA-ANN algorithm, different parameters such as pulse rate and velocity. have been set based on different research such as Golmaryami et al. (2015) and Hafezi et al. (2015). The process steps of the bat algorithm are shown in Table 6.

Table 6

BA pseudo-code

Initialize the bat population Xi (i = 1, 2, … , n) and Vi

Define pulse frequency F_i

Initialize pulse rate r_i and the loudness A_i

While (t < Max number of iterations)

Generate new solutions by adjusting frequency,

Updating velocities and positions [Eqs. (5) to (7)]

If (rand > r_i)

Select a solution among the best solutions randomly

Generate a local solution around the selected best solution

End if

Generate a new solution by flying randomly

If (rand < A_i and f (X_i) < f (Gbest)

Accept the new solutions

Increase r_i and reduce A_i

End if

Rank the bats and find the current Gbest

End while

The above pseudocode interpretation and more details are briefly as follows

1. Bat is initialized then passes its first population to ANN as weight's values

2. Load data

3. ANN starts training and computes the accuracy of the model

4. Bat finds the initial best solution by means of the ANN's results,

5. While I < Max number of iterations

BA pseudo-code The above pseudocode interpretation and more details are briefly as follows 1. Bat is initialized then passes its first population to ANN as weight's values 2. Load data 3. ANN starts training and computes the accuracy of the model 4. Bat finds the initial best solution by means of the ANN's results, 5. While I < Max number of iterations

Social spider algorithm (SSA)

This kind of algorithm means social spider optimization (SSO) that is in the subset of metaheuristic, evolutionary and swarm intelligence algorithms that is modified lifestyle form of the social spiders, male and female (Mirjalili et al. 2015). They have and do various functions and operations due to their gender, each one does different tasks like mating, preying, web design, social interrelation, etc. (Luque-Chang et al. 2018). As you know, a problem may have several answers and you should find them in search space. In this algorithm, you can consider communal web as a search space. Spiders due to their positions play the role of solution (Evangeline and Abirami 2019). Web and vibration are very important for spiders because they can understand when the prey is trapped and some details about mating which are transferred along the thin strings of the web due to spiders' vibrations (Reddy et al. 2019). In vibration two things are very important: weight and distance. Spiders should change their weights regard to a fitness value. Accordingly, they execute different operations such as mating. Like genetic algorithm which is based on superiority of better individuals, the offspring with better weigh changes with weak one, else the population doesn’t change. At the end of all iterations, the best spider with the best fitness seems to be the optimal choice (Yeh 2012). In training ANN with SSO algorithms, the best spider has a role of optimal solution. Here, to find the fitness value of the spider, minimization of MSE is considered as the objective function. Like other metaheuristic algorithm, SSA has different steps and parameters.

Initialization

Like any other swarm intelligence and evolutionary algorithms, the SSO algorithm begins with assigning an initial value to population and spider location. It includes two kinds of population: female and male spiders. The amount of population and individuals means female spiders ( can be selected intractably which often lies at range of 65–90% and is obtained by Eq. (12) and the amount of male spiders is also determined by Eq. (13): In SSO algorithm, the position of the is important. Therefore, we considered some limitations means upper bound and lower bound that is generated randomly between them. We have shown the initial parameters of lower and upper bound with and which are represented by:where , . Then is also accidentally created and equates as:where .

Fitness assignation

It should be noted that the size of spiders is very important and can affect the improvement in the solutions and optimizing the network and totally achieving the main goal. In the presented model, a weight is assigned to the ith spider (irrespective of gender) that indicates its quality in the population S. The weight is calculated for each spider as follows:where shows the fitness value of spider . Equation (17) represents the values of and as:

Vibration modeling

The communal web is something more than communal web and is vital for spiders according to the important things it makes possible, for example, making connection and relationship between spiders and their distance to each other. The size of vibrations means higher or lower one has different meaning. The more vibration means closer distance to each other and vice versa. In sequence, to exchange information between members i and member j of the colony, the mathematical definition of vibration is formed as follows:where shows the Euclidian distance of member i and j within the colony. Spiders use these vibrations to understand the distance and transfer it from member i to member j. These 3 types of vibrations occur between i and j that are illustrated as Vib, Vib, and . The individual i (s) receives the vibration as a result of the sent information by the member c (s) that is nearer to i and also with higher weight compared to i (w > w). The individual i receives the vibrations Vib as a result of the transferred information by the member b (s) that has the best weight or the best fitness value of the population S as a whole. Finally, the transferred information from the member i to the closest female individual can be defined by as:

Female cooperative operator

The movement between spiders means absorption or excretion is based on several random criteria which is shown with symbol in this article without considering gender. A random number is generated uniformly in the range of [0, 1]. When is smaller than a predetermined threshold PF, an attraction and a repulsion are created and shown in Eq. (22).where and rand are random numbers between [0, 1] and t is the iteration number and the individuals and symbolize the nearest spider with a higher weight than and the best spider in the communal web, respectively.

Male cooperative operator

According to weights, there are two groups of spiders. Some spiders have weights more than median the median of (dominant or D) and the others have weights lower than median of the (non-dominant or ND). The median weight is expressed by . The position of the might be equated as:

Mating operator

Mating has a specific range and takes place between dominant (D) and . The mating range is equated as: Spider weight is directly related to offspring. The chance of the spider with more weight is more likely to offspring and vice versa. Table 7 illustrates the parameters of the SSO algorithm.

Table 7

SSO Algorithm Parameters

Dim	Dimension (30)
Bound	100
Max-iteration	10,000
Pop-size	25
Alpha	0.99
Beta	0.7
Gamma	0.9
Fitness function	Mean square error (MSE)

SSO Algorithm Parameters The calculation method used in this paper is adapted from (Luque-Chang et al. 2018; Saravanan et al. 2019; Gülmez and Kulluk 2019). The steps in the social spider algorithm to obtain the parameters are as follows: Consider N as the total number of colony population; define the number of male N and female N spiders in the entire population S. Initialize randomly the male and female members and calculate the radius of mating. Calculate the weight of every spider of S. Move the female spiders according to the female cooperative operator. Move the male spiders according to the female cooperative operator. Perform the mating operation. If the stop criteria are met, the process is defined; go back to step 3.

Time series forecasting (ARMA and ARIMA)

When you check an events or sequence during time intervals, it is a kind of time series. (Hamilton 1994). You can check and examine the events in different frequencies such as yearly, monthly, weekly, daily, hourly or even in minutes or seconds. When you use past data to predict the future happens, it can be considered as prediction of single variable time series1 and when you anticipate something more than a series, it is called multivariate time series forecasting (Granger et al. 1974; Reinsel 2003). In autoregressive integrated moving average (ARIMA), the subsequent values of the variable are supposed to be the past observations and random errors in form of linear function and are called white noise. So we can use the same equation for the prediction of future value (Zhang 2003). ARIMA can be used to model the time series which are not station and don't have any pattern. An ARIMA model is known with these components: (p, d, q) (Sowell 1992). First of all, you should set the time series stationary. Because, phrase ‘Auto Regressive’ in ARIMA conveys that it uses its own lags for predictors as a linear regression model. We intend to check that whether predictors are dependent or independent to each other which this correlation can affect the model. To set a time series stationary, a lot of methods are represented. Differencing is the most common one (Clements and Hendry 2000). That is, the current value minus the previous value. Due to the complicacy in the model, sometimes it needs to difference it more than one. Thus, the value of d shows the minimum number of distinct required to create the series stationary. Heretofore the time series is stationary, it means that d = 0 and it doesn't need differencing. When particular lagged values of Y are used as predictor variables, the AR(p) model is considered as an autoregressive model. Where results of a time period affect the following periods, the lags are generated. The “p” value indicates the order. For instance, “First-order autoregressive process” can be shown as AR(1). The output variable of first-order AR in time t depends on its previous time periods (t − 1). The same is true in case of second- or third-order AR process that depend on data of two or three periods apart. An AR model is one where is only related to its own lags. Here, as the lags is illustrated as (Tseng et al. 2001; Akaike 1998).where are the past series values (lags), are the coefficient of lag estimated by the model and it also estimates as the seperating term. Also the moving average (MA) model equals 1 while is only depended to lagged caused by forecast errors (Said and Dickey 1984).where the errors are caused by autoregressive models regard to the related lags. These errors and relate to Eqs. (27)–(28): They come from AR and MA models, respectively. Through the combination of AR and MA with at least one differencing, an ARIMA model can be produced (Pai and Lin 2005). So the equation becomes: The following diagram shows the flowchart of ARIMA model (Fig. 4).

Fig. 4

ARIMA flowchart (Ma et al. 2018)

ARIMA flowchart (Ma et al. 2018) Additional explanations and more details are as follows: Step 1 Check stationarity: if a time series has a trend or seasonality component, it must be made stationary before we can use ARIMA to forecast. Step 2 Difference: if the time series is not stationary, it needs to be stationarized through differencing. Take the first difference, and then check for stationarity. Take as many differences as it takes. Make sure you check seasonal differencing as well. Step 3 Filter out a validation sample: this will be used to validate how accurate our model is. Use train test validation split to achieve this. Step 4 Select AR and MA terms: use the ACF and PACF to decide whether to include an AR term(s), MA term(s), or both. Step 5 Build the model: build the model and set the number of periods to forecast to N (depends on your needs). Step 6 Validate model: compare the predicted values to the actuals in the validation sample.

Experimental results and findings

The main goal of this study is to forecast stock price hybridizing ANN with GA for feature selection and two metaheuristic algorithms include BA and SSO for improving the network. The five major indices of DAX, S&P500, FTSE100, DJI and NDAQ are studied in this research. The desired time interval for research is from 4 July 2018 to 4 July 2020 about 2 years. Some important technical indicators like RSI, and MACD, etc., are employed as input variables and are reduced in optimal position. Thus, 20 technical indicators are selected to predict stock price, which 19 variables are inputs and 1 variable is output or target variable that determines the next day's price. The first step, as declared in Sect. 3 is data normalization. Data are normalized between [− 1, 1] to become ready as input variables. Table 8 is a general description of the indices, the timeframe and number of data used in this study.

Table 8

Statistical description of data

Symbol	Time interval	Number of data (before normalizing)	Number of data (after normalizing)	Number of data (after normalizing)	Number of input variables	Number of output layer	Target indicator
DJI	2018–2020	504	435	435	20	1	Closing price
DAX	2018–2020	504	436	436	20	1	Closing price
FTSE100	2018–2020	504	443	443	20	1	Closing price
NDAQ	2018–2020	504	437	437	20	1	Closing price
S&P500	2018–2020	504	438	438	20	1	Closing price

Statistical description of data As mentioned before, ANN includes three layers. The feature of ANN used in this study is defined in Sect. 3.2. Summarily, the number of input layer is 20, output layer is 1; and hidden layer quantity varies due to trial and error. The hidden layer uses tangent sigmoid as its activation layer and the output layer uses the simple linear. The data set is divided into two sections (1) training network (70%) and (2) validation (30). LM algorithm is used for training. Mean square error (MSE) is also adapted as loss function. The related information about architecture, training and testing for each indices is represented in Table 9.

Table 9

Training, validation and testing (T.V.T) error and network architecture

indices	Architecture	Weights	Fitness	Train error	Validation error	Test error	AIC	Correlation	R-squared
DJI	[20-50-1]	1101	0.0224	43.1729	48.8100	44.5219	1629.23	0.9991	0.9982
DAX	[20-37-1]	815	24.4495	0.0455	0.0446	0.0409	− 988.45	0.9993	0.9986
FTSE100	[20-50-1]	1101	0.05615	12.3556	15.9880	17.8072	1232.51	0.9979	0.9957
S&P500	[20-33-1]	727	0.2096	5.0277	5.0658	4.7692	237.52	0.9989	0.9978
NDAQ	[20-18-1]	397	2.8119	0.3131	0.4033	0.3556	− 1257.5	0.9986	0.9972

Training, validation and testing (T.V.T) error and network architecture More information about training, validation and testing for DJI index is represented Table 10 and Fig. 5, for instance. Other indices are presented in “Appendix A”.

Table 10

The DJI index details (T.V.T)

DJI	Training	Validation
Absolute Error	1.4611	9.9177
Network Error	4.49E−08	0
Error Improvement	3.57E−22
Iteration	371
Training Speed, iter/sec	7.1072
Architecture	[20-10-1]
TA	LM
TSR	NEI

TA training algorithm; TSR training stop reason, NEI no error improvement

Fig. 5

Actual V.S output (testing) for DJI

The DJI index details (T.V.T) TA training algorithm; TSR training stop reason, NEI no error improvement Actual V.S output (testing) for DJI

GA-ANN algorithm

The GA is used for choosing the best and fittest input and hidden layers in ANN. Therefore, Sect. 3.3 determines and describes the related parameters including size of population, the magnitude of generations, and rates of mutation and crossover. Using GA, the training, validation and testing error along with network architecture are determined according to Table 11.

Table 11

T.V.T error and network architecture after using GA

indices	Architecture	Weights	Fitness	Train error	Validation error	Test error	AIC	Correlation	R-squared	Best fitness
DJI	[10-16-1]	193	0.0123	39.76	73.36	81.21	− 211.23	0.9994	0.9989	89.81
DAX	[11-10-1]	131	22.18	0.0482	0.0608	0.0450	− 2339.3	0.9994	0.9987	59.65
FTSE100	[8-13-1]	131	0.0897	9.1412	12.79	11.38	− 798.78	0.9990	0.9980	74.33
S&P500	[12-30-1]	421	6.178	0.0721	0.1361	0.1618	− 1648.3	0.9999	0.9999	69.58
NDAQ	[10-21-1]	253	0.1649	5.1365	6.4543	6.0617	− 704.09	0.9994	0.9989	88.87

T.V.T error and network architecture after using GA Accordingly, using GA the number of input variables can be decreased to 8, while the amount of R-Squared is increased. The best fitness is the best technical indicators which network could recognize. For each index, as it is clear, a different number of input variables is selected, and this is due to the difference in the importance and the role of each technical indicator in the final price or target output (index). Details about selected technical indicators are represented in appendix (table ***A5). In this section, the parameters are optimized and the network is improved improved using bat algorithm. The obtained result is illustrated in Table 12.

Table 12

Bat-ANN optimum parameters and error

Indices	Alpha	F_min	Gamma	G	d	Pop	M	Fitness function (MSE)
DJI	0.99	8.35E−03	0.9	10	10	30	1000	1.0E−55
DAX	0.99	6.43E−04	0.9	10	11	30	1000	1.0E−63
FTSE100	0.99	5.51E−05	0.9	10	8	30	1000	1.0E−31
NDAQ	0.99	5.31E−03	0.9	10	12	30	1000	1.0E−33
S&P500	0.99	1.33E−04	0.9	10	10	30	1000	1.0E−s22

Bat-ANN optimum parameters and error

SSO (social spider optimization) algorithm

In this part, the global best fitness and global best solution are checked after 1000 iterations. Therefore, the error is improved using SSO. At first, the parameters set to a predetermined number and then, the network optimizes it with minimum error. Table 13 indicates the optimum error and parameters.

Table 13

SSO-ANN optimum parameters and error

Indices	Alpha	Beta	Gamma	Epoch	Input layer	Hidden layer	Output layer	Global best fitness	Global best solution (average)
DJI	0.7665	0.6439	0.7512	250	10	10	1	1.0E−64	1.0E−44
DAX	0.7521	0.5441	0.7591	357	11	22	1	1.0E−50	1.0E−20
FTSE100	0.6314	0.5512	0.6371	550	8	13	1	1.0E−73	1.0E−51
NDAQ	0.5365	0.6891	0.8111	953	12	17	1	2.0E−68	2.0E−35
S&P500	0.871	0.752	0.667	368	10	12	1	1.0E−30	1.0E−16

SSO-ANN optimum parameters and error are random numbers between [0, 1]. The classical SSO requires the random selection of parameters [(22) and (23)] to control the movement of the spiders, which can affect the mentioned balance leading the algorithm to a premature convergence. The other details including the ANN structure (i.e. the number of neurons in each layer such as input layer, hidden layer and output layer) and the estimation error and the average optimum solutions are attainable too. According to this table, it can be easily seen that error is very lower than ANN and GA-ANN network.

Time series forecasting (ARIMA)

The time series with financial nature usually are not stationary; they have some characteristics such as skewness and kurtosis with fat tail. Before doing everything, it seems necessary to check and recognize the stationary of the series. In this research, to find and test the stationary, Augmented Dickey Fuller Test is used. First, the stationary of each index is checked separately. The correlogram of DJI is shown in Fig. 6. Table 12 is shown the Unit root test without differencing for DJI.

Fig. 6

Correlogram of closing price (DJI)

Correlogram of closing price (DJI) From Table 14, since t-statistic, i.e. − 2.001110, is bigger than critical values in various significance levels (1%, 5% and 10%). Thus, series has a unit root and doesn't seem stationary. This problem is solved using ADF test.

Table 14

Unit root test without differencing (DJI)

H₀: CLOSE has a unit root
Exogenous: Constant
Lag Length (LLgth): 0 (Automatic—based on SIC (ABSIC), maxlag = 17)

Unit root test without differencing (DJI) After differencing, the series is stationary (Fig. 6). More details are represented in Table 15 (Fig. 7).

Table 15

ADF test after differencing

H₀: D(CLOSE) has a unit root
Exogenous: Constant
LLgth: 0 (ABSIC, maxlag = 17)

Fig. 7

Correlogram of closing price after differencing (DJI)

ADF test after differencing Correlogram of closing price after differencing (DJI) Now the series can be forecasted using ARIMA. Using Eviews 10, the degree of ARIMA is predicted. Table 16 shows the best model estimation. The used models to select criteria are summarized as can be seen in Table 17. Also, Fig. 8 illustrates the Akaike information criteria, while the ARIMA forecasting summary is illustrated in Table 18.

Table 16

ARIMA forecasting

Dependent variable: D(CLOSE)
Method: ARMA maximum likelihood (BFGS)
Date: 04/10/20 Time: 11:41
Sample: 2435
Number of observations: 434
Failure to improve objective (non-zero gradients) after 188 iterations

Table 17

The models used to select criteria

Model Selection Criteria Table
Dependent variable: D(CLOSE)
Date: 04/10/20 Time: 11:41
Sample: 1435
Number of observations: 434

Fig. 8

Akaike information criteria (top 20 models)

Table 18

ARIMA forecasting summary

Automatic ARIMA Forecasting

Selected dependent variable: D(CLOSE)

Date: 04/10/20 Time: 11:41

Sample: 1435

Number of observations: 434

ARIMA forecasting The models used to select criteria Akaike information criteria (top 20 models) ARIMA forecasting summary As it is clear, the best ARIMA selected model is (4, 1, 3) with AIC value − 2.695. The above process is done over all the indices and the results are represented in “Appendix B”.

Comparing results

In this part, some similar studies are reviewed and the obtained results are compared with them in the, as illustrated in Table 19.

Table 19

Comparative study

Author and date	Proposed approaches	Data type	Data type	MSE	MAE	R²
Gogna and Tayal (2013)	GA-ANN	Train	Train	0.0074	0.0584	0.9866
	GA-ANN	Test	Test	0.0079	0.0585	0.9895
	PSO-ANN	Train	Train	0.0013	0.0253	0.9972
	PSO-ANN	Test	Test	0.0014	0.0260	0.9969
	ICS-ANN	Train	Train	0.0076	0.0720	0.9966
	ICS-ANN	Test	Test	0.0068	0.0694	0.9995
Sedighi et al. (2019)	ARIMA-SVM	Final outcome	Final outcome	1.0042	0.0142
	SVM-RF	Final outcome	Final outcome	0.000295	0.0245
	ANFIS-SVM	Final outcome	Final outcome	3.5849	0.0117
	FA-MSVR	Final outcome	Final outcome	0.0014	0.0130
Safa and Panahian (2018)	HS-ANN	Final outcome	Final outcome	0.02776	0.05177	0.9641
Emamverdi et al.(2016)	ANN	Final outcome	Final Outcome	0.00030	0.0174
Emamverdi et al.(2016)	ARIMA	Final Outcome	Final outcome	0.00042	0.0162
Zheng et al.(2013)	Wavelet neural networks	Final outcome	Final outcome	0.00510	6.742E−04	0.9877
Dong et al.(2013)	One-step ahead and multi-step ahead predictions	Final outcome	Final outcome	0.0043	0.1043	0.9012
Wang et al.(2016)	Delayed neural network (DNN)	Final outcome	Final outcome	1.60E−03	1.00E−07	0.9955
Sin and Wang (2017)	Ensembles of neural network	Final outcome	Final outcome	2.05E−05	2.045E−09	0.9963
Current research	ANN	Train	Train		12.1827	0.9975
	ANN	Test	Test		13.499
	GA-ANN	Train	Train		10.8316	0.9988
	GA-ANN	Test	Test		19.7717
	BA	Final outcome	Final outcome		1.0E−40	0.9993
	SSO	Final outcome	Final outcome		1.0E−52	0.999
	ARIMA	Final Outcome	Final outcome		0.0712846	0.6028

Comparative study It can be seen that the lowest loss functions and highest R-Squared are obtained using the Social Spider Optimization (SSO) and bat algorithm (BA) and these algorithms performed well.

Conclusions

Today, the speed of making decisions has increased. So, the stock market has been many fluctuations and volatilities. Different factors toughen up the severity of fluctuations among them can refer to major economic, politic and social changes. On the other hand, with the Coronavirus outbreak in the late of 2019, a great fluctuation is expected in stock market. Thus, using improved and well-equipped methodologies to confront these fluctuations will be a necessity. One of the main tools that can help investors is artificial intelligence (AI). AI has many applications such as pattern recognition, regression, classification. In the current study, application of a usual ANN in forecasting stock price is compared with a hybrid metaheuristic-based ANN. To forecast stock price, a data set is employed to train and test an ANN. Then, a hybrid ANN is developed. In the proposed hybrid ANN, genetic algorithm is used for feature selection. Then, the bat algorithm and social spider optimization are used separately for ANN parameters optimization. In this paper, five main and important indices, such as DJI and DAX, are forecasted using ANN which is in the subset of AI. We used 20 main technical indicators as input variables. Today, many methods are used to optimization of the network. One of them is evolutionary algorithms. We used GA as an evolutionary algorithm for feature selection purpose. We could see that by using GA, the number of input variables reduced significantly. Thus, the speed of calculations and the accuracy of the network and the coefficient of determination increased. Also, two new metaheuristic algorithms including social spider algorithm and bat algorithm have been used to improve the results. The main advantages of using metaheuristic algorithms are as follows: Speed up calculations Reduce model complexity Increase the network accuracy Ease of using models High robustness Intelligent. On the other hand, they have some limitations: In GA, there is no guarantee that the best and most related technical indicators have been selected. We have tried to overcome the local optima trap but it is still possible. Comparing with previous methods, SSO and BA have had the lowest error, respectively, which could predict stock price better. As it is clear, the error of the social spider algorithm has been less, but this does not mean that this algorithm is better. Due to the difference in the time required to calculate, the complexity of the calculations, the required parameters, etc., we cannot say with certainty which one is better. But if we consider error as a measure of superiority, the social spider algorithm performed better. We used time series for the prediction of stock price too. The considered model was ARIMA. Because of nonlinearity and asymmetric qualification of stock price data, ANN could predict the stock price better than time series model means ARIMA. Experiments show that hybrid models perform better to explain the model with lower error. Therefore, the main recommendation is that different new metaheuristic algorithms should be used to train the network.

Table 20

T.V.T details (DAX)

DAX	Training	Validation
Absolute error	0.0006	0.0036
Network error	6.34E−09	0
Error improvement	9.71E−14
Iteration	401
Training speed, iter/sec	6.937
Architecture	[20-10-1]
TA	LM
TSR	NEI

Table 21

T.V.T details (FTSE100)

FTSE100	Training	Validation
Absolute error	0.3548	3.5899
Network error	9.19E−08	0
Error improvement	5.31E−11
Iteration	501
Training speed, iter/sec	7.07
Architecture	[20-10-1]
TA	LM
TSR	All iteration done

Table 22

T.V.T details (NDAQ)

NDAQ	Training	Validation
Absolute error	0.1354	0.261
Network error	1.50E−05	0
Error improvement	1.69E−21
Iteration	36
Training speed, iter/sec	6.428
Architecture	[20-10-1]
TA	LM
TSR	No error improvement

Table 23

T.V.T details (S&P500)

S&P500	Training	Validation
Absolute error	0.3538	3.09
Network error	1.50E−07	0
Error Improvement	6.69E−20
Iteration	327
Training speed, iter/sec	7.047
Architecture	[20-10-1]
TA	LM
TSR	No error improvement

Table 24

Selection of most important technical indicators using GA

Symbol	Technical indicators	Selected using GA	Selected (removed)
DJI	Open, High, Low, Close, Vol, SMA(5), SMA(10), EMA(5), ADL, CMF, MFI, RSI, Upper Band, Lower Band, MP, ROC, TP, DX, CCI, ATR	Open, High, Low, RS, Upper Band, SMA (5), SMA (10), ROC, Vol TP	(10)-10
DAX	Open, High, Low, Close, Vol, SMA(5), SMA(10), EMA(5), ADL, CMF, MFI, RSI, Upper Band, Lower Band, MP, ROC, TP, DX, CCI, ATR	Low, MP, SMA(5), EMA(5), TP, ROC, SMA(10), %R, ADL, RSI, MFI	(11)-9
FTSE100	Open, High, Low, Close, Vol, SMA(5), SMA(10), EMA(5), ADL, CMF, MFI, RSI, Upper Band, Lower Band, MP, ROC, TP, DX, CCI, ATR	High, ROC, %R, EMA(5), SMA(5), SMA(10), RS, RSI	(8)-12
NDAQ	Open, High, Low, Close, Vol, SMA(5), SMA(10), EMA(5), ADL, CMF, MFI, RSI, Upper Band, Lower Band, MP, ROC, TP, DX, CCI, ATR	Low, SMA(5), EMA(5), SMA(10), RS, RSI, ROC, TP, MP, ADL, VOL, CCI	(12)-8
S&P500	Open, High, Low, Close, Vol, SMA(5), SMA(10), EMA(5), ADL, CMF, MFI, RSI, Upper Band, Lower Band, MP, ROC, TP, DX, CCI, ATR	Open, Upper Band, Lower Band, RS, SMA(5), SMA(10), EMA(5), TP, RSI, High	(10)-10

Table 25

Correlogram of closing price (DAX)

Table 26

Unit root test without differencing (DAX)

H₀: CLOSE has a unit root
Exogenous: Constant
LLgth: 0 (ABSIC, maxlag = 17)

Table 27

Correlogram of closing price after differencing (DAX)

Table 28

ADF test after differencing

H₀: D(CLOSE) has a unit root
Exogenous: Constant
LLgth: 0 (ABSIC, maxlag = 17)

Table 29

ARIMA forecasting

Dependent variable: D(CLOSE)
Method: BFGS
Date: 04/10/20 Time: 12:11
Sample: 2436
Number of observations: 435
Convergence achieved after 260 iterations

Table 30

ARIMA forecasting summary

Automatic ARIMA Forecasting

Selected dependent variable: D(CLOSE)

Date: 04/10/20 Time: 12:11

Sample: 1436

Number of observations: 435

Forecast length: 0

Table 31

ARIMA forecasting (FTSE100)

Dependent variable: CLOSE
Method: BFGS
Date: 04/10/20 Time: 12:29
Sample: 1443
Number of observations: 443
Convergence achieved after 195 iterations

Table 32

ARIMA forecasting summary

Automatic ARIMA Forecasting

Selected dependent variable: CLOSE

Date: 04/10/20 Time: 12:29

Sample: 1443

Number of observations: 443

Forecast length: 0

Table 33

ARIMA forecasting (NDAQ)

Dependent variable: D(CLOSE)
Method: ARMA maximum likelihood (BFGS)
Date: 04/10/20 Time: 12:38
Sample: 2437
Number of observations: 436
Convergence achieved after 33 iterations

Table 34

ARIMA forecasting summary

Automatic ARIMA Forecasting

Selected dependent variable: D(CLOSE)

Date: 04/10/20 Time: 12:38

Sample: 1437

Number of observations: 436

Forecast length: 0

Table 35

ARIMA forecasting (S&P500)

Dependent variable: D(CLOSE)
Method: LSqr
Date: 04/10/20 Time: 12:54
Sample (adjusted): 2438
Number of observations: 437 after set out

Table 36

ARIMA forecasting summary

Automatic ARIMA Forecasting

Selected dependent variable: D(CLOSE)

Date: 04/10/20 Time: 12:54

Sample: 1438

Number of observations: 437

Forecast length: 0

2 in total

1. An adaptive particle swarm optimization-based hybrid long short-term memory model for stock price time series forecasting.

Authors: Gourav Kumar; Uday Pratap Singh; Sanjeev Jain
Journal: Soft comput Date: 2022-08-26 Impact factor: 3.732

2. Novel pricing strategies for revenue maximization and demand learning using an exploration-exploitation framework.

Authors: Dina Elreedy; Amir F Atiya; Samir I Shaheen
Journal: Soft comput Date: 2021-07-25 Impact factor: 3.643

2 in total