Literature DB >> 36196451

A Bayesian-based classification framework for financial time series trend prediction.

Arsalan Dezhkam1, Mohammad Taghi Manzuri1, Ahmad Aghapour1, Afshin Karimi1, Ali Rabiee1, Shervin Manzuri Shalmani2.   

Abstract

Financial time series have been extensively studied within the past decades; however, the advent of machine learning and deep neural networks opened new horizons to apply supercomputing techniques to extract more insights from the underlying patterns of price data. This paper presents a tri-state labeling approach to classify the underlying patterns in price data into up, down and no-action classes. The introduction of a no-action state in our novel approach alleviates the burden of denoising the dataset as a preprocessing task. The performance of our labeling algorithm is experimented with using machine learning and deep learning models. The framework is augmented by applying the Bayesian optimization technique for the selection of the best tuning values of the hyperparameters. The price trend prediction module generates the required trading signals. The results show that the average annualized Sharpe ratio as the trading performance metric is about 2.823, indicating the framework produces excellent cumulative returns.
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Entities:  

Keywords:  Classification; Deep learning; Feature engineering; Financial time series; Machine learning; Trend prediction

Year:  2022        PMID: 36196451      PMCID: PMC9521884          DOI: 10.1007/s11227-022-04834-4

Source DB:  PubMed          Journal:  J Supercomput        ISSN: 0920-8542            Impact factor:   2.557


Introduction

Although the outbreak of the COVID-19 pandemic triggered a sharp decline in stock prices across the financial market, a closer look at the recent figures on the market indices and stock prices illustrates that the trends are showing a pattern of great acceleration. Financial markets’ data form in time series and have been studied by researchers within the past decades, though the main objective of these studies is to find more insight into the underlying market trends. The more insight extracted from the market behavior; the better asset pricing is likely to be achieved. This is precisely the most significant aspect of portfolio formation part of the investment process. However, according to the EMH,1 it is impossible to forecast the prices for future time intervals because the information propagation across the markets rapidly results in updating prices. On the other hand, many studies have been conducted proving that financial markets are indeed a combination of efficient and non-efficient markets; therefore, the stock prices are, to some extent, predictable [31]. Prediction is the process of finding the next plausible outcome based on past experiences and observations. Hence, feeding the most relevant observations to the prediction model is a crucial task to improve the accuracy of the desired output. This process is called feature engineering. When it comes to financial time series, substantial considerations should be noted. In the case of image, text, and speech observations, the input signal has almost all the required information for modeling the prediction process. On the other hand, asset pricing is a complex problem influenced by multiple endogenous and exogenous factors including but not limited to systematic risk, market behavior, interdependence between markets, macroeconomic variables, firm-specific information, investors’ sentiment, and news. Authors [4] proposed a comprehensive taxonomy of input features prevalent among financial market researchers. According to their literature review, the authors have shown that the technical indicators have higher prediction power, while the informative signals from social media could boost the models’ performance. Hence, the feature selection and engineering process is a stone step toward building an overarching portfolio formation and optimization model. Researchers have tackled the feature extraction and engineering process using techniques from the time–frequency domain, statistical methodologies, traditional machine learning approaches, and recently deep learning frameworks. For example, the autoregressive integrated moving average (ARIMA) has extensively been used by researchers in financial time series analysis. The ARIMA model is a linear nonstationary model based on the autoregressive moving average (ARMA) model, including a new difference operator to convert nonstationary series to stationary and take the volatility clustering into account [19, 47]. Both ARMA and ARIMA belong to the univariate class of statistical analysis approaches since the only input variable is time series. There are other statistical methods in the same category, such as the generalized autoregressive conditional heteroscedastic (GARCH), and the Smooth Transition Autoregressive (STAR). Researchers in [47] also mentioned the second class of multivariate statistical methodologies, including linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), linear regression (LR), and support vector machines (SVM). Among time–frequency techniques, discrete wavelet transform (DWT) has been broadly exploited for feature extraction from financial time series. Authors [45] applied wavelet decomposition to the crude oil time series, turning the time series into different forecasting horizons. Applying the DWT to average monthly crude oil prices, they framed their procedure to compartment the whole signal into low- and high-frequency parts. The coarse scales follow the main trends, while the finer scales' seasonal fluctuations, singular events, and noise appear. Based on the power of wavelet transform in extracting features from various types of data, researchers have always shown specific interest in applying DWT as a preprocessing stage on the financial time series combining it with other frameworks like quantile regression, neural networks, and other applicable methods [1, 10, 12, 20, 23, 29, 42, 43]. Dimensionality reduction for extracting the abstract and high-level features to feed the subsequent modules of the prediction models has been studied in many research works such as [39, 46]. In [46], Zhang et al. applied principal component analysis (PCA) to perform dimensionality reduction and extract the abstract and high-level features to feed the next module of their framework, an LSTM predicting the next trading day's close price. They took the first four principal components of the cumulative contribution rate of the Shanghai Composite Index as the training sample data fed into LSTM. The authors [7] designed a framework to extract features from 24 randomly selected stocks in the SSE 50 index (Shanghai Stock Exchange), using a hybrid method based on the XGBoost and IFA. The generated features are then used in a mean–variance model for portfolio formation. Around the second decade of the twenty-first century, the winter of neural network applications across the science and technology realm turned to spring. Accordingly, deep learning models found their way into the financial market analysis to find better solutions for complex problems such as asset pricing, stock price prediction, contagion between financial markets, spillovers, and other problems. A note-worthy application of deep learning models is feature extraction and engineering due to their multi-layer cascading non-linear units, enabling them to capture non-linear dependencies and underlying trends in data. Although most of the early works in the context of financial market analysis are based on long short-term memory, LSTM, there is also a rise in applying other architectures like RL units, Q-learning, ensemble learning, transformer networks, and recently generative adversarial networks (GAN) [5, 13, 22, 23]. The authors [25] investigated the contribution of additional information from the US stock market to South Korea's stock prediction. They exploited a multimodal deep learning framework to capture the cross-modal correlation at different levels and showed that deep multimodal networks can leverage the complementarity of stock data and provide more robust predictions. The authors [27] proposed a two-phase solution for the structural break problem in stock markets using deep reinforcement learning and continuous wavelet CNN. To estimate the occurrence probability of a structural break, within the first phase, they combined the time-domain and frequency-domain extracted by LSTM and continuous wavelet CNN, respectively, and after that, the pairs trading strategy in the next phase is optimized using deep Q-learning. Authors [40] applied a preprocessing stage based on the genetic algorithm, GA, on a train and test dataset and then trained a back propagation neural network to predict the closing price of the Shanghai and Shenzhen 300 index for the next 100 trading days. To enable neural networks to receive the input vector sequentially, recurrent neural networks (RNN) emerged from the traditional feedforward networks. However, these models have severe problems dealing with long input sequences since they can only handle a few steps back. Hence, a developed variation of RNNs named LSTM was introduced to tackle these problems by adding three gates to the original RNN architecture: (1) a forget gate to control what information requires to be thrown away from the LSTM memory; (2) an input gate to indicate if new information will be added into the memory, and (3) an output gate controlling the output state. To introduce a threshold-based portfolio [24], the authors built three architectures, S-RNN, LSTM, and GRU, to forecast 1-month-ahead stock returns and then used the last business day OHLCV of each month for building portfolios. Tian et al. [36] proposed a hybrid deep learning model based on multilayer bidirectional LSTM networks to solve the stock price prediction problem. They first analyzed the attributes of 10 different stocks using the Pearson correlation coefficient and then applied the LSTM model to forecast the retained attributes after the analysis. There have been various attempts to tailor the structure of deep learning networks to the observations’ characteristics across different contexts. For example, the authors [9] proposed a model to place another attention mechanism over the document-level attention. The so-called attention-over-attention reader model was exploited to provide a solution to the cloze-style reading comprehension task. The authors [14] crafted a CNN-bLSTM deep learning model for improving the performance of conversational speech recognition tasks. Although increasing the number of layers in multi-layer deep models results in the enhanced learning ability of the network, it turns the model to face the problems such as exploding and vanishing gradients. To tackle the incurred problems, researchers proposed a handful of techniques such as dropout, batch normalization, and residual [2, 15, 16, 18, 35]. There are attempts to extend the sentiment analysis techniques and apply the results to price prediction models to enhance the performance of the task. In a recent study [38], the authors augmented a Bidirectional Encoder Representations from Transformers, BERT, with CNN structure to capture important local information in the financial texts. Inspired by the word vectorization technique in natural language processing, the authors [30] introduced stock vectors and proposed two LSTM architectures for dimension reduction and price prediction, one with an embedded layer and the other based on an automatic encoder. Their experimentation for Shanghai A-shares composite index showed that the deep LSTM with the embedded layer performs 0.3% better in terms of the accuracy metric. In recent decades, much research has been done based on price prediction as a regression task. However, researchers have shown that trend prediction as a classification task can dramatically improve machine learning and deep learning model predictions [32, 37, 44]. The task of labeling financial times series has a significant impact on the prediction model's performance, though the problem has not been widely studied in the literature. Recently, Wu et al. [41] proposed a price data labeling method to extract the continuous trend features of financial time series data and group them into two upward and downward classes. However, sometimes the market has unpredictable fluctuations; in this situation, investors risk losing their money. Thus, the labeling algorithm should have an extra state that shows these unpredictable and risky situations to prevent investors from investing their money in that period. Hence, proposing a price data labeling algorithm to help produce a more informative input feature vector for the market trend prediction task is of great significance. Accordingly, in this study, we introduce a novel tri-state labeling algorithm that significantly improves the quality of predictions by introducing three states that show upward and downward trends, besides the risky situation with the no-action state. The rest of the paper is organized as follows: In Sect. 2, we cover the methodology of our research work. It contains eight sub-sections starting from Sect. 2.1, which is the complete description of our proposed tri-state labeling algorithm. Since our experimentation will be conducted on financial time series, in Sect. 2.2, we have thoroughly covered the reasons for using embargoed purging cross-validation instead of the traditional K-fold CV. Then, we included the required background on the machine learning and deep networks we aim to use as our predictive machines. We first explain support vector machines in Sect. 2.3 and then continue the ML methods with XGBoost in Sect. 2.4. The basics of LSTM and GRU are also discussed in Sects. 2.5 and 2.6. In Sect. 2.7, the reader will be refreshed with our approach to the performance evaluation of the classification task. The hyperparameters’ value tuning using Bayesian optimization is discussed in Sect. 2.8. Section 2.9 provides readers with all our steps to design and evaluate our trading system. The reader can find all information about our experimentation and the associated results and discuss the findings within Sect. 3. Finally, Sect. 4 concludes the research findings and the further possibilities for future research works.

Methodology

Proposed labeling algorithm

The first part of our proposed framework is a labeling algorithm to extract continuous upward and downward trends from daily close price time series. Our input is the close price time series denoted by , where is close price at time . The algorithm finds such that:where is the close price at the time and denotes a threshold value which is a hyperparameter. According to Eq. (1) if the condition is satisfied then the trend is labeled as upward, otherwise, the direction of the changes is downward. Once the overall direction of the price changes is found, in the second phase of the algorithm, the labeling algorithm will deal with directional changes as follows: Suppose that the labeling algorithm reads , the price at , and the direction at time has been labeled as upward. However, the algorithm should decide the exact time of changing direction while keeping the upward trend until the next prices are still at higher levels. Therefore, the following three cases are determined: Case 1 In this case, we still label the trend with + 1 as upward since the coming price at the time is at a higher level compared with the price at the time . Case 2where is the window size for the period with no price fluctuations with much larger or smaller values than , the last updated price shows an upward trend. This means the upward trend has ended and we now can label all the coming time instances with 0, as a “no-action” trend. It is worth mentioning that is also a hyperparameter that needs to be determined. Case 3 This means that the upward trend ends and we now should change the direction to downward. This means while we label all instances with + 1, the state space will be ready to follow the downward direction. On the other hand, if the state space shows the no-action, label (denoted by 0), the algorithm is adjusted to handle the corresponding three cases: Case 1 In this situation, the trend shows the start of an upward direction. Case 2 This means the current price at the time is lower than , the price at the last no-action state, with a factor of . Hence, the algorithm detects a change from a no-action state to falling prices and starts to label the past time instances with label 0 and prepares the state space for the coming downward trend follow-up. Case 3 Since the price time series is still fluctuating, we continue to update the no-action trend with label 0. The third and final state for our proposed tri-state labeling algorithm is − 1 or the “downward,” for which exist three cases: (1) remaining at the current state, (2) changing to an upward direction, or (3) putting the system to the no-action state. Case 1where is a previous time instance of a downward trend, hence we only update the current downward trend. Case 2where is the latest time instance of the price at an upward trend, hence, the prices are still fluctuating which means the bullish trend has ended and we are ready to enter a no-action trend while the labeling process of the past trend with − 1 is done. Case 3where is the last time instance at which the trend has been labeled with a downward direction. Therefore, the direction is changed to upward. This means we must set the state space for the start of a new upward direction, while the past instances are labeled with − 1, as a downward trend.

Combinatorial purged K-fold cross-validation

For a machine learning algorithm to properly learn the general structure of the data and prevent it from the extreme fidelity to the data, we usually split observations into two training and test sets, where the cross-validation (CV) technique is used to prevent overfitting. K-fold CV is widely used among machine learning researchers among popular CV methods. However, for two reasons, this cross-validation method produces undesired results when applied to financial time series. First, financial time series do not possess the properties of an independent and identically distributed (IID) process. Finance observations are serially correlated, meaning that the feature at time is highly correlated with the feature at time . Therefore, the prediction process from overlapping data points results in a label at time which is derived from overlapping features from time . A second reason for CV’s failure in finance is the multiple testing and selection bias. The solution for the second problem is to purge all overlapping labeled samples from the training and test sets. For the serial correlation problem between financial features, the solution is to embargo those samples in the series that immediately follow another sample in the test set. This purging and embargoing cross-validation technique is known as PURGED K-FOLD CV [28]. As shown in Fig. 1a [28], within one partition of the K-fold cross-validation, two overlapping regions need to be purged to prevent data leakage between training and test sets. As shown in Fig. 1b [28], the embargo process is imposed on training samples directly after a test set to bolster leakage prevention between training and test observations.
Fig. 1

a Purging overlap in the training set; b Embargo of post-test train observations [28]

a Purging overlap in the training set; b Embargo of post-test train observations [28]

Support vector machines (SVM)

SVMs are a set of widely used supervised learning algorithms for classification, regression, and outlier detection tasks through finding the optimal hyperplane using margin maximization. The basic idea behind SVM is to apply a non-linear transformation to map the input vector into a high-dimensional feature space. Suppose the input feature is , where is the total number of data patterns, and the corresponding target is . Then, the SVM computes a decision function of the following form: The objective is to maximize the margin plane parameterized by and . Class labels are then assigned by function: The parameters and are estimated by solving the following minimization problem:where is the penalty parameter, and , are the slack variables. The above-mentioned optimization problem is solved by the Lagrangian method:where is the kernel function:where the is the inner product operator. The solution for determines the parameters and for the optimal hyperplane.

Extreme gradient boosting (XGBoost)

In the context of machine learning, a weak learner is a classification model that can perform marginally better than random guessing. Authors [33] developed boosting in a successful attempt to answer the question “Can a set of weak learners create a single strong learner?” proposed by [21]. The main idea behind most of the boosting algorithms is to iteratively apply a weak learner to training data and assign more weights to misclassified observations to find a new decision stump for them. Finally, all learned models are aggregated to form a strong learner able to classify all training samples correctly. Therefore, a decision tree ensemble model with additive functions is used to predict the target.In Eq. (16), is the m-dimensional input feature vector, is the one-dimensional target vector forming the cardinality sample space . The space of classification and regression trees (CART) with leaves in each tree is also indicated by where represents an independent tree structure whose leaf weights are . To classify the observations, the decision rules in the trees are applied to calculate the predicted target by summing up all , the weights in the corresponding leaves. Equation (17) shows the objective function used to learn the set of functions used in the model.where As it can be seen from Eq. (17), the model is trained in an additive manner instead of using traditional optimization methods in the Euclidean space. Hence, while the adaptive boosting techniques try to weigh misclassified samples more, in gradient boosting, base learners are generated sequentially so that the current model is always more effective than the previous one by ameliorating a loss function. Therefore, the objective function to be optimized is modified to include greedily adding . XGBoost [6] is a highly enhanced version of gradient boosting, and it mainly aims at increasing computation speed and efficiency since the gradient boosting algorithm analyzes the datasets sequentially resulting in a very low rate performance. Furthermore, XGBoost supports parallelization by creating decision trees in a parallel way like the random forest. It also exploits distributed computing methods to evaluate large and complex models and uses Out-of-Core computation to analyze large and varied datasets. Using cache optimization is another technique used in XGBoost to achieve a higher level of optimal resource utilization. Therefore, XGBoost, as a simple to utilize and interpretable prediction model, has been widely used and has outperformed most modern and state-of-the-art deep learning methods for classification and clustering of tabular data [34].

Long short-term memory (LSTM)

Learning in the human brain is an accumulative process that prevents it from restarting thinking and learning from scratch every second. This is in contrast to what happens in traditional neural networks, as they cannot consider the previous events to infer about the later ones. To overcome this shortcoming, recurrent neural networks (RNNs) have been widely adopted concerned with applications in time series and sequential data such as price prediction, speech recognition, and image recognition. As it is shown in Fig. 2, the goal of the RNN is to have a model which can take the current observations in the associated vector , send it into a neural network and using the knowledge from the previous stage as the hidden vector and predict the next target .
Fig. 2

RNN architecture

RNN architecture Vanishing gradient, exploding gradient, long-term dependency, and unidirectionality are the major drawback of the RNN model. The long short-term memory, LSTM, model is one way to solve these problems. LSTM has introduced a memory unit called the cell into the network. Different researchers adopted various architectures for the LSTM model, but the original design incorporates a forget gate, an input gate, and an output gate with a peephole connection. The mathematical equations according to the connections and gates in the LSTM architecture, Fig. 3, are expressed as follows:where , , and are the forget gate, input gate, and output gate at time , respectively. s and s in Eq. (19) represent the corresponding weight matrices and bias terms in each equation. indicates a sigmoid activation function, is a hyperbolic tangent function, and is an element-wise multiplication operator. The forget gate decides what information from the past cell state is propagated to the updating process of the cell state at time . If , it keeps the information received, while a value of 0 for means the information is discarded. , , and are the peephole weights for the respective forget, input, and output gates. The peephole connections are a mechanism to enable the LSTM cell for inspecting its current internal states resulting in learning unsupervised stable and precise timing algorithms [11]. Since LSTM is a special variant of the RNN model, the same process for RNN weight updates and hyperparameter optimization methods can be exploited within LSTM networks [3].
Fig. 3

LSTM architecture

LSTM architecture

Gated recurrent unit (GRU)

Although the LSTM cell has an outstanding learning capacity in comparison to the traditional RNN, its computational complexity is higher than RNN because of the extra parameters of the model. To reduce the number of parameters, the authors [8] introduced the GRU cell by integrating the forget and input gates into an update gate. Having only two gates, reset and update, the GRU removes a gating signal, and the associated parameters compared with the LSTM cell. Since GRU has fewer parameters to be learned it has fewer tensor operations which in turn results in slightly reduced computational time. Equation (20) represents the mathematical expressions of the GRU cell, and the corresponding architecture is visualized in Fig. 4.
Fig. 4

GRU cell. Forget and input gates in LSTM are now integrated into the update gate in the GRU model

GRU cell. Forget and input gates in LSTM are now integrated into the update gate in the GRU model

Classification task metrics

The AUC, or area under the receiver operating characteristic curve, is the most versatile and common evaluation metric used to judge the quality of a binary classification model. It is simply the probability that a randomly chosen positive data point will have a higher rank than a randomly chosen negative data point for the learning problem. So, a higher AUC means a more sensitive, better-performing model. When dealing with multi-class classification problems, it is common to use the accuracy score and to look at the overall confusion matrix to evaluate the quality of a model. Accuracy is the most straightforward to evaluate the overall performance of the classification task. Since financial time series are almost equally weighted across all classes, so all the individual dataset elements have approximately the same weight and contribute equally to the accuracy value. Therefore, the higher accuracy, the higher the probability that the model prediction is correct. Assigning up, down, and no-action labels to the market trends is a multi-class classification task in which the upward trends are positive, downward trends are negative, and non-essential fluctuations are shown with 0. Therefore, the performance of the labeling task can be assessed by computing the confusion matrix of the corresponding three classes. For each class, the classification task should be assessed for the performance of recognizing the class label correctly (true positives, TP), the number of correctly classified corresponding labels (true negatives, TN), the number of observations that were either incorrectly assigned to the class (false positives, FP) or those that incorrectly were not assigned to the class (false negatives, FN). Hence, the performance of the models for each class can be evaluated using precision, recall, and F1-Score metrics, and the overall performance of the classification task is assessed based on accuracy. Precision is defined as the ratio between the TP area and the area in the ellipse in Fig. 5. The recall is computed by finding the ratio between the TP area and the area in the left rectangle. Accuracy is the sum of the TP and TN areas divided by the whole area (square). In the context of the stock market prediction, we would like to predict and capture as many as up-trends to maximize the profit, however, we want to be very sure about our prediction. It means we need to maximize both precision and recall. On the other hand, decreasing the FP area comes at a cost of increasing the FN area, because higher precision typically means fewer calls, hence the lower recall. A trade-off between recall and precision is obtained by taking the harmonic mean of them, which is known as the F1-score.
Fig. 5

Confusion matrix. Source: [28]

Confusion matrix. Source: [28]

Hyperparameter optimization

A common practice for a learning machine to produce the desired output is to become optimized from the perspective of the parameters that control the learning process. These controlling measures are called hyperparameters. Hyperparameter optimization is a widespread technique that is done by maximizing or minimizing an objective function with a performance or loss metric to find a tuple of parameters that result in an optimal model. There are many techniques for this task including grid search, random search, gradient-based optimization, and so on. When it comes to problems with expensive-to-evaluate functions, Bayesian optimization is the first choice. It has the advantage of no predefined assumption for the functional form of problems. If , and the problem under the study is then the Bayesian hyperparameter optimization yields the result by Eq. (22):where In Eq. (22), belongs to the hyperparameter search space , and direction shows either minimization when the goal is to minimize the loss function or maximizing toward higher values of some performance metrics such as accuracy. Figure 6 illustrates the whole work of the prediction machine and trading simulation. The input vector of the machine includes a vector of various stock indices which is processed by the proposed labeling algorithm to indicate the ups and downs of the market. The labeled stock prices vector is fed into four independent classification models discussed earlier. The models are trained separately and are used for predicting the next close price trend from the unseen test data. These models are optimized by Bayesian optimization in a probabilistic search space. To run Bayesian optimization, we assume that for our classifier model function, , the performance of the model for a specific combination of hyperparameters is known as prior information. Then, we form the posterior probability function and exploit it to enhance the performance metric based on finding a better estimation of a new combination of hyperparameters. This procedure continues until it stops with no more improvement on the performance metric and the best tuning parameters for maximum performance are reported in the last stage, where the parameters are saved to be used for model preparation.
Fig. 6

Trend prediction and trading framework pipeline

Trend prediction and trading framework pipeline

Trading system

Trading strategies in financial markets are vital to profits, avoiding emotions and behavioral finance biases. Therefore, traders should decide when to sell an asset or security to be less likely to succumb to the disposition effect, which causes them to hold on to stocks that have lost value and sell those that rise in value. Hence, traders need to carefully look for trade signals, buy an asset and then sell it at an opportune moment. Trade signals can be composed of complex indicators, including but not limited to technical signals, fundamental analysis, sentiment measures, macroeconomic indicators, and even inputs from other trading signal systems. However, it is recommended to provide traders with a simple trading module using only a handful of inputs. The advantage of the proposed architecture in this paper is its ability to extract upward and downward trends from the market based on its behavior and turn the buy or sell signal on. To evaluate the model's performance using the predicted labels for test datasets, we conducted experiments using markets studied in this research. An initial capital has been assumed to be available at the time to be used for trading. As soon as the system receives the buy signal at the time , the whole initial capital is used to purchase as many shares as possible from the target asset. The system puts aside the remaining capital in the balance and waits for the next coming sell signal. At time , the system detects a downward market and issues the sell signal. Therefore, the whole shares are sold at the time and the total amount of the trade is summed into the available balance forming the new capital. This process is continued until the end of the study period. A sample run of this trading mechanism is shown in Fig. 7.
Fig. 7

A snapshot of the trading system using predicted labels. Buy and sell positions are taken upon receiving the appropriate signal. The first ‘1’ indicates a buy position and the next ‘− 1’ is the sell position. ‘0’ labels are indicating the volatile market

A snapshot of the trading system using predicted labels. Buy and sell positions are taken upon receiving the appropriate signal. The first ‘1’ indicates a buy position and the next ‘− 1’ is the sell position. ‘0’ labels are indicating the volatile market The primary goal of every trading system is to maximize the returns while considering risk and time spent with capital invested in the market. To compare the performance of the system using predicted labeled series from ML/DL methods used in this study, we use the metrics for the rate of return per day (), and risk-return ratio (). The indicates the daily profit obtained in the market. Hence, the for some time is defined as follows:And the daily is computed by taking into account the whole days in between. To calculate the , we need to compute the maximum drawdown. The maximum drawdown () is the maximum amount of loss from a peak to a trough in a specific period before a new peak is attained, which indicates the downside risk over the period.Having defined , and , it is now straightforward to calculate : The third metric is the Sharpe ratio which is used as a measure of the performance of the system in making profitable trades while minimizing the risk. The Sharpe ratio is defined as the ratio of the excess expected return to the standard deviation of the return.where the is the risk-free rate, is the mean of the one-period simple-return of an asset between dates and , and is the corresponding standard deviation.

Results and discussion

To test the proposed framework, we consider some stock indices from S&P500 since this market is widely used in computational finance literature. We have selected stocks based on their systematic risk compared with the market’s risk. Advanced Micro Devices, Inc., AMD, Apple Inc., AAPL, The Clorox Company, CLX, Macy’s Inc., M, Seagate Technology Holdings plc, STX, and Walmart Inc., WMT, are the selected stocks for further experimentation using our proposed framework. As can be seen from Table 1, Beta values range from 0.17 to 2.09 indicating less risk for stocks with a Beta smaller than 1. As Beta takes larger values than 1, the corresponding asset is considered to have more risk than the market. Daily close prices for the selected assets are obtained from Yahoo finance. Trained models are back-tested with out-of-sample data for the last two years from January 16, 2020, to November 23, 2021. For computation simplicity, we have assumed that the risk-free rate is zero and the transaction cost is zero throughout our simulations.
Table 1

Descriptive statistics for datasets; the statistics are computed for the log return series

StockMinMaxMeanStdSkewnessKurtosisJarque_Bera testBetaObservations
CLX− 0.17770.12460.00050.015− 0.362311.1595(40,634.0, 0.0)0.177809
WMT− 0.10740.11070.00040.0160.11965.1631(8677.52, 0.0)0.527808
M− 0.22440.19210.00020.0274− 0.08937.4364(17,299.2, 0.0)2.097515
AAPL− 0.73120.28690.00080.028− 2.218665.8689(1,416,089.97, 0.0)1.27808
STX− 0.28070.24580.00060.0292− 0.695711.4471(26,417.66, 0.0)1.064779
AMD− 0.47690.42060.00050.039− 0.332910.4935(35,916.69, 0.0)1.937808

The values in the parentheses in the Jarque–Bera column indicate Chi-Square value and p value, respectively

Descriptive statistics for datasets; the statistics are computed for the log return series The values in the parentheses in the Jarque–Bera column indicate Chi-Square value and p value, respectively Descriptive statistics for the dataset are also listed in Table 1. The large values of Chi-Square and zero for the p value of the Jarque–Bera test confirm that the null hypothesis for all series to be normally distributed is rejected. The skewness values for most indices are negative, indicating these markets are downward most of the time and experience negative returns. The series skewed with high excess kurtosis, indicating the presence of high peaks and heavy tails. As a statistical measure, kurtosis shows the degree of presence of outliers in the underlying distribution. This measure plays a crucial role in determining the associated risk in an asset in financial markets. High values of kurtosis imply there are high probabilities of extreme returns. Hence, these assets are risky and the returns experience a lot of outliers. To train the models, we first need to construct the required input features. As described in Sect. 2, we first pass the adjusted daily close prices into the labeling algorithm to form the input features for the classification models. The result of applying our labeling algorithm to the stock price time series is shown in Fig. 8. As shown in Fig. 8a, c, the labeling algorithm follows the upward and downward trends in the AMD stock price while also annotating a wait signal, the ‘0’ label, whenever the market is slightly fluctuating hence it is hard to determine the exact direction of the trend. The same scenario is plotted in Fig. 8b, d for STX stock.
Fig. 8

Automatic labeling of AMD and CLX time series using the proposed algorithm. a, b Are price time series for AMD and CLX, respectively, while c, d show their continuous trend labeling. The vertical axis in (a, b) is in US dollars. In labeling diagrams, up-trends are shown by 1, and the down-trend is − 1, while 0 stands for no deterministic trend due to high volatility between two up and down situations. For both time series, the threshold value is set to 0.05

Automatic labeling of AMD and CLX time series using the proposed algorithm. a, b Are price time series for AMD and CLX, respectively, while c, d show their continuous trend labeling. The vertical axis in (a, b) is in US dollars. In labeling diagrams, up-trends are shown by 1, and the down-trend is − 1, while 0 stands for no deterministic trend due to high volatility between two up and down situations. For both time series, the threshold value is set to 0.05 Combinatorial Embargoed Purging cross-validation process applied to the data set is shown in Fig. 9. The blue and red bars indicate the 8 training and 2 validation folds. As it is discussed in Sect. 2.2, when the validation set places before the training, the extracted overlapping region is doubled. The labels generated by the proposed tri-state labeling algorithm are presented along the Target Label row.
Fig. 9

Combinatorial embargoed purging K-fold CV. The blue and red bars indicate the training and validation sets, respectively. Target label is the labels extracted from price data using proposed framework. Green bars are + 1, Reds show − 1 and fluctuating periods are marked with yellow bars. Price data as input feature vector have been cross-validated with 8 training and 2 validation folds. Resource: research simulations

Combinatorial embargoed purging K-fold CV. The blue and red bars indicate the training and validation sets, respectively. Target label is the labels extracted from price data using proposed framework. Green bars are + 1, Reds show − 1 and fluctuating periods are marked with yellow bars. Price data as input feature vector have been cross-validated with 8 training and 2 validation folds. Resource: research simulations The hyperparameters selected for tuning the models, along with the range of their corresponding possible values, are explained in Table 2. For the recurrent neural networks used in this research, LSTM and GRU, three neurons are in the output layer. This is because the problem under the study is a multi-class classification with three classes. Dropout value, the number of units in the hidden layer, and the learning rate are three hyperparameters that must be optimized for the maximum accuracy score of the classification task. The configuration for LSTM and GRU simulation consists of 20 trials of running networks with 100 epochs. This configuration is repeated 25 times and each time the parameters’ values for the best trial are collected. The objective function is set in the maximization direction to optimize the performance metrics, F1-Score and accuracy. Extreme gradient boosting, XGBoost, has a long list of hyperparameters that could be considered for optimization purposes. We have selected some of the most significant parameters of XGBoost, such as the number of boosting stages, n_estimator, the maximum depth of the individual estimators, and the subsample ratio of columns when constructing each estimator, colsample_bytree. L1 and L2 regularization terms on weights in the list of hyperparameters are shown as reg_alpha and reg_lambda, respectively. Support vector machine classification task is conducted using Python’s scikit-learn package. The major examined hyperparameters for SVC are the regularization parameter, C, the kernel type to be used in the algorithm, Kernel, the degree of the polynomial kernel function, degree, and the kernel coefficient, gamma. The results show that the model with the radial basis function (RBF) kernel with the regularization parameter close to 1 and of degree 3 produces the best results. Therefore, we choose the same values to further conduct our simulation for future trading strategy experimentations. To search for the best hyperparameters we apply Bayesian optimization to search for values of hyperparameters maximizing the overall classification accuracy. Tables 3, 4, 5, 6, 7, 8, 9 and 10 represent the results of the hyperparameter optimization procedure along with the corresponding performance metrics values.
Table 2

Selected hyperparameters for optimization

NameDescriptionRange
LSTM and GRU
unitsSpecifies the number of units in each dense layer24, 32,48, 64
DropoutFloat between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs0.2, 0.25, 0.3, 0.35, 0.4
learning rate (l.r.)How quickly the network updates its parameters[0.001, 0.1]
XGBoost
objectiveLearning objective functionmulti:softprob, multi:softmax
n_estimatorsThe number of trees in our ensemble. Equivalent to the number of boosting rounds. The value must be an integer greater than 0. Default is 10050–1000
max_depthThe maximum depth per tree. A deeper tree might increase the performance, but also the complexity and chances to overfit. The value must be an integer greater than 0. Default is 63–18
sampling_methodUsed only by gpu_hist tree method'uniform,' 'gradient_based'
colsample_bytreeThe fraction of columns to be randomly sampled for each tree. It might improve overfitting. The value must be between 0 and 1. Default is 10.1–0.99
learning_rateThe step size at each iteration, while the model optimizes toward its objective0.01–0.2
reg_alphaL1 regularization on the weights (Lasso Regression). When working with a large number of features, it might improve speed performance. It can be any integer. Default is 00.00001–0.01
reg_lambdaL2 regularization on the weights (Ridge Regression). It might help to reduce overfitting. It can be any integer. Default is 10.00001–0.01
gammaA pseudo-regularization parameter (Lagrangian multiplier) that depends on the other parameters. The higher Gamma is, the higher the regularization. It can be any integer. Default is 00.1–0.99
SVM
kernelSpecifies the kernel type to be used in the algorithm. If none is given, ‘rbf’ will be usedpoly, rbf, sigmoid
CRegularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty[0.01, 1]
gammaKernel coefficient for ‘rbf,’ ‘poly,’ and ‘sigmoid’1, 2, 3, 4, 5, 6, 7, 8, 9
degreeDegree of the polynomial kernel function (‘poly’). Ignored by all other kernels2, 3, 4
Table 3

Hyperparameter optimization of recurrent neural networks for AMD stock

LSTMGRU
AccuracyF1-ScoreDropoutUnitsl.rAccuracyF1-ScoreDropoutUnitsl.r
0.912090.908320.2240.00130.885710.878820.2480.00102
0.90330.897780.4640.001650.912090.904910.35320.00923
0.90330.904120.3480.004210.883520.873580.4640.00503
0.887910.885880.3240.001460.912090.904080.2640.00274
0.90330.897480.35640.001680.905490.901760.2640.00813
0.894510.890120.4640.033550.907690.899190.25480.00127
0.89890.890130.25640.001080.90330.890120.4480.00242
0.870330.87090.3320.002520.894510.876880.3240.00343
0.881320.881020.25480.021730.894510.884330.35480.00101
0.894510.877680.4640.023680.828570.750890.35480.01142
0.90110.895750.3320.009790.89670.899960.25320.00372
0.905490.899880.2480.001170.828570.750890.35480.00211
0.909890.902690.2480.025470.905490.899810.35480.00935
0.892310.881450.25320.01820.894510.87850.3240.00259
0.885710.875470.2320.02460.828570.750890.2640.00112
0.90110.899990.4480.001310.914290.909190.2320.00339
0.887910.872280.3320.007760.876920.854660.35640.00402
0.89670.880180.35480.001940.890110.878520.2240.00425
0.868130.848780.3240.020530.90330.898820.2240.0072
0.890110.880680.2640.001060.90330.897930.25240.00611
0.887910.882890.4240.001660.916480.912180.2240.00183
0.907690.899610.3480.001320.892310.879610.35640.00233
0.907690.903290.4480.023140.914290.910660.35320.00394
0.892310.886070.3320.031950.892310.87950.25640.01113
0.868130.870350.3240.030190.894510.886950.25320.00204

Best tuning values and the respective performance metrics are shown in bold

Table 4

Hyperparameter optimization of recurrent neural networks for CLX stock

LSTMGRU
AccuracyF1-ScoreDropoutUnitsl.rAccuracyF1-ScoreDropoutUnitsl.r
0.854030.834380.25320.003050.910680.906670.2640.00107
0.912850.900240.2240.001890.901960.88780.4320.00264
0.904140.891210.25640.002690.875820.85680.4320.00169
0.901960.894770.25240.020770.90850.899660.2320.00167
0.891070.876560.25320.00290.910680.902650.2320.00216
0.840960.795210.4320.035330.904140.891040.2320.00171
0.895420.885910.25640.00210.90850.89750.25640.00103
0.902850.901760.25240.003350.886710.876220.25480.00232
0.880170.869760.2480.021790.891070.888240.25240.00234
0.90850.900660.2480.01290.814810.731670.3480.0058
0.901960.891810.2640.002360.930280.92340.2240.00191
0.899780.885490.2320.027230.906320.896530.35320.00499
0.904140.893920.4320.00260.814810.731670.4640.00142
0.884530.875490.3320.016270.893250.876440.35320.00179
0.89760.886620.2240.011530.884530.870570.3320.00222
0.901960.895270.25240.004060.910680.900660.25480.00128
0.858390.839640.25320.002450.904140.897260.2320.00153
0.915030.905150.25320.00270.904140.891010.25480.00117
0.906320.896130.3640.003580.845320.79250.2320.00182
0.856210.826030.2480.002230.901960.892810.2480.00202
0.873640.863680.3640.004370.90850.899560.2320.00118
0.875820.86110.2640.033910.921570.916210.2320.00148
0.910680.903290.4240.004070.899780.886280.4640.00109
0.895420.890530.3240.010180.893250.887210.25320.00436
0.858390.832850.2240.003660.915030.906940.25240.00354

Best tuning values and the respective performance metrics are shown in bold

Table 5

Hyperparameter optimization of recurrent neural networks for M stock

LSTMGRU
AccuracyF1-ScoreDropoutUnitsl.rAccuracyF1-ScoreDropoutUnitsl.r
0.868220.847630.3240.004670.860470.861830.35240.00199
0.839150.765760.2320.017530.875970.858210.3640.01056
0.84690.786490.35480.005970.839150.765760.35240.03173
0.84690.783380.4320.013010.84690.837810.25480.01737
0.872090.861860.4320.01620.813950.80180.25240.01998
0.84690.843410.3240.008530.850780.848060.25320.01251
0.850780.80740.35480.010660.850780.850660.2640.00283
0.782950.806980.35480.029370.875970.871950.2640.00781
0.858530.854150.4320.005210.864340.857960.25320.01924
0.872090.843430.35640.016070.833330.837910.2480.00374
0.854650.840370.35480.010520.860470.859820.35240.00381
0.835270.785570.2640.011320.870160.864870.35240.00274
0.864340.845910.4640.013850.839150.765760.35640.01673
0.839150.765760.2320.008110.84690.783380.35320.01054
0.850780.848770.3240.013770.854650.857220.2640.00661
0.827520.792960.35240.005330.852710.855490.25320.00704
0.839150.765760.4640.004760.874030.868770.2240.0026
0.850780.84620.4480.007040.839150.765760.25480.01857
0.839150.765760.2240.017120.833330.846910.3320.0135
0.839150.765760.35240.019630.858530.842220.2480.00768
0.852710.800620.3240.020790.848840.846990.2320.00239
0.875820.86110.2640.033910.914030.87020.2240.00636
0.910680.903290.2240.001930.84690.819670.4240.01267
0.895420.890530.3240.010180.839150.765760.25320.00389
0.858390.832850.2240.003660.864340.826190.3480.0185

Best tuning values and the respective performance metrics are shown in bold

Table 6

Hyperparameter optimization for SVM classification

SVC
AccuracyF1-ScoreCKernelDegreegammaTime (s)
0.887910.864620.97rbf46.3512234.08338
0.887910.862670.74rbf44.6114134.47854
0.835160.765970.6rbf34.109134.66601
0.890110.866690.92rbf28.800233.99457
0.887910.863740.69rbf25.2426133.50706
0.890110.866690.7rbf45.9294433.40672
0.890110.866650.97rbf42.5669235.21639
0.887910.862710.56rbf44.8132334.29156
0.894510.875251rbf38.6779732.86624
0.890110.866650.99rbf48.4230333.2351
0.857140.808720.85rbf44.6946534.28548
0.885710.860380.94rbf32.3834.4737
0.846150.788041rbf47.1277534.51734
0.892310.871460.95rbf48.4936933.89479
0.885710.859760.92rbf25.835834.21499
0.890110.866650.88rbf26.2937432.85993
0.894510.87420.96rbf36.280133.20176
0.815380.81090.9rbf43.5548832.62426
0.883520.856750.69rbf33.6011934.24339
0.885710.86190.92rbf36.6729333.9439
0.863740.822970.89rbf46.731533.09472
0.865930.826340.88rbf37.9316334.81558
0.843960.784310.92rbf38.9389833.51085
0.887910.862711rbf48.917234.22566
0.890110.866690.59rbf43.7174434.04539

The values are reported for AMD stock

Best tuning values and the respective performance metrics are shown in bold

Table 7

Hyperparameter optimization for XGBoost

XGBoost
AccuracyF1-Scoremax_depthgammareg_alphareg_lambdacolsample_bytreemin_child_weightsampling_methodn_estimatorsobjectiveTime
0.86370.8215181.1051410.30180.767810gradient_based696multi:softmax63.4289
0.86370.8289113.969400.64910.50952gradient_based781multi:softprob55.0986
0.87470.8458153.4775700.370.98257uniform355multi:softmax50.6152
0.88790.8697172.8832420.81750.896210uniform454multi: softmax51.8069
0.87030.8411121.4598870.96790.829910gradient_based318multi:softprob56.6415
0.87250.844271.2186410.01710.97977uniform967multi:softmax50.2201
0.87250.8439188.1591430.27040.70736gradient_based486multi:softmax65.4552
0.87030.8409143.0756420.19310.895410gradient_based502multi:softprob57.5294
0.88130.8576152.7026560.72770.77417uniform423multi:softprob55.8899
0.87910.8537111.0769400.5310.97129uniform667multi:softmax60.1367
0.88790.8683151.2257470.54040.65898gradient_based273multi:softmax59.3627
0.87470.8463181.2111430.2410.543610uniform450multi:softmax65.7168
0.88350.8603112.2105620.70190.9867gradient_based682multi:softprob42.5147
0.88570.8644141.0571400.4980.654710gradient_based683multi:softmax65.4253
0.87910.8536187.089400.98220.71513uniform990multi:softprob83.399
0.85930.818141.0921400.26230.62180uniform611multi:softmax66.2685
0.89450.876871.1738400.72860.59334uniform640multi:softmax53.2042
0.87910.852692.2144410.06740.676610gradient_based501multi:softmax58.5607
0.86150.8276141.3054580.26250.58552gradient_based613multi:softprob49.1764
0.87910.853741.9456590.70580.9764uniform351multi:softprob50.5104
0.87690.851163.8586540.45080.86983uniform478multi:softmax74.8684
0.87910.852971.7158400.90970.66424gradient_based786multi:softprob45.9424
0.87690.85144.0713410.00220.92736uniform574multi:softmax43.6041
0.87690.85185.3256410.68990.80381uniform948multi:softmax47.8001
0.87250.8429147.1177420.66820.80058gradient_based334multi:softmax63.6851

The values are reported for AMD stock

Best tuning values and the respective performance metrics are shown in bold

Table 8

Hyperparameter optimization for XGBoost

XGBoost
AccuracyF1-Scoremax_depthgammareg_alphareg_lambdacolsample_bytreemin_child_weightsampling_methodn_estimatorsobjectiveTime
0.8170.774165.931410.37270.70070gradient_based372multi:softmax24.0615
0.79740.707572.70591130.08450.95759uniform198multi:softprob28.5653
0.83660.812181.3742410.42620.98423gradient_based191multi:softprob37.0794
0.81050.7604186.8847400.67930.68177uniform305multi:softprob30.7957
0.82790.796291.0936400.51420.71013uniform299multi:softprob30.1246
0.83660.8141182.1214440.47730.8155uniform359multi:softmax33.8445
0.80610.733455.6337500.85660.74663uniform295multi:softmax22.1428
0.82140.7825104.2535420.03390.71932uniform328multi:softmax31.4881
0.88440.8699162.4258400.06210.95488uniform210multi:softprob27.5668
0.9020.890188.9265400.69580.94094gradient_based183multi:softmax32.7001
0.81480.7685104.2891500.04520.66344uniform344multi:softmax24.2424
0.83010.7982124.4569410.96360.98746uniform283multi:softprob31.7568
0.79740.7075137.0061410.40650.94214gradient_based298multi:softprob32.9238
0.82790.7947128.8421590.35820.56126gradient_based304multi:softmax30.7019
0.83660.8148112.5667400.37960.9762uniform266multi:softprob30.2273
0.81260.760164.0282440.65990.48231uniform399multi:softmax23.8804
0.80610.753971.6378400.44690.38060gradient_based386multi:softprob23.8486
0.82790.789541.8386530.74570.93781gradient_based249multi:softmax25.9997
0.86220.8535188.976400.00840.873610uniform265multi:softmax30.6596
0.89120.875133.3235400.16190.97766uniform323multi:softprob25.5677
0.80610.732851.6994460.79160.98310uniform181multi:softmax23.8973
0.83010.794946.3871410.99040.89661uniform327multi:softprob23.898
0.81920.7805122.8584400.99660.98898uniform400multi:softprob33.197
0.82790.793274.7116940.75710.95163gradient_based240multi:softmax28.2843
0.79740.7307147.9455420.58910.989810uniform274multi:softprob32.7641

The values are reported for CLX stock

Best tuning values and the respective performance metrics are shown in bold

Table 9

Hyperparameter optimization for XGBoost

XGBoost
AccuracyF1-Scoremax_depthgammareg_alphareg_lambdacolsample_bytreemin_child_weightsampling_methodn_estimatorsobjectiveTime (s)
0.85740.8011142.8253430.15710.823113uniform520multi:softmax22.5147
0.84380.772264.4761410.7070.98310gradient_based698multi:softprob22.8963
0.84380.772281.0467440.44690.807813gradient_based667multi:softmax24.2549
0.85740.801182.2016730.22340.875820gradient_based644multi:softmax23.5862
0.84380.7722131.0282700.28680.987520uniform589multi:softprob21.5718
0.85550.797565.8813680.45180.4178uniform694multi:softmax20.2554
0.86520.8155141.0441830.37270.98210uniform819multi:softmax23.6984
0.84380.7722144.6023730.92160.263516gradient_based257multi:softprob18.0163
0.85740.801192.6645740.03950.39114gradient_based568multi:softprob20.9973
0.86330.8111133.233870.65410.88512gradient_based288multi:softmax28.0208
0.85740.8011122.0782730.32460.25291gradient_based626multi:softmax19.2962
0.84380.772272.0008790.37250.938920uniform352multi:softprob23.9215
0.84380.7722183.8969400.7640.98110uniform369multi:softmax21.9027
0.84380.7722183.1102400.70710.89598gradient_based288multi:softmax29.7171
0.84380.7722154.6847870.03330.84320gradient_based749multi:softprob21.9142
0.84380.7722121.0602620.26240.76755gradient_based751multi:softmax25.447
0.85940.8045131.3293410.62980.75495gradient_based523multi:softprob23.9069
0.85740.801177.2568620.57740.85553uniform702multi:softmax25.7756
0.85740.8011152.2939600.78190.52650uniform615multi:softmax18.968
0.84380.7722144.0672610.43290.916716gradient_based896multi:softprob22.0706
0.84380.772271.0534790.41750.7620gradient_based527multi:softmax22.7151
0.86330.811161.3747670.99370.60499gradient_based700multi:softprob21.7321
0.85940.8045174.4469870.39730.66220uniform258multi:softmax22.6639
0.86520.8142111.0097400.32810.808311gradient_based343multi:softmax27.3033
0.84380.7722161.1164440.88670.973520gradient_based944multi:softprob22.6539

The values are reported for M stock

Best tuning values and the respective performance metrics are shown in bold

Table 10

Hyperparameter optimization for SVM classification

SVC
AccuracyF1-ScoreCKernelDegreegammaTime (s)
0.547730.426360.96rbf28.9645168.55751
0.588640.517710.6rbf33.2312261.5789
0.581820.501931rbf48.9465963.0378
0.615910.561780.75rbf45.2432360.19616
0.693180.663280.44rbf48.9903563.81414
0.709090.681820.98rbf28.9953162.34934
0.704550.676270.4rbf48.7173362.89979
0.871150.851230.99rbf37.4351364.88739
0.711360.684610.97rbf48.8689362.36296
0.686360.653820.44rbf28.940962.44242
0.520450.356310.69rbf38.860363.15386
0.534090.396430.8rbf38.8312260.07399
0.702270.675030.93rbf28.8132560.41432
0.709090.682280.84rbf38.8742562.6169
0.711360.684610.96rbf38.624764.30415
0.886910.866951rbf37.33260.75189
0.704550.677380.68rbf28.9814663.31783
0.709090.682281rbf47.3493260.10106
0.693180.662890.82rbf43.202762.91097
0.688640.660820.68rbf48.9882663.06187
0.702270.674190.71rbf38.2753262.57946
0.704550.677121rbf38.5787960.85747
0.593180.517730.98rbf38.9606163.29737
0.6250.605010.7rbf47.1919160.17934
0.690910.661490.71rbf48.8294561.37867

The values are reported for CLX stock

Best tuning values and the respective performance metrics are shown in bold

Selected hyperparameters for optimization Hyperparameter optimization of recurrent neural networks for AMD stock Best tuning values and the respective performance metrics are shown in bold Hyperparameter optimization of recurrent neural networks for CLX stock Best tuning values and the respective performance metrics are shown in bold Hyperparameter optimization of recurrent neural networks for M stock Best tuning values and the respective performance metrics are shown in bold Hyperparameter optimization for SVM classification The values are reported for AMD stock Best tuning values and the respective performance metrics are shown in bold Hyperparameter optimization for XGBoost The values are reported for AMD stock Best tuning values and the respective performance metrics are shown in bold Hyperparameter optimization for XGBoost The values are reported for CLX stock Best tuning values and the respective performance metrics are shown in bold Hyperparameter optimization for XGBoost The values are reported for M stock Best tuning values and the respective performance metrics are shown in bold Hyperparameter optimization for SVM classification The values are reported for CLX stock Best tuning values and the respective performance metrics are shown in bold Having found the best hyperparameters’ values for our problem and the dataset that we study, it is time to perform full experimentation including both the model training and then performing trading simulation using the labels predicted by the model. In Table 11, we have shown the report of classification metrics used in our experimentation for CLX stock. It can be seen that for all models, the overall performance in terms of accuracy shows satisfying numbers.
Table 11

Classification metrics report for CLX stock

Class labelBB-XGBoostBB-SVM
PrecisionRecallF1-scoreSupportPrecisionRecallF1-scoreSupport
− 10.8700.4550.597440.9410.3640.52544
00.9140.9920.9513740.8940.9970.943374
10.7670.5610.648410.7200.4390.54541
Summary
 Accuracy0.9024590.887459
 Macro avg0.8500.6690.7324590.8520.6000.671459
 Weighted avg0.8960.9020.8904590.8830.8870.867459

The threshold and window size are 0.05 and 11, respectively

Classification metrics report for CLX stock The threshold and window size are 0.05 and 11, respectively From Table 12, we can see that the XGBoost algorithm, in most cases, outperformed the other three algorithms based on the Sharpe ratio (SR) and maximum drawdown (MDD) values. The SR shows the excess return from the excess risk taken by the trading algorithm. The ratio indicates a good and very good investment in case the SR is in order of 1 or 2, respectively, and if it reaches 3 or higher, investment performance is considered excellent. Our tri-state labeling algorithm could produce SR values greater than 2 for each of the assets under management while maintaining a very low MDD value for each of them. As it can be seen from Table 12, the percentage of maximum drawdown in most cases does not exceed 10. The average performance of our proposed framework is shown in Table 13. First, it can be seen from the Proposed Framework column that XGBoost has outperformed other algorithms in our experimentation in terms of annualized Sharpe ratio. The “Comparison Frameworks” column in Table 13 compares our proposed framework with more sophisticated frameworks. Based on Convolutional Neural Networks (CNN), Hoseinzade and Haratizadeh [17] introduced 2D-CNNpred and 3D-CNNpred frameworks to extract features for market trend prediction automatically. 2D-CNNpred predicts future market trends based on its historical performance, while 3D-CNNpred does the task by incorporating other markets' historical information. The authors exploited 82 variables for their input feature vector. Their annualized SR values for 2D-CNNpred and 3D-CNNpred are reported in Table 13. Their highest score is 2.257, belonging to 2D-CNNpred, which is about 25% below our best SR value, 2.823. Kim and Khushi [22] introduced a Deterministic Policy Gradient with 2D Relative-attentional Gated Transformer (DPGRGT) model in the combination of historical OHLCV data to maximize the portfolio optimization reward using deep reinforcement learning. Their model could achieve an annualized SR equal to 0.6418 which is 339.9% less than what our model achieved. In another sophisticated model to attempt market trend prediction, Picasso et al. [32] exploited both technical and sentiment analysis to solve the problem as a classification task. They combined 10 technical indicators as information from historical stock data with textual financial news about the stock under study. Their highest annualized SR was reported for the case they had applied the dictionary of Loughran and Mcdonald [26] (L&Mc) to textual data for feature extraction. As can be seen from Table 13, L&Mc (News) and L&Mc (News and Price) are 1.235 and 0.756, while our model’s performance is 128.6% and 273.4% higher than their models, respectively. Our comparison with some recent sophisticated studies in terms of both models used to predict the market and the feature engineering process exploited indicates that our proposed labeling algorithm could have successfully extracted more effective buy and sell opportunities resulting in higher annual trading performance.
Table 12

Performance comparison of LSTM, GRU, XGBoost, and SVM

StockAlgorithmSRMDDTime (s)StockAlgorithmSRMDDTime (s)
AMDLSTM2.667.22211.84STXLSTM1.942.16127.58
GRU2.442.32241.81GRU1.9617.27145.10
XGBoost3.556.984.155XGBoost2.552.161.204
SVM2.567.222.565SVM2.162.160.873
WMTLSTM1.287.38213.89MLSTM0.732.98327.65
GRU− 1.988.97236.885GRU2.391.79325.11
XGBoost3.104.033.668XGBoost2.521.793.897
SVM1.377.713.498SVM2.342.611.925
AAPLLSTM2.6110.37215.11CLXLSTM0.812.76218.7
GRU2.5210.38238.48GRU2.722.76237.30
XGBoost2.674.244.657XGBoost2.551.835.528
SVM2.576.252.179SVM1.042.561.302

Best tuning values and the respective performance metrics are shown in bold

Table 13

Average performance comparison

Proposed frameworkComparison frameworks
AlgorithmSRAlgorithmSR
LSTM1.6722D-CNNpred [17]2.257
GRU1.6753D-CNNpred [17]2.243
XGBoost2.823DPGRGT [22]0.642
SVM2.007L&Mc (News) [32]1.235
L&Mc (News and Price) [32]0.756

Sharpe ratio for all studies is reported in annualized rate

Performance comparison of LSTM, GRU, XGBoost, and SVM Best tuning values and the respective performance metrics are shown in bold Average performance comparison Sharpe ratio for all studies is reported in annualized rate In Fig. 10, we have plotted each stock’s RoR to illustrate the evolution of our trading system’s return to measure how much profits are made using our system through an investment, over time. The comparison charts prove that our labeling algorithm successfully achieves a high positive return even with a stock such as Macy’s Inc., M, which has a high systematic risk value with a high likelihood to show extreme negative returns. As it can be seen from Fig. 10-, although the trained GRU model has failed to show a positive return on our labeled test set, XGBoost has surprisingly shown a high percentage of return, i.e., greater than 150% at the end of the test period. In Fig. 11, we have depicted the pick-to-trough decline during our investment test period. Drawdowns are considered a measure of downside volatility and are essential in monitoring the trading performance. Since volatile markets and large drawdowns are problematic for most investors, they usually choose to avoid values greater than 20 percent. In most cases, the drawdown value does not exceed 10 percent. This is a significant improvement that comes from our proposed labeling algorithm since when a trading strategy keeps the investor out of trouble, it results in starting to compound at a higher level.
Fig. 10

RoR diagrams to show how much return is produced within each learning model. The horizontal axis represents date and the vertical one is the percentage of RoR. The charts show the RoR from January 2020 to November 2021

Fig. 11

Draw-Down comparison between classification algorithms used within the proposed framework. XGBoost has smaller DDs during the back-testing period

RoR diagrams to show how much return is produced within each learning model. The horizontal axis represents date and the vertical one is the percentage of RoR. The charts show the RoR from January 2020 to November 2021 Draw-Down comparison between classification algorithms used within the proposed framework. XGBoost has smaller DDs during the back-testing period

Conclusion

In this paper, we designed a multi-class classification framework to tackle the price trend prediction problem. The framework was implemented using two machine learning models, SVM and XGBoost, and two recurrent neural networks, LSTM and GRU. The reason behind exploiting different classification models in our trend prediction module is to show that regardless of the classifier used, the tri-state labeling algorithm extracts more profitable buy and sell opportunities from price data. This study contributed to the market trend prediction problem in four ways. First, our tri-state labeling algorithm helps filter low-confidence states of the market. For example, when the classification machine is not confident enough about whether the trend is upward or downward, it changes the system status to the idle state (denoted by 0). This is a safe position that is taken by the machine not to pose a threat to the investor’s capital in a highly volatile market. Second, in the model training part of the work, the combinatorial purged K-Fold cross-validation is applied to the training and test dataset splitting task to prevent data leakage and look-ahead bias. This type of cross-validation assures less bias in the prediction because of data leakage between training and validation chunks of the input vector. This is in marked contrast to most of the previous works applying the common K-fold cross-validation technique which is ill-suited to deal with data leakage and look-ahead bias. Third, we have trained the final model with the best tuning hyperparameters found by applying a Bayesian hyperparameter optimization process. Finally, we have successfully back-tested our framework on selected stocks from the S&P-500 market and showed through extensive experiments that our training regime is applicable to different models, resulting in high performance of the trading task. The proposed study has some limitations that can be considered as anchor points for further extensions and improvements. While the study provides valuable insights on trend following based on price changes, the labeling mechanism can be further enhanced to show more robustness against higher levels of volatility. Besides, we believe that there is also more room for improvement in labeling the trends using an adaptive threshold. Such an adaptive parameter optimization may lead the system through more volatile periods to capture trends better. Finally, the activity at each price level results in changes in volume, which affects the price. This means we need to study the mutual effect of price and volume in future works to extract the trends better.
  6 in total

1.  LSTM recurrent networks learn simple context-free and context-sensitive languages.

Authors:  F A Gers; E Schmidhuber
Journal:  IEEE Trans Neural Netw       Date:  2001

2.  Multiresolution forecasting for futures trading using wavelet decompositions.

Authors:  B L Zhang; R Coggins; M A Jabri; D Dersch; B Flower
Journal:  IEEE Trans Neural Netw       Date:  2001

3.  Wavelet entropy-based evaluation of intrinsic predictability of time series.

Authors:  Ravi Kumar Guntu; Pavan Kumar Yeditha; Maheswaran Rathinasamy; Matjaž Perc; Norbert Marwan; Jürgen Kurths; Ankit Agarwal
Journal:  Chaos       Date:  2020-03       Impact factor: 3.642

4.  A Labeling Method for Financial Time Series Prediction Based on Trends.

Authors:  Dingming Wu; Xiaolong Wang; Jingyong Su; Buzhou Tang; Shaocong Wu
Journal:  Entropy (Basel)       Date:  2020-10-15       Impact factor: 2.524

5.  Price Movement Prediction of Cryptocurrencies Using Sentiment Analysis and Machine Learning.

Authors:  Franco Valencia; Alfonso Gómez-Espinosa; Benjamín Valdés-Aguirre
Journal:  Entropy (Basel)       Date:  2019-06-14       Impact factor: 2.524

6.  Structural break-aware pairs trading strategy using deep reinforcement learning.

Authors:  Jing-You Lu; Hsu-Chao Lai; Wen-Yueh Shih; Yi-Feng Chen; Shen-Hang Huang; Hao-Han Chang; Jun-Zhe Wang; Jiun-Long Huang; Tian-Shyr Dai
Journal:  J Supercomput       Date:  2021-08-17       Impact factor: 2.474

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.