Literature DB >> 33106709

Attention enhanced long short-term memory network with multi-source heterogeneous information fusion: An application to BGI Genomics.

Qun Zhang1,2, Lijun Yang3, Feng Zhou4.   

Abstract

The recent availability of enormous amounts of both data and computing power has created new opportunities for predictive modeling. This paper compiles an analytical framework based on multiple sources of data including daily trading data, online news, derivative technical indicators, and time-frequency features decomposed from closing prices. We also provide a real-life demonstration of how to combine and capitalize on all available information to predict the stock price of BGI Genomics. Moreover, we apply a long short-term memory (LSTM) network equipped with an attention mechanism to identify long-term temporal dependencies and adaptively highlight key features. We further examine the learning capabilities of the network for specific tasks, including forecasting the next day's price direction and closing price and developing trading strategies, comparing its statistical accuracy and trading performance with those of methods based on logistic regression, support vector machine, gradient boosting decision trees, and the original LSTM model. The experimental results for BGI Genomics demonstrate that the attention enhanced LSTM model remarkably improves prediction performance through multi-source heterogeneous information fusion, highlighting the significance of online news and time-frequency features, as well as exemplifying and validating our proposed framework.
© 2020 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Attention mechanism; Heterogeneous information fusion; Long short-term memory network; Machine learning; Stock price prediction

Year:  2020        PMID: 33106709      PMCID: PMC7577284          DOI: 10.1016/j.ins.2020.10.023

Source DB:  PubMed          Journal:  Inf Sci (N Y)        ISSN: 0020-0255            Impact factor:   6.795


Introduction

As has been widely reported in the media, the outbreak of coronavirus disease 2019 (COVID-19) has caused global mass death and panic, and has become a worldwide public health emergency. Stock markets have also suffered a shock, especially for stocks in biotechnology and pharmaceutical industries, not only because these stocks are usually news dependent [11] but also because these industries have received growing attention from investors in recent years due to their potential relevance in diagnosing, mitigating, treating, and preventing diseases. Meanwhile, with recent technological advancements that foster vibrant creation, sharing, and collaboration among web users, the speed of information dissemination has been greatly improved. In the investment field, although the enormous amounts of information being generated show great promise to reduce trading costs attributable to information asymmetry and financial market uncertainty, the increased quantities of data, stored in structured, semi-structured, and unstructured formats and generated from multiple sources, require further interpretation. How to develop a real-time stock prediction framework that capitalizes on all available information from multiple data sources remains an ongoing research topic. The idea of the granular models introduced by Pedrycz [30] illustrates that generalizations of numerical models are formed as a result of an optimal allocation of information granularity. Specifically, information fusion for stock price prediction is a multidisciplinary research field involving integration of information from multiple sources for data mining (subsuming statistics and machine learning), signal processing, text mining, knowledge discovery, and expert systems modeling [17], [32]. However, using multiple data sources instead of a single source is a considerable challenge because solving this problem requires not only improving the efficiency of information fusion, but also dealing with high levels of uncertainty, complexity [24], nonlinearity [33], and the dynamism of the market itself. Leveraging a unique dataset collected from multiple sources and performing in-depth analyses on it, we provide a real-life demonstration of how these issues can be addressed. BGI Genomics (stock code: 300676.SZ),1 a part of BGI Group which is one of the world’s leading life science and genomics organizations, was officially listed on the Shenzhen Stock Exchange on July 14, 2017, becoming the exchange’s 2,001st listed company. Its strengths are prenatal screening, hereditary cancer screening, detection of rare diseases, and aiding precision medicine research. It is one of the world’s leading providers of commercial sequencing services and genomics tests for medical institutions, research institutions, enterprises, and other public and private partners. The company’s potential, in light of the current coronavirus pandemic and uncertain commercial environment, makes it an interesting candidate for in-depth exploration of efficient approaches to data analysis. At present, data are becoming one of the most valuable resources. In general, data coming from more than one source can deliver more information or knowledge than data from a single source. Historically, numerical and textual data have been the two main types of data utilized by the financial field. Analysis of univariate or multivariate time series, whose values can be numbers, texts, or other types of data, can provide insights into the underlying data generating process. On the one hand, most common econometric models for forecasting treat each new signal as a noisy linear combination of the last few signals and independent noise terms, including the autoregressive model, the moving average model, the autoregressive moving average model, the autoregressive integrated moving average model, and stochastic volatility model. [13]. Although these models have advantages in theoretically describing the underlying data generating process based on statistical logic, they contain some strong assumptions about the noise terms (such as that variables are independent and identically distributed, or follow a t-distribution) or explanatory variables (such as stationarity or exogeneity) that are not fully satisfied in the real world. On the other hand, many machine learning models have been successfully developed over recent decades, without imposing restrictive assumptions, to learn from and forecast financial time series, such as the support vector machine (SVM) [34], gradient boosted decision trees (GBDTs) [47], neural networks (NNs) [1], [45], and the ensemble model by cascading logistic regression (LR) onto GBDT [47]. However, one key limitation of most existing machine learning models is their lack of an explicitly declared mechanism to handle nonlinearity and non-stationarity in time series [48], which may lead to inaccurate predictions. From a signal processing perspective, time–frequency analysis methods [12], [16] can be employed for extracting and utilizing inherent instantaneous amplitude and frequency/phase information in combination with other relevant morphological features. Additionally, many researchers have verified in their studies that the feature selection process is a key factor for precise predictions, especially when data types are mixed and resultant features are together fed into a classifier or regressor [28]. Meanwhile, it is challenging but also rewarding to interpret textual data and extract discriminative features from it effectively. For example, firm-specific news articles can spread information and enrich the knowledge of investors. They can consciously or unconsciously further affect investors’ trading activities, which might lead to overreaction or underreaction of the stock price to the information [26]. The examination of monthly stock returns following public news also suggests that bad public news is always followed by a negative drift but less drift is observed for stocks following good news [6]. Experiments conducted for predicting the stock prices of Amazon and eBay in a framework of a multivariate Bayesian structural time series model embedded with online text mining reveal that incorporating information from financial news and Twitter feeds into sentiment predictors consistently boosts their forecasting power [19]. Therefore, for available textual data, more advanced intelligent techniques such as computational linguistics and natural language processing (NLP) techniques can be performed to structure input text, derive informative features, identify text sentiment, evaluate the output, or explore stock trading strategies in an automated framework [23]. Building on the existing formal approaches, another rewarding direction for research would be to take advantage of deep neural networks to capture temporal dependence structures in data across both short- and long-term periods, even equipped with an attention mechanism that has the ability to focus on the most relevant parts of their inputs. Artificial neural networks (ANNs) are currently revolutionizing many technological areas, and aid in addressing the difficult aspects of theoretically solvable but computationally hard problems [19], [40]. ANNs are viable candidates to capture nonlinear relationships in input data without assumptions or the need for prior knowledge of the statistical distributions of the data [2]. Their capabilities in stock market analysis and prediction in emerging markets are found to be more attractive than the capabilities of Fama and French’s model [5]. The recurrent neural network (RNN) is a special kind of feed-forward neural network that learns sequential patterns through internal loops by receiving input sequences [39]. The long short-term memory (LSTM) network, an RNN composed of long short-term memory blocks that is also capable of learning long-term dependencies, was first proposed by Hochreiter and Schmidhuber [15]. Many LSTM networks have been successfully implemented for sequential data modeling; for instance, an LSTM network employed to predict returns in the Chinese stock market demonstrates better performance than a random prediction method [7]. Trading strategy based on volume-weighted average prices of daily S&P 500 data from 1992 to 2015 and derived using an LSTM network also outperforms memory-free classification methods, that is, logistic regression, random forest (RF), and a deep neural network [10]. Further, a kind of dual-stage attention-based RNN for stock price prediction has been proposed by Qin et al. [35], in which the attention mechanism is incorporated into an encoder-decoder framework. In summary, the investigation into multi-source heterogeneous information fusion and its applications in stock price prediction is still at a developmental stage. A promising direction for research is to combine the attention mechanism with the LSTM model to extract key features from various data sources, and to investigate their joint impact on the performance of stock price prediction models and trading strategy. For this purpose, we use the example of BGI Genomics. Overall, this paper contributes to the growing literature in several significant ways. (1) We compile a set of features based on multiple sources of data incorporating daily trading data, online news, technical indicators derived from trading data, and time–frequency features decomposed from closing prices, so as to provide a best performing-feature subset for information fusion and prediction. (2) In order to effectively weaken the influence of non-stationarity of data on forecasting performance, we address the problem of decomposing the original non-stationary price time series into a group of time–frequency features. (3) We adapt an attention enhanced LSTM network and verify its forecasting performance using BGI Genomics as a real-life demonstration, mainly by comparing the model’s results with those of alternative models and comprehensively analyzing which model demonstrates superior performance and generalization ability. (4) A framework integrating various data preprocessing techniques is proposed for forecasting the next day’s price direction and the next day’s closing price, and analyzing the benefits of a long/short trading strategy. This is exemplified and validated through in-depth analyses on BGI Genomics. The remainder of this paper is structured as follows. Section 2 details the intrinsic time-scale decomposition method, the long short-term memory network, and the attention mechanism. Section 3 proposes the study’s framework and illustrates how it works. Section 4 presents the experimental results and analysis for BGI Genomics. Finally, Section 5 summarizes our findings and concludes the paper.

ITD, LSTM, and attention mechanism

Intrinsic time-scale decomposition (ITD)

As an adaptive non-stationary signal decomposition technique, intrinsic time-scale decomposition (ITD) has been successfully applied in the field of signal processing [36], [44], [46]. Through the ITD process, the original non-stationary signal can be adaptively decomposed into several proper rotation components (PRCs), whose frequencies range from high to low. In this subsection, we briefly review the ITD process. For a more detailed description of the method, please refer to Frei and Osorio [12]. For a given signal , let and be the baseline extracting operator and the PRC extracting operator, respectively. In the first step of ITD, is decomposed into two components through:where is a baseline and is a PRC. Then, the process is repeated by using the baseline signal as a new input signal until the resulting baseline has only two extreme values or is a constant. In the end, the input signal can be decomposed into a series of PRCs with a decreasing instantaneous frequency. If this process takes k steps, is broken down into:and the baselines and PRCs satisfy: Let be the extrema points of is its initial point. If there are multiple consecutive data points with the same extreme value, we take to be the rightmost point of these extreme values. Furthermore, we define . Then, constructing a baseline by a piecewise linear formula in the interval between successive extrema, that is,where the knots are defined aswhere is a tunable parameter, and in general. After using the ITD method to decompose the input signal into a set of PRCs and a residual, the next step is to consider the instantaneous amplitude , phase , and frequency of each PRC . Instead of the Hilbert-Huang transformation (HHT) [16], Frei and Osorio [12] propose a wave-based method to calculate the instantaneous phase and instantaneous amplitude of PRCs, ensuring a monotonic increase in phase angle. The instantaneous phase (IP) can be calculated as follows:where and are the corresponding times of two successive zero up-crossing points, is the time of the zero down-crossing point, is the time of the maximum point and is the time of the minimum point. is the value of at (i.e., the maximum on the positive half-wave) and is the value of at (i.e., the minimum on the negative half-wave). Then, the instantaneous amplitude (IA) is defined as follows: Obviously, is a piecewise function, which is determined by the extreme value of PRCs. According to the IP formula in Eq. (6), the instantaneous frequency (IF) can be calculated by Overall, the ITD method is adaptive and suitable for processing stock trading data, which are usually nonlinear and non-stationary time series. The panels in the first column of Fig. 1 illustrate the decomposition results produced by the ITD method, including the PRCs (denoted ) and the residual (denoted ) of the closing price of BGI Genomics from July 14, 2017 to July 21, 2020. The second and the last columns of Fig. 1 present the IAs (denoted ) and IFs (denoted ) of the PRCs, respectively.
Fig. 1

The results decomposed from the closing price of BGI Genomics by the ITD method from July 14, 2017 to July 21, 2020. Panels in the first column: The top panel is the original signal (i.e., closing price time series), the remaining panels are the PRCs () and the residual (). Panels in the second column: The top panel is the original signal (i.e., closing price time series), and the remaining five panels are the corresponding IAs of the PRCs (). Panels in the last column: The top panel is the original signal (i.e., closing price time series), and the remaining five panels are the corresponding IFs of the PRCs ().

The results decomposed from the closing price of BGI Genomics by the ITD method from July 14, 2017 to July 21, 2020. Panels in the first column: The top panel is the original signal (i.e., closing price time series), the remaining panels are the PRCs () and the residual (). Panels in the second column: The top panel is the original signal (i.e., closing price time series), and the remaining five panels are the corresponding IAs of the PRCs (). Panels in the last column: The top panel is the original signal (i.e., closing price time series), and the remaining five panels are the corresponding IFs of the PRCs ().

Long short-term memory (LSTM)

An RNN composed of several long short-term memory (LSTM) blocks is commonly called an LSTM network. In each memory block, there exists a memory cell that stores the state. Strictly speaking, the main difference between the blocks of an LSTM and a traditional RNN is that the former can use newly introduced gates to decide whether to keep the existing memory or forget unnecessary information so as to ensure that the gradient of the long-term dependencies cannot vanish, whereas the latter overwrites its content at each time step. The structure of LSTM as described in previous studies [7], [10], [15] is illustrated in Fig. 2 .
Fig. 2

Graphical illustration of the structure of LSTM.

Graphical illustration of the structure of LSTM. Given the current input , the state that the previous step generated, and the memory state of the cell (peephole), the LSTM cell can be synoptically expressed as: Specifically, the detailed formulae of the LSTM function for the decisions whether to forget the stored memory, to take the inputs, and to output the state generated are given as follows: where , and are the forget gate, input gate, and output gate at time t, respectively. The forget gate is represented by , where and are the corresponding weight matrices and is the bias. The input gate is represented by with the corresponding weight matrices and and the bias . The output gate is represented by where and are the corresponding weight matrices and is the bias. and are the current and prior states of the cell, respectively. and represent the corresponding weight matrices, and is the bias. represents an element-wise multiplication operator, is a hyperbolic tangent function, and is a sigmoid activation function. According to Eq. (10), forget gate controls the amount of information in the past cell state in updating the cell state at time t, and the input gate determines how much new information is stored in the cell state . Finally, the hidden state in Eq. (11) is determined by passing through the output gate and filtering at , and passes into the hyperbolic tangent function. The prediction represented by in Eq. (12) is determined by where is the weight matrix and is the bias. Overall, the LSTM network can handle not only the large dimensionality of the system, but also a very general functional form of the states while allowing for lags of unknown and potentially long duration in the time series, which makes it very suitable for capturing long-term dependencies. It serves the purpose of finding hidden states in the time series, summarizing them in a small number of state processes, and applying the most appropriate transformation to the non-stationary time series in a data-driven way.

Attention mechanism

Attention has been proven to be a powerful mechanism for embedding categorical inference in a deep neural network. Its main concept is to choose “where to look” by assigning a weight or importance to each lower position when computing an upper level representation [18], [27]. An overview of the architecture of attention enhanced LSTM is provided in Fig. 3 .
Fig. 3

Graphical illustration of the architecture of attention enhanced LSTM.

Graphical illustration of the architecture of attention enhanced LSTM. Let be the hidden states obtained from the LSTM layer, where N is the number of data points. All these states are fed into a subsequent Attention layer, and the output of the Attention layer can be synoptically regarded as Specifically, the Attention function is formed by a weighted sum of all the hidden vectors, calculated aswhere the alignment vector is defined asandwhere denotes attention weights satisfying the constraint of is a softmax function; is a trained parameter vector and is its transpose; represents the learnable matrix and is the bias.

Our proposed framework

Overview

We illustrate the architecture of our proposed framework in Fig. 4 , which can be synoptically divided into four stages. In the first stage, we collect data related to BGI Genomics from multiple sources, consisting of daily trading data, online news, derivative technical indicators, and time–frequency features decomposed from the closing price. Because the collected data include both numerical (e.g., trading data) and textual data (e.g., online news), they pose a challenge for our prediction purposes. The second stage is feature engineering. This stage mainly involves the implementation of data cleaning, feature encoding, dimension reduction, and normalization. In the third stage, the proposed prediction model (i.e., attention enhanced LSTM, denoted LSTM-Attention for simplicity), is trained on the training dataset for various prediction tasks, including forecasting the next day’s price direction and the next day’s closing price. In the last stage, hyper-parameters that appear in the prediction model are selected according to their performance on the validation dataset. In addition, prediction performance is evaluated on the testing dataset at this stage.
Fig. 4

Graphical illustration of the architecture of the proposed framework.

Graphical illustration of the architecture of the proposed framework.

Multi-source heterogeneous data collection

This study considers a dataset related to BGI Genomics for the period from 14 July 14, 2017 to July 21, 2020, which encompasses the outbreak of COVID-19 that created high market volatility and had complex implications for the biotechnology and pharmaceutical industries, and thus represents a challenge to our model. The dataset contains two main types of data: daily trading data (numerical data) and daily online news data (textual data). First, the daily trading data are downloaded from the Wind Financial Terminal,2 including opening prices, lagged opening prices, high prices, low prices, closing prices, previous day’s closing prices, returns, trading volumes, daily average prices, total market capitalization, and number of shares outstanding. These data are used as proxies to capture value anomalies and trading information. Second, the daily online news is acquired from Wind Financial Terminal, Baidu News,3 and Sina Finance4 for the same period, totaling 1,556, 552, and 2,554 stock-related news articles for BGI Genomics from each source, respectively. The average daily number of articles is 6.13. Because financial markets are usually chaotic, the inherent nonlinearity and non-stationarity in trading data pose challenges to the trend-based prediction of closing prices. To reduce the impact of non-stationarity, we use the ITD method introduced in Section 2.1 to decompose the original non-stationary closing price data into several quasi-stationary components. In the process of constructing the time–frequency features from the closing price series, in order to avoid using future information, we use the ITD method to process each subsequence of closing prices following a sliding method with given parameters regarding the minimum length of time series h and the number of PRCs L.5 The process of the improved ITD method is summarized in Algorithm 1. Then, the resulting PRCs, residual, IAs, and IFs together form the time–frequency features that can be regarded as a new data source to improve forecasting performance. Further, because constructing new features from existing data is also a reliable method to improve prediction performance in the field of machine learning [28], several technical indicators based on trading data and previously studied by financial experts are generated as another novel data resource. As illustrated in Fig. 4, two categories of sophisticated quantitative indicators presented by Kakushadze [21] and Kingma and Ba [25] are adopted in this study. We refer to these as Alpha 101 and Alpha 191 indicators, because they contain indicators of size 101 and 191, respectively.

Feature engineering

Data cleaning

In this subsection, we first clean the news data and technical indicators; other data are already structured and complete. The online news articles are sorted by date as they are obtained from three platforms for the same period, as described in Section 3.2. Alpha 101 and Alpha 191 indicators with more than of values missing during the sample period are excluded. Additionally, in order to ensure data consistency, we discard data (including trading data, online news, and Alpha 101 and Alpha 191 indicators) from the first h days because the time–frequency features of these h days are unavailable.

Feature encoding

One major challenge in handling the online news is how to convert articles’ content into numerical vectors that can be processed by a prediction model. This challenge belongs to the Chinese text feature encoding problem in the field of NLP. Therefore, we apply NLP techniques to solve it by the following three steps. First, the online news articles sorted by date are passed to TextRank4ZH,6 an abstract extraction toolkit for Chinese text, to withdraw the most significant sentences from the daily online news based on the built-in sophisticated Pagerank algorithm [29]. We limit each abstract to a maximum of ten sentences. Second, both SnowNLP7 and Senta8 tools are employed for sentiment analysis on each sentence of these abstracts. SnowNLP is a class library written in Python, inspired by the TextBlob library, that can handle Chinese text content including tasks such as Chinese word segmentation, sentences segmentation, part-of-speech tagging, sentiment analysis, text categorization, conversion of pinyin, traditional simplification, and text similarity analysis. The Senta (also called SKEP) model is trained to learn a unified sentiment representation for multiple sentiment analysis tasks by embedding sentiment information at the word, polarity, and aspect levels into a pre-trained sentiment representation [41]. Third, we collect the results of sentiment analysis on the abstracts of daily news articles obtained from SnowNLP and Senta by date, and compute their daily mean values and daily standard deviations. Therefore, the resulting four numerical features generated from the daily online news articles are not only able to quantify the sentiment of daily news to a certain extent, but also can be the inputs fed into the prediction model.

Dimension reduction

In the fields of statistics, machine learning, and information theory, dimension reduction is a process of reducing the number of random variables under consideration to obtain a set of principal variables [38]. As suggested by Pestov and Vladimir [31] and Rico-Sulayes [37], the advantages of dimension reduction include: (1) Saving storage space and time required; (2) Eliminating multicollinearity to improve the interpretation of machine learning model parameters; (3) When scaling down to very small sizes (such as 2-dimension or 3-dimension), the data become much easier to visualize; (4) It avoids the curse of dimensionality. Feature projection (also called feature extraction), a well-known approach to dimension reduction, transforms the data from a high-dimensional space to a space of fewer dimensions. The data transformation is allowed to be nonlinear [8] or linear. In the paper we adopt principal component analysis (PCA) [20], [43], a commonly used method of linear transformation. The method can reduce the dimensions of both Alpha 101 and Alpha 191 indicators and extract information from them, as their dimensions are too high for direct classification or regression.

Normalization

To ensure prediction performance is not impacted by differences in the scales on which features’ values are measured, data normalization techniques are commonly used to transform the values of different scales to a notionally common scale. There are different types of normalization in statistics, such as min–max feature scaling, studentized residual, and standard score [49]. In this paper, we use the standard score for normalization, thus the values of features are scored by subtracting the sample or estimated mean and dividing by the sample standard deviation or another estimate of standard deviation.

Feature importance

High dimensionality of features is likely to cause redundancy, which may negatively affect prediction performance. Unlike feature projection methods that convert a high-dimensional feature space to a low-dimensional space, the computation of feature importance is a vital method that selects features according to their significance, from high to low, to achieve feature dimensionality reduction. In addition, the calculation and visualization of feature importance also help data mining analysts to understand the contribution of features. Ensemble decision-trees-based techniques, such as GBDT9 and RF,10 are common methods for computing feature importance by counting the number of occurrences in the trees of the features as input candidates. If a feature appears more frequently in these trees, it is more important and vice versa. After completing the procedures above, we obtain representative features of trading data, online news, time–frequency data, and Alpha 101 and Alpha 191 technical indicators for each trading day, except for the first h days (being the minimum length of time series processed in the ITD method). Table 9 in the Appendix shows common descriptive statistics such as the mean, standard deviation, minimum, and maximum of the features from trading data, news data, and time–frequency data. The summary statistics of features from the Alpha 101 and Alpha 191 technical indicators are listed in Table 10, Table 11 in the Appendix, respectively.
Table 9

Summary statistics of features derived from trading data, news data, and time–frequency data.

SourceFeatureNumberMeanStd.Min25%50%75%Max
Trading dataopen674100.035149.361247.000061.202574.5500144.0000250.0000
lag_open674100.025949.347847.000061.202574.5500144.0000250.0000
high674102.536851.128349.440062.515076.2950148.0825261.9900
low67497.973548.049046.520060.492573.0150139.0100238.0000
close674100.166549.596348.110061.307574.5900144.2725257.0200
prev_close674100.184549.621748.110061.307574.5900144.2725257.0200
r6740.04803.3926−10.0039−1.76640.11901.673910.0065
vol6745.3187*1064.4238*1065.6910*1052.5024*1063.7325*1066.8031*1063.5532*107
avg674100.271949.555448.354361.393574.5721143.6681250.2272
cap6744.0077*10101.9843*10101.9249*10102.4529*10102.9843*10105.7723*10101.0283*1011
share6745.2027*1085.1530*1086.2851*1071.9436*1083.3042*1086.5623*1084.5992*109
News datasnow_avg6740.64660.18380.05980.50000.63200.80560.9569
snow_std6740.22080.14200.00000.13450.26380.33490.4745
senta_avg6740.50020.18020.06260.38280.50000.61670.9430
senta_std6740.20890.12870.00000.12390.25190.30710.4129
a16740.91062.18940000.768818.1000
a26742.32993.893200.25910.99332.737532.2280
a36743.90865.159700.51781.99924.876136.3820
a46748.601511.26300.00040.86113.942610.298348.1040
a567414.879320.61660.00021.26884.836717.204568.9720
p16741.08371.71070001.57084.7124
p26742.78271.788901.57081.57084.71244.7124
Time–frequencyp36743.03211.585601.57081.57084.71244.7124
datap46743.04841.56921.57081.57081.57084.71244.7124
p56743.07171.57041.57081.57081.57084.71244.7124
c16740.21822.3614−18.100000016.9500
c26740.40064.5202−12.5750−1.015000.943832.2280
c36740.86256.4169−33.1050−1.46250.05312.414636.3820
c4674−0.229014.1739−48.1040−4.34030.12153.417141.5710
c5674−9.468823.6003−68.9720−15.68630.03893.426229.2630
c6674108.383052.064257.234063.566073.2605164.9075212.4000
Table 10

Summary statistics of Alpha 101 technical indicators.

FeatureCountMeanStdMin25%50%75%Max
alpha101_0016740.49030.28010.07610.23320.57260.83170.8317
alpha101_002674−0.10320.4681−0.9301−0.5027−0.12980.24070.9520
alpha101_003674−0.20310.4235−0.9159−0.5350−0.25680.08660.9113
alpha101_004674−5.00002.9679−9.0000−8.0000−5.0000−2.0000−1.0000
alpha101_005674−0.23830.2331−0.9986−0.3308−0.1449−0.0725−0.0014
alpha101_006674−0.19280.4217−0.9581−0.5280−0.24560.10470.8588
alpha101_008674−0.50550.2822−1.0000−0.7475−0.5077−0.2665−0.0014
alpha101_0096740.09844.2992−22.0600−1.2250−0.06501.517525.7000
alpha101_0106740.17034.2970−22.0600−1.225001.517525.7000
alpha101_0116740.48420.32820.00210.21120.44050.68471.5865
alpha101_012674−0.60604.2574−25.7000−1.9875−0.36000.780022.7800
alpha101_013674−0.50190.2918−1.0000−0.7569−0.5111−0.2431−0.0014
alpha101_014674−0.09840.2518−0.8778−0.2466−0.06550.02700.8040
alpha101_015674−1.50240.5887−2.9433−1.9125−1.5041−1.0975−0.1438
alpha101_016674−0.50450.2892−1.0000−0.7555−0.5069−0.2611−0.0014
alpha101_018674−0.49290.2834−1.0000−0.7315−0.4965−0.2462−0.0014
alpha101_020674−0.16810.2042−0.9835−0.2261−0.0929−0.02560.0000
alpha101_0216740.12460.9929−1.0000−1.00001.00001.00001.0000
alpha101_022674−0.00710.3986−1.5397−0.14960.00090.14141.3134
alpha101_023674−1.02504.7441−39.4100−0.76250036.9500
alpha101_024674−5.248110.2483−38.9500−10.4775−4.3400066.2100
alpha101_0256740.50180.28330.00280.26100.51560.74471.0000
alpha101_0276741.000001.00001.00001.00001.00001.0000
alpha101_0296743.33821.44711.00562.14263.29764.49375.9750
alpha101_0306740.12860.07940.01040.07130.12170.17680.5018
alpha101_0336740.50660.28730.00410.25550.50210.75691.0000
alpha101_0346740.51030.28730.00140.26280.51860.75791.0000
alpha101_038674−0.23800.2174−0.9023−0.3456−0.1654−0.0689−0.0009
alpha101_040674−0.18740.2621−0.8739−0.3468−0.1395−0.02800.5930
alpha101_041674−0.04970.6592−3.4969−0.2506−0.02190.17044.0343
alpha101_0426742.13483.88760.00140.57221.15781.964643.0000
alpha101_044674−0.42800.4949−0.9984−0.8268−0.6227−0.10330.9412
alpha101_0466740.13011.3917−6.8900−1.00001.00001.000019.6000
alpha101_0476740.65661.1763−0.9893−0.04730.38441.032211.4341
alpha101_0496740.07883.1977−22.0600−0.48001.00001.000022.7800
alpha101_050674−0.74950.2111−0.9986−0.9376−0.7982−0.6030−0.0236
alpha101_0516740.09433.1974−22.0600−0.44251.00001.000022.7800
alpha101_053674275.959321514.6395−238698.0000−2.47020.00492.5857238700.3418
alpha101_054674−0.42620.2606−1.0000−0.6431−0.4401−0.19880
alpha101_055674−0.25720.4937−0.9841−0.6738−0.36050.11380.9852
alpha101_056674−0.24050.2196−0.9285−0.3718−0.1661−0.0683−0.0005
alpha101_0576740.49116.7141−62.9900−1.06480.31371.808740.5214
alpha101_060674−0.00140.0017−0.0052−0.0026−0.0016−0.00010.0024
alpha101_062674−0.48520.5002−1.0000−1.0000000
alpha101_064674−0.44070.4968−1.0000−1.0000000
alpha101_065674−0.43920.4967−1.0000−1.0000000
alpha101_0686740000000
alpha101_0726744.433125.22510.00570.45470.97361.8481493.0000
alpha101_073674−9.13205.0477−17.0000−13.0000−9.0000−5.0000−1.0000
alpha101_074674−0.49700.5004−1.0000−1.0000000
alpha101_0776740.32260.22730.00140.13410.28480.47690.9862
alpha101_081674−0.52080.4999−1.0000−1.0000−1.000000
alpha101_0836720.867311.0008−70.2675−1.67090.22673.533571.8665
alpha101_0866740000000
alpha101_0886740.50760.28680.00410.25830.51790.75551.0000
alpha101_0926744.13652.52701.00001.00004.00007.00007.0000
alpha101_096674−7.32421.5951−13.0000−8.0000−7.0000−7.0000−1.0000
alpha101_098674−0.00700.4435−0.9738−0.34190.01860.33780.9159
alpha101_099674−0.50000.5004−1.0000−1.0000−0.500000
alpha101_101674−0.01540.5445−0.9999−0.481000.46351.0000
Table 11

Summary statistics of Alpha 191 technical indicators.

FeatureCountMeanStdMin25%50%75%Max
alpha191_001674−0.14610.5081−0.9890−0.5766−0.18900.25140.9464
alpha191_0036740.669612.2021−68.5500−4.62750.00504.755060.5400
alpha191_004674−0.15430.9888−1.0000−1.0000−1.00001.00001.0000
alpha191_006674−0.49070.2500−0.7510−0.7510−0.2510−0.2510−0.2510
alpha191_0076740.48420.32820.00210.21120.44050.68471.5865
alpha191_0086740.52000.27980.00140.28330.52500.75831.0000
alpha191_009674100.259349.307551.559561.271874.6724146.0503234.5470
alpha191_012674−0.24620.2343−0.9875−0.3390−0.1685−0.0681−0.0001
alpha191_013674−0.04970.6592−3.4969−0.2506−0.02190.17044.0343
alpha191_0146740.000110.0993−62.7200−3.9275−0.12503.272546.5200
alpha191_015674−0.00080.0161−0.1000−0.006000.00550.1000
alpha191_016674−0.74950.2111−0.9986−0.9376−0.7982−0.6030−0.0236
alpha191_0186741.00310.07970.75600.95740.99811.04161.3036
alpha191_019674−0.00010.0735−0.2440−0.0426−0.00190.04000.2329
alpha191_0206740.38978.7536−25.8970−4.9171−0.51644.459437.8756
alpha191_021674−0.06902.4703−11.1087−1.0000−0.27201.16639.1747
alpha191_022674−98.951148.1224−211.2363−144−74.6083−60.4823−54.9735
alpha191_02367451.86507.571230.115046.835752.217555.754279.5787
alpha191_0246740.05446.2956−22.5832−2.7434−0.31852.265428.7783
alpha191_0276743.964262.3864−153.0914−33.9152−6.904742.5038192.7562
alpha191_02867446.005635.7541−17.606612.820741.804777.5327119.6743
alpha191_0296741.9228*1059.4215*105-3.6381*106-1.4198*105-1.4679*1041.9920*1051.0173*107
alpha191_0316740.10356.7481−18.8284−3.9601−0.56963.681923.4269
alpha191_032674−2.00000.5887−2.9433−2.0000−2.0000−1.0000−0.1438
alpha191_0346741.00340.06700.81020.96451.00571.04121.2320
alpha191_0366740.49680.28900.00140.24650.49510.74621.0000
alpha191_037674−216.82445068.2919−23652.8933−170135.96581789.336527750.8772
alpha191_038674−1.00004.7441−39.4100−0.76250036.9500
alpha191_041674−0.48080.1678−1.0000−0.4162−0.4162−0.4162−0.4162
alpha191_042674−0.18740.2621−0.8739−0.3468−0.1395−0.02800.5930
alpha191_0436744.4103*1061.7571*107-7.6977*107-3.8508*1061.5229*1061.0290*1071.1420*108
alpha191_04767421.199415.0507−18.22379.781722.460533.268752.3685
alpha191_048674−0.12430.0834−0.5074−0.1697−0.1145−0.0588−0.0061
alpha191_0496740.49860.21870.05360.32540.51380.67830.9289
alpha191_0506740.00280.4374−0.8578−0.3566−0.02770.34920.8929
alpha191_0516740.50140.21870.07110.32170.48620.67460.9464
alpha191_05367452.373913.419816.666741.666750.000058.333391.6667
alpha191_054674−0.48490.2803−1.0000−0.7224−0.4816−0.2436−0.0057
alpha191_055674−68.8866455.1920−1718.0416−263.6958−69.9731118.93302016.6866
alpha191_05767446.130124.12795.540024.738042.947268.747192.6738
alpha191_05867452.433210.429825.000045.000055.000060.000085.0000
alpha191_0596742.747122.2861−59.3400−9.53251.270010.1375111.3100
alpha191_062674−0.42800.4949−0.9984−0.8268−0.6227−0.10330.9412
alpha191_06367449.283819.25736.251434.988247.413763.127292.2590
alpha191_0656741.00170.04370.85170.97971.00111.02311.2011
alpha191_0666740.02194.3617−16.7405−2.2564−0.11342.067417.4130
alpha191_06767449.959611.387425.315241.574648.564358.042578.7689
alpha191_068674-4.9601*10-73.0429*10-60-4.3910*10-7-6.3494*10-81.5582*10-71.7288*10-5
alpha191_069674−0.32010.2244−0.8354−0.4851−0.3351−0.15170.2583
alpha191_0706741.6154*1081.6840*1081.3939*1075.7772*1079.9917*1071.9075*1081.1817*109
alpha191_07267453.224111.373426.789044.657552.287462.082479.2427
alpha191_0766740.48410.15350.27150.41820.46100.51401.6130
alpha191_07967449.642914.505414.635239.067048.758859.186185.2442
alpha191_08067419.596699.4696−83.3537−30.2964−3.280531.72051200.5022
alpha191_0816745.2474*1063.3050*1061.0482*1062.9795*1064.3135*1066.0649*1061.6267*107
alpha191_08267453.046810.379224.201545.416552.300560.685276.0180
alpha191_083674−0.50450.2892−1.0000−0.7555−0.5069−0.2611−0.0014
alpha191_0846741.5024*1073.2959*107-5.6547*107-4.6367*1069.1038*1062.4192*1071.5472*108
alpha191_0866740.13011.3917−6.8900−1.00001.00001.000019.6000
alpha191_0886741.615217.4095−34.5442−9.4947−0.673911.900158.6935
alpha191_089674−0.09442.8494−10.7743−2.00000.05141.432010.8588
alpha191_090674−0.49810.2895−0.9986−0.7458−0.4986−0.2486−0.0014
alpha191_09367418.912913.84222.71007.875015.260026.400067.8400
alpha191_0956742.1494*1081.8631*1082.8660*1078.2223*1071.3558*1083.0081*1089.0899*108
alpha191_09667446.308527.26985.179623.227839.227467.0904112.4911
alpha191_0976741.8590*1061.6410*1062.0596*1057.8548*1051.3031*1062.3967*1061.0228*107
alpha191_098674−5.248110.2483−38.9500−10.0000−4.3400066.2100
alpha191_099674−0.50190.2918−1.0000−0.7569−0.5111−0.2431−0.0014
alpha191_1006742.1606*1061.7282*1063.0963*1059.3818*1051.6580*1062.7716*1069.9847*106
alpha191_101674−0.49260.5003−1.0000−1.0000000
alpha191_10267449.194210.518831.769441.569046.982754.448891.3050
alpha191_10367444.228535.2438010.000040.000080.000095.0000
alpha191_104674−0.00710.3986−1.5397−0.14960.00090.14141.3134
alpha191_105674−0.20310.4235−0.9159−0.5350−0.25680.08660.9113
alpha191_1066740.256919.8327−62.4900−9.0125−0.44509.067595.0600
alpha191_107674−0.16810.2042−0.9835−0.2261−0.0929−0.02560.0000
alpha191_1096740.99710.13120.69410.90520.97901.06821.6409
alpha191_110674123.028863.220034.900179.7922104.5444148.5177378.9331
alpha191_1116741.0692*1041.5747*106-8.9833*106-5.8243*1052.0852*1046.5441*1058.4296*106
alpha191_112674−2.000038.9669−80.7871−30.7496−5.830031.415581.8079
alpha191_1146720.867311.0008−70.2675−2.00000.22673.533571.8665
alpha191_116674−0.12991.2192−4.2179−0.9192−0.28210.54633.6647
alpha191_118674126.172849.349340.814188.3101119.7050155.0957325.9664
alpha191_1206742.13483.88760.00140.57221.15781.964643.0000
alpha191_1226740.00000.0016−0.0036−0.00090.00000.00100.0053
alpha191_123674−0.50000.5004−1.0000−1.0000−0.500000
alpha191_126674100.225649.570448.283361.327574.5467143.1742250.3000
alpha191_128674−20.3023113.4759−748.4376−58.636815.990948.116394.4449
alpha191_129674−16.000013.7424−81.3600−20.0000−11.0000−6.2450−1.0000
alpha191_1326745.0590*1083.8299*1081.1369*1082.1724*1083.6020*1087.0322*1081.8497*109
alpha191_133674−4.495562.7920−95.0000−65.0000−20.000060.000095.0000
alpha191_1346743.2286*1051.3246*106-2.9744*106-2.1131*105-3.7505*1044.4652*1051.2551*107
alpha191_1356741.05790.24460.76560.94291.00301.10512.6757
alpha191_136674−0.09840.2518−0.8778−0.2466−0.06550.02700.8040
alpha191_137674382.10741046.79951.237623.583378.6708316.106412999.0600
alpha191_139674−0.19280.4217−0.9581−0.5280−0.24560.10470.8588
alpha191_144674000.00000000
alpha191_148674−0.43770.4965−1.0000−1.0000000
alpha191_1506745.2009*1085.1582*1086.2851*1071.9389*1083.2979*1086.5574*1084.6288*109
alpha191_1516741.209016.0095−27.5168−6.4558−0.72986.705364.2788
alpha191_1556744.4702*1034.1572*105-1.8198*106-1.6728*105-1.4131*1041.2096*1052.6387*106
alpha191_156674−0.71810.1761−1.0000−0.8610−0.7318−0.6022−0.1650
alpha191_1576743.50161.42541.00492.26323.49104.71845.9930
alpha191_1586742.43637.9814−21.6317−1.00000.92464.140452.4837
alpha191_1606743.08902.13460.66241.55202.21504.21549.5648
alpha191_1616744.84653.57340.99252.16423.57586.746919.3108
alpha191_162674−1.00000−1.0000−1.0000−1.0000−1.0000−1.0000
alpha191_1636740.50180.28330.00280.26100.51560.74471.0000
alpha191_1646740.00300.011900.00000.00000.00080.1509
alpha191_16767415.709215.03871.35005.90259.940019.270097.5100
alpha191_168674−1.00000.5404−7.2381−1.0000−0.8881−0.6906−0.3760
alpha191_170674−0.25230.3540−0.9930−0.4873−0.2550−0.03612.2602
alpha191_17267438.208421.21745.047919.898234.055954.726788.8475
alpha191_173674104.818451.725447.833765.506978.1889144.6624271.6095
alpha191_1746743.42002.58090.79631.49702.43754.380611.6159
alpha191_1756744.83853.70970.90502.13173.40086.771723.2633
alpha191_1766740.25720.4937−0.9852−0.11380.36050.67380.9841
alpha191_17767439.732934.796205.000030.000075.000095.0000
alpha191_1786743.0366*1043.4830*105-2.5349*106-5.1757*1042.6616*1036.1132*1042.3445*106
alpha191_180672-2.4691*1062.9358*106-1.4949*107-3.5245*106-1.8683*106−40.750060.0000
alpha191_1826740.34470.12570.05000.25000.35000.40000.7000
alpha191_1856740.49740.27670.00280.26100.49860.73480.9821
alpha191_18767451.354243.303011.110021.525034.275060.9200231.9500
alpha191_188674−2.000036.4273−100.0000−27.7794−7.364217.2499204.3086
alpha191_1896743.42043.39090.37891.22202.24084.259223.1947
Summary statistics of features derived from trading data, news data, and time–frequency data. Summary statistics of Alpha 101 technical indicators. Summary statistics of Alpha 191 technical indicators. In dealing with a task of forecasting, we suppose that denotes all representative features that serve as input candidates for alternative prediction models at date t. In fact, we can use only the information in at date t to make predictions, or use the information over the l days until that date, which can be represented as . This means that the features’ time window size l can be 1 or larger than 1. Generally speaking, the richer the information, the better the prediction performance; however, if too much historical information is added during the training, it will be counterproductive due to the curse of dimensionality. We will discuss the sensitivity of prediction performance to the features’ time window size using experiments conducted in Section 4.1.1.

Attention enhanced LSTM (LSTM-Attention) prediction model

Once the representative features are obtained, the corresponding labels need to be constructed as input into our model for supervised learning. However, the labels depend on the specific prediction task. For instance, they are assigned to be the next day’s closing price for the task of predicting the next day’s closing price; in the task of predicting price direction, the label is equal to 1 when the next day’s closing price is greater than or equal to the price on the current day, and 0 otherwise. The resultant dataset with features and labels is further divided into the training, validation, and testing datasets in a specific ratio. Training a deep neural network is a complex task due to the potential for high dimensionality and nonlinearity. Building on the existing studies discussed in Section 1, we propose an attention enhanced LSTM model (abbreviated as LSTM-Attention) for prediction of stock closing prices, which is adaptable for multi-source heterogeneous information fusion. The schematic of the LSTM-Attention model is depicted in Fig. 5 .
Fig. 5

Schematic of the LSTM-Attention prediction model.

Schematic of the LSTM-Attention prediction model. As shown in Fig. 5, heterogeneous features are fed as inputs, then passed through the LSTM layer, the Attention layer, the fully connected layers (Dense11 ), and the activation function layers (ReLU12 and Sigmoid13 ). The detailed process is summarized in Algorithm 2.14 During the learning phase of the model, the losses on the training dataset are fed back layer by layer with the help of the back-propagating method [14], and update the undetermined weights of each layer using a gradient descent based method, such as stochastic gradient descent (SGD) [4], AdaGrad [9], or Adam [22]. In addition, there are many hyper-parameters in the LSTM-Attention model, including the number of neurons in the LSTM and Dense layers, the number of iterations, and the learning rate. The values of these hyper-parameters are selected according to their performance on the validation dataset; further details are described in Section 3.5.1. The LSTM-Attention model can be used to handle classification and regression tasks, but needs to be adjusted slightly for these two different tasks. Specifically, for the task of classification, the loss is generally selected from probabilistic losses,15 which include binary cross entropy, categorical cross entropy, and KL (Kullback–Leibler) divergence. For the regression task, the true labels must be transformed to between 0 and 1 because the outputs of the Sigmoid layer are ranged from 0 to 1, and the loss can be selected from regression losses,16 which cover mean square error, mean absolute error, mean absolute percentage error, cosine similarity, and similar. In this paper, the LSTM-Attention model is used to predict the next day’s stock price direction, and to predict the next day’s closing price of BGI Genomics, which are essentially classification and regression tasks, respectively.

Performance tuning and model evaluation

Hyper-parameters selection

Hyperopt17 is a Python library for serial and parallel optimization over awkward search spaces for hyper-parameters, which may include real-valued, discrete, and conditional dimensions [3]. Currently, there are three algorithms implemented in Hyperopt, that is, random search, tree of Parzen estimators (TPE), and adaptive TPE. Hyperas18 is a convenience wrapper around Hyperopt for fast prototyping with Keras models. Hyperas enables us to use the functions of Hyperopt without having to learn its syntax. Hence, we adopt Hyperas as an approach to select hyper-parameters in our experiments.

Model evaluation

Metrics of classification performance The correctness of classification can be evaluated by computing the number of correctly recognized class examples (true positives, TP), the number of correctly recognized examples that do not belong to the class (true negatives, TN), and examples that either were incorrectly assigned to the class (false positives, FP) or were incorrectly not recognized as class examples (false negatives, FN). The metrics , and F-measure are widely used to evaluate the performance of a classification task such as the prediction of stock price direction. These metrics are defined in Table 1 , where Accuracy is the number of correctly classified samples on total data, Precision gives the number of correct positive predictions divided by the number of all positive class values returned by the classifier in the test data, and Recall is the number of positive results divided by all relevant samples, which is also called Sensitivity or the True Positive Rate. F-measure is the harmonic mean of Precision and Recall, which achieves its maximum value when .
Table 1

Metrics of classification performance.

MetricExpressionMetricExpression
AccuracyTP+TNTP+FP+TN+FNRecallTPTP+FN
PrecisionTPTP+FPF-measure2*Precision*RecallPrecision+Recall
Metrics of regression performance Metrics of classification performance. Four common and highly statistical evaluation metrics are employed to assess the regression performance of the models of interest; these are the mean absolute error (MAE), the mean squared error (MSE), the root mean square error (RMSE), and the mean absolute percentage error (MAPE). These metrics are defined in Table 2 .
Table 2

Metrics of regression performance, where and are the actual and predicted values at time t, respectively. N represents the number of data sample points.

MetricExpressionMetricExpression
MAE1N|pt-p^t|MSE1N(pt-p^t)2
RMSE1N(pt-p^t)2MAPE1N|pt-p^tpt|*100
Metrics of trading strategy performance Metrics of regression performance, where and are the actual and predicted values at time t, respectively. N represents the number of data sample points. Because profits are not proportional to the performance of the classification or regression, we use a simple trading strategy named long/short strategy, as suggested by Zhou et al. [47], to examine the profitability of our proposed framework. Fig. 6 shows the rules of the long/short strategy. Based on the predicted value from the LSTM-Attention model, this strategy introduces a buy-threshold and a sell-threshold to decide whether to change the position from to 1, or from 1 to , or just to keep the position. Following Zhou et al. [47], in our experiments without considering transaction costs, the values of the buy-threshold and sell-threshold are set as 0.50.
Fig. 6

Graphical illustration of the long/short trading strategy.

Graphical illustration of the long/short trading strategy. Five metrics are chosen to evaluate the trading strategy performance. They are the Sharpe ratio (SR), average annual return (PnL), maximum drawdown (MD), , and the total number of entered trades (). Note that the SR measures the risk-adjusted return, the PnL indicates the average annual return, the MD indicates the largest accumulated loss due to a sequence of drops over the period of investment, and is computed as the PnL divided by the MD. The definitions of , and are listed in Table 3 .
Table 3

Metrics of trading strategy performance, where denotes the return of year is the accumulated return until date t over a period and are the corresponding mean and standard deviation of the return .

MetricExpressionMetricExpression
SRμ(Rt)σ(Rt)MDmaxτ(0,N)(maxt(0,τ)(Rt-Rτ))
PnL(i=1N(1+ri))1N-1PnL/MD(i=1N(1+ri))1N-1maxτ(0,N)(maxt(0,τ)(Rt-Rτ))
Metrics of trading strategy performance, where denotes the return of year is the accumulated return until date t over a period and are the corresponding mean and standard deviation of the return .

Proposed framework for BGI Genomics processing

In order to promote the understanding and use of our proposed framework, together with the overview in Section 3.1 and elaborations of how it works from Section 3.2 to Section 3.5, we summarize the framework’s four main stages and sequential steps as below. Stage 1. Multi-source heterogeneous data collection Input: The minimum length of time series h and the number of PRCs L that appear in the ITD method. 1:  Collect the daily trading data, including the open price (open), lagged open price (), high price (high), low price (low), closing price (close), previous day’s closing price (), return (r), trading volume (vol), daily average price (avg), total market capitalization (cap), and number of shares outstanding (share), of BGI Genomics for the period from July 14, 2017 to July 21, 2020 from Wind platform; and crawl the news data for the same period from Wind platform, Baidu News, and Sina Finance web portals. Assume that the number of trading days during this period is N. 2:  Generate the time–frequency data for close using the improved ITD method summarized in Algorithm 1 with the given h and L. The resulting , and form the time–frequency features. In this process, the time–frequency features of the first h days are discarded because the length of the time series processed by the ITD method is limited to at least h. 3:  Compute the Alpha 101 and Alpha 191 technical indicators, which are denoted and , respectively. Output: The trading data (), the daily online news data, the time–frequency features (), and the Alpha 101 () and Alpha 191 () technical indicators. Stage 2. Feature engineering Input: The dimension of the space reduced by the PCA method k, the features’ time window size l. 1:  Exclude the technical indicators from and if their proportion of missing values exceeds . Then, is used to reduce the Alpha 101 and Alpha 191 technical indicators to k features. 2:  Integrate the news data by date. Then, the TextRank4ZH toolkit is leveraged to extract abstracts from the integrated news, where each abstract has a maximum of ten sentences. Both SnowNLP and Senta techniques are further applied to compute the sentiments for each sentence of the abstracts. Their statistical mean values and standard deviations by date are calculated, i.e., , and . 3:  Construct the labels according to different prediction tasks, where denotes the label of date t. Specifically, in the next day’s closing price prediction task, ; in the task of the next day’s closing price direction prediction, if otherwise. 4:  Integrate all the features by date. The data for the first h days and the last day are discarded because the time–frequency features of the first h days and the labels on the last day are unavailable. These features are then scored using the standard scoring method and the final set of features are derived, i.e., . Assume that the features over the l days until date t are . 5:  Divide the samples into the training, validation, and testing datasets in the ratio . Output: The training, validation, and testing datasets, which are denoted , and , respectively, where and represent the corresponding number of training and validation datasets. Stage 3. Training the LSTM-Attention model Input: The training dataset ; the hyper-parameters that appear in the LSTM-Attention model, including the output size of the LSTM layer , the output size of the first Dense layer , the number of iterations, the learning rate. 1:  Select loss function according to the specific prediction task. In this paper, the mean square error is chosen as the loss function for the regression task, and the binary cross entropy is chosen for the classification task. 2:  Train the LSTM-Attention model on training dataset using the general forward and backward feedback method. The forward process is summarized in Algorithm 2, and the back propagation process is described in Section 3.4. Output: A trained LSTM-Attention model that can make predictions for any sample from . Stage 4. Performance tuning and model evaluation Input: The validation dataset , the testing dataset . 1:  Use the Python toolkit “Hyperas” on the validation dataset for hyper-parameters selection. 2:  Set different prediction tasks and use the corresponding metrics listed in Table 1, Table 2, Table 3 to evaluate the prediction performance on the testing dataset. Output: The prediction performance in different prediction tasks.

Empirical results and evaluation

In the section, using the minimum length of time series and the number of PRCs , we conduct experiments to evaluate the feasibility of our framework for fusing the heterogeneous information related to BGI Genomics. In addition, we discuss the sensitivity of the prediction performance with respect to the size of the features’ time window and the dimension reduction by the PCA method. Furthermore, we compare the results of our LSTM-Attention prediction model with the LR, SVM, GBDT, and original LSTM models, in terms of their performance in predicting the next day’s price direction and the next day’s closing price, and in developing a trading strategy. Here the original LSTM model is similar to the LSTM-Attention model depicted in Fig. 5 but without the Attention layer. To ensure a fair comparison with these models, the method of determining the hyper-parameters that appear in these models is the same as that for the LSTM-Attention model, as described in Section 3.5.1. All the experiments are performed in Python 2.7.3 on a Dell Precision 5820 tower with Intel(R) Xeon(R) W-2102 processor (2.90 GHz), 64G memory, and Ubuntu 18.04.3 operating system. Specifically, the experiments related to the LR, SVM, and GBDT models are carried out using the open-source machine learning library Scikit-learn,19 and the original LSTM and our proposed LSTM-Attention models are implemented using the deep learning platform Keras. Note that the / regular term, as a general method to avoid overfitting, has become the standard configuration of most models in the Scikit-learn library and Keras platform, which users can configure flexibly. The source code of this study has been publicly released.20

Preliminary analysis

As shown in Fig. 4, before applying the proposed LSTM-Attention prediction model, we conduct a preliminary analysis to verify the feasibility of multi-source input data for fusing the heterogeneous information, the necessity of dimension reduction for filtering noise, the importance of individual features, and to optimize their combination.

The feasibility of multi-source data fusion

After the implementation of data cleaning, feature encoding, and normalization as described in Section 3.3, we first investigate the quality of a single source (i.e., trading data), and then study the added value of features derived from news data, time–frequency data, Alpha 101 technical indicators, Alpha 191 technical indicators, and their combinations. Based on these datasets from single or double data sources, a series of experiments are conducted. Table 4 presents and compares the performance of price direction predictions of the LR, SVM, and GBDT models through the metrics defined in Table 1 (i.e., , and F-measure) using the testing dataset and different sources of input data. For each model in Table 4, the first row reports the results obtained from the trading data alone, and the other rows report the results obtained from the trading data in combination with the news data, the time–frequency data, the Alpha 101 technical indicators, and the Alpha 191 technical indicators. For each metric and model, the best outcome derived from the various sources of input data is highlighted in bold. To investigate the influence of features’ time window size l, we conduct experiments using and present the corresponding results in Panel A. Panels B and C show the results using and , respectively.
Table 4

, and F-measure metrics for the testing dataset derived using the LR, SVM, and GBDT models and different sources of input data, with features’ window size in the three panels.

ModelTrading dataNews dataTime–frequencyAlpha 101Alpha 191AccuracyPrecisionRecallF-measure
Panel A: using features’ time window sizel=1
LR0.50720.51110.65710.5750
0.56520.55560.71430.6250
0.57970.60000.71050.6506
0.53620.37780.80950.5152
0.46380.20000.90000.3273
SVM0.40580.11110.83330.1961
0.47820.42220.65520.5135
0.52170.48890.68750.5714
0.57970.73330.66000.6947
0.47820.31110.73680.4375
GBDT0.42020.26670.63160.3750
0.46380.46670.61760.5316
0.59420.71110.68090.6957
0.60870.55560.78130.6494
0.55070.55530.70590.6076



Panel B: using features’ time window sizel=5
LR0.58820.54540.75000.6316
0.61760.55270.82140.6389
0.58820.72730.66670.6957
0.60290.50000.81480.6197
0.51470.47730.67740.5600
SVM0.42640.13640.85710.2353
0.55880.47720.75000.5833
0.61760.88640.65000.7500
0.52940.70450.62000.6596
0.38230.09090.66670.1600
0.57140.55380.65450.6000
GBDT0.47060.47730.61760.5385
0.51470.59090.63410.6118
0.61760.93180.64060.7593
0.47060.25000.78570.3793



Panel C: using features’ time window sizel=10
LR0.56720.48840.75000.5915
0.55220.39530.80950.5313
0.41790.23260.62500.3390
0.59700.46510.83330.5970
0.50750.37210.72730.4923
SVM0.40290.09300.80000.1667
0.40300.09300.80000.1667
0.52240.51160.66670.5789
0.46270.34880.65220.4545
0.58210.69770.66670.6818
GBDT0.38810.25580.55000.3492
0.37310.25580.52380.3438
0.38810.20930.56250.3051
0.40300.41860.54550.4737
0.56720.65120.66670.6588
, and F-measure metrics for the testing dataset derived using the LR, SVM, and GBDT models and different sources of input data, with features’ window size in the three panels. Observing the results displayed in three panels of Table 4, we find the following. (1) Significant differences exist in the values of metrics for the LR, SVM, and GBDT models. Comparing the results in the three panels, the GBDT model reaches the highest value of F-measure (i.e., 0.7593) using and the combination of trading data and Alpha 101 trading indicators. The results in Panel B also demonstrate that all three models achieve the highest value of Accuracy (i.e., 0.6176 for all three models) using the features’ time window size . (2) Data source, model, and features’ time window size jointly influence the performance of price direction prediction. Using the Accuracy and F-measure metrics for the three models in Panel A, we can find the value added by news data, because the values of these two metrics for the combination of news data and trading data are larger than the equivalent values using only trading data. In addition, we can use these two metrics to identify the value added by time–frequency data in many cases, such as the results derived from the three models with in Panel A, the SVM model with in Panel B, and the GBDT model with in Panel C. Further, for the GBDT model with in Panel C, the results of all metrics suggest the importance and added value of Alpha 191 technical indicators. (3) The results derived using combinations of features from different data sources do not always outperform the results derived from the trading data alone. Indeed, adding too many features may interfere with the identification of the relevant factors for predictions and lead to worse prediction performance. Given the high dimensionality of the input features, such as in the two groups of technical indicators, it is necessary to perform feature selection or dimension reduction before invoking a classifier or a regressor.

The necessity of dimension reduction

Although the individual performance of each group of technical indicators in combination with the trading data has been tested in Table 4, here we further investigate the results obtained when a dimension reduction method is employed. PCA is now employed to extract representative features from the two groups of technical indicators by choosing sufficient eigenvectors to explain a percentage of the variance in the original data [42]. The results obtained using the LR, SVM, and GBDT models and different groups of input data related to technical indicators, with the PCA method applied, are presented in Table 5 . The PCA parameter in the fifth column is the number of principal components, . Based on the results reported in Table 4, the features’ time window size l has been experimentally set to 5 here. For each metric and model, the best outcome derived from the various combinations of input data is highlighted in bold.
Table 5

, and F-measure metrics for the testing dataset derived from the LR, SVM, and GBDT models using different combinations of features and PCA methods.

ModelTrading dataAlpha 101Alpha 191Reduced dimension k by PCAAccuracyPrecisionRecallF-measure
LRPCA(2)0.57970.55560.73530.6329
PCA(3)0.57970.66670.68180.6742
PCA(4)0.60870.53330.80000.6400
PCA(5)0.59420.60000.72970.6585
PCA(2)0.57970.55560.73530.6329
PCA(3)0.56520.55560.71430.6250
PCA(4)0.55070.48890.73330.5867
PCA(5)0.57970.55560.73530.6329
SVMPCA(2)0.49280.53330.63160.5783
PCA(3)0.44930.31110.66670.4242
PCA(4)0.53620.66670.63830.6522
PCA(5)0.59420.62220.71790.6667
PCA(2)0.39130.08890.80000.1600
PCA(3)0.47830.35560.69560.4706
PCA(4)0.59420.57780.74290.6500
PCA(5)0.59420.71110.68090.6957
GBDTPCA(2)0.46380.33330.68180.4478
PCA(3)0.40580.31110.58330.4058
PCA(4)0.60870.73330.68750.7077
PCA(5)0.57970.66670.68180.6742
PCA(2)0.49280.55560.62500.5882
PCA(3)0.49280.55560.62500.5882
PCA(4)0.49270.55560.62500.5882
PCA(5)0.55070.71110.64000.6737
, and F-measure metrics for the testing dataset derived from the LR, SVM, and GBDT models using different combinations of features and PCA methods. As shown in Table 5, out of the different combinations, the highest observed value of Accuracy (i.e., 0.6087) is achieved by both the SVM and GBDT models with the combination of trading data, Alpha 101 technical indicators, and PCA(4). The highest value of F-measure (i.e., 0.7077) is obtained by the GBDT model with the same combination. However, the optimal F-measure scores for each model (i.e., 0.6742 for LR, 0.6957 for SVM, and 0.7077 for GBDT) are achieved with different combinations of technical indicators and number of PCA principal components. Overall, these findings indicate that for models to have a better interaction between the classification task for real-time price direction prediction and alternative dimension reduction methods, they must be designed to filter noise when dealing with multi-source heterogeneous information fusion. Among all the experiments, those conducted with PCA(5) exhibit relatively better performance in terms of Accuracy and F-measure metrics.

The importance of individual features

Furthermore, after reducing the number of dimensions of feature space, we can further rank these features based on a measure of the importance or contribution of each feature used in the GBDT model.21 Fig. 7 presents the importance score of each related feature, which is calculated by the weight of the number of times that each feature is used to split the data across all trees via the GBDT model. As mentioned before, if the score is large, the corresponding feature is relatively important. The features’ time window size l has been experimentally set to 1, and the number of principal components k produced by the PCA method is 5 here. We can thus see the relative importance of two features for forecasting the next day’s stock price direction, which are the second principal component derived from the Alpha 101 technical indicators (i.e., ) and the standard deviation derived from news data by the Senta technique (i.e., ). In addition, some features from other categories of input data also contribute to the prediction and have scores of around 0.05.
Fig. 7

The relative importance of features based on how many times each feature is used to split data in the GBDT model.

The relative importance of features based on how many times each feature is used to split data in the GBDT model.

Performance of models forecasting the next day’s price direction

In this subsection, following the general scheme of the algorithm provided in Section 3.6, we examine the performance of alternative models in forecasting the next day’s price direction, including the LR, SVM, GBDT, LSTM, and LSTM-Attention models. After performing the individual experiment ten times for each model, we compute the average estimates and standard deviations of the metrics , and F-measure, reported in Table 6 . The features’ time window size l has been experimentally set to 5 and the number of principal components k produced by the PCA method is 5 here. For each metric, the best outcome derived from the five models is highlighted in bold.
Table 6

Average estimates and standard deviations of the metrics , and F-measure from ten repetitions of experiments using the testing dataset derived and the LR, SVM, GBDT, LSTM, and LSTM-Attention models for prediction of price direction.

ModelAccuracyPrecisionRecallF-measure
LR0.5015±0.01080.5341±0.04320.6370±0.00220.5801±0.0248
SVM0.5147±0.11330.5909±0.36910.6178±0.04580.5511±0.2344
GBDT0.5279±0.06330.5068±0.13600.6989±0.08480.5728±0.0880
LSTM0.5853±0.07100.7500±0.22160.6585±0.02720.6847±0.1145
LSTM-Attention0.6353±0.02670.8864±0.09880.6642±0.01640.7568±0.0329
Average estimates and standard deviations of the metrics , and F-measure from ten repetitions of experiments using the testing dataset derived and the LR, SVM, GBDT, LSTM, and LSTM-Attention models for prediction of price direction. From the results in Table 6, we can conclude that the highest values of Accuracy and F-measure are achieved by the LSTM-Attention model, whereas the original LSTM model is suboptimal. This finding indicates that the approach of using a multi-level network classifier of the LSTM model has great potential for forecasting stock price direction based on the signals derived from multi-source heterogeneous information.

Performance of models forecasting the next day’s closing price

We additionally evaluate the prediction performance of the models forecasting the next day’s closing price. For this, we use the metrics defined in Table 2, that is, , and MAPE. Table 7 presents the values of these metrics for the differences between real closing prices and average estimates from ten repetitions of experiments using the testing dataset and the SVM, GBDT, LSTM, and LSTM-Attention models. The LR model is not considered here because it is used for classification prediction rather than regression prediction. The features’ time window size l has been experimentally set to 5 and the number of principal components k produced by the PCA method is 5 here. For each metric, the best outcome derived from the four models is highlighted in bold.
Table 7

, and MAPE for the differences between the real closing prices and average estimates from ten repetitions of experiments using the testing dataset and the SVM, GBDT, LSTM, and LSTM-Attention models.

ModelMAEMSERMSEMAPE
SVM10.6655159.007012.60980.0882
GBDT11.3953200.996814.17730.0892
LSTM7.220783.87639.15840.0578
LSTM-Attention6.632973.14438.55240.0518
, and MAPE for the differences between the real closing prices and average estimates from ten repetitions of experiments using the testing dataset and the SVM, GBDT, LSTM, and LSTM-Attention models. Comparing the results of these models in Table 7, we see that the LSTM-Attention model achieves overall highly accurate regression results, with the lowest values of , and MAPE, indicating that the attention mechanism captures some valuable information that may be ignored by the original LSTM model and the SVM and GBDT models. In order to validate these regression results, as point estimates may be biased, a natural idea is to produce confidence intervals and further examine the forecasting accuracy. Fig. 8 shows the average estimates from ten repetitions of experiments for the SVM, GBDT, LSTM, and LSTM-Attention models associated with 95.44% confidence intervals (i.e., standard deviations) compared with closing prices during the testing period. The red line in each figure is the real closing price during the testing period and the black line represents the average estimate of ten repetitions of experiments. The 95.44% confidence intervals are shown by the shaded areas. The multiple regression analyses conducted in these figures and Table 7 show that the LSTM-Attention model can be considered the most accurate model for the regression task, because the average of predicted prices (corresponding to the point estimation) and confidence intervals (corresponding to the interval estimation) show the most similarity with real closing prices.
Fig. 8

Average estimates of ten repetitions of experiments for the SVM, GBDT, LSTM, and LSTM-Attention models and the corresponding 95.44% confidence intervals ( standard deviations), compared with the real closing price during the testing period.

Average estimates of ten repetitions of experiments for the SVM, GBDT, LSTM, and LSTM-Attention models and the corresponding 95.44% confidence intervals ( standard deviations), compared with the real closing price during the testing period.

Profitability of different models with long/short strategy

Table 8 shows the , and Trading Days of the long/short strategy using the alternative models, together with those of the “buy-and-hold strategy”, denoted Benchmark, and of the “ex post trading strategy”, denoted Ex post. Although it is not the main purpose of this paper to find the best trading rules by using the features based on the models, for such purposes additional considerations such as transaction costs need to be taken into account. We therefore present results in Panel B calculated with inclusion of transaction costs, taken as 0.3% for simplicity.
Table 8

, and Trading Days of the long/short strategy using the LR, SVM, GBDT, LSTM, and LSTM-Attention models, together with those of the ’buy-and-hold strategy’, denoted Benchmark, and of the ‘ex post trading strategy’, denoted Ex post, for the testing dataset.

ModelSRPnLMDPnL/MDTradeCountTrading Days
Panel A: without transaction costs
Ex post12.62328.490Inf1868
Benchmark4.0512.240.1868.00168
LR5.526.660.03222.001468
SVM5.6916.690.14119.21468
GBDT4.812.330.0638.831468
LSTM6.4218.780.06313.001068
LSTM-Attention6.6126.960.10269.60668



Panel B: with transaction costs
Ex post11.68217.660Inf1868
Benchmark4.0111.930.1866.28168
LR4.704.470.0589.401468
SVM5.138.940.0999.33768
GBDT3.651.410.0720.141468
LSTM6.3611.720.06195.331268
LSTM-Attention5.8816.050.11145.91568
, and Trading Days of the long/short strategy using the LR, SVM, GBDT, LSTM, and LSTM-Attention models, together with those of the ’buy-and-hold strategy’, denoted Benchmark, and of the ‘ex post trading strategy’, denoted Ex post, for the testing dataset. Comparing the results in the two panels, we find that the values of , and in Panel B are lower than corresponding values in Panel A, suggesting that transaction costs weaken the trading performance. Following inclusion of transaction costs, the value of SR derived from the GBDT model decreases by the most, and the value from the LSTM model decreases the least. Without considering transaction costs, in Panel A, the original LSTM model records the highest and the LR model records the lowest MD (excluding the Ex post figures). Additionally, according to the SR and PnL metrics, the LSTM-Attention model in Panel A yields superior prediction performance, evidenced by long/short strategy results, compared to other models. Taking transaction costs into account, the highest values of SR and are achieved by the original LSTM model, and the highest PnL is achieved by the LSTM-Attention model. Overall, the results highlight the potential of the original and attention enhanced LSTM models for developing higher quality trading rules and a more profitable trading system. Furthermore, Fig. 9 shows the evolution of the trading performance as a time series during the testing period, based on the predicted values of the LR, SVM, GBDT, LSTM, and LSTM-Attention models. Performance metrics are those presented in Table 8 and results from the buy-and-hold strategy (i.e., Benchmark) are included for comparison. The normalized compounded profits of the long/short strategy without transaction costs (red line) and with transaction costs (blue line), and the buy-and-hold strategy (black line) are presented in the top panel of each part of the figure. The other panels show the buy and sell signals for the long/short strategy, without inclusion of transaction costs in the middle panels and with transaction costs in the bottom panels.
Fig. 9

Time evolution of the trading performance and buy and sell signals using the LR, SVM, GBDT, LSTM, and LSTM-Attention prediction models. Top panels of each part: Time evolution of the trading performance in the testing period based on the predictions of individual models, for the strategy without transaction costs (red line), the strategy with transaction costs (blue line), and for comparison, the buy-and-hold strategy (benchmark strategy, black line). Middle panels: Buy and sell signals of the prediction models’ trading strategies without transaction costs. Bottom panels: Buy and sell signals of the prediction models’ trading strategies with transaction costs.

Time evolution of the trading performance and buy and sell signals using the LR, SVM, GBDT, LSTM, and LSTM-Attention prediction models. Top panels of each part: Time evolution of the trading performance in the testing period based on the predictions of individual models, for the strategy without transaction costs (red line), the strategy with transaction costs (blue line), and for comparison, the buy-and-hold strategy (benchmark strategy, black line). Middle panels: Buy and sell signals of the prediction models’ trading strategies without transaction costs. Bottom panels: Buy and sell signals of the prediction models’ trading strategies with transaction costs.

Conclusion

The topic of forecasting stock prices using deep learning interests many researchers and investors because improved prediction accuracy can be expected to bring enormous profit. The recent availability of enormous amounts of both data and computing power has created new opportunities for prediction purposes. This paper combines the attention mechanism with the LSTM model to extract features from multiple sources of data, and investigates their joint impact on the performance of stock price prediction and trading strategy in the case of BGI Genomics. Examining different combinations of features based on multiple sources of data, incorporating daily trading data, online news, technical indicators derived from trading data, and time–frequency features decomposed by the ITD method from closing prices, we identify the best-performing subset of features for information fusion and prediction. We also develop a framework by integrating various data preprocessing techniques and tackle its learning capabilities for specific tasks, including forecasting the next day’s price direction and the next day’s closing price, and analyzing the benefits for a long/short trading strategy. In terms of statistical accuracy and trading performance, compared with the LR, SVM, GBDT, and original LSTM models, the experimental results and in-depth analyses for BGI Genomics show that the attention enhanced LSTM model achieves remarkable improvements in prediction performance by multi-source heterogeneous information fusion, and models that include news-related features and time–frequency features often achieve the best performance metrics, indicating the relevance of online news and time–frequency features for prediction purposes and validating our proposed framework. Due to limitations on paper length, this study only conducts experiments on data for BGI Genomics. Because numerical and textual data are the two main types processed in this paper, our approach can be applied to any company for which these two data types are available. In practical application, our framework also provides various options that can be applied for analyzing other stocks or assets, adopting other information fusion methods, developing other trading rules, or using other types of data from different sources.

CRediT authorship contribution statement

Qun Zhang: Resources, Formal analysis, Visualization, Writing - original draft, Writing - review & editing, Project administration. Lijun Yang: Methodology, Funding acquisition. Feng Zhou: Conceptualization, Data curation, Software, Investigation, Validation, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Algorithm 1: Improved ITD method for constructing time–frequency features from the closing price series.
Input: The size of the daily closing price time series N, the minimum length of time series h (h<N), the number of PRCs L.
1: Set the PRCs c1,c2,,cL={}, the residual c(L+1)={}, the IFs p1,p2,,pL={}, and the IAs a1,a2,,aL={}.
2: fort=h,,Ndo
3: Decompose the subsequence sub_close{closei}i=1t into L PRCs and compute the corresponding IA and IF for each PRC by the ITD method. Here, we use PRCj, sub_IAj, and sub_IFj to represent the j-th PRC, IA, and IF, respectively, j=1,2,,L.
4: Compute the residual of the subsequence sub_close and call it Res, i.e., Res=sub_close-j=1LPRCj.
5: Obtain the time–frequency features at time t, that is, c1t=PRC1,t, , cLt=PRCL,t, c(L+1)t=Rest; a1t=sub_IA1,t, , aLt=sub_IAL,t and p1t=sub_IF1,t, , pLt=sub_IFL,t.
6: end forOutput: The time–frequency features: c1,c2,c(L+1); a1,a2,,aL; p1,p2,,pL.
Algorithm 2: Forward process of the LSTM-Attention prediction model.
Input: The training samples {X^t,Yt}t=lN, where X^t[Xt-l+1,,Xt] and Yt represent the features and the label of the t-th sample, respectively, and l is the features’ time window size; o1 is the number of neurons in the LSTM layer; and o2 is the number of neurons in the first Dense layer.
1: Compute the outputs YiLSTM of size o1 of the LSTM layer with the input X^t[Xt-l+1,,Xt] by Eq. (9): YiLSTM,Hi,Ci=LSTM(Xt-l+i,Hi-1,Ci-1), i=1,2,,l.
2: Calculate the output YAttention of size o1 of the Attention layer with the inputs {Hi}i=1l by Eq. (13): YAttention=Attention(H1,H2,,Hl).
3: Compute the output YDense1 of size o2 of the first nonlinear Dense layer with the input YAttention: YDense1=ReLU(WDense1YAttention+bDense1), where WDense1 of size o1×o2 is the undetermined weights, bDense1 denotes the bias, and ReLU(·) denotes the nonlinear activation function of ReLU.
4: Similar to step 3, obtain the output ySigmoid of the second nonlinear Dense layer with the input YRelu, i.e., YSigmoid=σ(WDense2YRelu+bDense2), where σ(·) denotes the sigmoid function.
Output: The prediction of the LSTM-Attention model for the t-th sample: YSigmoid.
5: Calculate the loss Loss(YSigmoid,Yt) between the predicted result YSigmoid and the true label Yt for the t-th sample, where the loss function is generally selected according to its specific prediction task.
  5 in total

1.  Nonlinear dimensionality reduction by locally linear embedding.

Authors:  S T Roweis; L K Saul
Journal:  Science       Date:  2000-12-22       Impact factor: 47.728

2.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

Review 3.  Deep learning in neural networks: an overview.

Authors:  Jürgen Schmidhuber
Journal:  Neural Netw       Date:  2014-10-13

4.  A strategy combining intrinsic time-scale decomposition and a feedforward neural network for automatic seizure detection.

Authors:  Lijun Yang; Sijia Ding; Hao-Min Zhou; Xiaohui Yang
Journal:  Physiol Meas       Date:  2019-09-30       Impact factor: 2.833

Review 5.  Computational modelling of visual attention.

Authors:  L Itti; C Koch
Journal:  Nat Rev Neurosci       Date:  2001-03       Impact factor: 34.870

  5 in total
  1 in total

1.  A graph-based approach to multi-source heterogeneous information fusion in stock market.

Authors:  Jun Wang; Xiaohan Li; Huading Jia; Tao Peng
Journal:  PLoS One       Date:  2022-08-11       Impact factor: 3.752

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.