Literature DB >> 35603041

A study of the impact of COVID-19 on the Chinese stock market based on a new textual multiple ARMA model.

Weijun Xu¹, Zhineng Fu¹, Hongyi Li², Jinglong Huang¹, Weidong Xu³, Yiyang Luo⁴.

Abstract

Coronavirus 2019 (COVID-19) has caused violent fluctuation in stock markets, and led to heated discussion in stock forums. The rise and fall of any specific stock is influenced by many other stocks and emotions expressed in forum discussions. Considering the transmission effect of emotions, we propose a new Textual Multiple Auto Regressive Moving Average (TM-ARMA) model to study the impact of COVID-19 on the Chinese stock market. The TM-ARMA model contains a new cross-textual term and a new cross-auto regressive (AR) term that measure the cross impacts of textual emotions and price fluctuations, respectively, and the adjacent matrix which measures the relationships among stocks is updated dynamically. We compute the textual sentiment scores by an emotion dictionary-based method, and estimate the parameter matrices by a maximum likelihood method. Our dataset includes the textual posts from the Eastmoney Stock Forum and the price data for the constituent stocks of the FTSE China A50 Index. We conduct a sliding-window online forecast approach to simulate the real-trading situations. The results show that TM-ARMA performs very well even after the attack of COVID-19.

Entities: Chemical

Keywords: COVID‐19; FTSE China A50 Index; multiple ARMA; stock forum; textual sentiment

Year: 2022 PMID： 35603041 PMCID： PMC9111149 DOI： 10.1002/sam.11582

Source DB: PubMed Journal: Stat Anal Data Min ISSN： 1932-1864 Impact factor: 1.247

INTRODUCTION

In the end of 2019, coronavirus 2019 (COVID‐19) suddenly attacked human race, and caused serious damage to the economy of China and even the whole world. The financial market fluctuated violently in the following months and caused huge losses to investors. Four circuit breakers were triggered in the U.S. stock market and the Shanghai Composite Index fell sharply and violently fluctuated between 3127.17 and 2646.8. The spread of panic was the main reason for the stock market crash. As we know, the rise and fall of any specific stock is also influenced by the fluctuations of many other stocks. Web textual information has great influence on the emotions of investors [3]. The discussions in stock forums affect the emotions of investors who participate in the discussions and browsing the posts, and they spread the sentiment to other investors. Investors and stocks in the stock market form a complex system, which leads to herd behavior. There are some researches that study the impact of COVID‐19 on the stock markets. The research of Al‐Awadhi et al. [2] based on a panel data found that the number of confirmed cases and death of COVID‐19 have significant negative impact on Chinese stocks. Zhang et al. [25] showed that COVID‐19 caused the increase of stock markets' risk all over the world. Akhtaruzzaman et al. [1] showed that the conditional correlations between the stock returns of G7 countries and China increased significantly, which is the result of the transmission of financial contagion. However, these literatures rarely researched the textual sentiment and the propagation effect among the investors and stocks. The Chinese stock market was one of the earliest stock markets hit by COVID‐19, so it is meaningful to research the impact of COVID‐19 on the Chinese stock market. To better measure the impact of inverters' emotions, we take textual information into account. Our textual information is collected from the Eastmoney Stock Forum (https://guba.eastmoney.com), a leading stock forum in China. There is a specific stock forum for every stock listed in Chinese stock market, but some stock forums are inactive, with relatively few textual information. Fortunately, the stock forums for most of the large capital stocks are active, because many investors pay attention to them. Considering the average number of posts per stock forum, we choose the constituent stocks of the FTSE China A50 Index as the research object. Therefore, our dataset are the posts in Eastmoney Stock Forum and price data for constituent stocks of the FTSE China A50 Index from January 1, 2019 to December 31, 2020. We compute the daily textual sentiment score for every stock by an emotion dictionary‐based method. To verify the change of Chinese stock market before and after the attack of COVID‐19, we compare the excess returns of prediction models in the two different stages. Autoregressive Moving Average model (ARMA) is a commonly used model for price prediction. Considering the cross impact among stocks and textual information, we propose a new Textual Multiple ARMA (TM‐ARMA) model. TM‐ARMA constructs a new multiple cross autoregressive term and a new multiple cross textual term to measure the cross impact on price fluctuations and textual emotions, respectively. The adjacent matrix that measures the relationships of stocks is very important for multiple ARMA models. Different from Vector Auto Regression (VAR), Vector ARMA (VARMA) and Vector ARMA with eXogenous regressors (VARMAX) models [17] which consider the adjacent matrix as an all‐one matrix, TM‐ARMA updates the adjacent matrix every day by the correlation coefficient matrix for the return series of the last 20 trading days, which reflects the relationships of stocks more accurately. To estimate the parameter matrices, we construct a log‐likelihood objective function for prediction errors, and apply matrix‐form gradient descent methods based on the solutions for partial derivatives of matrices provided by Petersen and Pedersen [11]. ARMA and its extended models are widely used for financial predictions [6, 16]. Many researchers value the importance of interactions between assets and improved multivariate ARMA models. Thornton [21] proposed a mixed‐frequency multiple ARMA by employing mixed stock flow data, and it has a good performance in the simulation. Lennon and Yuan [13] proposed a multivariable ARMA model combining digitized Gaussian and Monte Carlo Expectation Maximization method (EM‐ARMA) which was shown to be effective in the simulation. Textual information from stock forums is used in the research of sentiment in recent years. Yao et al. [23] constructed an investor attention index by the posts from the Eastmoney Stock Forum and Hexun Stock Forum. Yang et al. [22] used the textual information from Eastmoney Stock Forum to build a panic sentiment index, and studied the impact of textual sentiment on crash risk. The emotion dictionaries are widely used for the computation of sentiment scores. Li et al. [14] mapped the words from the news into a sentiment space of four different sentiment dictionaries, and built sentiment scores for stocks in Hong Kong. Groß‐Klußmann et al. [9] proposed an unsupervised learning method based on the positive and negative word lists from emotion dictionaries to build sentiment scores of microblog messages. Siering [20] applied a dictionary‐based method that computes the textual sentiment by the number of positive and negative words. Maximum likelihood estimation and gradient descent methods are widely applied to estimate parameters [5, 12, 15]. It is also commonly used in the extended ARMA models. Gong et al. [8] modified the conventional two‐stage maximum likelihood estimation method by non‐Gaussian QML estimator for the ARMA‐GJR‐GARCH process. Lennon and Yuan [13] estimated the parameters for their proposed multivariable ARMA model by a log‐likelihood function and partial derivatives. Grid search is a commonly considered way of searching for the possible hyperparameter combinations, especially when the number of combinations to be searched is not very large [7, 18, 19, 24]. The performance of models is fully evaluated by grid search. It is also widely used in some extended ARMA models [10]. To make our empirical study consistent with the real world trading, we propose a sliding‐window online approach. Before 9:30, the model forecasts price change rates of all the 50 stocks for the new day, and select stocks that are predicted to rise and put them into a portfolio. At 9:30, the strategy suggests to buy all the stocks in the portfolio with equal funds and hold them until 9:30 of the next trading day. Transaction fees and stamp duties are taken into account. The rest of this paper is arranged as follows. Section 2 discusses our TM‐ARMA model and its solutions for parameter matrices. Section 3 deploys the empirical study. Section 4 analyses the empirical results. Section 5 is the conclusion.

PROPOSED MODEL

TM‐ARMA model

We denote as the daily returns for the stocks at day , and is the predicted vector of , where , and is the number of trading days. Considering the transmission effect of trading sentiment in stock market and textual investor sentiment in stock forums, we construct a new cross‐AR term and a new cross‐textual term, respectively. We propose a new TM‐ARMA model as shown in Equation (1), where is the Hadamard product operator, , and are hyperparameters, are parameter vectors. is a dynamic adjacency matrix. The larger the , the greater the cross impact of stock on stock . is an exogenous vector which reflects the textual sentiment of each stock at day . In the right hand side of Equation (1), the first part is the intercept vector. The second part is a multiple autoregressive term of order , which measures the cross trading sentiment among stocks in prices. The third part consists of the moving average terms of order . The fourth part is an exogenous cross‐textual term of order , which measures the cross textual sentiment in stock forums. The adjacency matrix of and plays important roles in the cross‐AR term and cross‐textual term. The elements in and measure the cross impact between two stocks. We denote as the series of daily returns, as the series of textual sentiment, and as the series of prediction error. denotes the daily returns from to , where . Similarly, and denote the textual sentiment and prediction error from to , respectively. It is well known that the strength of the interaction between stocks change dynamically in response to market sentiment in the past few weeks. To better adapt to the dynamic changes, we set as the correlation coefficient matrix of the return series in the past trading days, and as the correlation coefficient matrix of stocks' sentiment index series in the past trading days, as shown in Equation (2), where is the function to calculate the correlation coefficient matrix, and is the length of the training window. and are updated daily. To better understand the model, we write Equation (1) as Equation (3), where are unit vectors. In , , are parameter metrics. , , , , and are the ith columns in , , , , and , respectively. We can also easily find out TM‐ARMA's improvement on VARMA and VARMAX by Equation (3). Firstly, the adjacent matrix is an all‐one constant matrix in VARMA and VARMAX, but in TM‐ARMA is dynamically updated every day. In VARMA and VARMAX, the relationships of stocks are constant and the historical data of all stocks are of equal importance when forecasting the return of one specific stock. However, as we know, the short‐term relationships of stocks change over time with market sentiment. Secondly, in VARMA and VARMAX, the third term in the right hand side is the cross MA term that is similar to the Cross‐AR term in form, which combines the prediction errors of all the stocks when forecasting. Our experiment shows that the cross MA term causes too big unstable noises and results in instability of prediction values. As a result, to avoid unstable noises, TM‐ARMA applies the single MA term that considers only the errors of the stock to be predicted in Equation (3).

Textual sentiment

As the posts of stock forums do not have the labels of sentiment polarity, the methods of supervised learning do not work. In this paper, the textual sentiment series in TM‐ARMA is computed based on the emotion dictionaries. We have a list of positive words and a list of negative words, and they do not share the same words, which are described in detail in Section 3.1.1. We assume that there are posts in the stock forum of stock in day (from 9:30 in day to 9:30 of the next trading day), and we denote the hth post as . Then the sentiment score of stock in day is shown in Equation (4), where is the function to compute the sentiment score of a post. In a specific post , we assume that there are words in the positive list and words in the negative list. Following Siering [20], we compute the sentiment score of the post as show in Equation (5).

Likelihood function

We denote the noises of stocks as a random vector , and is the prediction error in day , namely Following earlier studies, we assume that is a vector of white noise and , where is the covariance matrix of . The probability density function for is shown in Equation (7), where is the set of parameter matrices, and is a symmetric non‐singular non‐negative definite matrix with . To simulate the trading in real world, we train the model every morning based on the historical data in the past‐W days, where is the length of training window. In a specific day , the joint probability density function for the pre‐W days is , and its log form is as shown in Equation (8), where is the natural logarithm function. By dropping the constant, we have the log‐likelihood function in day as shown in Equation (9). To help better reading, we summarize the important symbols in this paper in Table 1.

TABLE 1

List of symbols and the corresponding descriptions

Variable	Definition	Value or size
N	Number of stocks	50
T	Number of trading days	323
S	Number of days arranged for initialization	25
W	The length of training window	20
p,q,r	Hyperparameters	∈{1, 2, 3, 4, 5}
λ	Learning rate	0.001
yt	Returns in day t	N×1
xt	Textual sentiment in day t	N×1
y^t	Predicted value of yt	N×1
ε	Random vector for noise	N×1
εt	Prediction error in day t	N×1
ζ	Parameter vector for intercept	N×1
Θ	Parameter matrix for the cross‐AR term	N×p
Ψ	Parameter matrix for MA term	N×q
Φ	Parameter matrix for the cross‐textual term	N×r
Y	Series of daily returns	N×T
X	Series of textual sentiment	N×T
E	Series of prediction error	N×T
A	Adjacency matrix	N×N
∑	Covariance matrix of ε	N×N
Ω	The set of parameter matrices	(ζ, Θ, Ψ, Φ, ∑)

List of symbols and the corresponding descriptions

Estimation for the parameter matrices

We apply a maximum likelihood method to estimate the parameter matrices, and derive the partial derivative of the objective function with respect to each parameter matrix with other parameters being fixed. Considering that the parameters are in matrix form, we apply matrix‐form gradient descent methods based on partial derivatives of matrices. The estimation works are done every day before the prediction work.

Update with other parameters being fixed

We derive the partial derivative for in day as Equation (10),1 where , and . Then we update by Equation (11), where is the learning rate. In the empirical analysis, Equation (11) iterates a fixed number of times to estimate . The partial derivative for in day is derived as Equation (12).2 Then we update by Equation (13). We denote as the partial derivative for , and it is derived as Equation (14). The elements in are shown in Equation (15),3 where is the trace of a matrix, , , and . is a matrix with only the element at () not be 0, as shown in Equation (16), where . Then we update by Equation (17). We denote as the partial derivative for . Similarly with Equations (14) and (17), the elements in are derived as Equation (18), where is a matrix shown in Equation (19). Then we update by Equation (20). We denote as the partial derivative for . Similarly with Equations (14) and (15), the elements in are derived as Equation (21), where is a matrix shown in Equation (22). Then we update by Equation (23).

Sliding‐window approach

Initialization

As shown in Equation (3), and are needed in forecasting of TM‐ARMA in day . So we should initialize and before the forecasting. We arrange the first days for initialization, and forecast the daily returns from day + 1 to day , and . Firstly, we build ()‐dimensional linear regression models with the input of and , then we set the prediction error as . Secondly, we build ()‐dimensional linear regression models with the input of , and . Thirdly, we update by the new prediction error, set the covariance matrix of as , set the intercept term as , and set the coefficients of , and as the elements of and , respectively.

Forecast and parameters updating approach

To better simulate the real trading, we propose a sliding‐window online forecast approach, which is shown in Figure 1 and Algorithm 1. In TM‐ARMA, the adjacent matrix and parameter matrices are updated every day, as shown in Algorithm 1, where is the training frequency. The model forecasts the returns for the next day from day + 1, and slides forward day by day. The model trains the parameter matrices based on the data in the past days before the forecasting in every morning.

FIGURE 1

Sliding‐window online forecast approach

EMPIRICAL IMPLEMENTATION AND RESULTS

Dataset and preprocessing

Our dataset is based on the constituent stocks of the FTSE China A50 Index from January 1, 2019 to December 31, 2020. There are 487 trading days in our dataset, including the first days arranged for initialization. The first official announcement about COVID‐19 is in December 31, 2019 by the Wuhan Municipal Health Commission. So we divide our dataset into two stages, namely the Pre‐COVID‐19 Stage from January 1, 2019 to December 30, 2019 (243 trading days), and the Post‐COVID‐19 Stage from December 31, 2019 to December 31, 2020 (244 trading days).

Textual data

In the Eastmoney Stock Forum, there is a specific stock forum for every stock listed in the Chinese stock market. The posts in stock forums of unpopular stock are very few, which cannot support the construction of sentiment scores. Many constituent stocks of commodity selection index (CSI) 500 and CSI 800 are not so popular and there are few posts in their stock forums. So we do not take CSI 500 and CSI 800 into account. Because of the attention from investors, stock forums of large capital stocks and hot stocks are active and generate many posts every day. However, hot stocks are not constant and they change over time. When a stock is no longer a hot stock, investors pay less attention to it and posts in its forum become very few. As a result, we cannot build textual sentiment for hot stocks and have to choose the large capital stocks. Another consideration for choosing large capital stocks is that large capital stocks have fewer suspensions, and the missing data caused by suspension is not good for model's training and forecasting. To make the textual sentiment scores effective, the textual data should be as many as possible. We count the total number of posts in the Eastmoney Stock Forum for constituent stocks of the major stock indices in China, and calculate the average daily number of posts per stock, as shown in Table 2. From Table 2, we find that the constituent stocks of FTSE China A50 Index have more average daily posts than the constituent stocks of any other stock indices. So FTSE China A50 is the best choice. We also consider that the FTSE China A50 Index is internationally compiled with its futures contracts traded in Singapore, while the SSE 50 is not representative because it does not include any stock from the Shenzhen Stock Exchange. As a result, we choose the constituent stocks of FTSE China A50 Index as the research object.

TABLE 2

Number of posts for constituent stocks of Chinese major stock indices

	Total number of posts		Average daily posts per stock
Stock index	Pre‐COVID‐19	Post‐COVID‐19	Pre‐COVID‐19	Post‐COVID‐19
FTSE China A50	1,054,631	1,514,325	87	124
SSE 50	1,055,387	1,382,526	87	113
CSI 100	1,911,830	2,514,643	79	103
CSI 300	5,247,094	6,923,634	72	95

Number of posts for constituent stocks of Chinese major stock indices The constituent stocks of FTSE China A50 Index are adjusted quarterly but the stocks should be fixed in empirical experiment. Considering that there is only little change of FTSE China A50 Index's constituent stocks from January 1, 2019 to December 31, 2020, we choose the constituent stocks in December 31, 2020 of the FTSE China A50 Index as our research object. We obtain 1,686,259 posts in the stock forums for the constituent stocks of the FTSE China A50 Index from January 1, 2019 to December 31, 2020. To make the sentiment scores better reflect the real sentiment of investors, we remove some invalid posts. For example, the institutional accounts of the top 20 users often sends some irrelevant routine news, which has nothing to do with the sentiment of a specific stock and lead to noise. After deleting these posts, we obtain 1,560,365 valid posts. As we know, words segmentation is necessary for text mining of Chinese textual information. Our words segmentation work is implemented by Pkuseg (https://github.com/lancopku/PKUSeg‐python), a powerful words segmentation tool developed by the Language Computing and Machine Learning Group of Peking University. To improve the segmentation accuracy, we apply four external lexicons for the word segmentation, namely Sogou Financial Accounting Lexicon (https://pinyin.sogou.com/dict/detail/index/20659), Baidu Stop Words Lexicon (https://github.com/goto456/stopwords/blob/master/baidu_stopwords.txt) and two emotion dictionaries (which are introduced below). Colloquial expression and financial terminologies are the two characteristics of the forum posts. Firstly, considering that the language of the forum is colloquial and belongs to non‐standard text, we introduce the BosonNLP emotion dictionary (https://bosonnlp.com/dev/resource), a powerful social media polarity emotion dictionary constructed from millions of emotion labeling data from microblog, news, and forums. BosonNLP emotion dictionary includes many internet terms and informal abbreviations, and has a high coverage of non‐standard texts. Secondly, considering that there are many financial terminologies, we apply the Chinese Financial Sentiment Dictionary (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3446388), a professional Chinese finance emotion dictionary which covers most of the common terms in the securities market. We merge the two dictionaries and obtain the list for positive words and list for negative words, which are applied in the computation of textual sentiment scores.

Price data

The trading time of Chinese stocks is from 9:30 to 15:00, but the textual data is available everyday 24 hours, which are shown in Figure 2. The announcement of the new policies and breaking news often occur during 15:00 to 9:30 of the next day, which led to a heated discussion on the stock forums. So the textual data from 15:00 to 9:30 of the next day is very important for forecasting. Therefore, TM‐ARMA predicts the daily return rate at 9:30 in every trading day.

FIGURE 2

Data used for forecasting every day

Data used for forecasting every day Following earlier research [4], we set as the daily log return times 100, as shown in Equation (24), where is the prices at 9:30 in day . It should be noted that when computing the profits for the strategies, we use the original daily returns (), which is consistent with the real trading. Our price data is obtained from the database of Tushare, and the data has been adjusted for stock split, rights offerings, and dividends. When suspension happens to some stocks in a particular day, we set the daily returns and corresponding predicted values of the suspended stocks to 0. This solution solves the problem of missing data, and do not influence the training and forecasting of normal stocks, and it is consistent with the real trading situations. In fact, the Chinese stock market implements the call auction from 9:15 to 9:25 and the continuous bidding starts at 9:30. Our model sells the stocks of the previous day during the call auction (between 9:15 and 9:25) and buys today's stocks at 9:30. Therefore, the profitability of the model mainly depends on the price gap between 9:30 on the current day and 9:25 on the next trading day. In order to make the time series model works well, the in our model is still set as the price of 9:30 every day. But when evaluating the model's performance, we calculate the profit based on the price at 9:25 on the next trading day' and 9:30 on the current trading day, which is consistent with the real trading scenario.

Empirical approach

We conduct all the experiments by the Python language. AR, ARMA, VAR, EM‐ARMA [13] and TM‐AR are chosen as the baseline models to compare with TM‐ARMA, where TM‐AR model is composed of cross‐AR term and cross‐textual term, as shown in Equation (25). As we know, stationary series is required for ARMA and its extended models. Our empirical results show that the second‐order differencing makes the dataset stationary and model converges well. VARMA and VARMAX have cross MA terms. We have tried hard to apply VARMA and VARMAX on our dataset, but they fail to converge on the data even after differencing. However, the models with single MA terms (such as ARMA and TM‐ARMA) converge well. As a result, VARMA and VARMAX are not in the baseline models. Our research shows that too big training window length and hyperparameters do not help models perform better, but lead to heavy computation. According to the conclusions of Markov process, the information decays over time and too old information has little effect on price prediction. Most of traders agree that stock prices are strongly influenced by sentiment over the past 5 days and are correlated with volatility over the past 20 days, as the number of trading days in a week and month is approximately equal to 5 and 20, respectively. So we set the grid search scope of hyperparameters as 5, and the length of training window . Considering that , we set = 25. The learning rate is set as 0.001, as a lot of earlier research did. Our research shows that models converge well under the training frequency of 50, so we set . For comparison purposes, all models share the same initial settings, including , , , and the grid search scope for hyperparameters. We set a rule‐based investment strategy for these models. The following steps are performed at 9:30 in every trading day. Firstly, each model is trained based on the historical data, and forecasts the returns of the new day (from 9:30 to the 9:30 of the next trading day, as shown in Equation (24)). Secondly, the strategy builds a new portfolio for the stocks with positive predicted returns. Thirdly, the strategy allocates all the funds equally on all the stocks in the new portfolio and holds them until 9:30 of the next trading day. We set the original net asset value to “1” for all models' strategies, and calculate the net asset values daily. Considering that the commissions and stamp duties are about 0.01% and 0.1% of the turnover, respectively, we set the transaction cost rate as 0.01% and 0.11% for buy and sell operations. To save transaction costs in the third step, the strategy sells the stocks that are in the old portfolio and not in the new portfolio, buys the stocks that are in the new portfolio and not in the old portfolio, and buys or sells part of the stocks that are both in new and old portfolios.

Performance evaluation

Models are trained to reduce the prediction error, but the real trading requires the strategies' profitability. So we evaluate the performance of models from both prediction performance and investment performance. Root Mean Square Error (RMSE) shown in Equation (26) is the most common evaluation indicator for prediction error, where represent the real and predicted returns in day , respectively. Annualized Return Rate (ARR), Maximum Drawdown Ratio (MDR) and Calmar Ratio (CR) are common indicators for investment performance [26]. The drawdown for day is the difference of the net asset value in day and the highest previous net asset value. The MDR equals the biggest drawdown from day + 1 to divides into the corresponding highest previous net asset value. CR is the quotient of ARR and MDR. The RMSE is a negative indicator measures the forecast error. ARR is a positive indicator measures the profitability of the model. MDR is a negative indicator measures the sustainability of loss. CR is a positive indicator that measures the stability of profits.

RESULTS AND ANALYSIS

To compare the general performance of different hyperparameter combinations, we calculate the average performance indicators of all hyperparameter combinations for all models, as shown in Table 3, where “↑” and “↓” mean positive indicators and negative indicators, respectively. The results in Table 3 show that EM‐ARMA works much better than AR, ARMA and VAR, and TM‐ARMA performs the best.

TABLE 3

Average performance indicators of the all hyperparameter combinations

Model	RMSE↓	ARR↑	MDR↓	CR↑
AR	26.625	0.115	0.245	0.483
VAR	24.926	0.126	0.189	0.702
ARMA	23.851	0.118	0.184	0.685
EM‐ARMA	21.428	0.146	0.162	0.925
TM‐AR	22.245	0.164	0.149	1.164
TM‐ARMA	19.354	0.173	0.142	1.235

Average performance indicators of the all hyperparameter combinations To compare the prediction performance of all models, we report the performance for the hyperparameter combination of the best RMSE in Table 4. We can find that the prediction performances of EM‐ARMA TM‐AR are significantly better than the traditional models, and TM‐ARMA has the smallest prediction error. It is worth mentioning that the investment performance of TM‐ARMA, TM‐AR and EM‐ARMA are also very good, which means that they give good consideration to both reduction of prediction error and increase of investment returns.

TABLE 4

Performance indicators of the hyperparameter combination with the best RMSE

Model	RMSE↓	ARR↑	MDR↓	CR↑
AR	25.925	0.124	0.251	0.494
VAR	21.672	0.107	0.192	0.557
ARMA	22.349	0.115	0.184	0.625
EM‐ARMA	21.135	0.143	0.16	0.894
TM‐AR	21.529	0.158	0.152	1.039
TM‐ARMA	18.872	0.165	0.148	1.115

Performance indicators of the hyperparameter combination with the best RMSE To compare the investment performance of all models, we show the performance for the hyperparameter combination of the best ARR in Table 5. We can find that TM‐ARMA outperforms all baseline models. Comparing Tables 4 and 5, we find that the RMSE of EM‐ARMA model is smaller than TM‐AR, but TM‐AR has better investment performance and TM‐ARMA performs best in both aspects. To intuitively compare the investment performance, we draw the net asset value curves in Figure 3 for models with the hyperparameter combination same as Table 5. The net value is sampled every 20 trading days when plot net value curves in Figure 3.

TABLE 5

Performance indicators of hyperparameter combination with the best ARR

Model	RMSE↓	ARR↑	MDR↓	CR↑
AR	26.435	0.176	0.204	0.863
VAR	24.742	0.192	0.174	1.103
ARMA	23.674	0.182	0.173	1.055
EM‐ARMA	21.537	0.262	0.134	1.955
TM‐AR	22.105	0.273	0.137	1.993
TM‐ARMA	19.093	0.309	0.134	2.297

FIGURE 3

Net asset value curves of hyperparameter combinations with the best ARR

Performance indicators of hyperparameter combination with the best ARR Net asset value curves of hyperparameter combinations with the best ARR From Figure 3, we can find that the traditional models have poor performance, but TM‐ARMA, TM‐AR and EM‐ARMA show better recovery capability under the attract of COVID‐19. To verify the impact of COVID‐19 on profitability of all models, we compare ARR and excess ARR of all models in two stages in Table 6. The excess ARR equals the difference between ARR of the model and ARR of the FTSE China A50 Index, and it measures the model's ability to achieve a higher return than the benchmark. From Table 6, we can find that TM‐ARMA performs well both in Pre‐COVID stage and Post‐COVID stage.

TABLE 6

Performance indicators in the two stages with hyperparameter combination of the best ARR

	ARR		Excess ARR
Models	Pre‐COVID	Post‐COVID	Pre‐COVID	Post‐COVID
AR	0.297	0.235	0.050	0.002
VAR	0.197	0.256	−0.050	0.023
ARMA	0.197	0.244	−0.051	0.11
EM‐ARMA	0.397	0.354	0.150	0.121
TM‐AR	0.412	0.370	0.165	0.137
TM‐ARMA	0.442	0.420	0.195	0.187

Performance indicators in the two stages with hyperparameter combination of the best ARR

CONCLUSIONS

This paper proposes a new TM‐ARMA model that considering the cross impacts of textual emotions and price fluctuations, with the adjacent matrix updated dynamically. The textual sentiment scores are computed by a emotion dictionary‐based method, and the parameter matrices are estimated by the maximum likelihood method. We conduct the empirical study based on the textual data and daily price data for constituent stocks of the FTSE China A50 Index from January 1, 2019 to December 31, 2020. The results show that TM‐ARMA outperforms all the baseline models, and its performance in the Post‐COVID‐19 Stage is even better than that in the Pre‐COVID‐19 Stage.

9 in total

1. Robust Multicategory Support Vector Machines using Difference Convex Algorithm.

Authors: Chong Zhang; Minh Pham; Sheng Fu; Yufeng Liu
Journal: Math Program Date: 2017-11-29 Impact factor: 3.995

2. Robust Multicategory Support Matrix Machines.

Authors: Chengde Qian; Quoc Tran-Dinh; Sheng Fu; Changliang Zou; Yufeng Liu
Journal: Math Program Date: 2019-03-28 Impact factor: 3.995

3. Tensor Graphical Model: Non-Convex Optimization and Statistical Inference.

Authors: Xiang Lyu; Will Wei Sun; Zhaoran Wang; Han Liu; Jian Yang; Guang Cheng
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2019-03-26 Impact factor: 6.226

4. Adaptively weighted large-margin angle-based classifiers.

Authors: Sheng Fu; Sanguo Zhang; Yufeng Liu
Journal: J Multivar Anal Date: 2018-03-15 Impact factor: 1.473

5. A study of the impact of COVID-19 on the Chinese stock market based on a new textual multiple ARMA model.

Authors: Weijun Xu; Zhineng Fu; Hongyi Li; Jinglong Huang; Weidong Xu; Yiyang Luo
Journal: Stat Anal Data Min Date: 2022-04-04 Impact factor: 1.247

6. Stock markets' reaction to COVID-19: Cases or fatalities?

Authors: Badar Nadeem Ashraf
Journal: Res Int Bus Finance Date: 2020-05-23

7. Death and contagious infectious diseases: Impact of the COVID-19 virus on stock market returns.

Authors: Abdullah M Al-Awadhi; Khaled Al-Saifi; Ahmad Al-Awadhi; Salah Alhamadi
Journal: J Behav Exp Finance Date: 2020-04-08

8. Financial contagion during COVID-19 crisis.

Authors: Md Akhtaruzzaman; Sabri Boubaker; Ahmet Sensoy
Journal: Financ Res Lett Date: 2020-05-23

9 in total

2 in total

1. A study of the impact of COVID-19 on the Chinese stock market based on a new textual multiple ARMA model.

Authors: Weijun Xu; Zhineng Fu; Hongyi Li; Jinglong Huang; Weidong Xu; Yiyang Luo
Journal: Stat Anal Data Min Date: 2022-04-04 Impact factor: 1.247

2. Performance prediction and optimization for healthcare enterprises in the context of the COVID-19 pandemic: an intelligent DEA-SVM model.

Authors: He Huang; Liwei Zhong; Ting Shen; Huixin Wang
Journal: J Comb Optim Date: 2022-10-12 Impact factor: 1.262

2 in total