| Literature DB >> 31463404 |
Zheng Tan1, Ziqin Yan1, Guangwei Zhu2.
Abstract
In recent years, a variety of research fields, including finance, have begun to place great emphasis on machine learning techniques because they exhibit broad abilities to simulate more complicated problems. In contrast to the traditional linear regression scheme that is usually used to describe the relationship between the stock forward return and company characteristics, the field of finance has experienced the rapid development of tree-based algorithms and neural network paradigms when illustrating complex stock dynamics. These nonlinear methods have proved to be effective in predicting stock prices and selecting stocks that can outperform the general market. This article implements and evaluates the robustness of the random forest (RF) model in the context of the stock selection strategy. The model is trained for stocks in the Chinese stock market, and two types of feature spaces, fundamental/technical feature space and pure momentum feature space, are adopted to forecast the price trend in the long run and the short run, respectively. It is evidenced that both feature paradigms have led to remarkable excess returns during the past five out-of-sample period years, with the Sharpe ratios calculated to be 2.75 and 5 for the portfolio net value of the multi-factor space strategy and momentum space strategy, respectively. Although the excess return has weakened in recent years with respect to the multi-factor strategy, our findings point to a less efficient market that is far from equilibrium.Entities:
Keywords: Computer science; Economics; Excess return; Finance; Machine learning; Random forests; Stock selection
Year: 2019 PMID: 31463404 PMCID: PMC6709379 DOI: 10.1016/j.heliyon.2019.e02310
Source DB: PubMed Journal: Heliyon ISSN: 2405-8440
Fundamental features. All factors are acquired from the Wind database. Some factors, such as EP, BP, ROE, etc., are strongly correlated with the average stock returns in mature stock markets, as indicated in the listed references therein, while others are believed to have significant explanatory power in practical stock investment.
| Factors | Name | Description |
|---|---|---|
| EP ( | The ratio of earnings to price | Determine whether shares are correctly valued in relation to one another |
| BP ( | The ratio of book to price | Used to compare a company's current market value to its book value |
| SP | The ratio of sales to price | Used to determine the value of a stock relative to its past business performance |
| Net profits yoy | The growth rate of net profits year on year | Used to estimate the company's business prospect |
| Business income yoy | The growth rate of business income year on year | Used to estimate the company's growth and development capabilities |
| ROA | The return on assets | Reflects by percentage how profitable a company's assets are in generating revenue |
| ROE ( | The return on equity | Used to measure how well a company uses investments to generate earnings growth |
| Market cap ( | Market capitalization calculated as price times shares outstanding | Reflects how much money is raised and the size of listed companies |
Technical features.
| Factors | Description | Formula |
|---|---|---|
| turnover_20, | Refers to the moving average of the turnover over a certain period | |
| close_0/close_9, close_0/close_19, close_0/close_39, close_0/close_59, close_0/close_119 | Refers to the momentum with different time lags and can be used to help identify the trend of the price process | |
| close_19/close_0, | Refers to the reversal of momentum | |
| adjusted_close_0/close_59, | Refers to the momentum with different time lags, excluding the most recent month | |
| vol10/vol20, vol10/vol40, | Refers to a rate of acceleration of a stock’s volume and can be used to help identify trend lines of volume | |
| volatility_10, volatility_20, | Refers to the volatility over the past | |
| std(volume_10), | Refers to the standard deviation of trading volume time series over the past |
Fig. 1The dependence of the strategy performance on the number of trees, which is set to be in {20, 40, 60, 80, 100, 120}. (a) Strategy performance, as represented by the net asset value portfolio, the dependence on the number of trees; (b) Hedged net asset value dependence on the number of trees; (c) Estimators of the net asset value portfolio and their dependence on the number of trees; (d) Oob score dependence on the number of trees.
Fig. 2The dependence of the strategy performance on the number of sample classes, which is set to be in {5, 10, 15}. (a) Portfolio NV dependence on the number of sample classes; (b) Hedged NV dependence on the number of sample classes; (c) Estimators of the NV portfolio and their dependence on the number of sample classes; (d) Oob score dependence on the number of sample classes.
Fig. 3Dependence of strategy performance on the training period, which is set to be in {75, 125, 175, 252}. (a) Portfolio NV dependence on the training period; (b) Hedged NV dependence on the training period; (c) Estimators of portfolio NV and their dependence on the training period; (d) Oob score dependence on the training period.
Fig. 4Dependence of strategy performance on the rolling period, which is set to be in {20, 40, 60, 80}. (a) Portfolio NV dependence on the rolling period; (b) Hedged NV dependence on the rolling period; (c) Estimators of portfolio NV and their dependence on the rolling period; (d) Oob score dependence on the rolling period.
Fig. 5Weight distribution of 40 factors used in RF model.
Fig. 6Comparison of the momentum space and multi-factor space strategy performances. (a) Portfolio NV comparison; (b) Hedged NV comparison.
Modal parameters and performance estimators for the momentum space strategy and multi-factor space strategy.
| Momentum space strategy | Multi-factor space strategy | |
|---|---|---|
| training period (D) | 125 | 125 |
| rolling period (D) | 20 | 20 |
| sample class no. | 5 | 5 |
| tree no. | 60 | 60 |
| features | Momentum series with different lags | Proprietary multi-factors |
| annual return | 1.8556 | 1.0121 |
| maximal drawdown | 0.569 | 0.4651 |
| Sharpe | 5.0068 | 2.7509 |
| mean oob | 0.2752 | 0.6814 |
Daily return characteristics of portfolio, after transaction costs for DNN, LR, RF compared to general market.
| DNN | LR | RF | General Market | |
|---|---|---|---|---|
| daily mean return | 0.002 | 0.0012 | 0.0018 | 0.0006 |
| t-statistics | 2.8365 | 1.8559 | 2.6435 | 1.219 |
| standard deviation | 0.0228 | 0.0216 | 0.0225 | 0.0186 |
| Skewness | -0.7898 | -0.9861 | -0.7518 | -1.0383 |
| Kurtosis | 6.2592 | 7.5038 | 5.9751 | 7.0337 |
| 5-percent VaR | -0.0414 | -0.0333 | -0.0368 | -0.033 |
| Maximum drawdown | 0.5016 | 0.4534 | 0.4651 | 0.5435 |
| Calmar | 2.4618 | 0.9656 | 2.176 | 0.3171 |