| Literature DB >> 35601748 |
Şirin Özlem1, Omer Faruk Tan2.
Abstract
This study predicts the cash holdings policy of Turkish firms, given the 20 selected features with machine learning algorithm methods. 211 listed firms in the Borsa Istanbul are analyzed over the period between 2006 and 2019. Multiple linear regression (MLR), k-nearest neighbors (KNN), support vector regression (SVR), decision trees (DT), extreme gradient boosting algorithm (XGBoost) and multi-layer neural networks (MLNN) are used for prediction. Results reveal that MLR, KNN, and SVR provide high root mean square error (RMSE) and low R2 values. Meanwhile, more complex algorithms, such as DT and especially XGBoost, derive higher accuracy with a 0.73 R2 value. Therefore, using advanced machine learning algorithms, we may predict cash holdings considerably.Entities:
Keywords: Cash holdings; MLNN; Machine learning; Turkey; XGBoost
Year: 2022 PMID: 35601748 PMCID: PMC9113774 DOI: 10.1186/s40854-022-00351-8
Source DB: PubMed Journal: Financ Innov ISSN: 2199-4730
Definition of variables and determinants factors of cash holdings
| Explanatory variables | Definitions | Studies | Source |
|---|---|---|---|
| CASH | The ratio of cash and cash equivalents to the total assets | Thomson Reuters | |
| DIV | The ratio of total dividend payments to the total assets | Bigelli and Sánchez-Vidal ( | As Above |
| SG | Annual change in sales growth (%) | Bigelli and Sánchez-Vidal ( | As Above |
| SIZE | Natural logarithm of total assets in current USD | Bigelli and Sánchez-Vidal ( | As Above |
| CAPEX | The ratio of capital expenditure to the total assets | Boubakri et al. ( | As Above |
| CF | The ratio of the sum of pre-tax income plus depreciation to the total assets | Boubakri et al. ( | As Above |
| IE | The ratio of interest expense to the total assets | Schauten et al. ( | As Above |
| NWC | The ratio of non-cash working capital to the total assets | Bigelli and Sánchez-Vidal ( | As Above |
| TANG | The ratio of net fixed assets to the total assets | Bhuiyan and Hooks ( | As Above |
| STD | The ratio of short-term debt to the total assets | Benkraiem et al. ( | As Above |
| ROA | The ratio of net income to the total assets | Batuman et al. ( | As Above |
| ROE | The ratio of net income to the total equity | Manoel et al. ( | As Above |
| AR | The ratio of account receivable to the total assets | Mohammadi et al. ( | As Above |
| AP | The ratio of accounts payable to the total assets | Chen et al. ( | As Above |
| CR | The ratio of current assets to current liabilities | Manoel et al. ( | As Above |
| EPS | Earnings per share | Sarfriz et al. ( | As Above |
| ROIC | The ratio of net operating profit after tax to the total assets | Sarfriz et al. ( | As Above |
| NET MARGIN | The ratio of net income to the net sales | Angelovska and Valentinčič ( | As Above |
| PRETAX MARGIN | The ratio of profit before tax to the net sales | Mihai et al. ( | As Above |
| AGE | The foundation year of the firm | Bigelli and Sánchez-Vidal ( | Google Search |
| WUI_TURKEY | Annual average of quarterly data of World Uncertainty Index |
Fig. 1The representation of a multi-layer neural network (Dixon et al. 2017). The input layer consists of explanatory variables called features, and the information is forwarded from this layer to the hidden layers. On the arcs of hidden layers, parameters called weights and biases exist. The goal of the network is to find the optimal parameter settings that minimize the error between the estimated and the actual target value
Fig. 2Plots showing the regression assumptions for the model. Errors are approximately normally distributed (2.1). The mean of errors is approximately zero (2.2). This shows that homoscedasticity (errors almost have equal variances) holds (2.3). Outliers are negligible (2.4)
Variance inflation factor
| Variables | VIF |
|---|---|
| STD | 2.34 |
| CF | 2.30 |
| IE | 2.28 |
| NWC | 2.26 |
| ROA | 1.63 |
| PRETAXMARGIN | 1.60 |
| ROIC | 1.58 |
| NETMARGIN | 1.58 |
| ROE | 1.51 |
| PPE | 1.38 |
| SIZE | 1.33 |
| CR | 1.26 |
| AR | 1.26 |
| DIV | 1.23 |
| AGE | 1.17 |
| EPS | 1.13 |
| WUI_TURKEY | 1.09 |
| CAPEX | 1.08 |
| SG | 1.03 |
| AP | 1.02 |
Performance metrics with MLR
| MLR | Model with 19 features | Model with 15 features |
|---|---|---|
| RMSE | 0.1109 | 0.1036 |
| R2 | 0.0406 | 0.1626 |
Correlation between features and CASH
| Variable | Correlation coefficient |
|---|---|
| CASH | 1.0000 |
| CR | 0.3556 |
| TANG | 0.2851 |
| CF | 0.2363 |
| DIV | 0.2356 |
| EPS | 0.2223 |
| STD | 0.1717 |
| IE | 0.1679 |
| SIZE | 0.1660 |
| ROIC | 0.1233 |
| AR | 0.1112 |
| PRETAXMARGIN | 0.0861 |
| ROA | 0.0783 |
| NETMARGIN | 0.0578 |
| AP | 0.0559 |
Performance metrics for MLR with various predictors
| Model no | Model predictors | R2 | RMSE |
|---|---|---|---|
| 1 | CR, TANG, CF, DIV, EPS, STD | 0.1611 | 0.1037 |
| 2 | NWC, CR, SIZE, CF, Age | 0.161 | 0.1036 |
| 3 | Age, WUI, CAPEX, DIV, IE | 0.0755 | 0.1109 |
| 4 | CF, SIZE, NWC, SSG, STD | 0.1022 | 0.1093 |
Fig. 3RMSE with different k settings. With all 19 features, the best hyperparameter k is 27, whereas with reduced 15 features, the best k is 57
Performance metrics with KNN
| KNN | Model with 19 features | Model with 15 features |
|---|---|---|
| k | 27 | 57 |
| RMSE | 0.1064 | 0.1071 |
| R2 | 0.1071 | 0.1228 |
Performance metrics with SVR
| SVR | Model with 19 features | Model with 15 features |
|---|---|---|
| RMSE | 0.0796 | 0.0887 |
| R2 | 0.5152 | 0.3984 |
Performance metrics with decision trees
| Decision trees | Model with 19 features | Model with 15 features | Model with 10 features | Model with 5 features |
|---|---|---|---|---|
| Max depth | 5 | 5 | 5 | 5 |
| RMSE | 0.0906 | 0.0915 | 0.0903 | 0.0899 |
| R2 | 0.3723 | 0.3756 | 0.3812 | 0.3822 |
Performance metrics with RF
| Random forest | Model with 19 features | Model with 15 features | Model with 10 features | Model with 5 features |
|---|---|---|---|---|
| n_estimators | 600 | 1000 | 1000 | 1000 |
| RMSE | 0.0722 | 0.0718 | 0.0713 | 0.0710 |
| R2 | 0.6016 | 0.6054 | 0.6111 | 0.6147 |
XGBoost best parameter setting
| Colsample by tree | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | |
|---|---|---|---|---|---|---|
| n estimators | 500 | 600 | ||||
| Gamma | 0 | |||||
| Max depth | 3 | 5 | ||||
| Reg lambda | 1 | |||||
| Eta | 0.01 | 0.05 |
The hyperparameter values displayed in bold are the best settings that provide the maximum R2. The optimal hyperparameter setting is obtained by assigning the following values: colsample by tree = 1, n estimators = 700, gamma = 1, max tree depth = 4, reg lambda i = 1.5, and eta = 0.1
Performance metrics with XGBoost
| XGBoost | Model with 19 features | Model with 15 features | Model with 10 features | Model with 5 features |
|---|---|---|---|---|
| RMSE | 0.0599 | 0.1000 | 0.0991 | 0.0990 |
| R2 | 0.7258 | 0.2340 | 0.2488 | 0.2495 |
Fig. 4Feature importance bar charts for several machine learning algorithms
Performance metrics with MLNN
| MLNN | Model with 19 features | Model with 15 features | Model with 10 features | Model with 5 features |
|---|---|---|---|---|
| Hidden layers | 3 | 5 | 5 | 5 |
| RMSE | 0.1016 | 0.1086 | 0.0991 | 0.1136 |
| R2 | 0.2105 | 0.0974 | 0.060 | 0.0121 |
Performance metrics comparison for the algorithms
| MLR | KNN | SVR | Decision trees | Random forest | XGBoost | |
|---|---|---|---|---|---|---|
| RMSE | 0.1036 | 0.1064 | 0.0796 | 0.0899 | 0.071 | 0.0599 |
| % Improvement in RMSE | 0 | − 2.7027 | 23.1660 | 13.2239 | 31.4672 | 42.1815 |
| R2 | 0.1626 | 0.1228 | 0.5152 | 0.3822 | 0.6147 | 0.7258 |
| % Improvement in R2 | 32.4104 | 0 | 319.5440 | 211.2378 | 400.5700 | 491.0423 |
The algorithms applied in the study are evaluated based on the RMSE and R2 values. For the RMSE, the minimum value is obtained from the XGBoost algorithm. This RMSE is 42.18% lower than that of MLR algorithm, which is the highest. For R2, the maximum value is obtained by XGBoost algorithm again. This R2 is 42.18% lower than that of KNN algorithm, which is the lowest R2 value among all