| Literature DB >> 27186503 |
Yeung-Ja James Goo1, Der-Jang Chi2, Zong-De Shen1.
Abstract
The purpose of this study is to establish rigorous and reliable going concern doubt (GCD) prediction models. This study first uses the least absolute shrinkage and selection operator (LASSO) to select variables and then applies data mining techniques to establish prediction models, such as neural network (NN), classification and regression tree (CART), and support vector machine (SVM). The samples of this study include 48 GCD listed companies and 124 NGCD (non-GCD) listed companies from 2002 to 2013 in the TEJ database. We conduct fivefold cross validation in order to identify the prediction accuracy. According to the empirical results, the prediction accuracy of the LASSO-NN model is 88.96 % (Type I error rate is 12.22 %; Type II error rate is 7.50 %), the prediction accuracy of the LASSO-CART model is 88.75 % (Type I error rate is 13.61 %; Type II error rate is 14.17 %), and the prediction accuracy of the LASSO-SVM model is 89.79 % (Type I error rate is 10.00 %; Type II error rate is 15.83 %).Entities:
Keywords: Classification and regression tree (CART); Data mining; Going concern prediction; Least absolute shrinkage and selection operator (LASSO); Neural network (NN); Support vector machine (SVM)
Year: 2016 PMID: 27186503 PMCID: PMC4846611 DOI: 10.1186/s40064-016-2186-5
Source DB: PubMed Journal: Springerplus ISSN: 2193-1801
Fig. 1Neural network model
Samples
| Year | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GCD samples | 20 | 2 | 4 | 4 | 4 | 1 | 4 | 2 | 2 | 1 | 2 | 2 | 48 |
| NGCD samples | 60 | 6 | 12 | 12 | 12 | 3 | 12 | 6 | 6 | 3 | 6 | 6 | 144 |
Research variables
| No. | Variable description/Definition or formula | Sources |
|---|---|---|
| X1 | Total assets: Natural logarithm of total assets | Zhou et al. ( |
| X2 | Net sales: Natural logarithm of net sales | Tang and Firth ( |
| X3 | Current ratio: Current assets/Current liabilities | Lin ( |
| X4 | Debt ratio: Total liabilities/Total assets | Lin ( |
| X5 | Current assets: Natural logarithm of current assets | Korol ( |
| X6 | Undistributed surplus: Natural logarithm of undistributed surplus | Chen and Lee ( |
| X7 | Long term liabilities: Natural logarithm of long term liabilities | Korol ( |
| X8 | Inventory: Natural logarithm of inventory | Salehi and Fard ( |
| X9 | Total equity: Natural logarithm of total equity | Korol ( |
| X10 | Total liabilities: Natural logarithm of total liabilities | Chen and Lee ( |
| X11 | Net profit before tax: Income before tax | Chen et al. ( |
| X12 | Operating cash flow: Cash flow from operating activities | Jiang and Habib ( |
| X13 | Accounts receivable turnover: Net sales/Average accounts receivable | Sun and Li ( |
| X14 | Inventory turnover: Cost of goods sold/Average inventory | Zhou et al. ( |
| X15 | Stockholding ratio of directors and supervisors: Number of stocks held by directors and supervisors/Total number of common stock outstanding | Chen and Lee ( |
| X16 | Big CPA firm or not (Big 4 in Taiwan): 1 for companies audited by BIG4, otherwise is 0 | Jiang and Habib ( |
| X17 | Change CPA firm (CPA) or not: 1 is for change; 0 is for non-change | Anandarajan and Anandarajan ( |
| X18 | Current liabilities: Natural logarithm of current liabilities | Salehi and Fard ( |
| X19 | Operating income: Natural logarithm of operating income | Salehi and Fard ( |
| X20 | Total assets turnover: Net Sales/Average total assets | Sun and Li ( |
| X21 | Earnings before interest and tax (EBIT) | Salehi and Fard ( |
| X22 | Return on assets (ROA): [Net income + interest expense × (1–tax rate)]/Average total assets | Martens et al. ( |
Fig. 2Research process
LASSO variables’ screening process
| Steps | Work-G1 (AIC) | Work-G2a (AIC) | Work-G3 (AIC) | Work-G4b (AIC) | Work-G5 (AIC) |
|---|---|---|---|---|---|
| 1 | X4 (−77.5676) | X4 (−94.7118) | X4 (−66.0500) | X4 (−83.1760) | X4 (−71.2937) |
| 2 | X22 (−108.2326) | X6 (−93.3790) | X6 (−80.9976) | X22 (−83.9267) | X22 (−115.3547) |
| 3 | X11 (−116.1226) | X22 (−94.0645) | X22 (−79.4015) | X6 (−87.0297) | X11 (−125.4222) |
| 4 | X6 (−127.3604) | X19 (−93.0137) | X19 (−129.3612) | X20 (−85.2646) | X20 (−123.5628) |
| 5 | X20 (−146.4499) | X20 (−100.9320) | X13 (−134.4688) | X15 (−94.1284) | X6 (−124.3376) |
| 6 | X7 (−152.5126) | X15 (−101.0658) | X14 (−132.8479) | X11 (−95.2185) | X14 (−133.9785) |
| 7 | X5 (−152.5561) | X17 (−100.642) | X20 (−134.1510) | X14 (−107.4634) | X16 (−134.0137) |
| 8 | X14 (−104.7244) | X17 (−136.4395) | X1 (−120.0362) | ||
| 9 | X11 (−102.8433) | X16 (−142.2861) | X9 (−120.4143) | ||
| 10 | X13 (−107.1809) | ||||
| 11 | X5 (−107.8717) | ||||
| 12 | X12 (−116.8996) | ||||
| 13 | X16 (−124.2823) |
aX9 effect entered at step, AIC value is −104.7244, removed at step 13, AIC value form −107.8717 decease to −115.5186
bX21 effect entered at step 5, AIC value is −93.7699, removed at step 9, AIC value form −107.4634 decease to −112.5140
Fig. 3LASSO variables screening process Work-Group 1
Fig. 4LASSO variables screening process Work-Group 2
Fig. 5LASSO variables screening process Work-Group 3
Fig. 6LASSO variables screening process Work-Group 4
Fig. 7LASSO variables screening process Work-Group 5
Descriptive statistics of input variables
| Variable | N | Mean | SD | Min | Max | |
|---|---|---|---|---|---|---|
| X4 | Debt ratio | 192 | 51.0965 | 21.6263 | 4.8700 | 101.9700 |
| X6 | Undistributed surplus | 192 | −346,749.52 | 2,210,187.98 | −22,801,544.00 | 5,561,297.0000 |
| X20 | Total assets turnover | 192 | 0.8593 | 0.6895 | 0.0300 | 4.8400 |
| X22 | Return on assets (ROA) | 192 | −0.0756 | 0.2762 | −2.0997 | 0.3695 |
Correlation of input variables
| Input variable | X4 | X6 | X20 | X22 | |
|---|---|---|---|---|---|
| X4 | Debt ratio | 1 | – | – | – |
| X6 | Undistributed surplus | −0.3137 | 1 | – | – |
| <0.0001 | |||||
| X20 | Total assets turnover | 0.0048 | 0.2430 | 1 | – |
| 0.9478 | 0.0007 | ||||
| X22 | Return on assets (ROA) | −0.2752 | 0.2146 | 0.1941 | 1 |
| 0.0001 | 0.0028 | 0.0070 | |||
LASSO–NN model—the fivefold cross validation results
| Subset | Training set | Testing set | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Predicted group | Hit ratio (%) | Type I error (%) | Type II error (%) | Predicted group | Hit ratio (%) | Type I error (%) | Type II error (%) | |||
| 1 | 71 | 1 | 98.96 | 1.39 | 0.00 | 70 | 2 | 94.79 | 2.78 | 12.50 |
| 0 | 24 | 3 | 21 | |||||||
| 2 | 70 | 2 | 90.62 | 2.78 | 29.17 | 60 | 12 | 85.42 | 16.67 | 8.33 |
| 7 | 17 | 2 | 22 | |||||||
| 3 | 69 | 3 | 92.71 | 4.17 | 4.17 | 64 | 8 | 90.62 | 11.11 | 4.17 |
| 1 | 23 | 1 | 23 | |||||||
| 4 | 60 | 12 | 87.50 | 16.67 | 12.50 | 59 | 13 | 85.42 | 18.06 | 4.17 |
| 3 | 21 | 1 | 23 | |||||||
| 5 | 70 | 2 | 96.88 | 2.78 | 4.17 | 63 | 9 | 88.54 | 12.50 | 8.33 |
| 1 | 23 | 2 | 22 | |||||||
| Avg. | 93.33 | 5.56 | 10.00 | 88.96 | 12.22 | 7.50 | ||||
LASSO–CART model—the fivefold cross validation results
| Subset | Training set | Testing set | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Predicted group | Hit ratio (%) | Type I error (%) | Type II error (%) | Predicted group | Hit ratio (%) | Type I error (%) | Type II error (%) | |||
| 1 | 66 | 6 | 93.75 | 8.33 | 0.00 | 68 | 4 | 93.75 | 5.56 | 8.33 |
| 0 | 24 | 2 | 22 | |||||||
| 2 | 70 | 2 | 93.75 | 2.78 | 16.67 | 57 | 15 | 86.46 | 20.83 | 16.67 |
| 4 | 20 | 4 | 20 | |||||||
| 3 | 67 | 5 | 92.71 | 6.94 | 8.33 | 65 | 7 | 90.62 | 9.72 | 20.83 |
| 2 | 22 | 5 | 19 | |||||||
| 4 | 69 | 3 | 92.71 | 4.17 | 16.67 | 60 | 12 | 83.33 | 16.67 | 12.50 |
| 4 | 20 | 3 | 21 | |||||||
| 5 | 72 | 0 | 94.79 | 0.00 | 20.83 | 61 | 11 | 89.58 | 15.28 | 12.50 |
| 5 | 19 | 3 | 21 | |||||||
| Avg. | 93.54 | 4.44 | 12.50 | 88.75 | 13.61 | 14.17 | ||||
LASSO–SVM model—the fivefold cross validation results
| Subset | Training set | Testing set | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Predicted group | Hit ratio (%) | Type I error (%) | Type II error (%) | Predicted group | Hit ratio (%) | Type I error (%) | Type II error (%) | |||
| 1 | 71 | 1 | 96.88 | 1.39 | 8.33 | 66 | 6 | 91.67 | 8.33 | 8.33 |
| 2 | 22 | 2 | 22 | |||||||
| 2 | 70 | 2 | 90.62 | 2.78 | 29.17 | 66 | 6 | 89.58 | 8.33 | 16.67 |
| 7 | 17 | 4 | 20 | |||||||
| 3 | 71 | 1 | 92.71 | 1.39 | 25.00 | 66 | 6 | 88.54 | 8.33 | 20.83 |
| 6 | 18 | 5 | 19 | |||||||
| 4 | 68 | 4 | 87.50 | 5.56 | 33.33 | 62 | 10 | 86.46 | 13.89 | 12.50 |
| 8 | 16 | 3 | 21 | |||||||
| 5 | 72 | 0 | 96.88 | 0.00 | 12.50 | 70 | 2 | 92.71 | 2.78 | 20.83 |
| 3 | 21 | 5 | 19 | |||||||
| Avg. | 92.92 | 2.22 | 21.67 | 89.79 | 10.00 | 15.83 | ||||
Fig. 8Weight of each node of the NN model
Fig. 9Importance of variables
Statistical tests
| Statistical test method | Statistical test | NN–CART | NN–SVM |
|---|---|---|---|
| Wilcoxon test | Z | −1.9335 | −2.0280 |
| one-sided pr <Z | 0.0266 | 0.2130 | |
| two-sided pr <|Z| | 0.0532* | 0.0426** | |
| Kruskal–Wallis test | Chi square | 4.1654 | 4.5570 |
| DF | 1 | 1 | |
| Pr >Chi square | 0.0413** | 0.0328** |
* Significant at P < 0.1; ** significant at P < 0.05, *** significant at P < 0.01