| Literature DB >> 24707201 |
A Gaspar-Cunha1, G Recio2, L Costa3, C Estébanez2.
Abstract
Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy). The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier.Entities:
Mesh:
Year: 2014 PMID: 24707201 PMCID: PMC3953468 DOI: 10.1155/2014/314728
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1Confusion matrix.
Figure 2ROC curve.
Algorithm 1Reduced Pareto set genetic algorithm (RPSGA).
Set of features considered for the industrial French companies.
| Feature | Designation |
|---|---|
| F1 | Number of employees |
| F2 | Capital employed/fixed assets |
| F3 | Financial debt/capital employed |
| F4 | Depreciation of tangible assets |
| F5 | Working capital/current assets |
| F6 | Current ratio |
| F7 | Liquidity ratio |
| F8 | Stock turnover days |
| F9 | Collection period |
| F10 | Credit Period |
| F11 | Turnover per employee (thousands euros) |
| F12 | Interest/turnover |
| F13 | Debt period days |
| F14 | Financial debt/equity |
| F15 | Financial debt/cashflow |
| F16 | Cashflow/turnover |
| F17 | Working capital/turnover (days) |
| F18 | Net current assets/turnover (days) |
| F19 | Working capital needs/turnover |
| F20 | Export |
| F21 | Value added per employee |
| F22 | Total assets/turnover |
| F23 | Operating profit margin |
| F24 | Net profit margin |
| F25 | Added value margin |
| F26 | Part of employees |
| F27 | Return on capital employed |
| F28 | Return on total assets |
| F29 | EBIT margin |
| F30 | EBITDA margin |
Set of features considered for the German Credit.
| Feature | Designation |
|---|---|
| F1 | Status of existing checking account |
| F2 | Duration in months |
| F3 | Savings account/bonds |
| F4 | Purpose |
| F5 | Credit amount |
| F6 | Savings account/bonds |
| F7 | Present employment since |
| F8 | Instalment rate in percentage of disposable income |
| F9 | Personal status and sex |
| F10 | Other debtors/guarantors |
| F11 | Present residence since |
| F12 | Property |
| F13 | Age in years |
| F14 | Other instalment plans |
| F15 | Housing |
| F16 | Number of existing credits at this bank |
| F17 | Job |
| F18 | Number of people being liable to provide maintenance for |
| F19 | Telephone |
| F20 | Foreign worker |
Set of computational experiments (N1 is the maximum number of features in the initial population; X means experiment not done).
| Exp | SVM type | Objectives |
|
|---|---|---|---|
| 1 | C-SVC14 | NF + Acc | 30-20-14 |
|
| |||
| 2 | C-SVC01 | NF + Acc | 30-20-14 |
| 3 | C-SVC02 | NF + | 30-20-14 |
| 4 | C-SVC03 | NF + | 30-20-14 |
| 5 | C-SVC04 | NF + | 30-20-14 |
| 6 | C-SVC05 | NF + | 30-20-14 |
| 7 | C-SVC06 | NF + | 30-20-14 |
|
| |||
| 8 | C-SVC07 | NF + | 30-20-14 |
| 9 | C-SVC08 | NF + | 5-5-5 |
| 10 | C-SVC09 | NF + | 15-15-10 |
| 11 | C-SVC10 | NF + | 25- |
|
| |||
| 12 |
| NF + Acc | 30-20-14 |
| 13 |
| NF + | 30-20-14 |
| 14 |
| NF + | 30-20-14 |
Figure 3Run-a of Experiment 2 (initial population and nondominated solutions of the final population).
Optimal solutions for Run-a of Experiment 2 (Diane 2005 database).
| NF | Features | Acc | TM | TF |
|
|
|---|---|---|---|---|---|---|
| 3 | 1, 14, 24 | 76.3% | H | 50.4% | 0.501 | 10.4 |
| 4 | 1, 14, 16, 24 | 78.6% | H | 50.6% | 0.403 | 39.6 |
| 5 | 1, 8, 14, 21, 24 | 79.5% | H | 52.0% | 0.280 | 211.0 |
| 6 | 1, 8, 14, 15, 16, 24 | 79.8% | H | 52.2% | 0.297 | 173.9 |
| 7 | 1, 8, 12, 14, 15, 16, 24 | 80.3% | H | 54.4% | 0.543 | 112.8 |
| 9 | 1, 8, 12, 14, 21, 23, 24, 25, 27 | 81.2% | H | 52.2% | 0.134 | 86.9 |
| 10 | 1, 8, 12, 14, 15, 16, 21, 23, 24, 25 | 81.4% | H | 53.6% | 0.343 | 114.6 |
| 11 | 1, 8, 12, 14, 15, 16, 21, 23, 24, 25, 27 | 81.7% | H | 52.2% | 0.164 | 59.7 |
| 14 | 1, 5, 7, 8, 9, 12, 14, 15, 16, 18, 21, 23, 24, 25 | 81.9% | H | 52.1% | 0.539 | 23.9 |
| 15 | 1, 2, 5, 7, 8, 9, 12, 14, 15, 16, 18, 21, 23, 24, 25 | 82.5% | H | 52.3% | 0.384 | 26.8 |
| 16 | 1, 2, 5, 6, 7, 8, 9, 12, 14, 15, 16, 18, 21, 23, 24, 27 | 82.8% | H | 52.2% | 0.405 | 24.5 |
| 17 | 1, 2, 5, 6, 7, 8, 9, 12, 14, 15, 16, 18, 21, 23, 24, 25, 27 | 83.5% | H | 52.1% | 0.354 | 24.8 |
Figure 4All runs of Experiment 2 (nondominated solutions of the final population).
Optimal solutions for Experiment 2 (Diane 2005 database).
| NF | Run | Features | Acc | TM | TF |
|
|
|---|---|---|---|---|---|---|---|
| 2 | f | 7, 21 | 75.6% | H | 73.9% | 0.066 | 373.1 |
| 3 | f | 7, 21, 23 | 80.7% | H | 74.5% | 0.127 | 668.5 |
| 4 | f | 7, 21, 22, 29 | 80.8% | H | 74.3% | 0.0102 | 855.8 |
| 5 | f | 7, 13, 21, 23, 29 | 81.2% | H | 74.4% | 0.0104 | 844.4 |
| 6 | e | 6, 12, 13, 19, 21, F29 | 81.8% | H | 75.3% | 0.373 | 41.5 |
| 8 | f | 7, 8, 13, 18, 21, 23, 27, 28 | 83.0% | H | 75.5% | 0.195 | 754.0 |
| 9 | f | 7, 8, 13, 18, 21, 22, 23, 27, 28 | 84.2% | H | 75.1% | 0.179 | 901.7 |
| 10 | f | 7, 8, 10, 13, 18, 21, 22, 23, 27, 28 | 85.8% | H | 75.3% | 0.156 | 866.4 |
Figure 5Optimal Pareto fronts for Diane 2005 data (10 runs).
Figure 6Optimal Pareto fronts for Diane 2005 data (10 runs).
Figure 7Optimal Pareto fronts for Diane 2005 data (10 runs).
Figure 8Optimal Pareto fronts for Diane 2006 data (10 runs).
Figure 9Optimal Pareto fronts for German Credit data (10 runs).
Figure 10Optimal Pareto fronts for Australian Credit data (10 runs).
Results summary for Experiments 1, 2, and 12 and for all databases.
| Data set | Experiment | Condition | Run | NF | Acc | Features | TM | TF |
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| Diane05 | 1 | Best | e | 11 | 81.94% | 7, 8, 10, 12, 14, 18, 21, 22, 24, 27, 30 | H | 70.00% | 0.10 | 10.00 |
| NF ≤ 10 | d | 10 | 81.67% | 3, 8, 10, 14, 15, 19, 21, 22, 24, 30 | H | 70.00% | 0.10 | 10.00 | ||
| NF ≤ 5 | e | 5 | 80.00% | 7, 8, 18, 21, 30 | H | 70.00% | 0.10 | 10.00 | ||
| 2 | Best | f | 10 | 85.81% | 7, 8, 10, 13, 18, 21, 22, 23, 27, 28 | H | 75.32% | 0.16 | 866.44 | |
| NF ≤ 10 | f | 10 | 85.81% | 7, 8, 10, 13, 18, 21, 22, 23, 27, 28 | H | 75.32% | 0.16 | 866.44 | ||
| NF ≤ 5 | f | 5 | 81.17% |
| H | 74.38% | 0.01 | 844.37 | ||
| 12 | Best | b | 11 | 85.57% | 7, 10, 12, 13, 15, 16, 18, 22, 23, 24, 27 | H | 75.16% | 0.77 | 0.46 | |
| NF ≤ 10 | e | 9 | 84.56% | 5, 6, 10, 12, 13, 19, 21, 22, 29 | H | 75.21% | 0.59 | 0.47 | ||
| NF ≤ 5 | e | 5 | 80.21% | 1, 10, 12, 13, 29 | H | 75.94% | 1.90 | 0.49 | ||
|
| ||||||||||
| Diane06 | 1 | Best | e | 12 | 94.17% | 1, 8, 9, 10, 11, 12, 14, 18, 20, 21, 24, 26 | H | 70.00% | 0.10 | 10.00 |
| NF ≤ 10 | e | 10 | 93.06% | 1, 8, 9, 10, 11, 14, 20, 21, 24, 26 | H | 70.00% | 0.10 | 10.00 | ||
| NF ≤ 5 | e | 5 | 91.94% | 1, 9, 11, 14, 24 | H | 70.00% | 0.10 | 10.00 | ||
| 2 | Best | b | 11 | 95.99% | 1, 10, 11, 12, 15, 19, 21, 24, 25, 27, 29 | H | 73.04% | 0.13 | 697.55 | |
| NF ≤ 10 | b | 10 | 95.73% | 1, 10, 11, 12, 15, 19, 21, 24, 25, 27 | H | 72.63% | 0.21 | 705.63 | ||
| NF ≤ 5 | a | 5 | 92.38% | 11, 14, 19, 24, 28 | H | 74.86% | 0.17 | 281.47 | ||
| 12 | Best | e | 17 | 95.95% | 1, 2, 8, 10, 11, 12, 13, 14, 19, 20, 21, 22, 23, 24, 25, 29, 30 | H | 75.29% | 2.45 | 0.18 | |
| NF ≤ 10 | a | 9 | 95.43% | 1, 10, 11, 12, 14, 15, 19, 21, 24 | H | 72.70% | 3.45 | 0.27 | ||
| NF ≤ 5 | a | 5 | 93.92% |
| H | 75.33% | 2.08 | 0.34 | ||
|
| ||||||||||
| German | 1 | Best | d | 14 | 80.00% | 1, 2, 3, 5, 6, 10, 12, 14, 15, 16, 17, 18, 19, 20 | H | 70.00% | 0.10 | 10.00 |
| NF ≤ 10 | e | 10 | 78.67% | 1, 2, 3, 5, 6, 7, 10, 12, 18, 19 | H | 70.00% | 0.10 | 10.00 | ||
| NF ≤ 5 | a | 5 | 75.33% | 1, 3, 5, 8, 19 | H | 70.00% | 0.10 | 10.00 | ||
| 2 | Best | h | 16 | 81.25% | 1, 2, 3, 5, 6, 738310, 11, 12, 14, 15, 17, 18, 19, 20 | H | 68.04% | 0.01 | 534.75 | |
| NF ≤ 10 | a | 10 | 78.99% | 1, 2, 3, 4, 6, 8, 11, 12, 15, 19 | H | 58.63% | 0.05 | 62.26 | ||
| NF ≤ 5 | c | 5 | 77.16% | 1, 3, 5, 7, 12 | H | 58.45% | 0.17 | 72.52 | ||
| 12 | Best | h | 7 | 80.29% | 1, 2, 3, 5, 8, 11, 14 | H | 58.40% | 0.14 | 0.45 | |
| NF ≤ 10 | h | 7 | 80.29% | 1, 2, 3, 5, 8, 11, 14 | H | 58.40% | 0.14 | 0.45 | ||
| NF ≤ 5 | b | 5 | 77.43% |
| H | 77.48% | 1.41 | 0.45 | ||
|
| ||||||||||
| Australian | 1 | Best | h | 7 | 91.35% | 1, 6, 8, 10, 12, 13, 14 | H | 70.00% | 0.10 | 10.00 |
| NF ≤ 5 | h | 5 | 89.42% | 6, 8, 9, 10, 14 | H | 70.00% | 0.10 | 10.00 | ||
| 2 | Best | g | 9 | 93.00% | 2, 4, 5, 6, 8, 10, 11, 13, 14 | H | 70.94% | 0.02 | 180.23 | |
| NF ≤ 5 | e | 5 | 92.16% |
| H | 70.48% | 0.58 | 321.06 | ||
| 12 | Best | j | 10 | 92.86% | 2, 3, 4, 5, 6, 8, 10, 11, 12, 14 | H | 77.61% | 2.06 | 0.34 | |
| NF ≤ 5 | b | 5 | 92.00% | 3, 4, 5, 8, 10 | H | 71.02% | 2.65 | 0.34 | ||
Figure 11ROC curves for Diane 05 data (best of 10 runs).
Figure 12ROC curves for Diane 05 data (best of 10 runs).
Results summary for Experiments 8, 9, 10, 11, and 14 and for all databases.
| Dataset | Experiment | ROC area | Condition | Run | NF |
|
| Features | TM | TF |
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Diane05 | 8 | 0.872 |
| c | 12 | 77.1% | 9.4% | 8, 10, 12, 13, 15, 16, 18, 22, 23, 24, 25, 27 | H | 77.6% | 0.62 | 108.73 |
| NF ≤ 5 | i | 4 | 64.6% | 6.4% | 15, 18, 27, 28 | H | 75.2% | 0.03 | 336.98 | |||
| 9 | 0.859 |
| b | 10 | 72.0% | 8.8% | 7, 8, 10, 12, 13, 19, 22, 23, 25, 29 | H | 76.3% | 0.26 | 701.29 | |
| NF ≤ 5 | c | 5 | 63.0% | 7.2% | 13, 14, 18, 27, 28 | H | 76.0 % | 0.05 | 29.51 | |||
| 10 | 0.876 |
| d | 7 | 72.0% | 8.8% | 8, 10, 11, 16, 22, 24, 27 | H | 76.2% | 0.41 | 19.29 | |
| NF ≤ 5 | c | 5 | 64.5% | 7.3% |
| H | 78.5% | 0.22 | 107.89 | |||
| 11 | 0.877 |
| a | 18 | 77.0% | 8.6% | 5, 6, 7, 8, 10, 11, 12, 13, 16, 18, 19, 21, 22, 23, 24, 26, 28, 30 | H | 77.6% | 0.35 | 25.35 | |
| NF ≤ 5 | h | 4 | 60.4% | 5.8% | 3, 10, 14, 28 | H | 75.7% | 0.21 | 6.48 | |||
| 14 | 0.867 |
| f | 13 | 76.0% | 10.1% | 5, 7, 12, 13, 16, 19, 21, 22, 23, 24, 25, 27, 28 | H | 76.0% | 0.94 | 0.49 | |
| NF ≤ 5 | h | 6 | 50.3% | 3.7% | 6, 14, 21, 24, 28, 29 | H | 76.1% | 0.01 | 0.03 | |||
|
| ||||||||||||
| Diane06 | 8 | 0.981 |
| e | 9 | 95.3% | 8.0% | 4, 11, 12, 13, 19, 21, 22, 25, 29 | H | 76.1% | 1.03 | 157.00 |
| NF ≤ 5 | b | 3 | 68.1% | 0.0% | 1, 15, 28 | H | 78.1% | 6.27 | 60.84 | |||
| 9 | 0.985 |
| b | 7 | 96.8% | 8.0% | 5, 7, 10, 11, 21, 24, 25 | H | 75.5% | 1.15 | 371.95 | |
| NF ≤ 5 | b | 5 | 92.9% | 2.1% |
| H | 75.4% | 1.37 | 240.07 | |||
| 10 | 0.982 |
| i | 18 | 96.5% | 8.3% | 1, 3, 4, 6, 8, 11, 12, 13, 15, 16, 17, 19, 21, 22, 23, 24, 25, 30 | H | 77.1% | 3.50 | 18.67 | |
| NF ≤ 5 | c | 7 | 94.2% | 8.0% | 1, 4, 11, 21, 25, 27, 29 | H | 75.5% | 0.32 | 57.04 | |||
| 11 | 0.982 |
| d | 18 | 95.0% | 9.4% | 1, 3, 5, 8, 10, 11, 12, 13, 15, 16, 17, 19, 21, 22, 26, 27, 28, 30 | H | 75.6% | 2.39 | 17.30 | |
| NF ≤ 5 | e | 10 | 94.2% | 7.3% | 1, 3, 4, 7, 11, 14, 16, 19, 22, 28 | H | 75.5% | 7.43 | 880.32 | |||
| 14 | 0.981 |
| h | 11 | 96.2% | 7.3% | 1, 4, 7, 11, 13, 14, 16, 18, 20, 22, 28 | H | 75.6% | 5.77 | 0.34 | |
| NF ≤ 5 | a | 8 | 93.4% | 3.6% | 1, 11, 14, 19, 22, 24, 25, 30 | H | 75.8% | 6.01 | 0.49 | |||
|
| ||||||||||||
| German | 8 | 0.765 |
| h | 11 | 57.5% | 9.4% | 1, 2, 3, 5, 8, 11, 12, 14, 16, 19, 20 | H | 76.8% | 0.02 | 525.81 |
| NF ≤ 5 | h | 5 | 41.5% | 4.9% |
| H | 58.2% | 0.08 | 669.49 | |||
| 9 | 0.757 |
| c | 7 | 50.8% | 9.4% | 1, 2, 3, 5, 7, 12, 13 | H | 59.6% | 0.27 | 34.09 | |
| NF ≤ 5 | h | 4 | 41.8% | 6.5% | 1, 3, 5, 8 | H | 75.3% | 0.15 | 136.06 | |||
| 10 | 0.749 |
| e | 10 | 52.2% | 9.6% | 1, 2, 3, 5, 10, 14, 15, 16, 19, 20 | H | 72.1% | 0.03 | 773.25 | |
| NF ≤ 5 | g | 5 | 50.5% | 8.2% | 1, 2, 3, 5, 12 | H | 71.2% | 0.17 | 669.11 | |||
| 14 | 0.766 |
| c | 4 | 32.6% | 7.5% | 1, 2, 5, 7 | H | 59.1% | 0.50 | 0.47 | |
| NF ≤ 5 | c | 4 | 32.6% | 7.5% | 1, 2, 5, 7 | H | 59.1% | 0.50 | 0.47 | |||
|
| ||||||||||||
| Australian | 8 | 0.964 |
| b | 6 | 97.1% | 9.6% | 3, 4, 5, 8, 12, 14 | H | 76.7% | 0.53 | 473.81 |
| NF ≤ 5 | d | 5 | 82.7% | 3.5% | 4, 8, 9, 13 | H | 71.9% | 9.48 | 75.61 | |||
| 9 | 0.957 |
| g | 6 | 95.3% | 9.6% | 2, 3, 4, 5, 8, 10 | H | 71.0% | 1.12 | 8.05 | |
| NF ≤ 5 | a | 5 | 90.9% | 8.0% | 4, 5, 8, 11, 13 | H | 77.6% | 8.50 | 106.32 | |||
| 10 | 0.960 |
| j | 5 | 92.9% | 8.8% | 6, 8, 9, 13, 14 | H | 71.5% | 1.13 | 839.90 | |
| NF ≤ 5 | j | 5 | 92.9% | 8.8% | 6, 8, 9, 13, 14 | H | 71.5% | 1.13 | 839.90 | |||
| 14 | 0.967 |
| e | 5 | 93.2% | 9.5% |
| H | 70.5% | 1.30 | 0.29 | |
| NF ≤ 5 | e | 5 | 93.2% | 9.5% |
| H | 70.5% | 1.30 | 0.29 | |||