| Literature DB >> 25706988 |
Hong Wang1, Qingsong Xu1, Lifeng Zhou1.
Abstract
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.Entities:
Mesh:
Year: 2015 PMID: 25706988 PMCID: PMC4338292 DOI: 10.1371/journal.pone.0117844
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Algorithm 1. Bagging.
| 1: Let |
| 2: |
| 3: for( |
| 4: Create a bootstrapped training set |
| 5: Learn a specific base learner |
| 6: } |
| 7: The final learning algorithm |
Fig 1Classification when majority are from different populations.
Algorithm 2. A Lasso-Logistic Regression Ensemble(LLRE) Algorithm.
| 1: INPUT: |
| 2: |
| 3: OUTPUT: |
| 4: Ensemble of Classifiers |
| 5: |
| 6: Cluster the majority examples into |
| 7: The ensemble size |
| 8: |
| 9: Select the |
| 10: Generate a bootstrap sample of the minority data |
| 11: Train a LLR model |
| 12: Evaluate |
| 13: Record the performance |
| 14: Calculate |
| 15: |
| 16: |
| 17: |
| 18: In prediction, a sample ( |
Performance on two variable sets in terms of AUC.
| Classifier | Original variables | Generated variables |
|---|---|---|
| LLRE | 0.6796 | 0.8597 |
| RF | 0.8488 | 0.8587 |
| LLR | 0.4898 | 0.8567 |
| CART | 0.7702 | 0.7632 |
Fig 2AUC change with LLRE using different k.
Fig 3Top 20 important variables in terms of LLR occurrence.
Fig 4Top 20 important variables in terms of mean AUC decrease.
Performance comparison in terms of AUC.
| Run No | LLRE | RF | LLR | CART |
|---|---|---|---|---|
| 1 | 0.8598 | 0.857 | 0.8571 | 0.7632 |
| 2 | 0.8553 | 0.8538 | 0.8526 | 0.7676 |
| 3 | 0.8662 | 0.8609 | 0.8651 | 0.7786 |
| 4 | 0.8602 | 0.8576 | 0.8577 | 0.7778 |
| 5 | 0.858 | 0.8564 | 0.8559 | 0.7746 |
| 6 | 0.8662 | 0.8628 | 0.8638 | 0.7689 |
| 7 | 0.8544 | 0.8536 | 0.8526 | 0.77 |
| 8 | 0.8619 | 0.8617 | 0.8589 | 0.7749 |
| 9 | 0.8657 | 0.8606 | 0.8636 | 0.7832 |
| 10 | 0.8575 | 0.8569 | 0.8561 | 0.7665 |
| 11 | 0.8622 | 0.8578 | 0.8604 | 0.7762 |
| 12 | 0.8565 | 0.8551 | 0.8542 | 0.7748 |
| 13 | 0.8576 | 0.8519 | 0.8573 | 0.7763 |
| 14 | 0.8573 | 0.8537 | 0.8547 | 0.7761 |
| 15 | 0.8638 | 0.8648 | 0.8606 | 0.7699 |
| 16 | 0.8567 | 0.8535 | 0.8547 | 0.7728 |
| 17 | 0.8586 | 0.8579 | 0.8558 | 0.7783 |
| 18 | 0.8696 | 0.8631 | 0.8666 | 0.7792 |
| 19 | 0.8529 | 0.8523 | 0.8506 | 0.77 |
| 20 | 0.8651 | 0.8607 | 0.8609 | 0.7732 |
| 21 | 0.8537 | 0.8498 | 0.8506 | 0.7695 |
| 22 | 0.8652 | 0.8625 | 0.8635 | 0.7783 |
| 23 | 0.8603 | 0.8606 | 0.8578 | 0.7738 |
| 24 | 0.8607 | 0.856 | 0.8584 | 0.7692 |
| 25 | 0.8668 | 0.8642 | 0.8648 | 0.7763 |
| 26 | 0.859 | 0.8562 | 0.857 | 0.7698 |
| 27 | 0.8574 | 0.8553 | 0.8548 | 0.7807 |
| 28 | 0.8576 | 0.8543 | 0.8564 | 0.7685 |
| 29 | 0.8633 | 0.8599 | 0.8609 | 0.7685 |
| 30 | 0.8636 | 0.8617 | 0.8623 | 0.7775 |
Performance comparison in terms of F-measure.
| Run No | LLRE | RF | LLR | CART |
|---|---|---|---|---|
| 1 | 0.3856 | 0.2683 | 0.2599 | 0 |
| 2 | 0.3872 | 0.2727 | 0.2466 | 0 |
| 3 | 0.3777 | 0.2712 | 0.2626 | 0 |
| 4 | 0.3799 | 0.2694 | 0.2738 | 0 |
| 5 | 0.4004 | 0.2873 | 0.2793 | 0 |
| 6 | 0.3848 | 0.2699 | 0.2548 | 0 |
| 7 | 0.3756 | 0.2671 | 0.2555 | 0 |
| 8 | 0.394 | 0.2834 | 0.2575 | 0 |
| 9 | 0.4088 | 0.2933 | 0.2796 | 0 |
| 10 | 0.3729 | 0.2734 | 0.265 | 0 |
| 11 | 0.3759 | 0.2669 | 0.2647 | 0 |
| 12 | 0.3846 | 0.2725 | 0.2567 | 0 |
| 13 | 0.3939 | 0.2869 | 0.2759 | 0 |
| 14 | 0.3717 | 0.2815 | 0.2615 | 0 |
| 15 | 0.3806 | 0.2689 | 0.2672 | 0 |
| 16 | 0.3855 | 0.2877 | 0.2785 | 0 |
| 17 | 0.3963 | 0.2756 | 0.2677 | 0 |
| 18 | 0.372 | 0.2679 | 0.2466 | 0 |
| 19 | 0.3751 | 0.2579 | 0.2642 | 0 |
| 20 | 0.3807 | 0.2817 | 0.2664 | 0 |
| 21 | 0.3737 | 0.2804 | 0.2559 | 0 |
| 22 | 0.3836 | 0.2648 | 0.2604 | 0 |
| 23 | 0.3937 | 0.2699 | 0.2627 | 0 |
| 24 | 0.3876 | 0.2811 | 0.278 | 0 |
| 25 | 0.3951 | 0.2958 | 0.2823 | 0 |
| 26 | 0.3784 | 0.2744 | 0.2548 | 0 |
| 27 | 0.3895 | 0.2779 | 0.2648 | 0 |
| 28 | 0.3815 | 0.2621 | 0.2413 | 0 |
| 29 | 0.3716 | 0.2723 | 0.2517 | 0 |
| 30 | 0.3826 | 0.2705 | 0.2649 | 0 |