| Literature DB >> 34151000 |
Hongxiang Li1, Ao Feng1, Bin Lin1, Houcheng Su1, Zixi Liu1, Xuliang Duan1, Haibo Pu1, Yifei Wang2.
Abstract
Credit scoring is a very critical task for banks and other financial institutions, and it has become an important evaluation metric to distinguish potential defaulting users. In this paper, we propose a credit score prediction method based on feature transformation and ensemble model, which is essentially a cascade approach. The feature transformation process consisting of boosting trees (BT) and auto-encoders (AE) is employed to replace manual feature engineering and to solve the data imbalance problem. For the classification process, this paper designs a heterogeneous ensemble model by weighting the factorization machine (FM) and deep neural networks (DNN), which can efficiently extract low-order intersections and high-order intersections. Comprehensive experiments were conducted on two standard datasets and the results demonstrate that the proposed approach outperforms existing credit scoring models in accuracy. ©2021 Li et al.Entities:
Keywords: AutoEncoder; Boosting tree; Credit scoring; Deep neural network; Factorization machine; Feature transformation
Year: 2021 PMID: 34151000 PMCID: PMC8189024 DOI: 10.7717/peerj-cs.579
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Summary of related work.
| References | Methods and materials |
|---|---|
| Support vector machine, classification and regression trees | |
| Flexible neural tree | |
| Self-organizing map, feedforward neural network | |
| Hybrid bagging algorithm, feature selection | |
| Denoising autoencoder | |
| Heterogeneous integration model, bagging, stacking | |
| Information gain, GA Wrapper | |
| Media information, machine learning | |
| Variance ranking technique, ranked order similarity | |
| Ensemble models, feature engineering | |
| Clustering analysis | |
| Ensemble model | |
| Local distribution-based adaptive minority oversampling | |
| Bootstrap-lasso | |
| Hybrid PCA-GWO |
Figure 1Structural diagram of the proposed model.
Figure 2Example of boosting tree feature transformation.
Figure 3Auto-encoder based feature transformation.
Experiment data.
| Dataset | Number of samples | Imbalance rate | Number of features |
|---|---|---|---|
| Dataset A | 150,000 (139,974/10,026) | 1:13.96 | 59 |
| Dataset B | 30,000(23,364/3,636) | 1:6.42 | 25 |
Figure 4The method for constructing unbalanced dataset.
Experiment comparison.
| Dataset A | Dataset B | |||
|---|---|---|---|---|
| AUC | Logloss | AUC | Logloss | |
| SVM | 0.62823 | 0.22808 | 0.73049 | 0.45938 |
| GBDT | 0.83224 | 0.18776 | 0.77713 | 0.43253 |
| LR | 0.79268 | 0.22551 | 0.72056 | 0.46841 |
| XGB | 0.86443 | 0.18337 | 0.78052 | 0.52310 |
| GNB | 0.79449 | 0.49821 | 0.73850 | 1.01296 |
| RF | 0.83786 | 0.19450 | 0.75198 | 0.48226 |
| DNN | 0.83012 | 0.18844 | 0.76903 | 0.44012 |
| FM | 0.79245 | 0.20463 | 0.74665 | 0 .56924 |
| XGB+LR | 0.84422 | 0.19262 | 0.75486 | 0.43509 |
| SMOTE+XGB | 0.88312 | 0.19465 | 0.79413 | 0.40965 |
| RUS+XGB | 0.85471 | 0.26135 | 0.77458 | 0.44085 |
| DeepFM | 0.82884 | 0.18840 | 0.77562 | 0.43401 |
| FNN | 0.82847 | 0.18856 | 0.77271 | 0.43887 |
| DCN | 0.82749 | 0.18928 | 0.77463 | 0.43493 |
| AutoInt | 0.82798 | 0.18853 | 0.77514 | 0.43417 |
| FwFM | 0.82867 | 0.18861 | 0.77515 | 0.43893 |
| FiBiNET | 0.82629 | 0.18964 | 0.77604 | 0.4351 |
| ONN | 0.82802 | 0.19042 | 0.75828 | 0.45015 |
Figure 5(A-B) Feature importance comparison.
|
|
|
|