| Literature DB >> 33286317 |
Michał Gostkowski1, Krzysztof Gajowniczek2.
Abstract
Due to various regulations (e.g., the Basel III Accord), banks need to keep a specified amount of capital to reduce the impact of their insolvency. This equity can be calculated using, e.g., the Internal Rating Approach, enabling institutions to develop their own statistical models. In this regard, one of the most important parameters is the loss given default, whose correct estimation may lead to a healthier and riskless allocation of the capital. Unfortunately, since the loss given default distribution is a bimodal application of the modeling methods (e.g., ordinary least squares or regression trees), aiming at predicting the mean value is not enough. Bimodality means that a distribution has two modes and has a large proportion of observations with large distances from the middle of the distribution; therefore, to overcome this fact, more advanced methods are required. To this end, to model the entire loss given default distribution, in this article we present the weighted quantile Regression Forest algorithm, which is an ensemble technique. We evaluate our methodology over a dataset collected by one of the biggest Polish banks. Through our research, we show that weighted quantile Regression Forests outperform "single" state-of-the-art models in terms of their accuracy and the stability.Entities:
Keywords: bimodal distribution; loss given default; machine learning; weighted quantile regression forests
Year: 2020 PMID: 33286317 PMCID: PMC7517045 DOI: 10.3390/e22050545
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1The normal distribution example.
Figure 2The example of bimodal distribution.
Figure 3Histogram of the loss given default variable.
Loss given default in the dataset and basic characteristics.
| Characteristics | Min | P25 | P50 | Mean | P75 | Max | Skewness | Kurtosis |
|---|---|---|---|---|---|---|---|---|
|
| 0.0000 | 0.0000 | 0.3413 | 0.4570 | 1.0000 | 1.0000 | 0.1699 | −1.8048 |
Figure 4The cumulative distribution functions for the investigated parametric distributions.
Goodness-of-fit statistics for the investigated parametric distributions.
| Measure | Beta | Gamma | Log-Norm | Exponential | Cauchy | Weibull |
|---|---|---|---|---|---|---|
| Kolmogorov-Smirnov statistic | 0.25 | 0.26 | 0.27 | 0.30 | 0.34 | 0.27 |
| Cramer-von Mises statistic | 16.50 | 19.54 | 20.84 | 23.82 | 28.05 | 20.15 |
| Anderson-Darling statistic | 110.57 | 128.27 | 135.54 | 165.02 | 170.07 | 132.79 |
| Akaike’s Information Criterion | −504.92 | 493.61 | 539.62 | 521.19 | 2049.24 | 501.45 |
| Bayesian Information Criterion | −494.82 | 503.71 | 549.72 | 526.24 | 2059.34 | 511.55 |
Missing variables data ratio.
| Group | 0% Missing | 0–2% Missing | 2–5% Missing | 5–10% Missing | 10–16% Missing | 16–20% Missing | 20–30% Missing | 30–90% Missing |
|---|---|---|---|---|---|---|---|---|
|
| 142 | 57 | 27 | 5 | 16 | 17 | 14 | 14 |
Figure 5An average measure on test sample (10 folds).
Figure 6An average probability-probability (P-P) plot on test sample (10 folds).
Goodness-of-fit statistics on test sample (10-fold).
| Measure | OLS | QR | qRF | wqRF |
|---|---|---|---|---|
|
| 0.645 | 0.169 | 0.153 | 0.151 |
|
| 0.256 | 0.022 | 0.017 | 0.017 |