| Literature DB >> 31874599 |
Huijuan Lu1, Yige Xu1, Minchao Ye2, Ke Yan1, Zhigang Gao3, Qun Jin4.
Abstract
BACKGROUND: Cost-sensitive algorithm is an effective strategy to solve imbalanced classification problem. However, the misclassification costs are usually determined empirically based on user expertise, which leads to unstable performance of cost-sensitive classification. Therefore, an efficient and accurate method is needed to calculate the optimal cost weights.Entities:
Keywords: Cost-sensitive; Misclassification cost; Parameter fitting; Weighted classification accuracy
Mesh:
Year: 2019 PMID: 31874599 PMCID: PMC6929277 DOI: 10.1186/s12859-019-3255-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The confusion matrix for binary classification
| Prediction of Positive | Prediction of Negative | |
|---|---|---|
| Positive samples | True Positive TP | False Negative FN |
| Negative samples | False Positive FP | True Negative TN |
Cost matrix
| Predicted Actual | ||
|---|---|---|
Specifications of datasets
| Dataset | Sample number | Feature dimension | Classification number |
|---|---|---|---|
| Leukemia | 34 | 7130 | 2 |
| Colon | 62 | 2000 | 2 |
| Prostate | 136 | 12600 | 2 |
| Lung | 181 | 12533 | 2 |
| Ovarian | 253 | 15154 | 2 |
Grid Searching Strategy
| Grid Searching Strategy | |
|---|---|
| 1: procedure GRIDSEARCHING( | |
| 2: | |
| 3: | |
| 4: if | |
| 5: | |
| 6: if | |
| 7: | |
| 8: | |
| 9: end if | |
| 10: end if | |
| 11: return | |
| 12: end procedure |
Datasets, cost weights and WCAs with the two approaches proposed
| Dataset | Cost weight | WCA | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| type | optimal | optimal | |||||||||
| ovarian | 1.68 | 0.1 | 1.65 | 1.63 | 1.53 | 1.58 | 0.9055 | 0.9695 | 0.1966 | 0.2084 | 0.1017 |
| Prostate | 2.5 | 0.9 | 1.04 | 1.05 | 1.05 | 1 | 0.9372 | 0.9815 | 0.9509 | 0.9869 | 0.8985 |
| Lung1 | 5 | 0.1 | 4.26 | 4.03 | 4.1 | 3.94 | 0.9078 | 0.9778 | 0.9786 | 0.9779 | 0.875 |
| Lung2 | 8 | 0.9 | 0.92 | 0.9 | 0.66 | 0.61 | 0.9009 | 0.9564 | 0.9762 | 0.9675 | 0.9 |
Fig. 1The values of function w compared with the optimal weights
Fig. 2The values of function w compared with the optimal weights
Fig. 3The values of function w compared with the optimal weights
Fig. 4Cost weight comparison using Ovarian, Prostate, Lung1, Lung2 dataset (p = 1.68, 2.5, 5, 8)
Fig. 5Cost weight comparison in overall
Fig. 6WCA comparison with Ovarian, Prostate, Lung1, Lung2 dataset (p = 1.68, 2.5, 5, 8)
Fig. 7The WCA comparison in 3-dimension
Optimal weights for different data set
| Data set | Sample categorical | Influence factors | Optimal weights | WCA | |
|---|---|---|---|---|---|
| distribution | |||||
| Colon | 1 | 0.2 | 0.8 | 1.03 | 0.6167 |
| Leukemia | 1.33 | 0.9 | 0.1 | 0.9 | 0.9179 |
| Ovarian1 | 1.68 | 0.9 | 0.1 | 1.65 | 0.9055 |
| Prostate1 | 2 | 0.9 | 0.1 | 1.06 | 0.939 |
| Prostate2 | 2.5 | 0.9 | 0.1 | 1.04 | 0.9372 |
| Lung1 | 3 | 0.9 | 0.1 | 0.93 | 0.92 |
| Ovarian2 | 4 | 0.1 | 0.9 | 3.45 | 0.9094 |
| Lung2 | 5 | 0.1 | 0.9 | 4.26 | 0.9078 |
| Ovarian3 | 6.5 | 0.9 | 0.1 | 0.8 | 0.9075 |
| Lung3 | 8 | 0.9 | 0.1 | 0.92 | 0.9009 |