| Literature DB >> 27642292 |
Yanqiu Liu1, Huijuan Lu1, Ke Yan1, Haixia Xia2, Chunlin An1.
Abstract
Embedding cost-sensitive factors into the classifiers increases the classification stability and reduces the classification costs for classifying high-scale, redundant, and imbalanced datasets, such as the gene expression data. In this study, we extend our previous work, that is, Dissimilar ELM (D-ELM), by introducing misclassification costs into the classifier. We name the proposed algorithm as the cost-sensitive D-ELM (CS-D-ELM). Furthermore, we embed rejection cost into the CS-D-ELM to increase the classification stability of the proposed algorithm. Experimental results show that the rejection cost embedded CS-D-ELM algorithm effectively reduces the average and overall cost of the classification process, while the classification accuracy still remains competitive. The proposed method can be extended to classification problems of other redundant and imbalanced data.Entities:
Mesh:
Year: 2016 PMID: 27642292 PMCID: PMC5011754 DOI: 10.1155/2016/8056253
Source DB: PubMed Journal: Comput Intell Neurosci
Datasets.
| Datasets | Sample number | Feature number | Class distribution | |
|---|---|---|---|---|
| Class name | Sample number | |||
| Diabetes | 97 | 8 | Relapse | 46 |
| Nonrelapse | 51 | |||
| Heart | 270 | 13 | Negative | 150 |
| Positive | 120 | |||
| Colon | 62 | 52 | Negative | 19 |
| Positive | 43 | |||
| Mushroom | 263 | 43 | Negative | 111 |
| Positive | 152 | |||
| Protein | 334 | 73 | Negative | 215 |
| Positive | 119 | |||
| Leukemia | 72 | 7129 | ALL | 24 |
| MLL | 20 | |||
| AML | 28 | |||
Figure 1Average misclassification costs for Diabetes dataset.
Figure 2Average misclassification costs for Heart dataset.
Figure 3Average misclassification costs for Leukemia dataset.
Figure 4Comparison of average classification costs for Diabetes dataset.
Figure 5Comparison of average classification costs for Heart dataset.
Figure 6Comparison of average classification costs for Leukemia dataset.
Figure 7Relationship between the rejection threshold and the average misclassification costs.
Figure 8Comparison of embedding different cost-sensitive factors.
Running time comparison between ELM, D-ELM, CS-D-ELM, and rejection cost embedded CS-D-ELM.
| Dataset | Average running time for different algorithms (recorded in sec.) | |||
|---|---|---|---|---|
| ELM | D-ELM | CS-D-ELM | Rejection cost embedded CS-D-ELM | |
| Diabetes | 0.4312 | 1.4536 | 1.5330 | 1.5470 |
| Heart | 0.6670 | 1.8579 | 1.9353 | 1.9455 |
| Colon | 0.5183 | 1.6743 | 1.7458 | 1.7561 |
| Mushroom | 0.7214 | 1.8654 | 1.9583 | 1.9836 |
| Protein | 0.7551 | 2.0836 | 2.1349 | 2.2655 |
| Leukemia | 1.2319 | 3.9593 | 4.0346 | 4.1692 |
NC value and PC value of D-ELM, CS-D-ELM, and embedded rejection costs into S-D-ELM.
| Dataset | NC value | PC value | ||||
|---|---|---|---|---|---|---|
| D-ELM | CS-D-LM | Rejection CS-D-LM | D-ELM | CS-D-LM | Rejection CS-D-LM | |
| Leukemia | 0.3815 | 0.4714 | 0.5874 | 0.9561 | 0.8626 | 0.8215 |
| Colon | 0.4132 | 0.5127 | 0.6322 | 0.9722 | 0.9152 | 0.8464 |
| Mushroom | 0.4325 | 0.5423 | 0.6929 | 1.0000 | 0.9605 | 0.9313 |
| Protein | 0.3237 | 0.4621 | 0.5433 | 0.9433 | 0.8956 | 0.7751 |
Experiment results of G-means value and average misclassification costs.
| Dataset | G-means value | Average misclassification costs | ||||
|---|---|---|---|---|---|---|
| D-ELM | CS-D-LM | Rejection CS-D-LM | D-ELM | CS-D-LM | Rejection CS-D-LM | |
| Leukemia | 0.6416 | 0.6508 | 0.6714 | 0.4271 | 0.3522 | 0.2543 |
| Colon | 0.6852 | 0.7203 | 0.7313 | 0.3814 | 0.2232 | 0.2123 |
| Mushroom | 0.7125 | 0.7877 | 0.7922 | 0.1102 | 0.0755 | 0.0411 |
| Protein | 0.5333 | 0.7122 | 0.7328 | 0.4843 | 0.3812 | 0.2043 |