| Literature DB >> 25180211 |
Feng Hu1, Xiao Liu1, Jin Dai1, Hong Yu1.
Abstract
The classification problem for imbalance data is paid more attention to. So far, many significant methods are proposed and applied to many fields. But more efficient methods are needed still. Hypergraph may not be powerful enough to deal with the data in boundary region, although it is an efficient tool to knowledge discovery. In this paper, the neighborhood hypergraph is presented, combining rough set theory and hypergraph. After that, a novel classification algorithm for imbalance data based on neighborhood hypergraph is developed, which is composed of three steps: initialization of hyperedge, classification of training data set, and substitution of hyperedge. After conducting an experiment of 10-fold cross validation on 18 data sets, the proposed algorithm has higher average accuracy than others.Entities:
Mesh:
Year: 2014 PMID: 25180211 PMCID: PMC4144305 DOI: 10.1155/2014/876875
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1An example of hypergraph.
Figure 2An example of neighborhood hypergraph.
Figure 3An example of upper, lower approximation, and boundary region.
Figure 4The flow chart of algorithm.
Figure 5Attribution inheritance.
Figure 6An example of situation 1.
Figure 7Hyperedges whose confidence degree is under λ.
Algorithm 1Neighbor hypergraph (N-HyperGraph).
Data description.
| Dataset | Size | Attribute | Class label (minority : majority) | Class distribution |
|---|---|---|---|---|
| Bupa | 345 | 6C | 01 : 02 | 145/200 |
| Colic | 368 | 7C 15N | No : yes | 136/232 |
| Reprocessed | 294 | 13C | 01 : 00 | 106/188 |
| Machine | 209 | 7C | Others : 2 | 74/135 |
| Labor | 57 | 8C 8N | Bad : good | 20/37 |
| Tic | 958 | 9N | Negative : positive | 332/626 |
| Iris | 150 | 4C | Iris-virginica : others | 50/100 |
| Seed | 210 | 7C | 02 : others | 70/140 |
| Vc | 310 | 6C | Normal : Abnormal | 100/210 |
| Glass | 214 | 9C | 01, 02 : others | 68/146 |
| Haberman | 306 | 3C | 02 : 01 | 81/225 |
| Transfusion | 748 | 4C | 01 : 00 | 178/570 |
| Abalone (7 : 15) | 494 | 7C 1N | 15 : 07 | 103/391 |
| Balance-scale | 625 | 4C | B : others | 49/576 |
| Abalone (9 : 18) | 731 | 7C 1N | 18 : 9 | 42/689 |
| Yeast (POX : CTY) | 483 | 8C | POX : CYT | 20/463 |
| Car | 1728 | 6N | Good : others | 69/1659 |
| Yeast (ME2 : others) | 1484 | 8C | ME2 : others | 51/1433 |
C: continuous, N: nominal.
Confusion matrix.
| Predict to positive | Predict to negative | |
|---|---|---|
| Positive | TP | FN |
| Negative | FP | TN |
Precision.
| Dataset | SVM | CS-EN-HN | SMOTE + C4.5 | SMOTE-RSB∗ + C4.5 | NRSBoundary-SMOTE + C4.5 | N-HyperGraph |
|---|---|---|---|---|---|---|
| Bupa |
| 0.6827 | 0.5663 | 0.6581 | 0.5614 | 0.5497 |
| Colic | 0.6857 | 0.5887 | 0.7391 | 0.8103 | 0.7687 |
|
| Reprocessed | 0.0000 | 0.6344 | 0.7030 |
| 0.6979 | 0.5558 |
| Machine | 0.0000 | 0.7542 | 0.8250 | 0.8873 |
| 0.9025 |
| Labor | 0.9411 |
| 0.6667 | 0.6667 | 0.8421 | 0.7733 |
| Tic | 0.9908 | 0.6534 | 0.6882 | 0.8044 | 0.8100 |
|
| Iris |
| 0.5061 | 0.9057 | 0.8703 | 0.8888 | 0.8310 |
| Seed | 0.9295 |
| 0.9577 | 0.9577 | 0.9577 | 0.8614 |
| Vc | 0.0000 | 0.6133 | 0.6696 |
| 0.6695 | 0.4842 |
| Glass | 0.8775 |
| 0.6933 | 0.8214 | 0.7428 | 0.8652 |
| Haberman | 0.4999 | 0.2334 | 0.4516 | 0.4590 | 0.4948 |
|
| Transfusion | 0.4186 | 0.3139 | 0.4722 | 0.5299 | 0.5000 |
|
| Abalone (7 : 15) | 0.7951 | 0.7552 | 0.8056 | 0.8155 | 0.8100 |
|
| Balance-scale | 0.0000 | 0.1605 | 0.0000 | 0.0000 | 0.0000 |
|
| Abalone (9 : 18) | 0.0000 | 0.3910 | 0.4167 | 0.4347 | 0.5000 |
|
| Yeast (POX : CTY) |
| 0.6371 | 0.6000 | 0.9268 | 0.7736 | 0.7000 |
| Car | 0.5000 | 0.5100 | 0.6849 | 0.6849 |
| 0.6731 |
| Yeast (ME2 : others) | 0.0000 | 0.1272 | 0.3214 |
| 0.4200 | 0.3265 |
|
| ||||||
| Average | 0.5233 | 0.5853 | 0.6204 | 0.6816 | 0.6682 |
|
w = 0.001 to 0.6.
AUC.
| Dataset | SVM | CS-EN-HN | SMOTE + C4.5 | SMOTE-RSB∗ + C4.5 | NRSBoundary-SMOTE + C4.5 | N-HyperGraph |
|---|---|---|---|---|---|---|
| Bupa | 0.5181 |
| 0.6468 | 0.6652 | 0.6401 | 0.6427 |
| Colic | 0.5645 | 0.8265 | 0.7960 | 0.7855 | 0.8102 |
|
| Reprocessed | 0.5000 | 0.6750 |
| 0.7565 | 0.7817 | 0.7662 |
| Machine | 0.5000 | 0.9201 | 0.9199 | 0.9359 | 0.9430 |
|
| Labor | 0.8864 | 0.7500 | 0.7500 | 0.7655 | 0.8243 |
|
| Tic | 0.8252 |
| 0.8638 | 0.8941 | 0.8848 | 0.9959 |
| Iris |
| 0.7500 | 0.9408 | 0.9197 | 0.9468 | 0.9450 |
| Seed | 0.9535 | 0.9780 | 0.9730 |
|
| 0.9535 |
| Vc | 0.5000 |
| 0.8107 | 0.8380 | 0.8277 | 0.7381 |
| Glass | 0.7956 |
| 0.8442 | 0.8640 | 0.8637 | 0.9400 |
| Haberman | 0.5079 | 0.5866 | 0.6255 | 0.6174 | 0.6636 |
|
| Transfusion | 0.5286 | 0.8558 | 0.6813 | 0.7007 | 0.7048 |
|
| Abalone (7 : 15) | 0.7986 |
| 0.8901 | 0.9046 | 0.8627 | 0.9910 |
| Balance-scale | 0.5000 | 0.6256 | 0.5000 | 0.5000 | 0.5000 |
|
| Abalone (9 : 18) | 0.6611 | 0.9333 | 0.7294 | 0.6514 | 0.7065 |
|
| Yeast (POX : CTY) | 0.5000 | 0.8999 | 0.6881 | 0.8685 | 0.8762 |
|
| Car | 0.5069 | 0.9792 | 0.9775 | 0.9775 |
| 0.9888 |
| Yeast (ME2 : others) | 0.5000 | 0.6790 | 0.7894 | 0.7412 | 0.6916 |
|
| Average | 0.6398 | 0.8509 | 0.7898 | 0.7980 | 0.8054 |
|
w = 0.001 to 0.6.
Figure 8The average value of each indicator.
Recall.
| Dataset | SVM | CS-EN-HN | SMOTE + C4.5 | SMOTE-RSB∗ + C4.5 | NRSBoundary-SMOTE + C4.5 | N-HyperGraph |
|---|---|---|---|---|---|---|
| Bupa | 0.0413 | 0.7856 | 0.6483 | 0.5310 | 0.6621 |
|
| Colic | 0.1764 | 0.6531 | 0.7500 | 0.6912 | 0.7574 |
|
| Reprocessed | 0.0000 | 0.5999 | 0.6698 | 0.6509 | 0.6320 |
|
| Machine | 0.0000 | 0.8404 | 8919 | 0.8514 | 0.8919 |
|
| Labor | 0.8000 | 0.5000 | 0.5000 | 0.7000 | 0.8000 |
|
| Tic | 0.6536 |
| 0.7711 | 0.7680 | 0.7319 |
|
| Iris | 0.9800 | 0.5000 | 0.9600 | 0.9400 | 0.9600 |
|
| Seed | 0.9428 | 0.9428 | 0.9714 | 0.9714 | 0.9714 |
|
| Vc | 0.0000 | 0.9500 | 0.7700 | 0.6400 | 0.7900 |
|
| Glass | 0.6323 |
| 0.7647 | 0.6764 | 0.7647 | 0.9000 |
| Haberman | 0.0246 | 0.7732 | 0.5185 | 0.3457 | 0.5926 |
|
| Transfusion | 0.1011 | 0.9117 | 0.4775 | 0.3989 | 0.5000 |
|
| Abalone (7 : 15) | 0.6407 |
| 0.8447 | 0.8155 | 0.7864 |
|
| Balance-scale | 0.0000 | 0.5500 | 0.0000 | 0.0000 | 0.0000 |
|
| Abalone (9 : 18) | 0.0000 | 0.8666 | 0.3571 | 0.4347 | 0.3809 |
|
| Yeast (POX : CTY) | 0.1372 | 0.8000 | 0.4500 | 0.4751 | 0.8039 |
|
| Car | 0.0145 |
| 0.7246 | 0.7246 | 0.6956 |
|
| Yeast (ME2 : others) | 0.0000 | 0.8700 | 0.3529 | 0.3137 | 0.4118 |
|
|
| ||||||
| Average | 0.2858 | 0.8068 | 0.6346 | 0.6071 | 0.6740 |
|
w = 0.001 to 0.6.
F-value.
| Dataset | SVM | CS-EN-HN | SMOTE + C4.5 | SMOTE-RSB∗ + C4.5 | NRSBoundary-SMOTE + C4.5 | N-HyperGraph |
|---|---|---|---|---|---|---|
| Bupa | 0.0789 |
| 0.6045 | 0.5878 | 0.6076 | 0.6422 |
| Colic | 0.2807 | 0.5688 | 0.7445 | 0.7460 | 0.7630 |
|
| Reprocessed | 0.0000 | 0.5374 | 0.6698 | 0.6831 | 0.6633 |
|
| Machine | 0.0000 | 0.8404 | 0.8571 | 0.8690 | 0.8980 |
|
| Labor |
| 0.6666 | 0.5714 | 0.6829 | 0.8205 | 0.8171 |
| Tic | 0.7876 | 0.7900 | 0.7273 | 0.7858 | 0.7689 |
|
| Iris |
| 0.4835 | 0.9320 | 0.9038 | 0.9230 | 0.9045 |
| Seed | 0.9361 | 0.9545 |
|
|
| 0.9209 |
| Vc | 0.0000 |
| 0.7163 | 0.6919 | 0.7248 | 0.6501 |
| Glass | 0.7350 |
| 0.7273 | 0.7419 | 0.7536 | 0.8808 |
| Haberman | 0.0470 | 0.3475 | 0.4828 | 0.3944 | 0.5393 |
|
| Transfusion | 0.1628 | 0.4582 | 0.4749 | 0.4551 | 0.5000 |
|
| Abalone (7 : 15) | 0.7096 | 0.8595 | 0.8246 | 0.8155 | 0.7980 |
|
| Balance-scale | 0.0000 | 0.2326 | 0.0000 | 0.0000 | 0.0000 |
|
| Abalone (9 : 18) | 0.0000 | 0.5071 | 0.3846 | 0.3076 | 0.4324 |
|
| Yeast (POX : CTY) | 0.2413 | 0.6493 | 0.5143 |
| 0.7885 | 0.8066 |
| Car | 0.0282 | 0.6728 | 0.7042 | 0.7042 | 0.6906 |
|
| Yeast (ME2 : others) | 0.0000 | 0.1924 | 0.3364 | 0.3765 | 0.4158 |
|
|
| ||||||
| Average | 0.3235 | 0.6191 | 0.6252 | 0.6409 | 0.6695 |
|
w = 0.001 to 0.6.
G-means.
| Dataset | SVM | CS-EN-HN | SMOTE + C4.5 | SMOTE-RSB∗ + C4.5 | NRSBoundary-SMOTE + C4.5 | N-HyperGraph |
|---|---|---|---|---|---|---|
| Bupa | 0.2026 |
| 0.6441 | 0.6518 | 0.6433 | 0.5592 |
| Colic | 0.4008 | 0.5828 | 0.7740 | 0.7910 | 0.8100 |
|
| Reprocessed | 0.0000 | 0.5973 |
| 0.7466 | 0.7311 | 0.7247 |
| Machine | 0.0000 | 0.8371 | 0.8941 | 0.8949 | 0.9196 |
|
| Labor | 0.8000 | 0.7071 | 0.6576 | 0.7534 |
| 0.8232 |
| Tic | 0.8015 | 0.8466 | 0.7926 | 0.8318 | 0.8156 |
|
| Iris | 0.4427 | 0.5112 |
| 0.9349 | 0.9499 | 0.9427 |
| Seed | 0.6473 | 0.9623 |
|
|
| 0.9511 |
| Vc | 0.0000 | 0.7729 | 0.7941 | 0.7589 |
| 0.6831 |
| Glass | 0.7141 |
| 0.8026 | 0.7938 | 0.8187 | 0.8897 |
| Haberman | 0.2957 | 0.4884 | 0.6332 | 0.5431 | 0.6808 |
|
| Transfusion | 0.1628 | 0.4582 | 0.6308 | 0.5956 | 0.6496 |
|
| Abalone (7 : 15) | 0.6626 | 0.9565 | 0.8940 | 0.8808 | 0.8649 |
|
| Balance-scale | 0.0000 | 0.6016 | 0.0000 | 0.0000 | 0.0000 |
|
| Abalone (9 : 18) | 0.0000 | 0.8521 | 0.5884 | 0.4833 | 0.6100 |
|
| Yeast (POX : CTY) | 0.3704 | 0.7088 | 0.6665 | 0.8552 | 0.8630 |
|
| Car | 0.1195 | 0.9790 | 0.8453 | 0.8453 | 0.8285 |
|
| Yeast (ME2 : others) | 0.0000 | 0.5086 | 0.5861 | 0.5566 | 0.6352 |
|
|
| ||||||
| Average | 0.3118 | 0.7113 | 0.7158 | 0.7162 | 0.7475 |
|
w = 0.001 to 0.6.