| Literature DB >> 24778600 |
Itziar Irigoien1, Basilio Sierra1, Concepción Arenas2.
Abstract
In the problem of one-class classification (OCC) one of the classes, the target class, has to be distinguished from all other possible objects, considered as nontargets. In many biomedical problems this situation arises, for example, in diagnosis, image based tumor recognition or analysis of electrocardiogram data. In this paper an approach to OCC based on a typicality test is experimentally compared with reference state-of-the-art OCC techniques--Gaussian, mixture of Gaussians, naive Parzen, Parzen, and support vector data description-using biomedical data sets. We evaluate the ability of the procedures using twelve experimental data sets with not necessarily continuous data. As there are few benchmark data sets for one-class classification, all data sets considered in the evaluation have multiple classes. Each class in turn is considered as the target class and the units in the other classes are considered as new units to be classified. The results of the comparison show the good performance of the typicality approach, which is available for high dimensional data; it is worth mentioning that it can be used for any kind of data (continuous, discrete, or nominal), whereas state-of-the-art approaches application is not straightforward when nominal variables are present.Entities:
Mesh:
Year: 2014 PMID: 24778600 PMCID: PMC3980920 DOI: 10.1155/2014/730712
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Description of ten UCI data sets used in the experiments.
| Data sets | Classes | Instances | Features |
|---|---|---|---|
| Breast Wisconsin original | 2 | 241/458 | 9 |
| Breast Wisconsin prognostic | 2 | 47/151 | 33 |
| Colon | 2 | 40/22 | 1908 |
|
| 2 | 52/284 | 8 |
| Hepatitis | 2 | 123/32 | 19 |
| Leukemia | 2 | 47/25 | 3571 |
| Liver disorders | 2 | 145/200 | 6 |
| METAS | 2 | 46/99 | 4919 |
| SPECT heart | 2 | 95/254 | 44 |
| Thyroid | 3 | 93/191/3488 | 21 |
AUC average and standard deviation, in brackets, values on UCI data sets. In the last column the distance used by the typicality method is indicated: c: correlation, E: Euclidean, E-st: Euclidean after standardization, and M: Mahalanobis.
| Data sets | Target class | Gaussian | Mixture Gaussians | Naive Parzen | Parzen | Support vector DD | Typicality distance |
|---|---|---|---|---|---|---|---|
| Breast Wisconsin original | Benign | 98.5 (0.1) | 98.3 (0.2) | 98.7 (0.1) | 99.2 (0.1) | 99.0 (0.1) | 99.4 (0.2)—E |
| Malignant | 82.3 (0.2) | 69.1 (3.2) | 96.5 (0.4) | 72.3 (0.5) | 66.1 (0.8) | 97.6 (0.5)—E | |
| Breast Wisconsin prognostic | Returning | 63.0 (1.4) | 59.1 (1.6) | 59.0 (1.9) | 59.4 (1.9) | 59.6 (1.4) | 58.5 (5.4)—M |
| Nonreturning | 50.8 (0.8) | 52.6 (1.6) | 53.8 (2.2) | 52.2 (1.7) | 51.7 (1.7) | 55.6 (2.9)—M | |
| Colon | 1 | 61.1 (3.8) | NaN | 73.4 (3.1) | 63.6 (22.4) | 63.6 (22.4) | 75.4 (6.3)—c |
| 2 | 70.4 (1.1) | NaN | 70.0 (1.5) | 36.4 (22.4) | 36.4 (22.4) | 78.3 (5.8)—c | |
|
| Periplasm | 92.9 (0.3) | 92.0 (0.4) | 93.0 (0.8) | 92.2 (0.4) | 89.4 (0.8) | 95.4 (1.3)—E |
| Hepatitis | Normal | 82.1 (1.0) | 78.3 (1.0) | 80.1 (0.7) | 79.0 (1.0) | 78.7 (1.1) | 80.8 (2.2)—M |
| Leukemia | 1 | 92.1 (1.8) | NaN | 90.2 (4.4) | NaN | 58.9 (30.2) | 91.2 (3.4)—c |
| 2 | 94.7 (2.7) | NaN | 96.7 (0.4) | NaN | 41.1 (30.2) | 90.6 (3.9)—c | |
| Liver disorders | Class 1 | 58.5 (0.4) | 59.3 (0.7) | 61.4 (0.7) | 58.7 (0.4) | 59.0 (0.9) | 58.1 (2.5)—M |
| Class 2 | 50.9 (0.5) | 49.4 (0.6) | 48.4 (0.8) | 46.9 (0.8) | 49.6 (1.0) | 58.0 (3.7)—M | |
| METAS | 1 | 69.1 (1.5) | NaN | 65.3 (0.8) | 64.8 (21.5) | 64.8 (21.5) | 67.3 (2.3)—c |
| 2 | 36.4 (1.4) | NaN | 40.7 (1.2) | 35.2 (21.5) | 35.2 (21.5) | 64.5 (4.7)—c | |
| SPECT heart | Class 0 | 93.4 (0.9) | 95.1 (0.8) | 90.7 (1.5) | 95.7 (1.0) | 89.7 (3.2) | 86.1 (3.8)—M |
| Class 1 | 28.4 (0.5) | 27.9 (1.3) | 26.0 (0.7) | 44.5 (0.5) | 57.1 (11.1) | 69.8 (2.5)—M | |
| Thyroid | Normal | 84.3 (0.0) | 84.7 (4.4) | 96.1 (0.0) | 90.6 (0.0) | 56.0 (0.0) | 98.1 (1.2)—c |
| Hyperthyroid | 70.3 (0.0) | 68.1 (0.9) | 75.1 (0.0) | 70.6 (0.0) | 45.7 (0.0) | 65.9 (2.5)—c | |
| Subnormal | 69.6 (0.0) | 81.5 (1.0) | 84.4 (0.0) | 87.4 (0.0) | 50.3 (0.0) | 88.0 (2.8)—c |
Figure 1Boxplots of AUC average values of the OCC methods over the experimental data sets.
For Statlog (heart) data set and using the typicality approach, false and true positive, and negative values, for a fixed False Alarm Rate equal to 0.1.
| Target class | Classified as | Tested classes | |
|---|---|---|---|
| Absence | Presence | ||
| Absence | Target | 135/150 | 41/120 |
| Nontarget | 15/150 | 79/120 | |
|
| |||
| Target class | Classified as | Tested classes | |
| Presence | Absence | ||
|
| |||
| Presence | Target | 110/120 | 68/150 |
| Nontarget | 10/120 | 82/150 | |
For Liver cancer data set and using the typicality approach, false and true positive, and negative values, for a fixed False Alarm Rate equal to 0.1.
| Target class | Classified as | Tested classes | ||
|---|---|---|---|---|
| N | NT | T | ||
| N | Target | 29/30 | 56/76 | 16/107 |
| Nontarget | 1/30 | 20/76 | 91/107 | |
|
| ||||
| Target class | Classified as | Tested classes | ||
| NT | N | T | ||
|
| ||||
| NT | Target | 72/76 | 28/30 | 48/107 |
| Nontarget | 4/76 | 2/30 | 59/107 | |
|
| ||||
| Target class | Classified as | Tested classes | ||
| T | N | NT | ||
|
| ||||
| T | Target | 102/105 | 30/30 | 75/76 |
| Nontarget | 3/105 | 0/30 | 1/76 | |