| Literature DB >> 19584932 |
Nisrine Jrad1, Edith Grall-Maës, Pierre Beauseroy.
Abstract
Supervised learning of microarray data is receiving much attention in recent years. Multiclass cancer diagnosis, based on selected gene profiles, are used as adjunct of clinical diagnosis. However, supervised diagnosis may hinder patient care, add expense or confound a result. To avoid this misleading, a multiclass cancer diagnosis with class-selective rejection is proposed. It rejects some patients from one, some, or all classes in order to ensure a higher reliability while reducing time and expense costs. Moreover, this classifier takes into account asymmetric penalties dependent on each class and on each wrong or partially correct decision. It is based on nu-1-SVM coupled with its regularization path and minimizes a general loss function defined in the class-selective rejection scheme. The state of art multiclass algorithms can be considered as a particular case of the proposed algorithm where the number of decisions is given by the classes and the loss function is defined by the Bayesian risk. Two experiments are carried out in the Bayesian and the class selective rejection frameworks. Five genes selected datasets are used to assess the performance of the proposed method. Results are discussed and accuracies are compared with those computed by the Naive Bayes, Nearest Neighbor, Linear Perceptron, Multilayer Perceptron, and Support Vector Machines classifiers.Entities:
Mesh:
Year: 2009 PMID: 19584932 PMCID: PMC2703706 DOI: 10.1155/2009/608701
Source DB: PubMed Journal: J Biomed Biotechnol ISSN: 1110-7243
Figure 1Training data mapped into the feature space on a portion 𝒮 of a hypersphere.
Algorithm 1Multiclass SVM minimizing an asymmetric loss function.
Multiclass gene expression datasets.
| Dataset | Leukemia72 | Ovarian | NCI | Lung cancer | Lymphoma |
|---|---|---|---|---|---|
| No. of gene | 6817 | 7129 | 9703 | 918 | 4026 |
| No. of sample | 72 | 39 | 60 | 73 | 96 |
| No. of class | 3 | 3 | 9 | 7 | 9 |
Loss function cost matrix in the Bayesian framework.
| Patient class | ||||||
| 1 | 2 | . | . | |||
| 1 | 0 | 1 | · | · | 1 | |
| 2 | 1 | 0 | 1 | · | ||
| Prediction | · | · | · | · | · | · |
| · | · | · | · | 1 | ||
| 1 | · | · | 1 | 0 | ||
Prediction errors of the proposed classifier, mean and median values of the 5 classifiers prediction errors according to [1] with 50 informative selected genes.
| Leukemia | Proposed algorithm | 4 | 3 | 5 | 5 | 3 | 2 |
| Mean | 3.4 | 2.4 | 2.8 | 2.8 | 3.2 | 3.0 | |
| Median | 3 | 2 | 3 | 3 | 3 | 3 | |
| Ovarian | Proposed algorithm | 0 | 0 | 0 | 0 | 0 | 0 |
| Mean | 0.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| Median | 0 | 0 | 0 | 0 | 0 | 0 | |
| NCI | Proposed algorithm | 31 | 26 | 27 | 27 | 27 | 33 |
| Mean | 36.0 | 32.0 | 27.4 | 26.0 | 27.0 | 35.4 | |
| Median | 35 | 29 | 27 | 27 | 27 | 35 | |
| Lung cancer | Proposed algorithm | 14 | 16 | 16 | 16 | 16 | 15 |
| Mean | 17.6 | 17.0 | 17.6 | 17.6 | 18.0 | 18.0 | |
| Median | 17 | 17 | 18 | 18 | 18 | 18 | |
| Lymphoma | Proposed algorithm | 18 | 16 | 9 | 10 | 9 | 15 |
| Mean | 23.8 | 19.8 | 14.0 | 14.0 | 12.8 | 22.0 | |
| Median | 23 | 19 | 12 | 12 | 13 | 20 |
Prediction errors of the proposed classifier, mean and median values of the 5 classifiers prediction errors according to [1] with 100 informative selected genes.
| Leukemia | Proposed algorithm | 5 | 2 | 3 | 3 | 4 | 6 |
| Mean | 3.4 | 3.0 | 3.0 | 3.0 | 3.2 | 3.0 | |
| Median | 3 | 3 | 4 | 3 | 3 | 3 | |
| Ovarian | Proposed algorithm | 0 | 0 | 0 | 0 | 0 | 0 |
| Mean | 0.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| Median | 0 | 0 | 0 | 0 | 0 | 0 | |
| NCI | Proposed algorithm | 33 | 21 | 26 | 25 | 26 | 36 |
| Mean | 33.0 | 22.6 | 23.8 | 25.2 | 25.2 | 31.6 | |
| Median | 33 | 22 | 25 | 26 | 26 | 31 | |
| Lung cancer | Proposed algorithm | 11 | 10 | 11 | 11 | 11 | 13 |
| Mean | 12.2 | 12.2 | 11.4 | 12.2 | 12.2 | 15.8 | |
| Median | 12 | 12 | 11 | 11 | 11 | 14 | |
| Lymphoma | Proposed algorithm | 16 | 16 | 11 | 10 | 11 | 17 |
| Mean | 21.8 | 19.2 | 13.0 | 13.8 | 14.4 | 18.2 | |
| Median | 17 | 16 | 12 | 12 | 12 | 18 |
Confusion matrix of 50 W* lung cancer dataset. Total of misclassified is equal to 16.
| Patient class | ||||||||
| Normal | SCLC | LCLC | SCC | AC2 | AC3 | AC1 | ||
| Predicted decision | Normal | 6 | 0 | 0 | 0 | 0 | 0 | 0 |
| SCLC | 0 | 4 | 0 | 0 | 0 | 1 | 0 | |
| LCLC | 0 | 0 | 3 | 0 | 0 | 4 | 1 | |
| SCC | 0 | 0 | 0 | 16 | 0 | 3 | 0 | |
| AC2 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | |
| AC3 | 0 | 1 | 1 | 0 | 1 | 4 | 0 | |
| AC1 | 0 | 0 | 1 | 0 | 2 | 1 | 20 | |
Confusion Matrix of 50 H lung cancer dataset. Total of misclassified is equal to 15.
| Patient class | ||||||||
| Normal | SCLC | LCLC | SCC | AC2 | AC3 | AC1 | ||
| Predicted decision | Normal | 5 | 0 | 0 | 0 | 0 | 0 | 0 |
| SCLC | 0 | 4 | 0 | 0 | 0 | 0 | 0 | |
| LCLC | 0 | 0 | 1 | 1 | 0 | 2 | 2 | |
| SCC | 0 | 0 | 2 | 14 | 0 | 1 | 0 | |
| AC2 | 0 | 0 | 0 | 0 | 7 | 0 | 0 | |
| AC3 | 0 | 0 | 2 | 1 | 0 | 8 | 0 | |
| AC1 | 1 | 1 | 0 | 0 | 0 | 2 | 19 | |
Asymmetric cost matrix of the loss function.
| Patient class | ||||||||
| Normal | SCLC | LCLC | SCC | AC2 | AC3 | AC1 | ||
| Normal | 0 | 1 | 1 | 1 | 1 | 1 | 1 | |
| SCLC | 1 | 0 | 1 | 1 | 1 | 1 | 1 | |
| LCLC | 1 | 1 | 0 | 0.9 | 0.9 | 1 | 1 | |
| SCC | 1 | 1 | 0.9 | 0 | 0.9 | 1 | 0.9 | |
| AC2 | 1 | 1 | 0.9 | 0.9 | 0 | 0.9 | 0.9 | |
| Predicted decision | AC3 | 1 | 1 | 0.9 | 0.9 | 0.9 | 0 | 0.9 |
| AC1 | 1 | 1 | 0.9 | 0.9 | 0.9 | 0.9 | 0 | |
| { | 1 | 1 | 0.6 | 0.6 | 0.9 | 0.2 | 0.9 | |
| All tumors | 1 | 0.2 | 0.6 | 0.6 | 0.2 | 0.2 | 0.5 | |
| All classes | 0.6 | 0.2 | 0.6 | 0.6 | 0.2 | 0.6 | 0.6 | |
Confusion matrix of the 50 W* lung cancer problem with class-selective rejection using cost matrix defined in Table 7. Total of misclassified is equal to 10, total of partially and totally rejected samples is equal to 8.
| Patient class | ||||||||
| Normal | SCLC | LCLC | SCC | AC2 | AC3 | AC1 | ||
| Normal | 6 | 0 | 0 | 0 | 0 | 0 | 0 | |
| SCLC | 0 | 3 | 0 | 0 | 0 | 0 | 0 | |
| LCLC | 0 | 0 | 3 | 0 | 0 | 4 | 0 | |
| SCC | 0 | 0 | 0 | 16 | 0 | 2 | 0 | |
| Predicted decision | AC2 | 0 | 0 | 0 | 0 | 4 | 0 | 0 |
| AC3 | 0 | 0 | 0 | 0 | 1 | 3 | 0 | |
| AC1 | 0 | 0 | 1 | 0 | 1 | 1 | 20 | |
| { | 0 | 0 | 1 | 0 | 0 | 2 | 0 | |
| All tumors | 0 | 2 | 0 | 0 | 1 | 1 | 1 | |
| All classes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Confusion matrix of the cascade classifier (50 W* with rejection and 50 H classifier). Total of misclassified is equal to 13.
| Patient class | ||||||||
| Normal | SCLC | LCLC | SCC | AC2 | AC3 | AC1 | ||
| Predicted decision | Normal | 6 | 0 | 0 | 0 | 0 | 0 | 0 |
| SCLC | 0 | 4 | 0 | 0 | 0 | 0 | 0 | |
| LCLC | 0 | 0 | 3 | 0 | 0 | 4 | 1 | |
| SCC | 0 | 0 | 0 | 16 | 0 | 2 | 0 | |
| AC2 | 0 | 0 | 0 | 0 | 5 | 0 | 0 | |
| AC3 | 0 | 1 | 1 | 0 | 1 | 6 | 0 | |
| AC1 | 0 | 0 | 1 | 0 | 1 | 1 | 20 | |