| Literature DB >> 21326602 |
Onur Dagliyan1, Fadime Uney-Yuksektepe, I Halil Kavakli, Metin Turkay.
Abstract
BACKGROUND: An important use of data obtained from microarray measurements is the classification of tumor types with respect to genes that are either up or down regulated in specific cancer types. A number of algorithms have been proposed to obtain such classifications. These algorithms usually require parameter optimization to obtain accurate results depending on the type of data. Additionally, it is highly critical to find an optimal set of markers among those up or down regulated genes that can be clinically utilized to build assays for the diagnosis or to follow progression of specific cancer types. In this paper, we employ a mixed integer programming based classification algorithm named hyper-box enclosure method (HBE) for the classification of some cancer types with a minimal set of predictor genes. This optimization based method which is a user friendly and efficient classifier may allow the clinicians to diagnose and follow progression of certain cancer types. METHODOLOGY/PRINCIPALEntities:
Mesh:
Year: 2011 PMID: 21326602 PMCID: PMC3033885 DOI: 10.1371/journal.pone.0014579
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Cancer data sets used in this study.
| Data set | Samples | Genes | Classes | Reference |
| Leukemia | 72 | 7129 | 2 | Golub |
| Prostate cancer | 102 | 12600 | 2 | Singh |
| Prostate outcome | 21 | 12600 | 2 | Singh |
| DLBCL | 77 | 7129 | 2 | Shipp |
| Lymphoma | 47 | 4026 | 2 | Alizadeh |
| SRBCT | 83 | 2308 | 4 | Khan |
Classification results of leukemia (AML-ALL) data set.
| Classifier | Test Set | 10-CV | LOOCV |
| HBE |
| 97.14±0.903 |
|
| BayesNet | 94.12 | 95.71 | 95.83 |
| LibSVM | 58.82 | 86.57±10.44 | 91.67 |
| SMO | 97.06 | 93.14±0.571 | 94.44 |
| Logistic Regression | 91.18 | 96.86±1.67 |
|
| RBF Network | 97.06 |
| 97.22 |
| IBk | 97.06 | 96.00±1.40 | 95.83 |
| J48 | 94.12 | 89.14±1.94 | 90.28 |
| Random Forest | 94.12 | 93.14±1.07 | 90.2 |
Selected genes which overlap with genes selected by other groups (Leukemia data set).
| Gene | Reference |
| Myeloperoxide |
|
| CD33 |
|
| TCF3 |
|
| Adipsin |
|
Classification results of prostate cancer data set.
| Classifier | 10-CV | LOOCV |
| HBE | 94.80±0.4 |
|
| BayesNet | 94.80±1.17 | 95.10 |
| LibSVM | 94.60±1.36 | 95.10 |
| SMO |
| 95.10 |
| Logistic Regression | 90.00±1.10 | 92.16 |
| RBF Network | 93.20±0.75 | 93.14 |
| IBk | 93.40±1.74 | 93.14 |
| J48 | 88.00±1.095 | 90.20 |
| Random Forest | 92.60±0.49 | 94.12 |
Classification results of prostate cancer outcome data set.
| Classifier | LOOCV |
| HBE |
|
| BayesNet |
|
| LibSVM | 61.90 |
| SMO | 57.14 |
| Logistic Regression | 47.62 |
| RBF Network |
|
| IBk | 80.95 |
| J48 | 85.71 |
| Random Forest | 90.48 |
Classification results of DLBCL.
| Classifier | 10-CV | LOOCV |
| HBE |
|
|
| BayesNet | 89.0±0.94 | 89.61 |
| LibSVM | 83.5±0.94 | 84.42 |
| SMO | 88.25±1.0 | 89.61 |
| Logistic Regression | 87.75±1.46 | 89.61 |
| RBF Network | 90.5±1.5 | 93.51 |
| IBk | 87.5±0.79 | 88.31 |
| J48 | 88.25±1 | 89.61 |
| Random Forest | 89.75±2.15 | 89.61 |
Classification results of Lymphoma.
| Classifier | 10-CV | LOOCV |
| HBE |
|
|
| BayesNet | 95.20±0.98 | 93.62 |
| LibSVM | 94.4±0.8 | 93.62 |
| SMO | 96.00 | 95.75 |
| Logistic Regression | 92.4±2.65 | 91.49 |
| RBF Network | 95.20±0.98 | 95.75 |
| IBk | 95.2±1.6 | 95.75 |
| J48 | 81.6±2.33 | 87.23 |
| Random Forest | 92.00±1.79 | 89.36 |
Classification results of SRBCT.
| Classifier | Test Set | 10-CV | LOOCV |
| HBE |
|
|
|
| BayesNet | 85 | 94.5±1.5 | 95.18 |
| LibSVM | 90 | 84.75±0.94 | 84.34 |
| SMO | 95 | 89.5±3.22 | 93,98 |
| Logistic Regression | 80 | 91.5±1.66 | 91.57 |
| RBF Network | 90 | 93.25±1.0 | 93.98 |
| IBk | 90 | 92.25±0.5 | 92.77 |
| J48 | 90 | 88.75±0.79 | 91.57 |
| Random Forest | 95 | 89.75±1.66 | 92.77 |
Selected genes which overlap with genes selected by other groups (SRBCT data set).
| Gene | Reference |
| FCGRT |
|
| Transmembrane protein |
|
| Fibroblast growth factor receptor |
|
| ESTs |
|
| Recoverin |
|
Figure 1The flowchart of the algorithm.
Figure 2The illustrative two dimensional classification problem.
a) The two-dimensional four-classes illustrative example. Each color represents one class. b) The determination of boundaries for corresponding classes for all samples. c) The determination of problematic samples. d) The identification of representative samples (seeds) from each class using pure IP. e) Construction of hyper-boxes for problematic samples using MILP. f) Construction of hyper-boxes for non-problematic samples.
Figure 3The final solution after the intersection elimination.