| Literature DB >> 24024194 |
Alireza Osareh1, Bita Shadgar.
Abstract
The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24024194 PMCID: PMC3759279 DOI: 10.1155/2013/478410
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411

The RotBoost pseudocode.
Summary of benchmark gene microarray datasets.
| Dataset | # Total genes ( | # Instances ( | # Classes ( |
|---|---|---|---|
| Colon tumor | 2000 | 62 | 2 |
| Central nervous system | 7129 | 60 | 2 |
| Leukaemia | 6817 | 72 | 2 |
| Breast cancer | 24481 | 97 | 2 |
| Ovarian cancer | 15154 | 253 | 2 |
| MLL | 12582 | 72 | 3 |
| SRBCT | 2308 | 83 | 4 |
| Lung cancer | 12533 | 181 | 5 |
Number of selected genes for each gene selection algorithm.
| Dataset | Initial gene numbers | FCBF | ReliefF | CFS | mRMR | GSNR |
|---|---|---|---|---|---|---|
| Colon | 2000 | 14 | 25 | 31 | 100 | 100 |
| CNS | 7129 | 28 | 28 | 40 | 356 | 356 |
| Leukaemia | 7129 | 51 | 104 | 50 | 356 | 356 |
| Breast | 24481 | 90 | 131 | 130 | 1224 | 1224 |
| Lung | 12553 | 100 | 432 | 299 | 628 | 628 |
| Ovarian | 15154 | 30 | 120 | 103 | 587 | 587 |
| MLL | 12582 | 97 | 295 | 327 | 629 | 629 |
| SRBCT | 2308 | 82 | 97 | 82 | 115 | 115 |
Figure 1Classification accuracy of decision tree classifier on selected genes of 8 datasets based on different feature selection algorithms.
Classification results obtained by RotBoost ensemble learning against typical 8 gene datasets in terms of PCA/ICA transformation methods.
| Dataset | ICA_based RootBoost | PCA_based RootBoost |
|---|---|---|
| Colon | 96.10 ± 0.59 | 95.48 ± 0.61• |
| CNS | 95.00 ± 0.28 | 94.80 ± 0.59• |
| Leukaemia | 98.77 ± 0.03 | 98.75 ± 0.31 |
| Breast | 97.88 ± 0.45 | 94.39 ± 0.49• |
| Lung | 99.54 ± 0.11 | 98.11 ± 0.17• |
| Ovarian | 99.40 ± 0.26 | 99.82 ± 0.08o |
| MLL | 99.31 ± 0.55 | 98.86 ± 0.23• |
| SRBCT | 99.59 ± 0.16 | 99.50 ± 0.31 |
| Win tie loss | 5/2/1 |
•Specifies that RotBoost is significantly better, and opoints out that RotBoost is notably worse at the significance level = 0.05.
Mean classification accuracy of each classification method against 8 different gene datasets.
| Dataset | ICA-based RotBoost | Single Tree | Rotation Forest | AdaBoost | Bagging | SVMs |
|---|---|---|---|---|---|---|
| Colon | 96.10 ± 0.59 | 93.80 ± 0.82• | 95.21 ± 0.43• | 94.97 ± 0.63• | 94.92 ± 0.50• | 96.13 ± 0.12 |
| CNS | 95.00 ± 0.28 | 89.92 ± 0.61• | 92.37 ± 0.83• | 95.09 ± 0.64 | 93.50 ± 0.79• | 93.34 ± 0.10• |
| Leukemia | 98.77 ± 0.03 | 96.60 ± 00.46• | 97.97 ± 0.38• | 98.22 ± 0.55• | 97.47 ± 0.51• | 95.64 ± 0.49• |
| Breast | 97.88 ± 0.45 | 88.50 ± 0.72• | 98.60 ± 0.63o | 98.89 ± 0.47o | 92.74 ± 0.45• | 96.84 ± 0.02• |
| Lung | 99.54 ± 0.11 | 94.36 ± 0.42• | 97.56 ± 0.23• | 96.30 ± 0.39• | 97.08 ± 0.37• | 95.56 ± 0.55• |
| Ovarian | 99.40 ± 0.26 | 99.37 ± 0.12 | 99.77 ± 0.07o | 99.57 ± 0.11 | 99.76 ± 0.08o | 98.66 ± 0.35• |
| MLL | 99.31 ± 0.55 | 96.03 ± 0.59• | 97.61 ± 0.31• | 97.63 ± 0.45• | 97.11 ± 0.55• | 96.80 ± 0.31• |
| SRBCT | 99.59 ± 0.16 | 93.96 ± 0.59• | 97.44 ± 0.41• | 98.16 ± 0.39• | 96.46 ± 0.58• | 97.23 ± 0.44• |
| Win Tie Loss | 7/1/0 | 6/0/2 | 5/2/1 | 7/0/1 | 7/1/0 |
•Specifies that RotBoost is significantly better, and opoints out that RotBoost is notably worse at the significance level = 0.05.
Kappa error diagram for Lung dataset (the centroids of ensembles).
| Ensemble method | Kappa | Error |
|---|---|---|
| AdaBoost | 0.22 | 0.30 |
| Bagging | 0.24 | 0.25 |
| Rotation Forest | 0.29 | 0.23 |
| RotBoost (PCA) | 0.58 | 0.09 |
| RotBoost (ICA) | 0.59 | 0.07 |
Figure 2Kappa error diagrams for the Lung dataset using different ensemble algorithms.