| Literature DB >> 25961028 |
Hala Alshamlan1, Ghada Badr2, Yousef Alohali1.
Abstract
An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.Entities:
Mesh:
Year: 2015 PMID: 25961028 PMCID: PMC4414228 DOI: 10.1155/2015/604910
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1The representation of solution space (foods) for the ABC algorithm (when applied on microarray dataset). SN is the number of food sources, which represent the solutions containing indices of genes in a microarray gene expression profile, and D represents the number of informative genes to be optimized for each solution. Each cell represents different genes indices.
Figure 2mRMR dataset that contains the gene number that is selected by the mRMR filter approach, where gene numbers are ordered by their relevancy.
Comprehensive comparison between the state-of-the-art machine learning cancer classification methods in terms of classification accuracy and number of selected genes for the four benchmark microarray datasets (colon, leukemia, lung, and prostate). The number between parenthesis denotes the number of selected genes. The best classification performance in each gene selection approach for each microarray dataset is indicated in bold.
| Cancer classification methods | Colon | Leukemia | Lung | Prostate |
|---|---|---|---|---|
| ANN [ | 93.43 (40) | 97.33 (10) | ||
| NB [ | 88.79 (8) | 100 (8) | 98.04 (8) | |
| KNN [ | 77.42 (12) | 100 (12) | 97.06 (12) | |
| KNN [ | 100 (9) | |||
| KNN [ | 100 (9) | |||
| RF [ | 84.4 (14) | 91.3 (2) | 93.9 (18) | |
| SVM [ | 94 (20) | |||
| SVM [ | 88.41 (25) | 99.63 (5) | 90.26 (4) | |
| SVM [ | 88.18 (95) | 96.88 (88) | 99.90 (29) | 93.41 (85) |
| SVM [ | 91.68 (78) | 98.35 (37) |
| |
| SVM [ |
|
| ||
| SVM [ | 91.67 (4) | |||
| SVM [ | 100 (6) | 98.66 (2) | ||
| SVM [ | 98.57 (7) | |||
| SVM [ | 95 (5) |
|
| |
| SVM [ | 99.41 (10) | 100 (25) |
Figure 3The representation of food for the proposed mRMR-ABC algorithm. Each row of food matrix represents a particular solution, which contains D genes indices that are to be optimized. The number of rows of food matrix equals the food number “SN”.
Figure 4The phases of the mRMR-ABC algorithm.
Statistics of microarray cancer datasets.
| Microarray datasets | Number of classes | Number of samples | Number of genes | Description |
|---|---|---|---|---|
| Colon [ | 2 | 62 | 2000 | 40 cancer samples and 22 normal samples |
| Leukemia1 [ | 2 | 72 | 7129 | 25 AML samples and 47 ALL samples |
| Lung [ | 2 | 96 | 7129 | 86 cancer samples and 10 normal samples |
| SRBCT [ | 4 | 83 | 2308 | 29 EWS samples, 18 NB samples, 11 BL samples, and 25 RMS samples |
| Lymphoma [ | 3 | 62 | 4026 | 42 DLBCL samples, 9 FL samples, and 11 B-CLL samples |
| Leukemia2 [ | 3 | 72 | 7129 | 28 AML sample, 24 ALL sample, and 20 MLL samples |
mRMR-ABC control parameters.
| Parameter | Value |
|---|---|
| Colony size | 80 |
| Max cycle | 100 |
| Number of runs | 30 |
| Limit | 5 |
The classification accuracy performance of the mRMR method with an SVM classifier for all microarray datasets.
| Number of genes | Colon | Leukemia1 | Lung | SRBCT | Lymphoma | Leukemia2 |
|---|---|---|---|---|---|---|
| 50 | 91.94% | 91.66% | 89.56% | 62.65% | 93.93% | 77.77% |
| 100 | 93.55% | 97.22% | 95.83% | 91.44% | 98.48% | 86.11% |
| 150 | 95.16% | 100% | 98.95% | 96.39% | 100% | 95.83% |
| 200 | 96.77% | 100% | 100% | 97.59% | 100% | 98.61% |
| 250 | 98.38% | 100% | 100% | 100% | 100% | 100% |
| 300 | 98.38% | 100% | 100% | 100% | 100% | 100% |
| 350 | 100% | 100% | 100% | 100% | 100% | 100% |
| 400 | 100% | 100% | 100% | 100% | 100% | 100% |
Figure 8The classification accuracy performance of the mRMR method with an SVM classifier for all microarray datasets.
Comparison between mRMR-ABC and ABC algorithms classification performance when applied with the SVM classifier for colon dataset.
| Number of genes | Classification accuracy | |||||
|---|---|---|---|---|---|---|
| mRMR-ABC | ABC | |||||
| Best | Mean | Worst | Best | Mean | Worst | |
| 3 | 88.71% | 87.50% | 85.48% | 87.10% | 85.91% | 83.87% |
| 4 | 90.23% | 88.27% | 87.10% | 87.10% | 86.71% | 85.48% |
| 5 | 91.94% | 89.50% | 87.10% | 90.32% | 87.98% | 85.48% |
| 6 | 91.94% | 90.12% | 87.10% | 90.32% | 88.44% | 85.48% |
| 7 | 993.55% | 91.64% | 88.81% | 91.94% | 90.20% | 88.81% |
| 8 | 93.55% | 91.80% | 88.81% | 91.94% | 90.61% | 88.81% |
| 9 | 93.55% | 92.11% | 90.16% | 91.94% | 90.95% | 88.81% |
| 10 | 93.55% | 92.74% | 90.16% | 93.55% | 91.31% | 88.81% |
| 15 | 96.77% | 93.60% | 91.93% | 93.55% | 91.38% | 90.32% |
| 20 | 96.77% | 94.17% | 91.93% | 95.61% | 92.44% | 90.32% |
Comparison between mRMR-ABC and ABC algorithms classification performance when applied with the SVM classifier for leukemia1 dataset.
| Number of genes | Classification accuracy | |||||
|---|---|---|---|---|---|---|
| mRMR-ABC | ABC | |||||
| Best | Mean | Worst | Best | Mean | Worst | |
| 2 | 91.66% | 89.63% | 81.94% | 87.5% | 86.45% | 81.94% |
| 3 | 93.05% | 90.37% | 83.33% | 88.88% | 89.82% | 83.33% |
| 4 | 94.44% | 91.29% | 86.11% | 88.8% | 91.15% | 83.33% |
| 5 | 95.83% | 92.82% | 88.88% | 91.66% | 91.89% | 87.5% |
| 6 | 95.83% | 92.82% | 90.32% | 91.99% | 92.04% | 87.5% |
| 7 | 97.22% | 93.10% | 90.32% | 93.05% | 92.23% | 87.5% |
| 10 | 98.61% | 94.44% | 91.66% | 93.05% | 92.38% | 88.88% |
| 13 | 98.61% | 94.93% | 91.66% | 93.05% | 92.44% | 88.88% |
| 14 | 100% | 95.83% | 93.05% | 93.05% | 92.51% | 88.88% |
Comparison between mRMR-ABC and ABC algorithms classification performance when applied with the SVM classifier for lung dataset.
| Number of genes | Classification accuracy | |||||
|---|---|---|---|---|---|---|
| mRMR-ABC | ABC | |||||
| Best | Mean | Worst | Best | Mean | Worst | |
| 2 | 96.87% | 95.83% | 93.75% | 88.54% | 87.5% | 84.37% |
| 3 | 97.91% | 96.31% | 93.75% | 89.58% | 88.54% | 84.37% |
| 4 | 98.95% | 97.91% | 96.87% | 91.66% | 89.58% | 87.5% |
| 5 | 98.95% | 97.98% | 96.87% | 92.70% | 90.03% | 88.54% |
| 6 | 98.95% | 98.27% | 96.87% | 94.79% | 91.66% | 88.54% |
| 7 | 98.95% | 98.53% | 96.87% | 95.83% | 92.18% | 89.58% |
| 8 | 100% | 98.95% | 96.87% | 97.91% | 93.75% | 91.66% |
Comparison between mRMR-ABC and ABC algorithms classification performance when applied with the SVM classifier for SRBCT dataset.
| Number of genes | Classification accuracy | |||||
|---|---|---|---|---|---|---|
| mRMR-ABC | ABC | |||||
| Best | Mean | Worst | Best | Mean | Worst | |
| 2 | 75.90% | 71.08% | 68.67% | 72.28% | 69.87% | 67.46% |
| 3 | 85.54% | 79.51% | 71.08% | 73.34% | 71.08% | 68.67% |
| 4 | 87.95% | 84.33% | 77.10% | 84.33% | 81.92% | 77.10% |
| 5 | 91.56% | 86.74% | 84.33% | 87.95% | 84.33% | 77.10% |
| 6 | 95.36% | 91.56% | 87.99% | 92.77% | 87.99% | 84.33% |
| 8 | 97.59% | 94.05% | 89.15% | 93.97% | 89.15% | 84.33% |
| 10 | 100% | 96.30% | 92.77% | 95.36% | 91.56% | 89.15% |
Comparison between mRMR-ABC and ABC algorithms classification performance when applied with the SVM classifier for lymphoma dataset.
| Number of genes | Classification accuracy | |||||
|---|---|---|---|---|---|---|
| mRMR-ABC | ABC | |||||
| Best | Mean | Worst | Best | Mean | Worst | |
| 2 | 86.36% | 86.36% | 86.36% | 86.36% | 86.36% | 86.36% |
| 3 | 93.93% | 90.90% | 86.36% | 89.39% | 87.87% | 86.36% |
| 4 | 96.96% | 92.42% | 89.39% | 93.93% | 89.39% | 86.36% |
| 5 | 100% | 96.96% | 93.93% | 96.96% | 92.42% | 90.90% |
Comparison between mRMR-ABC and ABC algorithms classification performance when applied with the SVM classifier for Leukemia2 dataset.
| Number of genes | Classification accuracy | |||||
|---|---|---|---|---|---|---|
| mRMR-ABC | ABC | |||||
| Best | Mean | Worst | Best | Mean | Worst | |
| 2 | 84.72% | 84.72% | 84.72% | 84.72% | 84.72% | 84.72% |
| 3 | 87.5% | 86.11% | 84.72% | 86.11% | 85.23% | 84.72% |
| 4 | 90.27% | 87.5% | 84.72% | 87.5% | 86.11% | 84.72% |
| 5 | 90.27% | 88.88% | 86.11% | 87.5% | 86.45% | 84.72% |
| 6 | 94.44% | 90.27% | 87.5% | 90.27% | 88.88% | 86.11% |
| 7 | 93.05% | 89.49% | 88.88% | 90.27% | 89.22% | 86.11% |
| 8 | 94.44% | 91.66% | 87.5% | 91.66% | 90.27% | 88.88% |
| 9 | 94.44% | 92.38% | 87.5% | 93.05% | 91.46% | 88.88% |
| 10 | 95.83% | 91.66% | 88.88% | 93.05% | 91.98% | 88.88% |
| 15 | 98.61% | 94.44% | 91.66% | 94.44% | 92.78% | 90.27% |
| 18 | 98.61% | 95.67% | 91.66% | 95.83% | 92.99% | 90.27% |
| 20 | 100% | 96.12% | 95.83% | 97.22% | 93.15% | 91.66% |
The classification accuracy of the existing gene selection algorithms under comparison when combined with the SVM as a classifier for six microarray datasets. Numbers between parentheses denote the numbers of selected genes.
| Algorithms | Colon | Leukemia1 | Lung | SRBCT | Lymphoma | Leukemia2 |
|---|---|---|---|---|---|---|
| mRMR-ABC | 96.77 (15) | 100 (14) | 100 (8) | 100 (10) | 100 (5) | 100 (20) |
| ABC | 95.61 (20) | 93.05 (14) | 97.91 (8) | 95.36 (10) | 96.96 (5) | 97.22 (20) |
| mRMR-GA | 95.61 (83) | 93.05 (51) | 95.83 (62) | 92.77 (74) | 93.93 (43) | 94.44 (57) |
| mRMR-PSO | 93.55 (78) | 95.83 (53) | 94.79 (65) | 93.97 (68) | 96.96 (82) | 95.83 (61) |
| PSO [ | 85.48 (20) | 94.44 (23) | ||||
| PSO [ | 87.01 (2000) | 93.06 (7129) | ||||
| mRMR-PSO [ | 90.32 (10) | 100 (18) | ||||
| GADP [ | 100 (6) | |||||
| mRMR-GA [ | 100 (15) | 95 (5) | ||||
| ESVM [ | 95.75 (7) | 98.75 (6) | ||||
| MLHD-GA [ | 97.1 (10) | 100 (11) | 100 (6) | 100 (9) | ||
| CFS-IBPSO [ | 100 (6) | 98.57 (41) | ||||
| GA [ | 93.55 (12) | |||||
| mAnt [ | 91.5 (8) | 100 (7) |
The best predictive genes that give highest classification accuracy for all microarray datasets using mRMR-ABC algorithm.
| Datasets | Predictive genes | Accuracy |
|---|---|---|
| Colon | Gene115, Gene161, Gene57, Gene70, Gene12, Gene132, Gene84, Gene62, Gene26, Gene155, Gene39, Gene14, Gene1924, Gene148, and Gene21 | 96.77% |
|
| ||
| Leukemia1 | M31994_at, U07563_cds1_at, Y07604_at, J03925_at, X03484_at, U43522_at, U12622_at, L77864_at, HG3707-HT3922_f_at, D49950_at, HG4011-HT4804_s_at, Y07755_at, M81830_at, and U03090_at | 100% |
|
| ||
| Lung | U77827_at, D49728_at, HG3976-HT4246_at, X77588_s_at, M21535_at, L29433_at, U60115_at, and M14764_at | 100% |
|
| ||
| SRBCT | Gene795, Gene575, Gene423, Gene2025, Gene1090, Gene1611, Gene1389, Gene338, Gene1, and Gene715 | 100% |
|
| ||
| Lymphoma | Gene1219X, Gene656X, Gene2075X, Gene3344X, and Gene345X | 100% |
|
| ||
| Leukemia2 | Y09615_atD87683_at, U31973_s_at, U68031_at, V00571_rna1_at, L39009_at, U37529_at, U35407_at, X93511_s_at, L15533_rna1_at, X00695_s_at, H46990_at, U47686_s_at, L27624_s_at, S76473_s_at, X16281_at, M37981_at, M89957_at, L05597_at, and X07696_at | 100% |