| Literature DB >> 30108438 |
Abstract
In this paper, we propose a new hybrid method based on Correlation-based feature selection method and Artificial Bee Colony algorithm,namely Co-ABC to select a small number of relevant genes for accurate classification of gene expression profile. The Co-ABC consists of three stages which are fully cooperated: The first stage aims to filter noisy and redundant genes in high dimensionality domains by applying Correlation-based feature Selection (CFS) filter method. In the second stage, Artificial Bee Colony (ABC) algorithm is used to select the informative and meaningful genes. In the third stage, we adopt a Support Vector Machine (SVM) algorithm as classifier using the preselected genes form second stage. The overall performance of our proposed Co-ABC algorithm was evaluated using six gene expression profile for binary and multi-class cancer datasets. In addition, in order to proof the efficiency of our proposed Co-ABC algorithm, we compare it with previously known related methods. Two of these methods was re-implemented for the sake of a fair comparison using the same parameters. These two methods are: Co-GA, which is CFS combined with a genetic algorithm GA. The second one named Co-PSO, which is CFS combined with a particle swarm optimization algorithm PSO. The experimental results shows that the proposed Co-ABC algorithm acquire the accurate classification performance using small number of predictive genes. This proofs that Co-ABC is a efficient approach for biomarker gene discovery using cancer gene expression profile.Entities:
Keywords: ABC; Artificial bee colony; CFS; Cancer classification; Correlation-based feature selection; Gene expression profile; Gene selection method
Year: 2018 PMID: 30108438 PMCID: PMC6088113 DOI: 10.1016/j.sjbs.2017.12.012
Source DB: PubMed Journal: Saudi J Biol Sci ISSN: 1319-562X Impact factor: 4.219
Fig. 1The main phases and steps of the proposed Co-ABC algorithm.
Fig. 2Co-ABC dataset which is contains the highly correlated genes m selected by the CFS filter approach.
The cancer microarray datasets statistical values.
| Microarray datasets | No of classes | No of samples | No of genes | Description |
|---|---|---|---|---|
| Colon | 2 | 62 | 2000 | 40 cancer samples and 22 normal samples |
| Leukemia1 | 2 | 72 | 7129 | 25 AML samples and 47 ALL samples |
| Lung | 2 | 96 | 7129 | 86 cancer samples and 10 normal samples |
| SRBCT | 4 | 83 | 2308 | 29 EWS cancer samples, 18 NB cancer samples, 11 BL cancer samples, and 25 RMS cancer samples |
| Lymphoma | 3 | 62 | 4026 | 42 DLBCL cancer samples, 9 FL cancer samples, and 11 B-CLL cancer samples |
| Leukemia2 | 3 | 72 | 7129 | 28 AML sample, 24 ALL sample, and 20 MLL samples |
The control parameters for Co-ABC algorithm.
| Control parameter | Value |
|---|---|
| 80 | |
| 100 | |
| 30 | |
| 5 |
The CFS with an SVM classification performance.
| Microarray datasets | Number of genes | Classification accuracy |
|---|---|---|
| Colon | 25 | 91.94% |
| Leukemia1 | 80 | 100% |
| Lung | 71 | 100% |
| SRBCT | 110 | 100% |
| Lymphoma | 184 | 100% |
| Leukemia2 | 103 | 100% |
The classification performance of the Co-ABC algorithm with comparison to mRMR-ABC algorithm and ABC-SVM algorithm for Colon cancer dataset.
| Number of genes | Classification accuracy | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Co-ABC | mRMR-ABC | ABC-SVM | |||||||
| Best | Mean | Worst | Best | Mean | Worst | Best | Mean | Worst | |
| 3 | 90.32% | 90.16% | 88.71% | 88.71% | 87.50% | 85.48% | 87.10% | 85.91% | 83.87% |
| 4 | 91.94% | 91.34% | 90.32% | 90.23% | 88.27% | 87.10% | 87.10% | 86.71% | 85.48% |
| 5 | 91.94% | 91.94% | 91.94% | 91.94% | 89.50% | 87.10% | 90.32% | 87.98% | 85.48% |
| 6 | 93.55% | 92.42% | 91.94% | 91.94% | 90.12% | 87.10% | 90.32% | 88.44% | 85.48% |
| 7 | 95.16% | 93.55% | 91.94% | 93.55% | 91.64% | 88.81% | 91.94% | 90.20% | 88.81% |
| 8 | 95.16% | 94.25% | 93.55% | 93.55% | 91.80% | 88.81% | 91.94% | 90.61% | 88.81% |
| 9 | 96.77% | 94.62% | 93.55% | 93.55% | 92.11% | 90.16% | 91.94% | 90.95% | 88.81% |
| 10 | 96.77% | 94.68% | 93.55% | 93.55% | 92.74% | 90.16% | 93.55% | 91.31% | 88.81% |
| 15 | 95.16% | 94.95% | 93.55% | 96.77% | 93.60% | 91.93% | 93.55% | 91.38% | 90.32% |
| 20 | 95.16% | 93.44% | 91.94% | 96.77% | 94.17% | 91.93% | 95.61% | 92.44% | 90.32% |
The classification performance of the Co-ABC algorithm with comparison to mRMR-ABC algorithm and ABC-SVM algorithm for Leukemia1 cancer dataset.
| Number of genes | Classification accuracy | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Co-ABC | mRMR-ABC | ABC-SVM | |||||||
| Best | Mean | Worst | Best | Mean | Worst | Best | Mean | Worst | |
| 2 | 97.22% | 97.22% | 97.22% | 91.66% | 89.63% | 81.94% | 87.5% | 86.45% | 81.94% |
| 3 | 100% | 99.58% | 98.61% | 93.05% | 90.37% | 83.33% | 88.88% | 89.82% | 83.33% |
| 4 | 100% | 100% | 100% | 94.44% | 91.29% | 86.11% | 88.8% | 91.15% | 83.33% |
| 14 | 100% | 100% | 100% | 100% | 95.83% | 93.05% | 93.05% | 92.51% | 88.88% |
The classification performance of the Co-ABC algorithm with comparison to mRMR-ABC algorithm and ABC-SVM algorithm for Lung cancer dataset.
| Number of genes | Classification accuracy | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Co-ABC | mRMR-ABC | ABC-SVM | |||||||
| Best | Mean | Worst | Best | Mean | Worst | Best | Mean | Worst | |
| 2 | 97.91% | 97.91% | 97.91% | 96.87% | 95.83% | 93.75% | 88.54% | 87.5% | 84.37% |
| 3 | 100% | 100% | 100% | 97.91% | 96.31% | 93.75% | 89.58% | 88.54% | 84.37% |
| 4 | 100% | 100% | 100% | 98.95% | 97.91% | 96.87% | 91.66% | 89.58% | 87.5% |
| 8 | 100% | 100% | 100% | 100% | 98.95% | 96.87% | 97.91% | 93.75% | 91.66% |
The classification performance of the Co-ABC algorithm with comparison to mRMR-ABC algorithm and ABC-SVM algorithm for SRBCT cancer dataset.
| Number of genes | Classification accuracy | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Co-ABC | mRMR-ABC | ABC-SVM | |||||||
| Best | Mean | Worst | Best | Mean | Worst | Best | Mean | Worst | |
| 2 | 77.11% | 77.03 % | 75.90 % | 75.90% | 71.08% | 68.67% | 72.28% | 69.87% | 67.46% |
| 3 | 89.16% | 86.51% | 83.13% | 85.54% | 79.51% | 71.08% | 73.34% | 71.08% | 68.67% |
| 4 | 100% | 95.82% | 92.77% | 87.95% | 84.33% | 77.10% | 84.33% | 81.92% | 77.10% |
| 5 | 100% | 98.43% | 96.38% | 91.56% | 86.74% | 84.33% | 87.95% | 84.33% | 77.10% |
| 10 | 100% | 98.43% | 96.38% | 100% | 96.30% | 92.77% | 95.36% | 91.56% | 89.15% |
The classification performance of the Co-ABC algorithm with comparison to mRMR-ABC algorithm and ABC-SVM algorithm for Lymphoma cancer dataset.
| Number of genes | Classification accuracy | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Co-ABC | mRMR-ABC | ABC-SVM | |||||||
| Best | Mean | Worst | Best | Mean | Worst | Best | Mean | Worst | |
| 2 | 100% | 99.1% | 98.48% | 86.36% | 86.36% | 86.36% | 86.36% | 86.36% | 86.36% |
| 3 | 100% | 100% | 100% | 93.93% | 90.90% | 86.36% | 89.39% | 87.87% | 86.36% |
| 5 | 100% | 100% | 100% | 100% | 96.96% | 93.93% | 96.96% | 92.42% | 90.90% |
The classification performance of the Co-ABC algorithm with comparison to mRMR-ABC algorithm and ABC-SVM algorithm for Leukaemia2 cancer dataset.
| Number of genes | Classification accuracy | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Co-ABC | mRMR-ABC | ABC-SVM | |||||||
| Best | Mean | Worst | Best | Mean | Worst | Best | Mean | Worst | |
| 2 | 87.5% | 87.5% | 87.5% | 84.72% | 84.72% | 84.72% | 84.72% | 84.72% | 84.72% |
| 3 | 97.22% | 96.32% | 95.83% | 87.5% | 86.11% | 84.72% | 86.11% | 85.23% | 84.72% |
| 4 | 98.61% | 96.81% | 95.83% | 90.27% | 87.5% | 84.72% | 87.5% | 86.11% | 84.72% |
| 5 | 98.61% | 98.19% | 97.22% | 90.27% | 88.88% | 86.11% | 87.5% | 86.45% | 84.72% |
| 6 | 100% | 99.21% | 98.61% | 94.44% | 90.27% | 87.5% | 90.27% | 88.88% | 86.11% |
| 20 | 100% | 99.21% | 98.61% | 100% | 96.12% | 95.83% | 97.22% | 93.15% | 91.66% |
The classification performance of the related algorithms under comparison for six cancer gene expression profile Numbers between parentheses means the numbers of informative genes that has been used in classification task.
| Algorithms | Colon | Leukemia1 | Lung | SRBCT | Lymphoma | Leukemia2 |
|---|---|---|---|---|---|---|
| Co-ABC | 96.77(9) | 100(3) | 100(2) | 100(4) | 100(2) | 100(6) |
| CFS-GA | 90.32(8) | 100(24) | 100(20) | 100(38) | 100(17) | 100(36) |
| CFS-PSO | 91.94(7) | 100(15) | 100(5) | 100(35) | ||
| mRMR-ABC | 96.77(15) | 100(14) | 100(8) | 100(10) | 100(5) | 100(20) |
| ABC-SVM | 95.61(20) | 93.05(14) | 97.91(8) | 95.36(10) | 96.96(5) | 97.22(20) |
| PSO | 85.48(20) | 94.44(23) | ||||
| PSO | 87.01(2000) | 93.06 (7129) | ||||
| mRMR-PSO | 90.32(10) | 100(18) | ||||
| GADP | 100(6) | |||||
| mRMR-GA | 100(15) | 95(5) | ||||
| ESVM | 95.75(7) | 98.75(6) | ||||
| MLHD-GA | 97.1(10) | 100(11) | 100(6) | 100(9) | ||
| CFS-IBPSO | 100(6) | 98.57(41) | ||||
| GA | 93.55(12) | |||||
| mAnt | 91.5(8) | 100(7) |
Average runtime (in s) for the Co-ABC algorithm and other classification algorithms.
| Algorithms | Preprocessing time | Average classification time | Total |
|---|---|---|---|
| Co-ABC with SVM | 24.37 s | 40.22 s | 64.59 s |
| mRMR-ABC with SVM | 25.17 s | 72.13 s | 97.3 s |
| ABC with SVM | 0.0 s | 134.74 s | 134.74 s |
Average memory space (in GB) for the Co-ABC algorithm and other classification algorithms.
| Algorithms | Memory space |
|---|---|
| Co-ABC with SVM | 1.56 GB |
| mRMR-ABC with SVM | 1.58 GB |
| ABC with SVM | 1.89 GB |
The highly correlative and informative genes that achieve highest classification accuracy for six cancer gene expression profile using Co-ABC algorithm.
| Datasets | Predictive genes | Accuracy |
|---|---|---|
| Colon | Gene 625, Gene1562, Gene576, Gene1328, Gene1917, Gene 1772, Gene682, Gene1200, Gene1671 | 96.77% |
| Leukemia1 | S50223_at, U05259_rna1_at, M23197_at | 100% |
| Lung | X64559_at, U19247_rna1_s_at | 100% |
| SRBCT | Gene123, Gene742, Gene1954, Gene1003 | 100% |
| Lymphoma | Gene2403X, Gene3519X | 100% |
| Leukemia2 | L47738_at, X00274_at, X58072_at, X95735_at, D63880_at, U48251_at | 100% |
The precision and sensitivity of Co-ABC algorithm using selected informative genes.
| Datasets | Precision | Sensitivity |
|---|---|---|
| Colon | 97.04% | 96.77% |
| Leukemia1 | 100% | 100% |
| Lung | 100% | 100% |
| SRBCT | 100% | 100% |
| Lymphoma | 100% | 100% |
| Leukemia2 | 100% | 100% |
| 1: Select the maximum relevant genes subset using CFS filter method that achieve highly classification accuracy with SVM Classifier from initial microarray dataset. |
| 2: ABC parameters setting, include maximum cycles, bee colony size and limited trail. |
| 3: Initialize ABC food sources randomly. |
| 4: Food sources quality evaluation using fitness calculation, which is SVM classification accuracy. |
| 5: |
| 6: |
| 7: Generate new employed bees (new candidate solutions) |
| 8: New solution quality evaluation using fitness calculation. |
| 9: Adopt greedy selection approche. |
| 10: Determine the probability values by using fitness values. |
| 11: Generate new onlooker bees (new candidate solutions) using the probability of food source. |
| 12: New solution quality evaluation using fitness calculation. |
| 13: Adopt greedy selection process. |
| 14: Identify abandoned solutions and produce new solutions randomly using scout bee. |
| 15: Identify and save the best solution found so far. |
| 16: |
| 17: |
| 18: Generate and return best solution (predictive and biomarker genes). |
| 19: Train the SVM classifier algorithm using generated biomarker genes. |
| 20: Classify gene expression profile using SVM classifier. |
| 21: Calculate the classification accuracy |