| Literature DB >> 27510562 |
Johra Muhammad Moosa1, Rameen Shakur2, Mohammad Kaykobad3, Mohammad Sohel Rahman3.
Abstract
BACKGROUND: Development of biologically relevant models from gene expression data notably, microarray data has become a topic of great interest in the field of bioinformatics and clinical genetics and oncology. Only a small number of gene expression data compared to the total number of genes explored possess a significant correlation with a certain phenotype. Gene selection enables researchers to obtain substantial insight into the genetic nature of the disease and the mechanisms responsible for it. Besides improvement of the performance of cancer classification, it can also cut down the time and cost of medical diagnoses.Entities:
Keywords: Artificial bee colony algorithm; Cancer classification; Evolutionary algorithm; Gene selection; Microarray
Mesh:
Year: 2016 PMID: 27510562 PMCID: PMC4980787 DOI: 10.1186/s12920-016-0204-7
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1The flowchart of the modified Artificial Bee Colony (mABC) algorithm
Attributes of the datasets used for experimental evaluation
| Name of the dataset | Sample size | Number of genes | Number of classes | Reference |
|---|---|---|---|---|
| 9_ | 60 | 5,726 | 9 | [ |
| 11_ | 174 | 12,533 | 11 | [ |
|
| 90 | 5,920 | 5 | [ |
|
| 50 | 10,367 | 4 | [ |
|
| 77 | 5,469 | 2 | [ |
|
| 72 | 5,327 | 3 | [ |
|
| 72 | 11,225 | 3 | [ |
|
| 203 | 12,600 | 5 | [ |
|
| 102 | 10,509 | 2 | [ |
|
| 83 | 2,308 | 4 | [ |
All the proposed parameter values after tuning
| Parameter | First | Second | Third |
|---|---|---|---|
|
| 0.7 | 0.4 | 0.7 |
|
| 0.8 | 0.8 | 0.8 |
|
| 1.4 | 1.4 | 1.4 |
|
| 0.85 | 0.85 | 0.85 |
|
| 0.065 | 0.03 | 0.03 |
|
| 12 | 16 | 16 |
|
| 9 | 15 | 15 |
|
| N/A | 10 | N/A |
|
| 10 | N/A | N/A |
|
| 5 | N/A | N/A |
|
| 0.5 | N/A | N/A |
|
| 5 | 5 | 5 |
|
| 0 | 0 | 0 |
|
| 0.6 | 0.5 | 0.5 |
|
| 20 | 20 | 20 |
|
| 35 | 100 | 100 |
|
| 0.035 | 0.02 | 0.035 |
|
| 25 | 40 | 40 |
|
| 0.5 | 0.7 | 0.7 |
|
| SA | HC | SAHCR |
|
| SAHCR | SAHCR | SAHCR |
| Selection method | Tournament selection | Tournament selection | Stochastic universal sampling |
|
| Linear | Linear | Linear |
|
| Equation | Equation | Equation |
|
| True | True | True |
|
| Kruskal-Wallis | Kruskal-Wallis | Kruskal-Wallis |
Optimized parameter values after tuning
| Parameter | Optimized value |
|---|---|
|
| 0.7 |
|
| 0.8 |
|
| 1.4 |
|
| 0.85 |
|
| 0.065 |
|
| 12 |
|
| 9 |
|
| 14 |
|
| 5 |
|
| 0.5 |
|
| 5 |
|
| 0 |
|
| 0.6 |
|
| 20 |
|
| 35 |
|
| 0.035 |
|
| 25 |
|
| 0.5 |
|
| SA |
|
| SAHCR |
| Selection method | Tournament selection |
|
| Linear |
|
| Equation |
|
| True |
|
| Kruskal-Wallis |
Experimental results of mABC using default, optimized (first), second and third parameter settings for different datasets
| Dataset name | Evaluation | Default | PS1 | PS2 | PS3 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Criteria | Best | Avg. | S. D. | Worst | Best | Avg. | S. D. | Worst | Best | Avg. | S. D. | Worst | Best | Avg. | S. D. | Worst | ||||
| 9_ | Accuracy | 90.0 | 84.60 | 0.02 | 80.0 |
| 98.65 |
| 96.67 |
| 99.11 |
|
|
|
|
|
| |||
| # Genes | 41 | 50.03 | 52.21 | 37 | 30 | 34.73 | 5.64 | 35 |
|
| 4 | 26 |
| 25.5 |
|
| ||||
| 11_ | Accuracy | 94.25 | 93.08 | 0.01 | 91.38 |
| 99.50 |
| 98.85 |
| 99.45 |
| 98.85 |
|
|
|
| |||
| # Genes | 33 | 37.15 | 5.08 | 23 | 42 | 47.27 | 7.79 | 47 |
|
| 4.15 |
| 32 | 33.86 |
| 34 | ||||
|
| Accuracy | 100 | 95.85 | 0.02 | 93.33 |
|
|
|
|
|
|
|
|
|
|
|
| |||
| # Genes | 13 | 16.53 | 5.78 | 16 | 12 | 16.87 | 2.85 | 20 | 10 | 11.2 |
|
|
|
| 1.17 |
| ||||
|
| Accuracy | 100 | 98.09 | 0.01 | 96 |
|
|
|
|
|
|
|
|
|
|
|
| |||
| # Genes | 11 | 17.6 | 5.82 | 16 | 7 | 10.52 | 1.72 | 15 |
| 7.47 | 1.19 |
|
|
|
|
| ||||
|
| Accuracy | 100 | 100 | 0 | 100 |
|
|
|
|
|
|
|
|
|
|
|
| |||
| # Genes | 4 | 7.67 | 1.77 | 11 |
| 4.05 | 0.78 | 5 |
| 3.6 | 0.51 |
|
|
|
|
| ||||
|
| Accuracy | 100 | 100 | 0 | 100 |
|
|
|
|
|
|
|
|
|
|
|
| |||
| # Genes | 5 | 8.39 | 1.94 | 12 | 4 | 5.67 | 0.73 | 7 | 4 | 4.4 | 0.51 | 5 |
|
|
|
| ||||
|
| Accuracy | 100 | 100 | 0 | 100 |
|
|
|
|
|
|
|
|
|
|
|
| |||
| # Genes | 6 | 10.64 | 2.57 | 15 | 4 | 6.29 | 0.98 | 8 |
|
| 0.74 |
|
| 4.06 |
|
| ||||
|
| Accuracy | 99.01 | 98.17 | 0.01 | 97.04 |
|
|
|
|
|
|
|
|
|
|
|
| |||
| # Genes | 27 | 24.59 | 6.94 | 16 | 14 | 23.31 | 5.14 | 32 |
|
|
|
|
| 12.44 | 1.5 |
| ||||
|
| Accuracy | 100 | 96.95 | 0.01 | 95.10 |
|
|
|
|
|
|
|
|
|
|
|
| |||
| # Genes | 11 | 15.33 | 6.35 | 19 |
| 10.73 | 3.15 | 16 |
| 6.59 | 0.91 |
|
|
|
|
| ||||
|
| Accuracy | 100 | 100 | 0 | 100 |
|
|
|
|
| ||||||||||
| textbf100 |
|
|
|
|
|
| ||||||||||||||
| # Genes | 6 | 7.47 | 0.94 | 9 | 5 | 5.59 | 0.51 | 6 |
| 4.27 | 0.46 | 5 |
|
|
|
| ||||
Best results (maximum accuracy and minimum selected gene size) are highlighted using boldface font
Fig. 2Distribution of classification accuracy using first (optimized) parameter setting for the dataset a 9_Tumors; b 11_Tumors
Fig. 3Distribution of number of times selected gene size fall in a specific range using the first (optimized) parameter setting a 9_Tumors; b 11_Tumors; c Brain_Tumor1; d Brain_Tumor2e Leukemia1; f Leukemia2; gDLBCL; h Lung_Cancer; i Prostate_Tumor; jSRBCT
Comparative experimental results of the best subsets produced by mABC and other evolutionary methods for the dataset 9_Tumors
| Evaluation | GA | ACO | ABC | mABC [This work] | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Criteria | Best | Avg. | S.D. | Worst | Best | Avg. | S.D. | Worst | Best | Avg. | S.D. | Worst | Best | Avg. | S.D. | Worst | |||
| Accuracy | 85 | 82.56 |
| 81.67 | 85 | 83.09 |
| 81.67 | 91.67 | 87.67 | 0.02 | 85 |
|
|
|
| |||
| # Genes | 275 | 276.8 | 12.07 | 301 | 266 | 271.59 | 11.03 | 279 | 57 | 75.67 | 14.18 | 97 |
|
|
|
| |||
Best results (maximum accuracy and minimum selected gene size) are highlighted using boldface font
Comparative experimental results of the best subsets produced by mABC and other methods for different datasets
| Dataset name | Evaluation | EPSO | IBPSO | TS-BPSO | BPSO-CGA | Random forest | mABC | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Criteria | [ | [ | [ | [ | [ | [This work] | |||||||||||
| Best | Avg. | S.D. | Best | Best | Best | Avg. | S.D. | Best | Avg. | S.D. | |||||||
| 9_ | Accuracy | 76.67 | 75.00 | 1.11 | 78.33 | 81.63 | - | 88.0 | 0.037 |
|
|
| |||||
| # Genes | 251 | 247.10 | 9.65 | 1280 | 2941 | - | 130 | - |
|
|
| ||||||
| 11_ | Accuracy | 96.55 | 95.40 | 0.61 | 93.10 | 97.35 | - | - | - |
|
|
| |||||
| # Genes | 243 | 237.70 | 9.66 | 2948 | 3206 | - | - | - |
|
|
| ||||||
|
| Accuracy | 93.33 | 92.11 | 0.82 | 94.44 | 95.89 | 91.4 | 93.6 | 0.007 |
|
|
| |||||
| # Genes |
|
|
| 754 | 2913 | 456 | 86 | - | 12 | 16.87 | 2.85 | ||||||
|
| Accuracy | 94 | 92.4 | 1.27 | 94.00 | 92.65 | - | - | - |
|
|
| |||||
| # Genes | 4 | 6.0 | 1.83 | 1197 | 5086 | - | - | - |
|
|
| ||||||
|
| Accuracy |
|
|
|
|
| - | 94.6 | 0.021 |
|
|
| |||||
| # Genes |
| 4.7 | 0.82 | 1042 | 2671 | - | 21 | - | 3 | 4.05 | 0.78 | ||||||
|
| Accuracy |
|
|
|
|
|
| 98.1 | 0.006 |
|
|
| |||||
| # Genes |
|
|
| 1034 | 2577 | 300 | 21 | - | 4 | 5.67 | 0.73 | ||||||
|
| Accuracy |
|
|
|
|
| - | - | - |
|
|
| |||||
| # Genes |
| 6.8 | 2.2 | 1292 | 5609 | - | - | - |
|
|
| ||||||
|
| Accuracy | 96.06 | 95.67 | 0.31 | 96.55 | 99.52 | - | - | - |
|
|
| |||||
| # Genes |
|
|
| 1897 | 6958 | - | - | - | 14 | 23.31 | 5.14 | ||||||
|
| Accuracy | 99.02 | 97.84 | 0.62 | 92.16 | 95.45 | 93.7 | - | - |
|
|
| |||||
| # Genes |
|
|
| 1294 | 5320 | 795 | - | - |
| 10.73 | 3.15 | ||||||
|
| Accuracy |
| 99.64 | 0.58 |
|
|
| 99.7 | 0.003 |
|
|
| |||||
| # Genes | 7 | 14.90 | 13.03 | 431 | 1084 | 880 | 42 | - | 5 | 5.59 | 0.51 | ||||||
Best results (maximum accuracy and minimum selected gene size) are highlighted using boldface font
Performance outcome for different values of parameter selection method
|
|
|
| ||
|---|---|---|---|---|
| Avg. | S.D. | Avg. | S.D. | |
| Fitness proportionate selection | 84.2 |
| 41.43 | 47.47 |
| Tournament selection | 84.74 |
| 53.33 | 53.65 |
| Stochastic universal sampling |
|
|
|
|
Best results (maximum accuracy and minimum selected gene size) are highlighted using boldface font