| Literature DB >> 23617960 |
Mohd Saberi Mohamad1, Sigeru Omatu, Safaai Deris, Michifumi Yoshioka, Afnizanfaizal Abdullah, Zuwairie Ibrahim.
Abstract
BACKGROUND: Gene expression data could likely be a momentous help in the progress of proficient cancer diagnoses and classification platforms. Lately, many researchers analyze gene expression data using diverse computational intelligence methods, for selecting a small subset of informative genes from the data for cancer classification. Many computational methods face difficulties in selecting small subsets due to the small number of samples compared to the huge number of genes (high-dimension), irrelevant genes, and noisy genes.Entities:
Year: 2013 PMID: 23617960 PMCID: PMC3847130 DOI: 10.1186/1748-7188-8-15
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Figure 1The areas of unsatisfied () ≥ 0, (() = 0) and (() = 1) in EPSO and BPSO.
The Description of gene expression data sets
| 11_Tumors | 174 | 12,533 | 11 |
| 9_Tumors | 60 | 5,726 | 9 |
| Brain_Tumor1 | 90 | 5,920 | 5 |
| Brain_Tumor2 | 50 | 10,367 | 4 |
| Leukemia1 | 72 | 5,327 | 3 |
| Leukemia2 | 72 | 11,225 | 3 |
| Lung_Cancer | 203 | 12,600 | 5 |
| SRBCT | 83 | 2,308 | 4 |
| Prostate_Tumor | 102 | 10,509 | 2 |
| DLBCL | 77 | 5,469 | 2 |
Note: SRBCT = small round blue cell tumor.
DLBCL = diffuse large B-cell lymphomas.
Parameter settings for EPSO and BPSO
| The number of particles | 100 |
| The number of iteration (generation) | 500 |
| 0.8 | |
| 0.2 | |
| 2 | |
| 2 | |
| Cost, | 1 |
| Gamma, | 1/ |
Note: k = the number of genes (features) in a subset during a training phase in SVM.
Experimental result for each using run epso on 11_tumors, 9_tumors, brain_tumor1, brain_tumor2 and leukemia1 data sets
| 1 | 100 | 3 | ||||||||
| 2 | 95.98 | 245 | 76.67 | 255 | 93.33 | 11 | 94 | 5 | 100 | 4 |
| 3 | 95.98 | 250 | 75 | 231 | 92.22 | 6 | 94 | 8 | 100 | 3 |
| 4 | 95.40 | 232 | 75 | 237 | 92.22 | 7 | 92 | 4 | 100 | 3 |
| 5 | 95.40 | 241 | 75 | 242 | 92.22 | 8 | 92 | 7 | 100 | 3 |
| 6 | 95.40 | 244 | 75 | 253 | 92.22 | 9 | 92 | 7 | 100 | 4 |
| 7 | 94.83 | 218 | 75 | 255 | 92.22 | 11 | 92 | 7 | 100 | 4 |
| 8 | 94.83 | 229 | 75 | 261 | 91.11 | 3 | 92 | 7 | 100 | 3 |
| 9 | 94.83 | 232 | 73.33 | 238 | 91.11 | 5 | 92 | 8 | 100 | 3 |
| 10 | 94.83 | 243 | 73.33 | 248 | 91.11 | 7 | 90 | 3 | ||
| Average ± S.D. | 95.40 ±0.61 | 237.70 ±9.66 | 75 ±1.11 | 247.10 ±9.65 | 92.11 ±0.82 | 7.5 ±2.51 | 92.40 ±1.26 | 6.00 ±1.83 | 100.00 ±0 | 3.20 ±0.63 |
Note: Results of the best subsets are written in a bold style. A near-optimal subset that produces the highest classification accuracy with the smallest number of genes is selected as the best subset. #Acc and S.D. denote the classification accuracy and the standard deviation, respectively, whereas #Selected Genes and Run# represent the number of selected genes and a run number, respectively.
Experimental results for each run using epso on leukemia2, lung_cancer, SRBCT prostate_tumor, and DLBCL data stes
| 1 | 100 | 27 | ||||||||
| 2 | 100 | 4 | 96.06 | 10 | 100 | 11 | 98.04 | 4 | 100 | 4 |
| 3 | 100 | 5 | 96.06 | 12 | 100 | 12 | 98.04 | 6 | 100 | 4 |
| 4 | 100 | 6 | 95.57 | 6 | 98.80 | 8 | 98.04 | 8 | 100 | 5 |
| 5 | 100 | 7 | 95.57 | 7 | 98.80 | 9 | 98.04 | 8 | 100 | 5 |
| 6 | 100 | 7 | 95.57 | 7 | 100 | 48 | 98.04 | 11 | 100 | 5 |
| 7 | 100 | 7 | 95.57 | 8 | 98.80 | 7 | 98.04 | 8 | 100 | 5 |
| 8 | 100 | 8 | 95.57 | 9 | 100 | 12 | 97.06 | 4 | 100 | 5 |
| 9 | 100 | 9 | 95.57 | 11 | 100 | 8 | 97.06 | 6 | 100 | 5 |
| 10 | 100 | 11 | 95.07 | 6 | 97.06 | 6 | 100 | 6 | ||
| Average ± S.D. | 100.00 ±0 | 6.80 ±2.20 | 99.64 ±0.58 | 14.90 ±13.03 | 97.84 ±0.62 | 6.60 ±2.17 | 100.00 ±0 | 4.70 ±0.82 | ||
Note: Results of the best subsets shown are written in a bold style. A near-optimal subset that produces the highest classification accuracy with the smallest number of genes is selected as the best subset. #Acc and S.D. denote the classification accuracy and the standard deviation, respectively, whereas #Selected Genes and Run# represent the number of selected genes and a run number, respectively.
Figure 2The relation between the average of fitness values (10 runs on average) and the number of generations for EPSO and BPSO.
Comparative Experimental Results of the best subsets produced by EPSO and BPSO
| 11_Tumors | #Acc (%) | 96.55 | 95.40 | 0.73 | |||
| #Genes | 243 | 237.70 | 14.48 | ||||
| #Time | 212.44 | 252.05 | 32.87 | ||||
| 9_Tumors | #Acc (%) | 76.67 | 75.00 | 2.25 | |||
| #Genes | 251 | 247.10 | 9.65 | ||||
| #Time | 17.97 | 17.74 | 0.10 | ||||
| Brain_Tumor1 | #Acc (%) | 92.11 | 0.82 | ||||
| #Genes | 217 | 215 | 7.24 | ||||
| #Time | 24.62 | 24.76 | 0.15 | ||||
| Brain_Tumor2 | #Acc (%) | 1.27 | 92 | 91.40 | |||
| #Genes | 208 | 225.20 | 13.13 | ||||
| #Time | 6 | 6.08 | 0.09 | ||||
| Leukemia1 | #Acc (%) | 98.75 | 0.44 | ||||
| #Genes | 213.00 | 198.20 | 6.48 | ||||
| #Time | 6.97 | 6.99 | 0.05 | ||||
| Leukemia2 | #Acc (%) | ||||||
| #Genes | 195 | 200.80 | 4.24 | ||||
| #Time | 10.05 | 10.16 | 0.15 | ||||
| Lung_Cancer | #Acc (%) | 96.06 | 95.67 | 0.31 | |||
| #Genes | 212 | 212.90 | 9.26 | ||||
| #Time | 145.98 | 145.93 | 1.00 | ||||
| SRBCT | #Acc (%) | 99.64 | 0.58 | ||||
| #Genes | 13.03 | 191 | 195.30 | ||||
| #Time | 22.75 | 22.63 | 0.22 | ||||
| Prostate_Tumor | #Acc (%) | 97.84 | 0.62 | 98.04 | |||
| #Genes | 195 | 199.40 | 3.41 | ||||
| #Time | 22.57 | 22.27 | 0.16 | ||||
| DLBCL | #Acc (%) | ||||||
| #Genes | 193 | 200.90 | 3.78 | ||||
| #Time | 7.72 | 7.75 | 0.10 | ||||
Note: The best result of each evaluation is written in a bold style. The best method of each data set is shown in the shaded cells. The best method is selected based on the highest number of the best results for all evaluations. #Acc and S.D. denote the classification accuracy and the standard deviation, respectively, whereas #Genes and #Ave represent the number of selected genes and an average, respectively. #Time stands for running time in the hour unit.
A comparison between our method (EPSO) and previous PSO-based methods
| 11_Tumors | Average #Acc (%) | 95.40 | - | - | - | - | - |
| Best #Acc (%) | 96.55 | 93.10 | - | - | - | ||
| Average #Genes | 237.70 | - | - | - | - | - | |
| Best #Genes | 2948 | - | - | 3206 | - | ||
| 9_Tumors | Average #Acc (%) | 75 | - | - | - | - | - |
| Best #Acc (%) | 76.67 | 78.33 | - | - | - | ||
| Average #Genes | 247.10 | - | - | - | - | - | |
| Best #Genes | 1280 | - | - | 2941 | - | ||
| Brain_Tumor1 | Average #Acc (%) | 92.11 | - | - | - | - | |
| Best #Acc (%) | 93.33 | 94.44 | - | - | 91.4 | ||
| Average #Genes | 7.5 | - | - | - | - | - | |
| Best #Genes | 754 | - | - | 2913 | 456 | ||
| Brain_Tumor2 | Average #Acc (%) | 92.4 | - | - | - | - | - |
| Best #Acc (%) | - | - | 92.65 | - | |||
| Average #Genes | 6.0 | - | - | - | - | - | |
| Best #Genes | 1197 | - | - | 5086 | - | ||
| Leukemia1 | Average #Acc (%) | - | 98.61 | 95.10 | - | - | |
| Best #Acc (%) | - | - | |||||
| Average #Genes | - | 7 | 21 | - | - | ||
| Best #Genes | 1034 | - | - | 2577 | 300 | ||
| Leukemia2 | Average #Acc (%) | 100 | - | - | - | - | - |
| Best #Acc (%) | - | - | - | ||||
| Average #Genes | 6.8 | - | - | - | - | - | |
| Best #Genes | 1292 | - | - | 5609 | - | ||
| Lung_Cancer | Average #Acc (%) | 95.67 | - | - | - | - | - |
| Best #Acc (%) | 96.06 | 96.55 | - | - | - | ||
| Average #Genes | 8.3 | - | - | - | - | - | |
| Best #Genes | 1897 | - | - | 6958 | - | ||
| SRBCT | Average #Acc (%) | 99.64 | - | - | - | - | - |
| Best #Acc (%) | - | - | |||||
| Average #Genes | 14.90 | - | - | - | - | - | |
| Best #Genes | 431 | - | - | 1084 | 880 | ||
| Prostate_Tumor | Average #Acc (%) | 97.84 | - | - | - | - | - |
| Best #Acc (%) | 92.16 | - | - | 95.45 | 93.7 | ||
| Average #Genes | 6.6 | - | - | - | - | - | |
| Best #Genes | 1294 | - | - | 5320 | 795 | ||
| DLBCL | Average #Acc (%) | 100 | - | - | - | - | - |
| Best #Acc (%) | - | - | - | ||||
| Average #Genes | 4.7 | - | - | - | - | - | |
| Best #Genes | 1042 | - | - | 2671 | - |
Note: The best result of each evaluation is written in a bold style. The best result of evaluations could not be compared and stated if the results of evaluations have not been reported in the previous works. The best method of each data set is shown in the shaded cells. The best method is selected based on the highest number of the best results for all evaluations. #Acc and S.D. denote the classification accuracy and the standard deviation, respectively, whereas #Genes and #Ave represent the number of selected genes and an average, respectively. #Time stands for running time in the hour unit. ‘-‘ means that a result is not reported in the previous related work.
IBPSO = An improved binary PSO. PSOTS = A hybrid of PSO and tabu search.
PSOGA = A hybrid of PSO and GA. TS-BPSO = A combination of tabu search and BPSO.
BPSO-CGA = A hybrid of BPSO and a combat genetic algorithm.