| Literature DB >> 27579323 |
Shuaiqun Wang1, Wei Kong1, Weiming Zeng1, Xiaomin Hong1.
Abstract
Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes.Entities:
Mesh:
Year: 2016 PMID: 27579323 PMCID: PMC4989135 DOI: 10.1155/2016/9721713
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1The flowchart of ICA scheme.
Figure 2The framework of the proposed algorithm HICATS.
Figure 3An illustrated example with generated subset and individual representation.
Figure 4A colony is assimilated by an imperialist.
Figure 5Producing nearby solutions in TS.
Cancer-related human gene microarray datasets used in this study.
| Dataset name | Description |
|---|---|
| 9_Tumors | Oligonucleotide microarray gene expression profiles for the chemosensitivity profiles of 232 chemical compounds |
| 11_Tumors | Transcript profiles of 11 common human tumors for carcinomas of the prostate, breast, colorectum, lung, liver, gastroesophagus, pancreas, ovary, kidney, and bladder/ureter |
| Brain_Tumor 1 | DNA microarray gene expression profiles derived from 99 patient samples. The medulloblastomas included primitive neuroectodermal tumors, atypical teratoid/rhabdoid tumors, malignant gliomas, and the medulloblastomas activated by the sonic hedgehog pathway |
| Brain_Tumor 2 | Transcript profiles of four malignant gliomas, including classic glioblastoma, nonclassic glioblastoma, classic oligodendroglioma, and nonclassic oligodendroglioma |
| Leukemia 1 | DNA microarray gene expression profiles of acute myelogenous leukemia (AML) and acute lymphoblastic leukemia (ALL) of B-cell and T-cell |
| Leukemia 2 | Gene expression profiles of a chromosomal translocation to distinguish mixed-lineage leukemia, ALL, and AML |
| Lung_Cancer | Oligonucleotide microarray transcript profiles of 203 specimens, including lung adenocarcinomas, squamous cell lung carcinomas, pulmonary carcinomas, small-cell lung carcinomas, and normal lung tissue |
| SRBCT | cDNA microarray gene expression profiles of small, round blue cell tumors, which include neuroblastoma, rhabdomyosarcoma, non-Hodgkin's lymphoma, and the Ewing family of tumors |
| Prostate_Tumor | cDNA microarray gene expression profiles of prostate tumors. Based on MUC1 and AZGP1 gene expression, the prostate cancer can be distinguished as a subtype associated with an elevated risk of recurrence or with a decreased risk of recurrence |
| DLBCL | DNA microarray gene expression profiles of DLBCL, in which the DLBCL can be identified as cured versus fatal or refractory disease |
Description of gene expression datasets.
| Dataset number | Dataset name | Number of | ||
|---|---|---|---|---|
| Samples | Genes | Classes | ||
| 1 | 9_Tumors | 60 | 5726 | 9 |
| 2 | 11_Tumors | 174 | 12533 | 11 |
| 3 | Brain_Tumors 1 | 90 | 5920 | 5 |
| 4 | Brain_Tumors 2 | 50 | 10367 | 4 |
| 5 | Leukemia 1 | 72 | 5327 | 3 |
| 6 | Leukemia 2 | 72 | 11225 | 3 |
| 7 | Lung_Cancer | 203 | 12600 | 5 |
| 8 | SRBCT | 83 | 2308 | 4 |
| 9 | Prostate_Tumor | 102 | 10509 | 2 |
| 10 | DLBCL | 77 | 5469 | 2 |
Parameter settings for HICATS.
| Parameters | Values |
|---|---|
| The number of countries | 15 |
| The number of imperialists | 4 |
| The number of colonies | 11 |
| The number of iterations (generations) | 50 |
|
| 0.8 |
|
| 0.2 |
The computational results obtained by our proposed algorithm HICATS for 10 independent runs on 11_Tumors, 9_Tumors, and SRBCT datasets.
| Runs | 11_Tumors | 9_Tumors | SRBCT | |||
|---|---|---|---|---|---|---|
| Acc. (%) | Selected genes | Acc. (%) | Selected genes | Acc. (%) | Selected genes | |
| 1 |
|
| 75.00 | 245 | 100 | 10 |
| 2 | 96.55 | 302 | 76.67 | 262 | 100 | 14 |
| 3 | 94.83 | 330 | 75.00 | 233 | 100 | 15 |
| 4 | 95.40 | 268 | 75.00 | 249 | 100 | 13 |
| 5 | 96.55 | 290 | 76.67 | 257 |
|
|
| 6 | 96.55 | 356 | 81.67 | 242 | 100 | 12 |
| 7 | 94.83 | 323 |
|
| 100 | 16 |
| 8 | 94.83 | 349 | 76.67 | 238 | 100 | 9 |
| 9 | 95.98 | 275 | 81.67 | 247 | 100 | 9 |
| 10 | 95.40 | 295 | 81.67 | 253 | 100 | 10 |
|
| ||||||
| Ave. ± SD | 95.86 ± 0.97 | 307.5 ± 30.46 | 78.33 ± 3.33 | 248.5 ± 9.38 | 100 ± 0 | 11.70 ± 2.67 |
The computational results obtained by our proposed algorithm HICATS for 10 independent runs on Leukemia 1, Leukemia 2, and DLBCL datasets.
| Runs | Leukemia 1 | Leukemia 2 | DLBCL | |||
|---|---|---|---|---|---|---|
| Acc. (%) | Selected genes | Acc. (%) | Selected genes | Acc. (%) | Selected genes | |
| 1 |
|
| 100 | 8 | 100 | 4 |
| 2 | 100 | 3 | 100 | 10 |
|
|
| 3 | 100 | 3 | 100 | 6 | 100 | 5 |
| 4 | 100 | 3 | 100 | 6 | 100 | 3 |
| 5 | 100 | 3 | 100 | 7 | 100 | 4 |
| 6 | 100 | 3 | 100 | 8 | 100 | 3 |
| 7 | 100 | 3 |
|
| 100 | 4 |
| 8 | 100 | 3 | 100 | 7 | 100 | 5 |
| 9 | 100 | 3 | 100 | 5 | 100 | 6 |
| 10 | 100 | 3 | 100 | 6 | 100 | 4 |
|
| ||||||
| Ave. ± SD | 100 ± 0 | 3 ± 0 | 100 ± 0 | 6.80 ± 1.55 | 100 ± 0 | 4.10 ± 0.99 |
The computational results obtained by our proposed algorithm HICATS for 10 independent runs on Prostate_Tumor, Lung_Cancer, Brain_Tumor 1, and Brain_Tumor 2 datasets.
| Runs | Prostate_Tumor | Lung_Cancer | Brain_Tumor 1 | Brain_Tumor 2 | ||||
|---|---|---|---|---|---|---|---|---|
| Acc. (%) | Selected genes | Acc. (%) | Selected genes | Acc. (%) | Selected genes | Acc. (%) | Selected genes | |
| 1 | 98.04 | 8 | 95.57 | 6 |
|
| 94 | 5 |
| 2 | 97.06 | 7 | 96.06 | 6 | 93.33 | 12 | 90 | 6 |
| 3 |
|
| 96.06 | 9 | 94.44 | 9 | 94 | 7 |
| 4 | 98.04 | 7 | 95.57 | 8 | 91.11 | 10 | 92 | 5 |
| 5 | 97.06 | 6 | 96.06 | 7 | 93.33 | 8 | 92 | 3 |
| 6 | 98.04 | 7 | 97.04 | 11 | 92.22 | 14 | 94 | 8 |
| 7 | 97.06 | 10 | 96.06 | 8 | 91.11 | 7 | 92 | 4 |
| 8 | 98.04 | 8 | 96.06 | 7 | 93.33 | 9 |
|
|
| 9 | 98.04 | 9 | 96.06 | 9 | 94.44 | 6 | 90 | 9 |
| 10 | 98.04 | 5 |
|
| 93.33 | 8 | 94 | 8 |
|
| ||||||||
| Ave. ± SD | 97.75 ± 0.47 | 7.2 ± 1.62 | 96.16 ± 0.50 | 7.8 ± 1.55 | 93.10 ± 1.26 | 8.9 ± 2.55 | 92.60 ± 1.60 | 5.8 ± 2.14 |
Classification accuracies and selected genes obtained by HICATS and ICA for gene expression data.
| Datasets | Methods | |||
|---|---|---|---|---|
| HICATS | ICA | |||
| Acc. (%) | Selected genes | Acc. (%) | Selected genes | |
| 9_Tumors |
|
| 76.67 | 282 |
| 11_Tumors |
|
| 95.98 | 293 |
| Brain_Tumor 1 |
|
| 91.11 | 8 |
| Brain_Tumor 2 |
|
| 92 | 5 |
| Leukemia 1 |
|
| 97.50 | 7 |
| Leukemia 2 |
|
| 97.32 | 8 |
| Lung_Cancer |
|
| 95.57 | 12 |
| SRBCT |
|
| 100 | 10 |
| Prostate_Tumor |
|
| 97.06 | 6 |
| DLBCL |
|
| 97.50 | 5 |
Classification accuracies of gene expression data obtained via different classification methods.
| Datasets | Methods | HICATS | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Non-SVM | MC-SVM | SVM | |||||||
|
| NN | PNN | OVR | OVO | DAG | WW | CS | OVR | |
| 9_Tumors | 78.33 | 19.38 | 34.00 | 65.10 | 58.57 | 60.24 | 62.24 | 65.33 |
|
| 11_Tumors | 93.10 | 54.14 | 77.21 | 94.68 | 90.36 | 90.36 | 94.68 | 95.30 |
|
| Brain_Tumor 1 | 94.44 | 84.72 | 79.61 | 91.67 | 90.56 | 90.56 | 90.56 | 90.56 |
|
| Brain_Tumor 2 | 94.00 | 60.33 | 62.83 | 77.00 | 77.83 | 77.83 | 73.33 | 72.83 |
|
| Leukemia 1 | 100 | 76.61 | 85.00 | 97.50 | 91.32 | 96.07 | 97.50 | 97.50 |
|
| Leukemia 2 | 100 | 91.03 | 83.21 | 97.32 | 95.89 | 95.89 | 95.89 | 95.89 |
|
| Lung_Cancer | 96.55 | 87.80 | 85.66 | 96.05 | 95.59 | 95.59 | 95.55 | 96.55 |
|
| SRBCT | 100 | 91.03 | 79.50 | 100 | 100 | 100 | 100 | 100 |
|
| Prostate_Tumor | 92.16 | 79.18 | 79.18 | 92.00 | 92.00 | 92.00 | 92.00 | 92.00 |
|
| DLBCL | 100 | 89.64 | 80.89 | 97.50 | 97.50 | 97.50 | 97.50 | 97.50 |
|
(1) Non-SVM: traditional classification method. (2) MC-SVM: multiclass support vector machines. (3) KNN: K-nearest neighbors. (4) NN: backpropagation neural networks. (5) PNN: probabilistic neural networks. (6) OVR: one-versus-the-rest. (7) OVO: one-versus-one. (8) DAG: DAGSVM. (9) WW: method by Weston and Watkins. (10) CS: method by Crammer and Singer. (11) HICATS: improved binary imperialist competition algorithm.
The number of selected genes from datasets between HICATS and IBPSO.
| Datasets | HICATS | IBPSO | ||
|---|---|---|---|---|
| Genes selected | Percentage of genes selected | Genes selected | Percentage of genes selected | |
| 9_Tumors |
| 0.045 | 2941 | 0.51 |
| 11_Tumors |
| 0.022 | 3206 | 0.26 |
| Brain_Tumor 1 |
| 0.001 | 754 | 0.13 |
| Brain_Tumor 2 |
| 0.0003 | 1197 | 0.12 |
| Leukemia 1 |
| 0.0006 | 1034 | 0.19 |
| Leukemia 2 |
| 0.0004 | 1292 | 0.12 |
| Lung_Cancer |
| 0.0005 | 1897 | 0.15 |
| SRBCT |
| 0.004 | 431 | 0.19 |
| Prostate_Tumor |
| 0.0005 | 1294 | 0.12 |
| DLBCL |
| 0.0005 | 1042 | 0.19 |
|
| ||||
| Average |
| 0.00097 | 1117.6 | 0.15 |
Figure 6The convergence graphs of the best and average accuracy classification by HICATS algorithm on 9_Tumors and 11_Tumors datasets.
Figure 7The convergence graphs of the best and average accuracy classification by HICATS algorithm on SRBCT and DLBCL datasets.