| Literature DB >> 25157377 |
Chellamuthu Gunavathi1, Kandasamy Premalatha2.
Abstract
Feature selection in cancer classification is a central area of research in the field of bioinformatics and used to select the informative genes from thousands of genes of the microarray. The genes are ranked based on T-statistics, signal-to-noise ratio (SNR), and F-test values. The swarm intelligence (SI) technique finds the informative genes from the top-m ranked genes. These selected genes are used for classification. In this paper the shuffled frog leaping with Lévy flight (SFLLF) is proposed for feature selection. In SFLLF, the Lévy flight is included to avoid premature convergence of shuffled frog leaping (SFL) algorithm. The SI techniques such as particle swarm optimization (PSO), cuckoo search (CS), SFL, and SFLLF are used for feature selection which identifies informative genes for classification. The k-nearest neighbour (k-NN) technique is used to classify the samples. The proposed work is applied on 10 different benchmark datasets and examined with SI techniques. The experimental results show that the results obtained from k-NN classifier through SFLLF feature selection method outperform PSO, CS, and SFL.Entities:
Mesh:
Year: 2014 PMID: 25157377 PMCID: PMC4137534 DOI: 10.1155/2014/693831
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Pseudocode 1Pseudocode for PSO.
Pseudocode 2Pseudocode for CS.
Pseudocode 3Pseudocode for SFL.
Pseudocode 4Pseudocode for SFLLF.
Figure 1Schematic representation of the proposed method.
Figure 2Candidate solution representation.
Microarray gene expression datasets.
| Dataset name | Number of genes | Class 1 | Class 2 | Total samples |
|---|---|---|---|---|
| CNS | 7129 | Survivors (21) | Failures (39) | 60 |
| DLBCL Harvard | 7129 | DLBCL (58) | FL (19) | 77 |
| DLBCL Outcome | 7129 | Cured (32) | Fatal (26) | 58 |
| Lung Cancer Michigan | 7129 | Tumor (86) | Normal (10) | 96 |
| Ovarian Cancer | 15154 | Normal (91) | Cancer (162) | 253 |
| Prostate Outcome | 12600 | Nonrelapse (13) | Relapse (8) | 21 |
| AML-ALL | 7129 | ALL (47) | AML (25) | 72 |
| Colon Tumor | 2000 | Tumor (40) | Healthy (22) | 62 |
| Lung Harvard2 | 12533 | ADCA (150) | Mesothelioma (31) | 181 |
| Prostate | 12600 | Normal (59) | Tumor (77) | 136 |
Parameters and their values.
| Parameter | Value |
|---|---|
| Particle/egg/frog size | 10, 50, 100 |
| Number of memeplexes ( | 10 |
| Number of frogs in each memeplex ( | 5 |
| Population size | 50 |
| Maximum number of generations | 200 |
| Shuffling iteration | 20 |
|
| 0.9 |
|
| 2.1 |
|
| 2.1 |
|
| 1 |
|
| 1.5 |
| Distance measure in | Euclidean |
|
| 5 |
Figure 3Classification accuracy using particle swarm optimization.
Figure 4Classification accuracy using cuckoo search.
Figure 5Classification accuracy using shuffled frog leaping.
Figure 6Classification accuracy using shuffled frog leaping with Lévy flight.
Comparison of classification accuracies obtained from different SI techniques.
| Dataset name | SI techniques | |||
|---|---|---|---|---|
| PSO | CS | SFL | SFLLF | |
| CNS | 100~# | 87.5~# | 93.75~# | 100~# |
| DLBCL Harvard | 100~# | 100~ | 96~# | 100~# |
| DLBCL Outcome | 95.45# | 77.27+~# | 81.81+~# | 95.45# |
| Lung Cancer Michigan | 100+~# | 100~# | 100+# | 100+~# |
| Ovarian Cancer | 100~# | 100# | 100# | 100~# |
| Prostate Outcome | 100# | 85.71+~ | 85.71~# | 100# |
| AML-ALL | 100~# | 100# | 100# | 100~# |
| Colon Tumor | 95~ | 95~ | 95~ | 100~ |
| Lung Harvard2 | 100~# | 100~# | 100~# | 100~# |
| Prostate | 97.56# | 92.68# | 92.68# | 97.56# |
+ T-statistics.
~SNR.
# F-test.
Comparison of classification accuracy with other methods for CNS.
| Reference (year) | Methodology | Maximum classification accuracy in percentage |
|---|---|---|
| Alonso-González et al. (2012) [ | Combination of attribute selection and classification algorithm | 75.49 |
| Liu et al. (2010) [ | EGS (ensemble gene selection) method | 98.33 |
| This work | PSO | 100 |
| This work | Cuckoo search | 87.5 |
| This work | SFL | 93.75 |
| This work | SFLLF | 100 |
Comparison of classification accuracy with other methods for DLBCL Harvard.
| Reference (year) | Methodology | Maximum classification accuracy in percentage |
|---|---|---|
| Huang et al. (2012) [ | iSELF (improved semisupervised local Fisher) discriminant analysis | 94.67 |
| Alonso-González et al. (2012) [ | Combination of attribute selection and classification algorithm | 100 |
| Dagliyan et al. (2011) [ | HBE (hyperbox enclosure) method | 96.1 |
|
Chuang et al. (2011) [ | Correlation-based feature selection (CFS) and Taguchi genetic algorithm (TGA) | 100 |
| Chopra et al. (2010) [ | Based on gene doublets | 98.1 |
| Martinez et al. (2010) [ | Swarm intelligence feature selection algorithm | 100 |
| This work | PSO | 100 |
| This work | Cuckoo search | 100 |
| This work | SFL | 96 |
| This work | SFLLF | 100 |
Comparison of classification accuracy with other methods for DLBCL Outcome.
| Reference (year) | Methodology | Maximum classification accuracy in percentage |
|---|---|---|
| Alonso-González et al. (2012) [ | Combination of attribute selection and classification algorithm | 67.84 |
| Wang and Simon (2011) [ | Univariate class discrimination with single gene | 74 |
| This work | PSO | 95.45 |
| This work | Cuckoo search | 77.27 |
| This work | SFL | 81.81 |
| This work | SFLLF | 95.45 |
Comparison of classification accuracy with other methods for Lung Cancer Michigan.
| Reference (year) | Methodology | Maximum classification accuracy in percentage |
|---|---|---|
| Alonso-González et al. (2012) [ | Combination of attribute selection and classification algorithm | 100 |
| Liu et al. (2010) [ | EGS (ensemble gene selection) method | 89.58 |
| This work | PSO | 100 |
| This work | Cuckoo search | 100 |
| This work | SFL | 100 |
| This work | SFLLF | 100 |
Comparison of classification accuracy with other methods for Ovarian Cancer.
| Reference (year) | Methodology | Maximum classification accuracy in percentage |
|---|---|---|
| Alonso-González et al. (2012) [ | Combination of attribute selection and classification algorithm | 100 |
| This work | PSO | 100 |
| This work | Cuckoo search | 100 |
| This work | SFL | 100 |
| This work | SFLLF | 100 |
Comparison of classification accuracy with other methods for Prostate Outcome.
| Reference (year) | Methodology | Maximum classification accuracy in percentage |
|---|---|---|
| Dagliyan et al. (2011) [ | HBE (hyperbox enclosure) method | 95.24 |
| This work | PSO | 100 |
| This work | Cuckoo search | 85.71 |
| This work | SFL | 85.71 |
| This work | SFLLF | 100 |
Comparison of classification accuracy with other methods for AML-ALL.
| Reference (year) | Methodology | Maximum classification accuracy in percentage |
|---|---|---|
| Alonso-González et al. (2012) [ | Combination of attribute selection and classification algorithm | 100 |
| Maji (2012) [ | Mutual Information | 100 |
| Chandra and Gupta (2011) [ | Effective range based gene selection | 98.61 |
| Chuang et al. (2011) [ | Correlation-based feature selection (CFS) and Taguchi genetic algorithm (TGA) | 100 |
| Dagliyan et al. (2011) [ | HBE (hyperbox enclosure) method | 100 |
| Martinez et al. (2010) [ | Swarm intelligence feature selection algorithm | 100 |
| Liu et al. (2010) [ | EGS (ensemble gene selection) method | 100 |
| Chopra et al. (2010) [ | Based on gene doublets | 100 |
| Wang and Gotoh (2009) [ | Rough sets | 100 |
| Vanichayobon et al. (2007) [ | Gene selection step and clustering cancer data by using self-organizing map | 100 |
|
Jirapech-Umpai and Sturat (2005) [ | Evolutionary algorithm | 98.24 |
| This work | PSO | 100 |
| This work | Cuckoo search | 100 |
| This work | SFL | 100 |
| This work | SFLLF | 100 |
Comparison of classification accuracy with other methods for Colon Tumor.
| Reference (year) | Methodology | Maximum classification accuracy in percentage |
|---|---|---|
| Alonso-González et al. (2012) [ | Combination of attribute selection and classification algorithm | 88.41 |
| Maji (2012) [ | Mutual information | 100 |
| Chandra and Gupta (2011) [ | Effective range based gene selection | 83.87 |
| Li et al. (2011) [ | Margin influence analysis with SVM | 100 |
| Chopra et al. (2010) [ | Based on gene doublets | 91.1 |
| This work | PSO | 95 |
| This work | Cuckoo search | 95 |
| This work | SFL | 95 |
| This work | SFLLF | 100 |
Comparison of classification accuracy with other methods for Lung Harvard2.
| Reference (year) | Methodology | Maximum classification accuracy in percentage |
|---|---|---|
| Alonso-González et al. (2012) [ | Combination of attribute selection and classification algorithm | 99.63 |
| Chandra and Gupta (2011) [ | Effective range based gene selection | 100 |
| Wang and Simon (2011) [ | Univariate class discrimination with single gene | 99 |
| Chopra et al. (2010) [ | Based on gene doublets | 100 |
| Wang and Gotoh (2009) [ | Rough sets | 97.32 |
| Vanichayobon et al. (2007) [ | Gene selection step and clustering cancer data by using self-organizing map | 100 |
| This work | PSO | 100 |
| This work | Cuckoo search | 100 |
| This work | SFL | 100 |
| This work | SFLLF | 100 |
Comparison of classification accuracy with other methods for Prostate.
| Reference (year) | Methodology | Maximum classification accuracy in percentage |
|---|---|---|
| Wang and Gotoh (2009) [ | Rough sets | 91.18 |
| This work | PSO | 97.56 |
| This work | Cuckoo search | 92.68 |
| This work | SFL | 92.68 |
| This work | SFLLF | 97.56 |