| Literature DB >> 20172493 |
Hualong Yu1, Guochang Gu, Haibo Liu, Jing Shen, Jing Zhao.
Abstract
Microarray data are often extremely asymmetric in dimensionality, such as thousands or even tens of thousands of genes but only a few hundreds of samples or less. Such extreme asymmetry between the dimensionality of genes and samples can lead to inaccurate diagnosis of disease in clinic. Therefore, it has been shown that selecting a small set of marker genes can lead to improved classification accuracy. In this paper, a simple modified ant colony optimization (ACO) algorithm is proposed to select tumor-related marker genes, and support vector machine (SVM) is used as classifier to evaluate the performance of the extracted gene subset. Experimental results on several benchmark tumor microarray datasets showed that the proposed approach produces better recognition with fewer marker genes than many other methods. It has been demonstrated that the modified ACO is a useful tool for selecting marker genes and mining high dimension data. Copyright 2009 Beijing Genomics Institute. Published by Elsevier Ltd. All rights reserved.Entities:
Mesh:
Substances:
Year: 2009 PMID: 20172493 PMCID: PMC5054414 DOI: 10.1016/S1672-0229(08)60050-9
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Figure 1The feature selection procedure of modified ACO algorithm. 1 represents that the corresponding gene will be selected, 0 represents that the corresponding gene will not be selected.
Figure 2The flow chart of marker gene selection algorithm based on modified ACO and SVM.
Parameters used for experiments
| Common parameters for ACO | Value | |
|---|---|---|
| population size | 50 | |
| the number of iterations | 50 | |
| the weight factor of updating pheromone | 5 | |
| evaporation of pheromone trails | 0.2 | |
| the weight factor of the number of marker genes | 0.005 | |
| the initial pheromone of pathway 0 | 1.0 | |
| the initial pheromone of pathway 1 | 1.0 | |
| Common parameters for MMACO | ||
| the lower boundary of pheromone | 0.3 | |
| the upper boundary of pheromone | 1.5 | |
| Common parameters for SVM | ||
| the parameter of RBF kernel function | 5 | |
| the penalty factor | 500 | |
Figure 3Variational curves of fitness for GA (A), ACO (B) and MMACO (C).
Figure 4Variational curves of fitness for GA (A), ACO (B) and MMACO (C) based on different initial pheromone for pathway 0 and 1 in ACO and MMACO (1.0 for pathway 0 and 0.5 for pathway 1) and different probability for initial binary characters in GA (the probability of 0 is as twice as that of 1).
Detailed description of top 10 marker genes extracted by GA, ACO and MMACO
| Rank | Gene ID | Accession No. | Times | Description |
|---|---|---|---|---|
| 1 | 1423 | J02854 | 71 | Myosin regulatory light chain 2, smooth muscle isoform (human); contains element TAR1 repetitive element |
| 2 | 1772 | H08393 | 63 | Collagen |
| 3 | 765 | M76378 | 55 | Human cysteine-rich protein (CRP) gene, exons 5 and 6 |
| 4 | 515 | T56604 | 50 | Tubulin |
| 5 | 625 | X12671 | 49 | Human gene for heterogeneous nuclear ribonucleoprotein (hnRNP) core protein A1 |
| 6 | 1067 | T70062 | 45 | Human nuclear factor NF45 mRNA, complete cds |
| 7 | 1406 | U26312 | 44 | Human heterochromatin protein HP1Hs- |
| 8 | 992 | X12466 | 41 | Human mRNA for snRNP E protein |
| 9 | 241 | M36981 | 41 | Human putative NDP kinase (nm23-H2S) mRNA, complete cds |
| 10 | 780 | H40095 | 39 | Macrophage migration inhibitory factor (human) |
Other benchmark tumor microarray datasets
| Dataset | Quantity | Reference | ||
|---|---|---|---|---|
| Genes | Samples | Classes | ||
| Leukemia | 7,129 | 72 | 2 | Golub |
| DLBCL | 4,026 | 47 | 2 | Alizadeh |
| NCI60 | 5,726 | 60 | 9 | Stuanton |
| Brain | 5,920 | 90 | 5 | Pomeroy |
Related works on five datasets
| Method | LOOCV predictive accuracy (Size of selected marker genes) | ||||
|---|---|---|---|---|---|
| Colon | Leukemia | DLBCL | NCI60 | Brain | |
| ACO/SVM | 91.5%±1.5% (7.5) | 100% (8.6) | 100% (7.2) | 82.4%±1.9% (8.8) | 90.7%±1.9% (7.9) |
| MMACO/SVM | 95.0%±0.3% (10.8) | 100% (6.3) | 100% (5.7) | 84.2%±1.8% (12.6) | 91.0%±1.4% (8.1) |
| SNR (top-ranked 100)/SVM | 87.1% (100) | 97.2% (100) | 95.7% (100) | 71.7% (100) | 84.4% (100) |
| GA/SVM | 90.2%±0.5% (28.4) | 100% (17.6) | 100% (15.4) | 80.7%±2.2% (23.6) | 88.9%±1.6% (25.1) |
| SVM | 90.3% (2,000) | 94.1% (500) | – | – | – |
| Bagboost | 83.9% (200) | 95.9% (200) | 98.4% (200) | – | 76.1% (200) |
| SWKC | 88.4% (15.0) | 98.2% (14.2) | 99.3% (14.1) | 75.2% (32.5) | 81.9% (41.5) |
| OVR-SVM | – | – | – | 65.2% (5,726) | 91.7% (5,920) |