| Literature DB >> 30646919 |
Russul Alanni1, Jingyu Hou2, Hasseeb Azzawi2, Yong Xiang2.
Abstract
BACKGROUND: Microarray datasets are an important medical diagnostic tool as they represent the states of a cell at the molecular level. Available microarray datasets for classifying cancer types generally have a fairly small sample size compared to the large number of genes involved. This fact is known as a curse of dimensionality, which is a challenging problem. Gene selection is a promising approach that addresses this problem and plays an important role in the development of efficient cancer classification due to the fact that only a small number of genes are related to the classification problem. Gene selection addresses many problems in microarray datasets such as reducing the number of irrelevant and noisy genes, and selecting the most related genes to improve the classification results.Entities:
Keywords: Gene expression programming; Gene selection; Microarray cancer dataset; Support vector machine
Mesh:
Year: 2019 PMID: 30646919 PMCID: PMC6334429 DOI: 10.1186/s12920-018-0447-6
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1The flowchart of the GEP modelling
Fig. 2Example of GSP mutation
Fig. 3Recombination of 3 elements in gene 1 (from position 0 to 2)
Fig. 4Example for GSP Recombination
Description of the experimental datasets
| No. | Dataset | Samples | Attributes | Classes | Reference |
|---|---|---|---|---|---|
| 1 | 11_Tumors | 174 | 12533 | 11 | [ |
| 2 | 9_Tumors | 60 | 5726 | 9 | [ |
| 3 | Brain_Tumor1 | 90 | 5920 | 5 | [ |
| 4 | Brain_Tumor2 | 50 | 10367 | 4 | [ |
| 5 | Leukemia 1 | 72 | 5327 | 3 | [ |
| 6 | Leukemia 2 | 72 | 11225 | 3 | [ |
| 7 | Lung_Cancer | 203 | 12600 | 5 | [ |
| 8 | SRBCT | 82 | 2308 | 4 | [ |
| 9 | Prostate_Tumor | 102 | 10509 | 2 | [ |
| 10 | DLBCL | 77 | 5469 | 2 | [ |
Parameters used in GSP
| Parameter | Setting |
|---|---|
| Function set | +, -, ÷,Q |
| Terminal set | Selected informative genes from the microarray dataset using systematic selection. |
| Number of chromosomes | 200 |
| Maximum Number of generations | 2000 |
| Genetic operators | |
| Mutation | 0.044 |
| Recombination | 0.3 |
The results of different setting for g and h. Bold font indicates the best results
| g | h |
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| 1 | 10 | 87.587 | 3.423 | 5.567 | 2.16 | 151.1202 | 0.00594 |
| 15 | 94.787 | 2.757 | 10.067 | 1.977 | 154.1243 | 0.00334 | |
| 20 | 96.317 | 2.147 | 11 | 1.6 | 157.7917 | 0.00277 | |
| average | 92.897 | 2.776 | 8.878 | 1.912 | 154.3454 | 0.004016 | |
| 2 | 10 | 97.453 | 1.033 | 11.633 | 1.637 | 266.7896 | 0.00162 |
| 15 |
|
|
| 0.973 | 275.1234 | 0.00146 | |
| 20 |
|
| 13.633 | 0.987 | 280.1246 | 0.00149 | |
| average | 98.847 | 0.467 | 12.844 | 1.199 | 274.0125 | 0.001522 | |
| 3 | 10 | 98.397 | 0.853 | 13.133 | 0.9737 | 381.0373 | 0.00445 |
| 15 | 99.21 | 0.19 | 13.3 | 0.973 | 382.3714 | 0.00143 | |
| 20 | 99.21 | 0.177 | 13.3 | 0.973 | 388.7084 | 0.00133 | |
| average | 98.939 | 0.407 | 13.244 | 0.973 | 384.039 | 0.002404 |
Fig. 5The evaluation values a The average accuracies (ACavg). b The average number of attributes (Navg ). c The average CPU time (Tavg)
Parameter setting of the competitors
| GA | GEP | PSO | |||
|---|---|---|---|---|---|
| Parameters | Values | Parameters | Values | Parameters | Values |
| #chromosomes | 200 | #chromosomes | 200 | # particles | 200 |
| # generations | 2000 | # generations | 2000 | # iterations | 2000 |
| Crossover rate | 0.8 | Crossover rate | 0.8 | Weight (w) | 0.8 |
| mutation rate | 0.1 | mutation rate | 0.1 | Accelerations c1 and c2 | 2 |
Comparison of GSP with three gene selection algorithms on ten selected datasets. Bold font indicates the best results
| Statistics | PSO | GA | GEP | GSP |
|---|---|---|---|---|
| AC avg. | 96.84 | 93.14 | 96.63 |
|
| AC std. | 3.74 | 4.78 | 3.42 |
|
| SNavg. | 96.49 | 92.28 | 96.74 |
|
| SNstd. | 4.05 | 6.9 | 3.65 |
|
| SPavg. | 95.98 | 93 | 95.23 |
|
| SPstd. | 5.31 | 6.03 | 5.71 |
|
| AUCavg. | 0.96 | 0.9 | 0.96 |
|
| AUCstd. | 0.05 | 0.08 | 0.05 |
|
| T avg | 121 | 119 | 235 | 126 |
| T std | 30 | 27 | 28 | 38 |
| N max. | 18.7 | 549.9 | 15.7 |
|
| N min. | 14.3 | 462.6 | 11.5 |
|
| N avg. | 16.14 | 473.5 | 13.8 |
|
| N std. | 10.32 | 619.44 | 5.16 |
|
Comparison of the gene selection algorithms on ten selected datasets. Bold font indicates the best results
| IBPSO | IG-GA | IG-ISSO | EPSO | mABC | SVM | GSP | ||
|---|---|---|---|---|---|---|---|---|
| 11_Tumors | AC avg. | 95.06 | 92.53 | 95.92 | 95.4 | 99.5 |
|
|
| AC std. | 0.3 | _____ | 1.31 | 0.61 | 0 |
|
| |
| N avg. | 240.9 | 479 | 19.8 | 237.7 | 47.27 |
|
| |
| N std. | 9.55 | ____ | 2.57 | 9.66 | 7.79 |
|
| |
| 9_Tumors | AC avg. | 75.5 | 85 | 91.67 | 75 | 98.65 |
|
|
| AC std. | 1.58 | ____ | 2.48 | 1.11 | 0.01 |
|
| |
| N avg. | 240 | 52 | 15.7 | 247.1 | 34.73 |
|
| |
| N std. | 7.95 | ____ | 2.2136 | 9.65 | 5.54 |
|
| |
| Brain_Tumor1 | AC avg. | 92.56 | 93.33 | 98 | 92.11 |
| 90 | 99.8 |
| AC std. | 0.54 | ____ | 0.88 | 0.82 |
| -------- | 0.31 | |
| N avg. | 11.2 | 244 | 10.1 | 7.5 |
|
| 9.2 | |
| N std. | 7.15 | ____ | 1.73 | 2.51 |
| -------- | 1.5 | |
| Brain_Tumor2 | AC avg. | 91 | 88 | 99.8 | 92.4 |
| 80 | 99.9 |
| AC std. | 0.05 | ____ | 0.63 | 1.27 |
| -------- | 0.1 | |
| N avg. | 6.4 | 489 | 10.4 | 6 |
| 10367 | 9.8 | |
| N std. | 1.9 | ____ | 1.08 | 1.83 |
| -------- | 0.4 | |
| Lung_ Cancer | AC avg. | 95.86 | 95.57 | 99.41 | 95.67 | 100 |
|
|
| AC std. | 0.53 | ____ | 0.45 | 8.3 | 0 | -------- |
| |
| N avg. | 14.9 | 2101 | 10.4 | 8.5 | 23.31 |
|
| |
| N std. | 10.57 | ____ | 1.08 | 2.11 | 5.14 | -------- |
| |
| Leukemia1 | AC avg. | 100 | 100 | 100 | 100 | 100 |
|
|
| AC std. | 0 | ____ | 0 | 0 | 0 | -------- |
| |
| N avg. | 3.5 | 82 | 4.6 | 3.2 | 5.67 |
|
| |
| N std. | 0.71 | ____ | 0.52 | 0.63 | 0.73 | -------- |
| |
| Leukemia2 | AC avg. | 100 | 98.61 | 100 | 100 | 100 |
|
|
| AC std. | 0 | ____ | 0 | 0 | 0 | -------- |
| |
| N avg. | 6.7 | 782 | 4.2 | 6.8 | 6.29 |
|
| |
| N std. | 1.5 | ____ | 0.42 | 2.2 | 0.98 | -------- |
| |
| SRBCT | AC avg. | 100 | 100 | 100 | 99.64 |
| 98.41 | 100 |
| AC std. | 0 | ____ | 0 | 0.58 |
| -------- | 0 | |
| N avg. | 17.5 | 56 | 4.3 | 14.9 |
|
| 4 | |
| N std. | 8.32 | ____ | 0.48 | 13.03 |
| -------- | 0.67 | |
| Prostate | AC avg. | 97.94 | 96 | 98.82 | 97 |
| 93.4 | 99.87 |
| AC std. | 0.31 | ____ | 0.41 | 0.62 |
| -------- | 0.52 | |
| N avg. | 13.6 | 343 | 8.4 | 6.6 |
| 10509 | 8.2 | |
| N std. | 7.68 | ____ | 1.78 | 2.17 |
| -------- | 0.79 | |
| DLBCL | AC avg. | 100 | 100 | 100 | 100 | 100 |
|
|
| AC std. | 0 | ____ | 0 | 0 | 0 | -------- |
| |
| N avg. | 6 | 107 | 3.9 | 4.7 | 4.05 |
|
| |
| N std. | 1.25 | ____ | 0.32 | 0.82 | 0.78 | --------Ftable |
|