| Literature DB >> 31775613 |
Russul Alanni1, Jingyu Hou2, Hasseeb Azzawi2, Yong Xiang2.
Abstract
BACKGROUND: Microarray datasets consist of complex and high-dimensional samples and genes, and generally the number of samples is much smaller than the number of genes. Due to this data imbalance, gene selection is a demanding task for microarray expression data analysis.Entities:
Keywords: Evolutionary algorithms; Gene expression programming; Gene selection; Microarray
Mesh:
Year: 2019 PMID: 31775613 PMCID: PMC6880643 DOI: 10.1186/s12859-019-3161-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Parameters used in DGS
| Parameter | Setting |
|---|---|
| Terminal set | Start with all the attributes in microarray dataset. |
| Function set | +, −, ÷, Q where Q is the square root |
| Maximum Iterations number | 200 |
| Mutation | 0.044 |
| Recombination | 0.3 |
the results of different setting for the number of genes (N) and the number of chromosomes (CH)
| genes(N) | CH | AC avg. | I avg | S avg. | TM avg. |
|---|---|---|---|---|---|
| 1 | 100 | 77.92 | 200 | 7.37 | 189.00 |
| 200 | 85.45 | 192.50 | 10.07 | 247.28 | |
| 300 | 86.18 | 152.40 | 4.00 | 285.01 | |
| average | 83.18 | 181.63 | 7.15 | 240.43 | |
| 2 | 100 | 82.29 | 191.30 | 4.00 | 183.52 |
| 200 | 87.49 | 145.90 | 3.90 | 218.85 | |
| 300 | 87.54 | 144.03 | 3.90 | 279.74 | |
| average | 85.77 | 160.41 | 3.93 | 227.37 | |
| 3 | 100 | 87.20 | 144.00 | 3.90 | 204.72 |
| 200 | 87.54 | 135.00 | 3.90 | 288.05 | |
| 300 | 87.54 | 135.00 | 3.90 | 362.05 | |
| average | 87.43 | 138.00 | 3.90 | 284.94 |
Comparison of DGS performance with different feature selection models in term of AC, SN, SP, PPV, NPV, AUC, S and TM with CI 95% for each test
| CSF | CSFS | WS | SVM | GEP | DGS | |
|---|---|---|---|---|---|---|
| AC avg. | 0. 8436 | 0.8370 | 0.8395 | 0.8544 | 0.8577 | 0. 8749 |
| CI 95% | ±0.1921 | ±0.1279 | ±0.1180 | ±0.0986 | ±0.0922 | ± 0.1287 |
| SN avg. | 0.8995 | 0.8907 | 0.8932 | 0.9307 | 0.9278 | 0.9522 |
| CI 95% | ±0.2520 | ±0.1893 | ±0.1753 | ±0.1362 | ±0.1575 | ±0.1187 |
| SP avg | 0.7707 | 0.7669 | 0.7694 | 0.7548 | 0.7662 | 0.7739 |
| CI 95% | ±0.5809 | ±0.3157 | ±0.3417 | ±0.1682 | ±0.1001 | ±0.2569 |
| PPV avg. | 0.8373 | 0.8332 | 0.8351 | 0.8321 | 0.8382 | 0.8462 |
| CI 95% | ±0.2956 | ±0.1652 | ±0.1744 | ±0.0910 | ±0.0637 | ±0.1362 |
| NPVavg. | 0.8550 | 0.8434 | 0.8468 | 0.8931 | 0.8907 | 0.9253 |
| CI 95% | ±0.3803 | ±0.2855 | ±0.2557 | ±0.2475 | ±0.2749 | ±0.2401 |
| AUCavg. | 0.8293 | 0.8104 | 0.8414 | 0.8499 | 0.8423 | 0.8687 |
| CI 95% | ±0.0223 | ±0.0213 | ±0.0211 | ±0.0218 | ±0.0216 | ±0.0210 |
| Savg. | 6.5 | 6.9 | 6.7 | 6.3 | 6.2 | 3.9 |
| CI 95% | ±0.8430 | ±0.978 | ±1.0013 | ±1.3016 | ±0.9917 | ±0.3338 |
| TM avg | 600.12 | 600.02 | 600.01 | 600.21 | 620.51 | 218.85 |
| CI 95% | ±0.1821 | ±0.0189 | ±0.0134 | ±0.3700 | ±24.6415 | ±34.6227 |
Fig. 1Comparison of DGS performance with different feature selection models in term of AC, SN, SP, PPV, NPV and AUC
Validation results of DGS on the independent dataset GSE8894
| AC avg. | 0.8768 | PPV avg. | 0.8714 |
| CI 95% | ±0.1932 | CI 95% | ±0.5191 |
| SN avg. | 0.8841 | NPVavg. | 0.8824 |
| CI 95% | ±0.2360 | CI 95% | ± 0.3148 |
| SP avg | 0.8696 | AUCavg. | 0.8686 |
| CI 95% | ±0.4721 | CI 95% | ±0.0210 |
Fig. 2The evaluation results for the selected genes. aThe gene expression level of the selected genes shown as a heatmap. b The prediction results using the selected genes
The selected gens of each run
| Run number | S | Probe ID | Gene symbol |
|---|---|---|---|
| 1 | 4 | 204891_s_at | LCK |
| 208893_s_at | DUSP6 | ||
| 202454_s_at | ERBB3 | ||
| 202885_s_at | MMD | ||
| 2 | 4 | 204891_s_at | LCK |
| 208893_s_at | DUSP6 | ||
| 202454_s_at | ERBB3 | ||
| 202885_s_at | MMD | ||
| 3 | 4 | 204891_s_at | LCK |
| 208893_s_at | DUSP6 | ||
| 202454_s_at | ERBB3 | ||
| 202885_s_at | MMD | ||
| 4 | 4 | 204891_s_at | LCK |
| 208893_s_at | DUSP6 | ||
| 202454_s_at | ERBB3 | ||
| 202885_s_at | MMD | ||
| 5 | 4 | 204891_s_at | LCK |
| 208893_s_at | DUSP6 | ||
| 202454_s_at | ERBB3 | ||
| 202885_s_at | MMD | ||
| 6 | 3 | 204891_s_at | LCK |
| 208893_s_at | DUSP6 | ||
| 202454_s_at | ERBB3 | ||
| 7 | 4 | 204891_s_at | LCK |
| 208893_s_at | DUSP6 | ||
| 202454_s_at | ERBB3 | ||
| 202885_s_at | MMD | ||
| 8 | 3 | 208893_s_at | DUSP6 |
| 202454_s_at | ERBB3 | ||
| 202885_s_at | MMD | ||
| 9 | 4 | 204891_s_at | LCK |
| 208893_s_at | DUSP6 | ||
| 202454_s_at | ERBB3 | ||
| 202885_s_at | MMD | ||
| 10 | 5 | 204891_s_at | LCK |
| 208893_s_at | DUSP6 | ||
| 202454_s_at | ERBB3 | ||
| 202885_s_at | MMD | ||
| 205027_s_at | MAP3K8 |
The final selected genes from the gene selection method DGS
| Gene symbol | Gene Name | Chr. | NCBI UniGene number | Specification |
|---|---|---|---|---|
| LCK | lymphocyte-specific protein tyrosine kinase | 1 | 3932 | The encoded protein is a key signaling molecule in the selection and maturation of developing T-cells |
| DUSP6 | dual-specificity phosphatase6 | 12 | 1848 | This gene inactivates (ERK2), resulting in tumor suppression and apoptosis. The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily |
| ERBB3 | v-erb-b2 avian erythroblastic leukemia viral oncogene homolog 3 | 12 | 2065 | Also known as HER3 (human epidermal growth factor receptor 3) This gene encodes a member of the epidermal growth factor receptor (EGFR) family of receptor tyrosine kinases which are often aberrantly expressed and/or activated in human cancers |
| MMD | monocyte-to-macrophage differentiation associated protein | 17 | 23,531 | This protein is expressed in mature macrophages but the function of this protein is still unknown. |
Note: NCBI UniGene number with more information about the genes can be found from NCBI website https://www.ncbi.nlm.nih.gov/geo/
Description of the experimental datasets
| No. | Dataset | Samples(X) | Number of Genes(Y) | Classes | Reference |
|---|---|---|---|---|---|
| 1 | 11_Tumors | 174 | 12,533 | 11 | [ |
| 2 | 9_Tumors | 60 | 5726 | 9 | [ |
| 3 | Brain_Tumor1 | 90 | 5920 | 5 | [ |
| 4 | Brain_Tumor2 | 50 | 10,367 | 4 | [ |
| 5 | Leukemia 1 | 72 | 5327 | 3 | [ |
| 6 | Leukemia 2 | 72 | 11,225 | 3 | [ |
| 7 | Lung_Cancer | 203 | 12,600 | 5 | [ |
| 8 | SRBCT | 82 | 2308 | 4 | [ |
| 9 | Prostate_Tumor | 102 | 10,509 | 2 | [ |
| 10 | DLBCL | 77 | 5469 | 2 | [ |
Comparison of the gene selection algorithms on ten selected datasets
| 11_Tumors | IBPSO | IG-GA | IG-ISSO | EPSO | mABC | IG-GEP | DGS |
| AC avg. | 95.06 | 92.53 | 95.92 | 95.4 | 99.5 | 93.88 | 99.88 |
| AC std. | 0.3 | _____ | 1.31 | 0.61 | 0 | 3 | 0.01 |
| S avg. | 240.9 | 479 | 19.8 | 237.7 | 47.27 | 18.6 | 17.9 |
| S std. | 9.55 | ____ | 2.57 | 9.66 | 7.79 | 3 | 1.2 |
| 9_Tumors | IBPSO | IG-GA | IG-ISSO | EPSO | mABC | IG-GEP | DGS |
| AC avg. | 75.5 | 85 | 91.67 | 75 | 98.65 | 89.83 | 98.89 |
| AC std. | 1.58 | ____ | 2.48 | 1.11 | 0.01 | 1.01 | 0.02 |
| S avg. | 240 | 52 | 15.7 | 247.1 | 34.73 | 20.3 | 13.7 |
| S std. | 7.95 | ____ | 2.2136 | 9.65 | 5.54 | 2.1 | 1.02 |
| Brain_Tumor1 | IBPSO | IG-GA | IG-ISSO | EPSO | mABC | IG-GEP | DGS |
| AC avg. | 92.56 | 93.33 | 98 | 92.11 | 100 | 96.11 | 99.82 |
| AC std. | 0.54 | ____ | 0.88 | 0.82 | 0 | 1.41 | 0.31 |
| S avg. | 11.2 | 244 | 10.1 | 7.5 | 16.87 | 19 | 9.2 |
| S std. | 7.15 | ____ | 1.73 | 2.51 | 2.85 | 1.05 | 1.5 |
| Brain_Tumor2 | IBPSO | IG-GA | IG-ISSO | EPSO | mABC | IG-GEP | DGS |
| AC avg. | 91 | 88 | 99.8 | 92.4 | 100 | 99.8 | 99.9 |
| AC std. | 0.05 | ____ | 0.63 | 1.27 | 0 | 1.01 | 0.1 |
| S avg. | 6.4 | 489 | 10.4 | 6 | 10.52 | 14.6 | 9.8 |
| S std. | 1.9 | ____ | 1.08 | 1.83 | 1.72 | 0.7 | 0.4 |
| Lung_ Cancer | IBPSO | IG-GA | IG-ISSO | EPSO | mABC | IG-GEP | DGS |
| AC avg. | 95.86 | 95.57 | 99.41 | 95.67 | 100 | 98.48 | 100.00 |
| AC std. | 0.53 | ____ | 0.45 | 8.3 | 0 | 0.61 | 0.00 |
| S avg. | 14.9 | 2101 | 10.4 | 8.5 | 23.31 | 14.5 | 8.30 |
| S std. | 10.57 | ____ | 1.08 | 2.11 | 5.14 | 0.61 | 0.82 |
| Leukemia1 | IBPSO | IG-GA | IG-ISSO | EPSO | mABC | IG-GEP | DGS |
| AC avg. | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| AC std. | 0 | ____ | 0 | 0 | 0 | 0 | 0 |
| S avg. | 3.5 | 82 | 4.6 | 3.2 | 5.67 | 7.7 | 2.9 |
| S std. | 0.71 | ____ | 0.52 | 0.63 | 0.73 | 0.67 | 0.63 |
| Leukemia2 | IBPSO | IG-GA | IG-ISSO | EPSO | mABC | IG-GEP | DGS |
| AC avg. | 100 | 98.61 | 100 | 100 | 100 | 100 | 100 |
| AC std. | 0 | ____ | 0 | 0 | 0 | 0 | 0 |
| S avg. | 6.7 | 782 | 4.2 | 6.8 | 6.29 | 7.5 | 4.1 |
| Sstd. | 1.5 | ____ | 0.42 | 2.2 | 0.98 | 1.58 | 0.73 |
| SRBCT | IBPSO | IG-GA | IG-ISSO | EPSO | mABC | IG-GEP | DGS |
| AC avg. | 100 | 100 | 100 | 99.64 | 100 | ______ | 100 |
| AC std. | 0 | ____ | 0 | 0.58 | 0 | _______ | 0 |
| S avg. | 17.5 | 56 | 4.3 | 14.9 | 5.59 | _____ | 4 |
| S std. | 8.32 | ____ | 0.48 | 13.03 | 0.51 | ______ | 0.67 |
| Prostate | IBPSO | IG-GA | IG-ISSO | EPSO | mABC | IG-GEP | DGS |
| AC avg. | 97.94 | 96 | 98.82 | 97 | 100 | 98.33 | 99.87 |
| AC std. | 0.31 | ____ | 0.41 | 0.62 | 0 | 0.4 | 0.52 |
| S avg. | 13.6 | 343 | 8.4 | 6.6 | 10.73 | 18.1 | 8.2 |
| Sstd. | 7.68 | ____ | 1.78 | 2.17 | 3.15 | 0.9 | 0.79 |
| DLBCL | IBPSO | IG-GA | IG-ISSO | EPSO | mABC | IG-GEP | DGS |
| AC avg. | 100 | 100 | 100 | 100 | 100 | ______ | 100 |
| AC std. | 0 | ____ | 0 | 0 | 0 | ____ | 0 |
| S avg. | 6 | 107 | 3.9 | 4.7 | 4.05 | ____ | 3.5 |
| S std. | 1.25 | ____ | 0.32 | 0.82 | 0.78 | ____ | 0.5 |
the differences between DGS, GA and GEP
| DGS | GA | GEP | |
|---|---|---|---|
| number of chromosomes in each generation | Same number | Same number | Same number |
| Chromosome length | Flexible length | Fixed length | Flexible length |
| Generation size | changeable size | Fixed size | Fixed size |
| Genetic Operation | Systematic selection | Random selection | Random selection |
| Terminal set | Different set in each generation | Same set in each generation | Same set in each generation |
Fig. 3DGS Flowchart
The results of example 2
| Generation | T | h | Generation | T | h |
|---|---|---|---|---|---|
| 1 | 2200 | 7 | 11 | 650 | 3 |
| 2 | 2000 | 6 | 12 | 402 | 3 |
| 3 | 1852 | 6 | 13 | 254 | 3 |
| 4 | 1723 | 5 | 14 | 102 | 3 |
| 5 | 1583 | 5 | 15 | 79 | 3 |
| 6 | 1296 | 4 | 16 | 53 | 3 |
| 7 | 1101 | 3 | 17 | 31 | 3 |
| 8 | 972 | 3 | 18 | 19 | 3 |
| 9 | 801 | 3 | 19 | 3 | |
| 10 | 734 | 3 | 20 | 3 |
Fig. 4Example of mutation operation for DGS
Fig. 5DGS Recombination example