| Literature DB >> 31888444 |
Ying Xiong1,2,3, Qing-Hua Ling4, Fei Han1,2, Qing-Hua Liu5.
Abstract
BACKGROUND: The main goal of successful gene selection for microarray data is to find compact and predictive gene subsets which could improve the accuracy. Though a large pool of available methods exists, selecting the optimal gene subset for accurate classification is still very challenging for the diagnosis and treatment of cancer.Entities:
Keywords: Binary particle swarm optimization; Extreme learning machine; Gene selection; LASSO
Mesh:
Year: 2019 PMID: 31888444 PMCID: PMC6936154 DOI: 10.1186/s12859-019-3228-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 5The heatmap of expression levels based on the top ten frequently selected genes on the five data. (a) Colon (b) Brain cancer (c)Leukemia. (d) Lymphoma (e) LUNG
Fig. 6The comparison of the 5-fold CV accuracy on the test data versus the iteration number between the original BPSO and the improved BPSO on the five data. (a) Colon (b) Brain cancer (c) Leukemia (d) Lymphoma (e) LUNG
Fig. 7The contribution value of the selected genes versus iteration number of IBPSO on the five data. (a) Colon (b) Brain cancer (c) Leukemia (d) Lymphoma (e) LUNG
Fig. 8The number of the clusters versus the 5-fold CV accuracy on the test data obtained by ELM
The classification accuracy obtained by ELM with different gene subsets selected by the KL-IBPSO-ELM method on the five microarray data
| Data | Selected gene subsets | 5-fold CV Accuracy Mean(%) ± std | Test Accuracy Mean(%) ± std |
|---|---|---|---|
| Colon | 493, 1902, 1060, 1346, 1982, 554, 1060 | 9307 ± 0.009 | 91.40 ± 0.011 |
| 377,1100,959, 475,1637,164,1764,304,60,897 | 95.81 ± 0.004 | 93.23 ± 0.012 | |
| 14,341,20,1886,164,1271,304,1136,165,1549,830,1897,1227,1042 | 97.42 ± 0.011 | 94.28 ± 0.022 | |
| 493,251,1346,377,554,1902 | 95.07 ± 0.009 | 92.16 ± 0.019 | |
| Brain cancer | 5202,3341,1243,1135,5051,30,4413,4935 | 91.63 ± 0.011 | 89.00 ± 0.012 |
| 18,3341,1582,2942,1198,6331,4917,724 | 92.67 ± 0.008 | 90.78 ± 0.011 | |
| 6429, 4917,6774,1975,587,2122,5051,6700,6828 | 92.36 ± 0.006 | 91.73 ± 0.011 | |
| 6429,4309,2304,3555,1975,3035,3341,1648,161, 724 | 91.15 ± 0.007 | 90.26 ± 0.012 | |
| Leukemia | 818,894,3135,3359,4653,4991,5094,5406,2356,445 | 100 ± 0.000 | 100 ± 0.000 |
| 3090,1694,1635,3276,1410,1523,1992,2659, | 100 ± 0.000 | 100 ± 0.000 | |
| 1268,3276,1523,1973,1855,2356,445,3150,818 | 100 ± 0.000 | 100 ± 0.000 | |
| 3276,1523,1973,1882,356,818,445,3135,2895,3082 | 100 ± 0.000 | 100 ± 0.000 | |
| Lymphoma | 4862,3589,3775,3356,343,962,3227,2666,2810,2734 | 90.09 ± 0.006 | 89.54 ± 0.011 |
| 4862,3589,3227,704,2810,4998 | 92.06 ± 0.008 | 90.63 ± 0.004 | |
| 4514,3589,5709,6172,2666,2810,3525 | 91.63 ± 0.006 | 90.24 ± 0.010 | |
| 3589,3775,5709,6565,5329,418,5818,4998 | 92.23 ± 0.005 | 91.63 ± 0.010 | |
| LUNG | 235,295,1411,1784,1921,1974,2264,2672,3187 | 91.64 ± 0.007 | 89.64 ± 0.013 |
| 1268,1523,1822,2356,445,2556,1318,1411,295,2005,1712 | 94.64 ± 0.007 | 90.64 ± 0.016 | |
| 2479,924,2969,1973,1822,580,2279,2128,1432,2005,414 | 95.81 ± 0.004 | 90.35 ± 0.011 | |
| 1268,3276,2969,441,295,2904,445,2895,2128,261,1028, 2005 | 91.35 ± 0.012 | 88.08 ± 0.021 |
The top ten frequently selected genes with the proposed method on the Brain cancer data
| Gene no. | Gene name | Description |
|---|---|---|
| 18 | AB000895 | Dachsous 1(Drosophila) #※ |
| 4917 | U65676 | Hermansky-Pudlak syndrome 1 |
| 4309 | U33849_at | Proprotein, convertase subtilisin/ kexin type 7 |
| 4413 | U39817 | Bloom syndrome ※ |
| 4657 | U51095 | caudal type homeo box transcription factor 1 |
| 4843 | U61262 | neogenin homolog 1 (chicken)#※ |
| 5931 | X58987 | dopamine receptor D1 #※ |
| 3041 | M64394 | Kell blood group #❖ |
| 6480 | X87159 | Sodium channel, nonvoltage-gated 1,beta (Liddle syndrome)# |
| 6429 | X83703 | ankyrin repeat domain 1 (cardiac muscle) |
※also selected in [16]; # also selected in [17];❖also selected in [30]
The top ten frequently selected genes with the proposed method on the Colon data
| Gene no. | Gene name | Description |
|---|---|---|
| 493 | R87126 | Myosin heavy chain, nonmuscle (gallus gallus)✱ |
| 14 | H20709 | Myosin light chain alkali, smooth muscle isoform (human)#※❖✺✱ |
| 377 | Z50573 | H.sapiens mRNA for GCAP-II/uroguanylin precursor. #✺ |
| 251 | U37012 | Hunman cleavage and polyadenylation specificity factor mRNA,complete cds. ※ |
| 554 | H24401 | MAP KINASE PHOSPHATASE-1 ( |
| 175 | T94579 | Human chitotriosidase precursor mRNA, complete cds #※✺ |
| 1346 | T62947 | 60S RIBOSOMAL PROTEIN L24 ( |
| 1771 | J05032 | Human aspartyl-tRNA synthetase alpha-2 subunit mRNA, complete cds✺ |
| 765 | M76378 | Human cysteine-rich protein (CPR) gene, exons 5 and 6 #✺ |
| 1902 | U01038 | Human pLK mRNA, complete cds |
※also selected in [16]; # also selected in [17]; ❖also selected in [31]; ✺also selected in [32];✱also selected in [30]
The top ten frequently selected genes with the proposed method on the LUNG data
| Gene no. | Gene name | Description |
|---|---|---|
| 580 | 39333_at | collagen, type IV, alpha 1# |
| 235 | 41,770 | Cluster Incl AA420624:nc61c12.r1 Homo sapiens cDNA |
| 295 | 36,681 | apolipoprotein D |
| 1411 | 37,954 | annexin A8 |
| 1784 | 35,874 | lymphoid-restricted membrane protein |
| 1921 | 32,748 | Cluster Incl AI557852:P6test.G05.r Homo sapiens cDNA |
| 1974 | 33,656 | ribosomal protein L37 |
| 2264 | 32,104 | calcium/calmodulin-dependent protein kinase (CaM kinase) II gamma |
| 2672 | 37,970 | mitogen-activated protein kinase 8 interacting protein 3 |
| 3178 | 38799_at | Cluster Incl AF068706:Homo sapiens gamma2-adaptin (G2 AD) mRNA, complete cds /cds = (763,3018) /gb = AF068706 /gi = 3,193,225 /ug = Hs.8991 ※ |
※also selected in [16]; # also selected in [30]
The top ten frequently selected genes with the proposed method on the Lymphoma data
| Gene no. | Gene name | Description |
|---|---|---|
| 806 | D86969 | PHD finger protein 16 ※# |
| 772 | D85423 | CDC5 cell division cycle 5-like (S. pombe) |
| 1703 | L06499 | ribosomal protein L37a |
| 2320 | M13207 | colony stimulating factor 2 (granulocyte-macrophage) |
| 3419 | S48983 | serum amyloid A4, constitutive |
| 3507 | S75213 | phosphodiesterase 4A, cAMP-specific (phosphodiesterase E2 dunce homolog, Drosophila) |
| 3755 | U07358 | mitogen-activated protein kinase kinase kinase 12 |
| 4998 | U69108 | TNF receptor-associated factor 5 |
| 5230 | U81600 | paired related homeobox 2 |
| 6651 | X97630 | MAP/microtubule affinity-regulating kinase 2 |
※also selected in [16]; # also selected in [30]
The top ten frequently selected genes with the proposed method on the Leukemia data
| Gene no. | Gene name | Description |
|---|---|---|
| 4847 | X95735 | Zyxin ※#⊙✱ |
| 894 | HG3162-HT3339 | Transcription Factor Iia |
| 2354 | M92287 | CCND3 Cyclin D3#⊙✱✺ |
| 4535 | X74262 | RETINOBLASTOMA BINDING PROTEIN P48 ✱ |
| 4991 | Y09615 | GB DEF = Mitochondrial transcription termination factor |
| 2642 | U05259 | MB-1 gene ※#⊙✱✺ |
| 818 | HG1879-HT1919 | Ras-Like Protein Tc10 |
| 6283 | Y00081 | IL6 Interleukin 6 (B cell stimulatory factor 2) |
| 6855 | M31523 | TCF3 Transcription factor 3 (E2A immunoglobulin enhancer binding factors E12/E47) ⊙✱✺ |
| 1882 | M27891 | CST3 Cystatin C (amyloid angiopathy and cerebral hemorrhage) ※#⊙✺ |
※also selected in [16]; # also selected in [17]; ⊙also selected in [33]; ✱also selected in [34]; ✺also selected in [30]
Fig. 1Relation between change rate and bit velocity
The 5-fold CV classification accuracies of ELM based on the five gene selection methods on the five data
| Method | 5-fold CV Accuracy (Mean% ± std) and selected gene number | ||||
|---|---|---|---|---|---|
| Colon | Brain cancer | Leukemia | Lymphoma | LUNG | |
| BPSO-ELM | 93.34 ± 0.020(9) | 85.45 ± 0.023(7) | 98.50 ± 0.003(5) | 83.50 ± 0.027(8) | 94.80 ± 0.006(11) |
| KMeans-BPSO-ELM | 93.50 ± 0.020(9) | 87.23 ± 0.023(8) | 99.17 ± 0.010(4) | 85.14 ± 0.029(6) | 95.64 ± 0.006(12) |
| KMeans-GCSI-MBPSO-ELM | 97.61 ± 0.014(6) | 88.63 ± 0.022(6) | 100 ± 0.00(3) | 86.97 ± 0.024(8) | 97.10 ± 0.006(11) |
| SC-IPSO-ELM | 99.05 ± 0.011(13) | 91.88 ± 0.019(7) | 100 ± 0.00(3) | 93.79 ± 0.020(7) | 98.67 ± 0.019(11) |
| The proposed method | 97.42 ± 0.011(14) | 92.67 ± 0.008 (8) | 100 ± 0.000(8) | 92.23 ± 0.005(8) | 95.81 ± 0.004(11) |
Fig. 2New curve of mapping function of probability
Fig. 3New mapping function curve in three-dimension
Fig. 4The frame of the proposed hybrid gene selection method