| Literature DB >> 32548100 |
Qi Chen1,2, Zhaopeng Meng1,3, Ran Su1,4.
Abstract
Gene selection algorithm in micro-array data classification problem finds a small set of genes which are most informative and distinctive. A well-performed gene selection algorithm should pick a set of genes that achieve high performance and the size of this gene set should be as small as possible. Many of the existing gene selection algorithms suffer from either low performance or large size. In this study, we propose a wrapper gene selection approach, named WERFE, within a recursive feature elimination (RFE) framework to make the classification more efficient. This WERFE employs an ensemble strategy, takes advantages of a variety of gene selection methods and assembles the top selected genes in each approach as the final gene subset. By integrating multiple gene selection algorithms, the optimal gene subset is determined through prioritizing the more important genes selected by each gene selection method and a more discriminative and compact gene subset can be selected. Experimental results show that the proposed method can achieve state-of-the-art performance.Entities:
Keywords: RFE; WERFE; ensemble; gene selection; wrapper
Year: 2020 PMID: 32548100 PMCID: PMC7270206 DOI: 10.3389/fbioe.2020.00496
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
The details of the five data sets.
| RatinvitroH | 31,042 | 116 |
| Nki70 | 70 | 144 |
| ZQ_188D | 188 | 9,024 |
| Prostate | 100 | 50 |
| Regicor | 22 | 300 |
Figure 1The entire process of gene ranking algorithm.
Voting and predicted results on RatinvitroH data set using WERFE.
| 19 | 20 | 0 | – | – | – | – | – | – |
| 18 | 19, 20 | 2 | 75.79 | 74.58 | 56.19 | 60.45 | 100 | 0 |
| 17 | 18–20 | 17 | 77.30 | 81.10 | 47.26 | 57.80 | 95.42 | 3.33 |
| 16 | 17–20 | 685 | 77.15 | 81.46 | 48.10 | 76.67 | 90.69 | 60.48 |
| 15 | 16–20 | 1,092 | 77.43 | 85.82 | 53.10 | 75.00 | 82.27 | 69.76 |
| 14 | 15–20 | 6,142 | 75.70 | 80.17 | 43.10 | 65.53 | 69.57 | 65.48 |
| 0 | 1–20 | 31,042 | 76.84 | 81.74 | 66.62 | 60.23 | 49.52 | 50.71 |
GN, gene number.
Acc.RF, Acc using RF as classifier. Other abbreviations in the first row mean in the same way.
Comparison with RFRFE.
| RatinvitroH | 17 | 77.30 | 81.10 | 47.26 | 11 | 72.27 | 68.71 | 34.95 |
| Nki70 | 5 | 82.27 | 49.75 | 86.13 | 43 | 80.15 | 35.36 | 83.92 |
| ZQ_188D | 1 | 93.81 | 98.43 | 100.00 | 41 | 95.80 | 17.29 | 99.98 |
| Prostate | 4 | 98.00 | 95.00 | 100.00 | 3 | 95.31 | 90.00 | 100.00 |
| Regicor | 4 | 76.54 | 65.34 | 62.71 | 5 | 77.76 | 68.95 | 64.70 |
GN, gene number.
Comparison with SVMRFE.
| RatinvitroH | 17 | 77.30 | 81.10 | 47.26 | 51 | 70.30 | 80.86 | 53.79 |
| Nki70 | 5 | 82.27 | 49.75 | 86.13 | 25 | 77.10 | 57.42 | 88.17 |
| ZQ_188D | 1 | 93.81 | 98.43 | 100.00 | 1 | 93.81 | 0 | 100.00 |
| Prostate | 4 | 98.00 | 95.00 | 100.00 | 42 | 98.00 | 96.67 | 100.00 |
| Regicor | 4 | 76.54 | 65.34 | 62.71 | 3 | 65.33 | 62.21 | 72.24 |
GN, gene number.
Figure 2ROC curve on RatinvitroH dataset.
Figure 3ROC curve on Nki70 dataset.
Performance between lightGBM with WERFE and without WERFE.
| RatinvitroH | 17 | 77.30 | 81.10 | 47.26 | 31042 | 59.13 | 73.90 | 36.93 |
| Nki70 | 5 | 82.27 | 49.75 | 86.13 | 70 | 63.60 | 31.25 | 80.00 |
| ZQ_188D | 1 | 93.81 | 98.43 | 100.00 | 188 | 96.80 | 61.50 | 98.90 |
| Prostate | 4 | 98.00 | 95.00 | 100.00 | 100 | 89.80 | 88.00 | 91.70 |
| Regicor | 4 | 76.54 | 65.34 | 62.71 | 22 | 59.90 | 64.00 | 55.70 |
GN, gene number.
Comparison with other gene selection algorithms on RatinvitroH.
| WERFE | 17 | 77.30 | 81.10 | 47.26 | 685 | 76.67 | 90.69 | 60.48 |
| FSNM | 60 | 77.50 | 83.65 | 43.52 | 100 | 74.85 | 83.95 | 60.02 |
| Fisher | 20 | 73.39 | 69.60 | 34.02 | 10 | 59.85 | 93.02 | 14.83 |
| ReliefF | 40 | 73.21 | 74.60 | 40.45 | 80 | 62.20 | 97.46 | 8.17 |
GN, gene number.
Comparison with other gene selection algorithms on Nki70.
| WERFE | 5 | 82.27 | 49.75 | 86.13 | 5 | 72.33 | 33.00 | 92.17 |
| FSNM | 63 | 80.85 | 22.93 | 88.06 | 28 | 81.33 | 61.79 | 90.86 |
| Fisher | 35 | 81.46 | 35.33 | 92.94 | 35 | 74.24 | 46.12 | 89.14 |
| ReliefF | 21 | 80.31 | 39.36 | 82.11 | 35 | 75.76 | 50.62 | 87.86 |
GN, gene number.
Gene ranking of Wrapper Embedded Recursive Feature Elimination (WERFE)
| 1: |
| 2: The data set was randomly divided into ten equal parts; |
| 3: Keep one part as a test data; The remaining nine parts are used as training data; |
| 4: |
| 5: Train a model based on training data of |
| 6: Calculate the prediction accuracy of the model using the test data; |
| 7: Obtain the weight of each gene produced from SVM; |
| 8: Remove |
| 9: |
| 10: Obtain the gene subset |
| 11: |
| 12: Train a model based on training data of |
| 13: Calculate the prediction accuracy of the model using the test data; |
| 14: Obtain the importance of each gene produced from RF; |
| 15: Remove |
| 16: |
| 17: Obtain the gene subset |
| 18: Count the votes for all the genes contained in both |
| 19: |
| 20: Rank genes based on votes and obtain |