Literature DB >> 30860571

MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy-defined energy.

Ran Su1, Xinyi Liu2, Leyi Wei2.   

Abstract

Recursive feature elimination (RFE), as one of the most popular feature selection algorithms, has been extensively applied to bioinformatics. During the training, a group of candidate subsets are generated by iteratively eliminating the least important features from the original features. However, how to determine the optimal subset from them still remains ambiguous. Among most current studies, either overall accuracy or subset size (SS) is used to select the most predictive features. Using which one or both and how they affect the prediction performance are still open questions. In this study, we proposed MinE-RFE, a novel RFE-based feature selection approach by sufficiently considering the effect of both factors. Subset decision problem was reflected into subset-accuracy space and became an energy-minimization problem. We also provided a mathematical description of the relationship between the overall accuracy and SS using Gaussian Mixture Models together with spline fitting. Besides, we comprehensively reviewed a variety of state-of-the-art applications in bioinformatics using RFE. We compared their approaches of deciding the final subset from all the candidate subsets with MinE-RFE on diverse bioinformatics data sets. Additionally, we also compared MinE-RFE with some well-used feature selection algorithms. The comparative results demonstrate that the proposed approach exhibits the best performance among all the approaches. To facilitate the use of MinE-RFE, we further established a user-friendly web server with the implementation of the proposed approach, which is accessible at http://qgking.wicp.net/MinE/. We expect this web server will be a useful tool for research community.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Keywords:  Gaussian Mixture Model; bioinformatics; recursive feature elimination; subset-accuracy space optimization

Year:  2020        PMID: 30860571     DOI: 10.1093/bib/bbz021

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  6 in total

1.  iDNA-MT: Identification DNA Modification Sites in Multiple Species by Using Multi-Task Learning Based a Neural Network Tool.

Authors:  Xiao Yang; Xiucai Ye; Xuehong Li; Lesong Wei
Journal:  Front Genet       Date:  2021-03-31       Impact factor: 4.599

2.  A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD.

Authors:  Zhiyu Tao; Yanjuan Li; Zhixia Teng; Yuming Zhao
Journal:  Comput Math Methods Med       Date:  2020-10-19       Impact factor: 2.238

Review 3.  Application of Multilayer Network Models in Bioinformatics.

Authors:  Yuanyuan Lv; Shan Huang; Tianjiao Zhang; Bo Gao
Journal:  Front Genet       Date:  2021-03-31       Impact factor: 4.599

4.  SNAREs-SAP: SNARE Proteins Identification With PSSM Profiles.

Authors:  Zixiao Zhang; Yue Gong; Bo Gao; Hongfei Li; Wentao Gao; Yuming Zhao; Benzhi Dong
Journal:  Front Genet       Date:  2021-12-20       Impact factor: 4.599

5.  4mCPred-MTL: Accurate Identification of DNA 4mC Sites in Multiple Species Using Multi-Task Deep Learning Based on Multi-Head Attention Mechanism.

Authors:  Rao Zeng; Song Cheng; Minghong Liao
Journal:  Front Cell Dev Biol       Date:  2021-05-10

6.  WERFE: A Gene Selection Algorithm Based on Recursive Feature Elimination and Ensemble Strategy.

Authors:  Qi Chen; Zhaopeng Meng; Ran Su
Journal:  Front Bioeng Biotechnol       Date:  2020-05-28
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.