Literature DB >> 28850898

A filter feature selection method based on the Maximal Information Coefficient and Gram-Schmidt Orthogonalization for biomedical data mining.

Hongqiang Lyu1, Mingxi Wan2, Jiuqiang Han3, Ruiling Liu3, Cheng Wang4.   

Abstract

A filter feature selection technique has been widely used to mine biomedical data. Recently, in the classical filter method minimal-Redundancy-Maximal-Relevance (mRMR), a risk has been revealed that a specific part of the redundancy, called irrelevant redundancy, may be involved in the minimal-redundancy component of this method. Thus, a few attempts to eliminate the irrelevant redundancy by attaching additional procedures to mRMR, such as Kernel Canonical Correlation Analysis based mRMR (KCCAmRMR), have been made. In the present study, a novel filter feature selection method based on the Maximal Information Coefficient (MIC) and Gram-Schmidt Orthogonalization (GSO), named Orthogonal MIC Feature Selection (OMICFS), was proposed to solve this problem. Different from other improved approaches under the max-relevance and min-redundancy criterion, in the proposed method, the MIC is used to quantify the degree of relevance between feature variables and target variable, the GSO is devoted to calculating the orthogonalized variable of a candidate feature with respect to previously selected features, and the max-relevance and min-redundancy can be indirectly optimized by maximizing the MIC relevance between the GSO orthogonalized variable and target. This orthogonalization strategy allows OMICFS to exclude the irrelevant redundancy without any additional procedures. To verify the performance, OMICFS was compared with other filter feature selection methods in terms of both classification accuracy and computational efficiency by conducting classification experiments on two types of biomedical datasets. The results showed that OMICFS outperforms the other methods in most cases. In addition, differences between these methods were analyzed, and the application of OMICFS in the mining of high-dimensional biomedical data was discussed. The Matlab code for the proposed method is available at https://github.com/lhqxinghun/bioinformatics/tree/master/OMICFS/.
Copyright © 2017 Elsevier Ltd. All rights reserved.

Keywords:  Biomedical data mining; Filter feature selection; Gram-Schmidt Orthogonalization (GSO); Maximal Information Coefficient (MIC)

Mesh:

Year:  2017        PMID: 28850898     DOI: 10.1016/j.compbiomed.2017.08.021

Source DB:  PubMed          Journal:  Comput Biol Med        ISSN: 0010-4825            Impact factor:   4.589


  4 in total

1.  Sparse support vector machines with L0 approximation for ultra-high dimensional omics data.

Authors:  Zhenqiu Liu; David Elashoff; Steven Piantadosi
Journal:  Artif Intell Med       Date:  2019-04-30       Impact factor: 5.326

2.  A Neighborhood Rough Sets-Based Attribute Reduction Method Using Lebesgue and Entropy Measures.

Authors:  Lin Sun; Lanying Wang; Jiucheng Xu; Shiguang Zhang
Journal:  Entropy (Basel)       Date:  2019-02-01       Impact factor: 2.524

3.  A Feature Selection Algorithm Integrating Maximum Classification Information and Minimum Interaction Feature Dependency Information.

Authors:  Li Zhang
Journal:  Comput Intell Neurosci       Date:  2021-12-28

4.  HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis.

Authors:  Yajie Meng; Min Jin
Journal:  Front Cell Dev Biol       Date:  2021-06-30
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.