Literature DB >> 33286261

An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection.

Gustavo Estrela1,2, Marco Dimas Gubitoso2, Carlos Eduardo Ferreira2, Junior Barrera2, Marcelo S Reis1.   

Abstract

In Machine Learning, feature selection is an important step in classifier design. It consists of finding a subset of features that is optimum for a given cost function. One possibility to solve feature selection is to organize all possible feature subsets into a Boolean lattice and to exploit the fact that the costs of chains in that lattice describe U-shaped curves. Minimization of such cost function is known as the U-curve problem. Recently, a study proposed U-Curve Search (UCS), an optimal algorithm for that problem, which was successfully used for feature selection. However, despite of the algorithm optimality, the UCS required time in computational assays was exponential on the number of features. Here, we report that such scalability issue arises due to the fact that the U-curve problem is NP-hard. In the sequence, we introduce the Parallel U-Curve Search (PUCS), a new algorithm for the U-curve problem. In PUCS, we present a novel way to partition the search space into smaller Boolean lattices, thus rendering the algorithm highly parallelizable. We also provide computational assays with both synthetic data and Machine Learning datasets, where the PUCS performance was assessed against UCS and other golden standard algorithms in feature selection.

Entities:  

Keywords:  Boolean lattice; Support-Vector Machine; U-curve problem; classifier design; feature selection; information theory; machine learning; mean conditional entropy; supervised learning

Year:  2020        PMID: 33286261      PMCID: PMC7516975          DOI: 10.3390/e22040492

Source DB:  PubMed          Journal:  Entropy (Basel)        ISSN: 1099-4300            Impact factor:   2.524


  5 in total

1.  A meeting with Enrico Fermi.

Authors:  Freeman Dyson
Journal:  Nature       Date:  2004-01-22       Impact factor: 49.962

2.  Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy.

Authors:  Hanchuan Peng; Fuhui Long; Chris Ding
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2005-08       Impact factor: 6.226

3.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology.

Authors:  W H Wolberg; O L Mangasarian
Journal:  Proc Natl Acad Sci U S A       Date:  1990-12       Impact factor: 11.205

4.  Time-course gait analysis of hemiparkinsonian rats following 6-hydroxydopamine lesion.

Authors:  Tsung-Hsun Hsieh; Jia-Jin J Chen; Li-Hsien Chen; Pei-Tzu Chiang; Hsiao-Yu Lee
Journal:  Behav Brain Res       Date:  2011-03-22       Impact factor: 3.332

5.  Data mining in bioinformatics using Weka.

Authors:  Eibe Frank; Mark Hall; Len Trigg; Geoffrey Holmes; Ian H Witten
Journal:  Bioinformatics       Date:  2004-04-08       Impact factor: 6.937

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.