Literature DB >> 31983804

Simple strategies for semi-supervised feature selection.

Konstantinos Sechidis1, Gavin Brown1.   

Abstract

What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this-how much we can gain from two simple classifier-independent strategies. If we have some binary labelled data and some unlabelled, we could assume the unlabelled data are all positives, or assume them all negatives. These minimalist, seemingly naive, approaches have not previously been studied in depth. However, with theoretical and empirical studies, we show they provide powerful results for feature selection, via hypothesis testing and feature ranking. Combining them with some "soft" prior knowledge of the domain, we derive two novel algorithms (Semi-JMI, Semi-IAMB) that outperform significantly more complex competing methods, showing particularly good performance when the labels are missing-not-at-random. We conclude that simple approaches to this problem can work surprisingly well, and in many situations we can provably recover the exact feature selection dynamics, as if we had labelled the entire dataset.
© The Author(s) 2017.

Entities:  

Keywords:  Feature selection; Positive unlabelled; Semi-supervised

Year:  2017        PMID: 31983804      PMCID: PMC6954040          DOI: 10.1007/s10994-017-5648-2

Source DB:  PubMed          Journal:  Mach Learn        ISSN: 0885-6125            Impact factor:   2.940


  7 in total

1.  Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection.

Authors:  Jun Chin Ang; Andri Mirzal; Habibollah Haron; Haza Nuzly Abdull Hamed
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2015-09-14       Impact factor: 3.710

2.  Towards Making Unlabeled Data Never Hurt.

Authors:  Yu-Feng Li; Zhi-Hua Zhou
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2015-01       Impact factor: 6.226

3.  Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy.

Authors:  Hanchuan Peng; Fuhui Long; Chris Ding
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2005-08       Impact factor: 6.226

4.  Semi-supervised learning of class balance under class-prior change by distribution matching.

Authors:  Marthinus Christoffel du Plessis; Masashi Sugiyama
Journal:  Neural Netw       Date:  2013-11-18

5.  Semisupervised Feature Selection Based on Relevance and Redundancy Criteria.

Authors:  Jin Xu; Bo Tang; Haibo He; Hong Man
Journal:  IEEE Trans Neural Netw Learn Syst       Date:  2016-05-20       Impact factor: 10.451

6.  Contrastive Pessimistic Likelihood Estimation for Semi-Supervised Classification.

Authors:  Marco Loog
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2016-03       Impact factor: 6.226

7.  MINT: Mutual Information Based Transductive Feature Selection for Genetic Trait Prediction.

Authors:  Dan He; Irina Rish; David Haws; Laxmi Parida
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2016 May-Jun       Impact factor: 3.710

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.