Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Simple strategies for semi-supervised feature selection.

Literature DB >> 31983804

Simple strategies for semi-supervised feature selection.

Abstract

What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this-how much we can gain from two simple classifier-independent strategies. If we have some binary labelled data and some unlabelled, we could assume the unlabelled data are all positives, or assume them all negatives. These minimalist, seemingly naive, approaches have not previously been studied in depth. However, with theoretical and empirical studies, we show they provide powerful results for feature selection, via hypothesis testing and feature ranking. Combining them with some "soft" prior knowledge of the domain, we derive two novel algorithms (Semi-JMI, Semi-IAMB) that outperform significantly more complex competing methods, showing particularly good performance when the labels are missing-not-at-random. We conclude that simple approaches to this problem can work surprisingly well, and in many situations we can provably recover the exact feature selection dynamics, as if we had labelled the entire dataset.

Entities: Chemical

Keywords: Feature selection; Positive unlabelled; Semi-supervised

Year: 2017 PMID： 31983804 PMCID： PMC6954040 DOI： 10.1007/s10994-017-5648-2

Source DB: PubMed Journal: Mach Learn ISSN： 0885-6125 Impact factor: 2.940

Keyword Cloud
References

7 in total

Simple strategies for semi-supervised feature selection.

1. Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection.

2. Towards Making Unlabeled Data Never Hurt.

3. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy.

4. Semi-supervised learning of class balance under class-prior change by distribution matching.

5. Semisupervised Feature Selection Based on Relevance and Redundancy Criteria.

6. Contrastive Pessimistic Likelihood Estimation for Semi-Supervised Classification.

7. MINT: Mutual Information Based Transductive Feature Selection for Genetic Trait Prediction.