Literature DB >> 35500458

Deep unsupervised feature selection by discarding nuisance and correlated features.

Uri Shaham1, Ofir Lindenbaum2, Jonathan Svirsky3, Yuval Kluger4.   

Abstract

Modern datasets often contain large subsets of correlated features and nuisance features, which are not or loosely related to the main underlying structures of the data. Nuisance features can be identified using the Laplacian score criterion, which evaluates the importance of a given feature via its consistency with the Graph Laplacians' leading eigenvectors. We demonstrate that in the presence of large numbers of nuisance features, the Laplacian must be computed on the subset of selected features rather than on the complete feature set. To do this, we propose a fully differentiable approach for unsupervised feature selection, utilizing the Laplacian score criterion to avoid the selection of nuisance features. We employ an autoencoder architecture to cope with correlated features, trained to reconstruct the data from the subset of selected features. Building on the recently proposed concrete layer that allows controlling for the number of selected features via architectural design, simplifying the optimization process. Experimenting on several real-world datasets, we demonstrate that our proposed approach outperforms similar approaches designed to avoid only correlated or nuisance features, but not both. Several state-of-the-art clustering results are reported. Our code is publically available at https://github.com/jsvir/lscae.
Copyright © 2022 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  Concrete layer; Laplacian score; Unsupervised feature selection

Mesh:

Year:  2022        PMID: 35500458      PMCID: PMC9526895          DOI: 10.1016/j.neunet.2022.04.002

Source DB:  PubMed          Journal:  Neural Netw        ISSN: 0893-6080


  3 in total

1.  Singular value decomposition for genome-wide expression data processing and modeling.

Authors:  O Alter; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  2000-08-29       Impact factor: 11.205

2.  Feature Selection and Kernel Learning for Local Learning-Based Clustering.

Authors:  Hong Zeng; Yiu-ming Cheung
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2010-12-10       Impact factor: 6.226

3.  Novel unsupervised feature filtering of biological data.

Authors:  Roy Varshavsky; Assaf Gottlieb; Michal Linial; David Horn
Journal:  Bioinformatics       Date:  2006-07-15       Impact factor: 6.937

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.