Literature DB >> 16873514

Novel unsupervised feature filtering of biological data.

Roy Varshavsky1, Assaf Gottlieb, Michal Linial, David Horn.   

Abstract

MOTIVATION: Many methods have been developed for selecting small informative feature subsets in large noisy data. However, unsupervised methods are scarce. Examples are using the variance of data collected for each feature, or the projection of the feature on the first principal component. We propose a novel unsupervised criterion, based on SVD-entropy, selecting a feature according to its contribution to the entropy (CE) calculated on a leave-one-out basis. This can be implemented in four ways: simple ranking according to CE values (SR); forward selection by accumulating features according to which set produces highest entropy (FS1); forward selection by accumulating features through the choice of the best CE out of the remaining ones (FS2); backward elimination (BE) of features with the lowest CE.
RESULTS: We apply our methods to different benchmarks. In each case we evaluate the success of clustering the data in the selected feature spaces, by measuring Jaccard scores with respect to known classifications. We demonstrate that feature filtering according to CE outperforms the variance method and gene-shaving. There are cases where the analysis, based on a small set of selected features, outperforms the best score reported when all information was used. Our method calls for an optimal size of the relevant feature set. This turns out to be just a few percents of the number of genes in the two Leukemia datasets that we have analyzed. Moreover, the most favored selected genes turn out to have significant GO enrichment in relevant cellular processes.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16873514     DOI: 10.1093/bioinformatics/btl214

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  18 in total

1.  Classifier assessment and feature selection for recognizing short coding sequences of human genes.

Authors:  Kai Song; Ze Zhang; Tuo-Peng Tong; Fang Wu
Journal:  J Comput Biol       Date:  2012-03       Impact factor: 1.479

2.  Graph-based unsupervised feature selection and multiview clustering for microarray data.

Authors:  Tripti Swarnkar; Pabitra Mitra
Journal:  J Biosci       Date:  2015-10       Impact factor: 1.826

3.  Protein signaling networks from single cell fluctuations and information theory profiling.

Authors:  Young Shik Shin; F Remacle; Rong Fan; Kiwook Hwang; Wei Wei; Habib Ahmad; R D Levine; James R Heath
Journal:  Biophys J       Date:  2011-05-18       Impact factor: 4.033

4.  IMMAN: free software for information theory-based chemometric analysis.

Authors:  Ricardo W Pino Urias; Stephen J Barigye; Yovani Marrero-Ponce; César R García-Jacas; José R Valdes-Martiní; Facundo Perez-Gimenez
Journal:  Mol Divers       Date:  2015-01-26       Impact factor: 2.943

5.  UFFizi: a generic platform for ranking informative features.

Authors:  Assaf Gottlieb; Roy Varshavsky; Michal Linial; David Horn
Journal:  BMC Bioinformatics       Date:  2010-06-03       Impact factor: 3.169

6.  Multi-TGDR: a regularization method for multi-class classification in microarray experiments.

Authors:  Suyan Tian; Mayte Suárez-Fariñas
Journal:  PLoS One       Date:  2013-11-19       Impact factor: 3.240

7.  Surprisal analysis of transcripts expression levels in the presence of noise: a reliable determination of the onset of a tumor phenotype.

Authors:  Ayelet Gross; Raphael D Levine
Journal:  PLoS One       Date:  2013-04-23       Impact factor: 3.240

8.  Index cohesive force analysis reveals that the US market became prone to systemic collapses since 2002.

Authors:  Dror Y Kenett; Yoash Shapira; Asaf Madi; Sharron Bransburg-Zabary; Gitit Gur-Gershgoren; Eshel Ben-Jacob
Journal:  PLoS One       Date:  2011-04-27       Impact factor: 3.240

9.  Integrated morphologic analysis for the identification and characterization of disease subtypes.

Authors:  Lee A D Cooper; Jun Kong; David A Gutman; Fusheng Wang; Jingjing Gao; Christina Appin; Sharath Cholleti; Tony Pan; Ashish Sharma; Lisa Scarpace; Tom Mikkelsen; Tahsin Kurc; Carlos S Moreno; Daniel J Brat; Joel H Saltz
Journal:  J Am Med Inform Assoc       Date:  2012-01-24       Impact factor: 4.497

10.  The projection score--an evaluation criterion for variable subset selection in PCA visualization.

Authors:  Magnus Fontes; Charlotte Soneson
Journal:  BMC Bioinformatics       Date:  2011-07-28       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.