Literature DB >> 20811510

A framework for feature selection in clustering.

Daniela M Witten, Robert Tibshirani.   

Abstract

We consider the problem of clustering observations using a potentially large set of features. One might expect that the true underlying clusters present in the data differ only with respect to a small fraction of the features, and will be missed if one clusters the observations using the full set of features. We propose a novel framework for sparse clustering, in which one clusters the observations using an adaptively chosen subset of the features. The method uses a lasso-type penalty to select the features. We use this framework to develop simple methods for sparse K-means and sparse hierarchical clustering. A single criterion governs both the selection of the features and the resulting clusters. These approaches are demonstrated on simulated data and on genomic data sets.

Entities:  

Year:  2010        PMID: 20811510      PMCID: PMC2930825          DOI: 10.1198/jasa.2010.tm09415

Source DB:  PubMed          Journal:  J Am Stat Assoc        ISSN: 0162-1459            Impact factor:   5.033


  16 in total

1.  Learning the parts of objects by non-negative matrix factorization.

Authors:  D D Lee; H S Seung
Journal:  Nature       Date:  1999-10-21       Impact factor: 49.962

2.  A mixture model-based approach to the clustering of microarray expression data.

Authors:  G J McLachlan; R W Bean; D Peel
Journal:  Bioinformatics       Date:  2002-03       Impact factor: 6.937

3.  Mixture modelling of gene expression data from microarray experiments.

Authors:  Debashis Ghosh; Arul M Chinnaiyan
Journal:  Bioinformatics       Date:  2002-02       Impact factor: 6.937

4.  A haplotype map of the human genome.

Authors: 
Journal:  Nature       Date:  2005-10-27       Impact factor: 49.962

5.  Principal components analysis corrects for stratification in genome-wide association studies.

Authors:  Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal:  Nat Genet       Date:  2006-07-23       Impact factor: 38.330

6.  Variable selection for model-based high-dimensional clustering and its application to microarray data.

Authors:  Sijian Wang; Ji Zhu
Journal:  Biometrics       Date:  2007-10-26       Impact factor: 2.571

7.  Molecular portraits of human breast tumours.

Authors:  C M Perou; T Sørlie; M B Eisen; M van de Rijn; S S Jeffrey; C A Rees; J R Pollack; D T Ross; H Johnsen; L A Akslen; O Fluge; A Pergamenschikov; C Williams; S X Zhu; P E Lønning; A L Børresen-Dale; P O Brown; D Botstein
Journal:  Nature       Date:  2000-08-17       Impact factor: 49.962

8.  Cluster analysis and display of genome-wide expression patterns.

Authors:  M B Eisen; P T Spellman; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1998-12-08       Impact factor: 11.205

9.  Complementary hierarchical clustering.

Authors:  Gen Nowak; Robert Tibshirani
Journal:  Biostatistics       Date:  2007-12-18       Impact factor: 5.899

10.  A second generation human haplotype map of over 3.1 million SNPs.

Authors:  Kelly A Frazer; Dennis G Ballinger; David R Cox; David A Hinds; Laura L Stuve; Richard A Gibbs; John W Belmont; Andrew Boudreau; Paul Hardenbol; Suzanne M Leal; Shiran Pasternak; David A Wheeler; Thomas D Willis; Fuli Yu; Huanming Yang; Changqing Zeng; Yang Gao; Haoran Hu; Weitao Hu; Chaohua Li; Wei Lin; Siqi Liu; Hao Pan; Xiaoli Tang; Jian Wang; Wei Wang; Jun Yu; Bo Zhang; Qingrun Zhang; Hongbin Zhao; Hui Zhao; Jun Zhou; Stacey B Gabriel; Rachel Barry; Brendan Blumenstiel; Amy Camargo; Matthew Defelice; Maura Faggart; Mary Goyette; Supriya Gupta; Jamie Moore; Huy Nguyen; Robert C Onofrio; Melissa Parkin; Jessica Roy; Erich Stahl; Ellen Winchester; Liuda Ziaugra; David Altshuler; Yan Shen; Zhijian Yao; Wei Huang; Xun Chu; Yungang He; Li Jin; Yangfan Liu; Yayun Shen; Weiwei Sun; Haifeng Wang; Yi Wang; Ying Wang; Xiaoyan Xiong; Liang Xu; Mary M Y Waye; Stephen K W Tsui; Hong Xue; J Tze-Fei Wong; Luana M Galver; Jian-Bing Fan; Kevin Gunderson; Sarah S Murray; Arnold R Oliphant; Mark S Chee; Alexandre Montpetit; Fanny Chagnon; Vincent Ferretti; Martin Leboeuf; Jean-François Olivier; Michael S Phillips; Stéphanie Roumy; Clémentine Sallée; Andrei Verner; Thomas J Hudson; Pui-Yan Kwok; Dongmei Cai; Daniel C Koboldt; Raymond D Miller; Ludmila Pawlikowska; Patricia Taillon-Miller; Ming Xiao; Lap-Chee Tsui; William Mak; You Qiang Song; Paul K H Tam; Yusuke Nakamura; Takahisa Kawaguchi; Takuya Kitamoto; Takashi Morizono; Atsushi Nagashima; Yozo Ohnishi; Akihiro Sekine; Toshihiro Tanaka; Tatsuhiko Tsunoda; Panos Deloukas; Christine P Bird; Marcos Delgado; Emmanouil T Dermitzakis; Rhian Gwilliam; Sarah Hunt; Jonathan Morrison; Don Powell; Barbara E Stranger; Pamela Whittaker; David R Bentley; Mark J Daly; Paul I W de Bakker; Jeff Barrett; Yves R Chretien; Julian Maller; Steve McCarroll; Nick Patterson; Itsik Pe'er; Alkes Price; Shaun Purcell; Daniel J Richter; Pardis Sabeti; Richa Saxena; Stephen F Schaffner; Pak C Sham; Patrick Varilly; David Altshuler; Lincoln D Stein; Lalitha Krishnan; Albert Vernon Smith; Marcela K Tello-Ruiz; Gudmundur A Thorisson; Aravinda Chakravarti; Peter E Chen; David J Cutler; Carl S Kashuk; Shin Lin; Gonçalo R Abecasis; Weihua Guan; Yun Li; Heather M Munro; Zhaohui Steve Qin; Daryl J Thomas; Gilean McVean; Adam Auton; Leonardo Bottolo; Niall Cardin; Susana Eyheramendy; Colin Freeman; Jonathan Marchini; Simon Myers; Chris Spencer; Matthew Stephens; Peter Donnelly; Lon R Cardon; Geraldine Clarke; David M Evans; Andrew P Morris; Bruce S Weir; Tatsuhiko Tsunoda; James C Mullikin; Stephen T Sherry; Michael Feolo; Andrew Skol; Houcan Zhang; Changqing Zeng; Hui Zhao; Ichiro Matsuda; Yoshimitsu Fukushima; Darryl R Macer; Eiko Suda; Charles N Rotimi; Clement A Adebamowo; Ike Ajayi; Toyin Aniagwu; Patricia A Marshall; Chibuzor Nkwodimmah; Charmaine D M Royal; Mark F Leppert; Missy Dixon; Andy Peiffer; Renzong Qiu; Alastair Kent; Kazuto Kato; Norio Niikawa; Isaac F Adewole; Bartha M Knoppers; Morris W Foster; Ellen Wright Clayton; Jessica Watkin; Richard A Gibbs; John W Belmont; Donna Muzny; Lynne Nazareth; Erica Sodergren; George M Weinstock; David A Wheeler; Imtaz Yakub; Stacey B Gabriel; Robert C Onofrio; Daniel J Richter; Liuda Ziaugra; Bruce W Birren; Mark J Daly; David Altshuler; Richard K Wilson; Lucinda L Fulton; Jane Rogers; John Burton; Nigel P Carter; Christopher M Clee; Mark Griffiths; Matthew C Jones; Kirsten McLay; Robert W Plumb; Mark T Ross; Sarah K Sims; David L Willey; Zhu Chen; Hua Han; Le Kang; Martin Godbout; John C Wallenburg; Paul L'Archevêque; Guy Bellemare; Koji Saeki; Hongguang Wang; Daochang An; Hongbo Fu; Qing Li; Zhen Wang; Renwu Wang; Arthur L Holden; Lisa D Brooks; Jean E McEwen; Mark S Guyer; Vivian Ota Wang; Jane L Peterson; Michael Shi; Jack Spiegel; Lawrence M Sung; Lynn F Zacharia; Francis S Collins; Karen Kennedy; Ruth Jamieson; John Stewart
Journal:  Nature       Date:  2007-10-18       Impact factor: 49.962

View more
  104 in total

1.  Sparse Biclustering of Transposable Data.

Authors:  Kean Ming Tan; Daniela M Witten
Journal:  J Comput Graph Stat       Date:  2014       Impact factor: 2.302

2.  Biclustering with heterogeneous variance.

Authors:  Guanhua Chen; Patrick F Sullivan; Michael R Kosorok
Journal:  Proc Natl Acad Sci U S A       Date:  2013-07-08       Impact factor: 11.205

3.  Biclustering via sparse clustering.

Authors:  Erika S Helgeson; Qian Liu; Guanhua Chen; Michael R Kosorok; Eric Bair
Journal:  Biometrics       Date:  2019-10-14       Impact factor: 2.571

4.  Interactions between the tumor and the blood systemic response of breast cancer patients.

Authors:  Vanessa Dumeaux; Bjørn Fjukstad; Hans E Fjosne; Jan-Ole Frantzen; Marit Muri Holmen; Enno Rodegerdts; Ellen Schlichting; Anne-Lise Børresen-Dale; Lars Ailo Bongo; Eiliv Lund; Michael Hallett
Journal:  PLoS Comput Biol       Date:  2017-09-28       Impact factor: 4.475

5.  Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery.

Authors:  Zhiguang Huo; George Tseng
Journal:  Ann Appl Stat       Date:  2017-07-20       Impact factor: 2.083

6.  On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling.

Authors:  J Fernando Vera; Rodrigo Macías
Journal:  Psychometrika       Date:  2021-05-19       Impact factor: 2.500

7.  Clustering of mRNA-Seq data based on alternative splicing patterns.

Authors:  Marla Johnson; Elizabeth Purdom
Journal:  Biostatistics       Date:  2017-04-01       Impact factor: 5.899

8.  clustvarsel: A Package Implementing Variable Selection for Gaussian Model-Based Clustering in R.

Authors:  Luca Scrucca; Adrian E Raftery
Journal:  J Stat Softw       Date:  2018-04-17       Impact factor: 6.440

9.  Translational Meta-analytical Methods to Localize the Regulatory Patterns of Neurological Disorders in the Human Brain.

Authors:  Vanessa Sochat; Maude David; Dennis P Wall
Journal:  AMIA Annu Symp Proc       Date:  2015-11-05

10.  Sparse cluster analysis of large-scale discrete variables with application to single nucleotide polymorphism data.

Authors:  Baolin Wu
Journal:  J Appl Stat       Date:  2012-11-21       Impact factor: 1.404

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.