Literature DB >> 12801869

Bagging to improve the accuracy of a clustering procedure.

Sandrine Dudoit1, Jane Fridlyand.   

Abstract

MOTIVATION: The microarray technology is increasingly being applied in biological and medical research to address a wide range of problems such as the classification of tumors. An important statistical question associated with tumor classification is the identification of new tumor classes using gene expression profiles. Essential aspects of this clustering problem include identifying accurate partitions of the tumor samples into clusters and assessing the confidence of cluster assignments for individual samples.
RESULTS: Two new resampling methods, inspired from bagging in prediction, are proposed to improve and assess the accuracy of a given clustering procedure. In these ensemble methods, a partitioning clustering procedure is applied to bootstrap learning sets and the resulting multiple partitions are combined by voting or the creation of a new dissimilarity matrix. As in prediction, the motivation behind bagging is to reduce variability in the partitioning results via averaging. The performances of the new and existing methods were compared using simulated data and gene expression data from two recently published cancer microarray studies. The bagged clustering procedures were in general at least as accurate and often substantially more accurate than a single application of the partitioning clustering procedure. A valuable by-product of bagged clustering are the cluster votes which can be used to assess the confidence of cluster assignments for individual observations. SUPPLEMENTARY INFORMATION: For supplementary information on datasets, analyses, and software, consult http://www.stat.berkeley.edu/~sandrine and http://www.bioconductor.org.

Entities:  

Mesh:

Year:  2003        PMID: 12801869     DOI: 10.1093/bioinformatics/btg038

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  42 in total

1.  wCLUTO: a Web-enabled clustering toolkit.

Authors:  Matthew D Rasmussen; Mukund S Deshpande; George Karypis; James Johnson; John A Crow; Ernest F Retzel
Journal:  Plant Physiol       Date:  2003-10       Impact factor: 8.340

2.  Identification and clustering of event patterns from in vivo multiphoton optical recordings of neuronal ensembles.

Authors:  Ilker Ozden; H Megan Lee; Megan R Sullivan; Samuel S-H Wang
Journal:  J Neurophysiol       Date:  2008-05-21       Impact factor: 2.714

3.  Link-Prediction Enhanced Consensus Clustering for Complex Networks.

Authors:  Matthew Burgess; Eytan Adar; Michael Cafarella
Journal:  PLoS One       Date:  2016-05-20       Impact factor: 3.240

4.  Peeling off the hidden genetic heterogeneities of cancers based on disease-relevant functional modules.

Authors:  Jian-Zhen Xu; Zheng Guo; Min Zhang; Xia Li; Yong-Jin Li; Shao-Qi Rao
Journal:  Mol Med       Date:  2006 Jan-Mar       Impact factor: 6.354

5.  Ensemble Clustering using Semidefinite Programming with Applications.

Authors:  Vikas Singh; Lopamudra Mukherjee; Jiming Peng; Jinhui Xu
Journal:  Mach Learn       Date:  2010-05       Impact factor: 2.940

6.  Knowledge-guided gene ranking by coordinative component analysis.

Authors:  Chen Wang; Jianhua Xuan; Huai Li; Yue Wang; Ming Zhan; Eric P Hoffman; Robert Clarke
Journal:  BMC Bioinformatics       Date:  2010-03-30       Impact factor: 3.169

7.  Merged consensus clustering to assess and improve class discovery with microarray data.

Authors:  T Ian Simpson; J Douglas Armstrong; Andrew P Jarman
Journal:  BMC Bioinformatics       Date:  2010-12-03       Impact factor: 3.169

8.  Neural networks of colored sequence synesthesia.

Authors:  Steffie N Tomson; Manjari Narayan; Genevera I Allen; David M Eagleman
Journal:  J Neurosci       Date:  2013-08-28       Impact factor: 6.167

9.  Machine learning integration for predicting the effect of single amino acid substitutions on protein stability.

Authors:  Ayşegül Ozen; Mehmet Gönen; Ethem Alpaydan; Türkan Haliloğlu
Journal:  BMC Struct Biol       Date:  2009-10-19

10.  MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering.

Authors:  Eun-Youn Kim; Seon-Young Kim; Daniel Ashlock; Dougu Nam
Journal:  BMC Bioinformatics       Date:  2009-08-22       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.