Literature DB >> 14668221

Scoring clustering solutions by their biological relevance.

I Gat-Viks1, R Sharan, R Shamir.   

Abstract

MOTIVATION: A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them.
RESULTS: We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. AVAILABILITY: The software is available from the authors upon request.

Entities:  

Mesh:

Year:  2003        PMID: 14668221     DOI: 10.1093/bioinformatics/btg330

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  13 in total

Review 1.  Unraveling the secret lives of bacteria: use of in vivo expression technology and differential fluorescence induction promoter traps as tools for exploring niche-specific gene expression.

Authors:  Hans Rediers; Paul B Rainey; Jos Vanderleyden; René De Mot
Journal:  Microbiol Mol Biol Rev       Date:  2005-06       Impact factor: 11.056

Review 2.  A ground truth based comparative study on clustering of gene expression data.

Authors:  Yitan Zhu; Zuyi Wang; David J Miller; Robert Clarke; Jianhua Xuan; Eric P Hoffman; Yue Wang
Journal:  Front Biosci       Date:  2008-05-01

3.  GenClust: a genetic algorithm for clustering gene expression data.

Authors:  Vito Di Gesú; Raffaele Giancarlo; Giosué Lo Bosco; Alessandra Raimondi; Davide Scaturro
Journal:  BMC Bioinformatics       Date:  2005-12-07       Impact factor: 3.169

4.  Evaluation of clustering algorithms for gene expression data.

Authors:  Susmita Datta; Somnath Datta
Journal:  BMC Bioinformatics       Date:  2006-12-12       Impact factor: 3.169

5.  Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.

Authors:  Susmita Datta; Somnath Datta
Journal:  BMC Bioinformatics       Date:  2006-08-31       Impact factor: 3.169

6.  DISCLOSE : DISsection of CLusters Obtained by SEries of transcriptome data using functional annotations and putative transcription factor binding sites.

Authors:  Evert-Jan Blom; Sacha A F T van Hijum; Klaas J Hofstede; Remko Silvis; Jos B T M Roerdink; Oscar P Kuipers
Journal:  BMC Bioinformatics       Date:  2008-12-16       Impact factor: 3.169

7.  Recursive cluster elimination (RCE) for classification and feature selection from gene expression data.

Authors:  Malik Yousef; Segun Jung; Louise C Showe; Michael K Showe
Journal:  BMC Bioinformatics       Date:  2007-05-02       Impact factor: 3.169

8.  Evaluation of gene-expression clustering via mutual information distance measure.

Authors:  Ido Priness; Oded Maimon; Irad Ben-Gal
Journal:  BMC Bioinformatics       Date:  2007-03-30       Impact factor: 3.169

9.  A robust measure of correlation between two genes on a microarray.

Authors:  Johanna Hardin; Aya Mitani; Leanne Hicks; Brian VanKoten
Journal:  BMC Bioinformatics       Date:  2007-06-25       Impact factor: 3.169

10.  Comprehensive Identification of Sexual Dimorphism-Associated Differentially Expressed Genes in Two-Way Factorial Designed RNA-Seq Data on Japanese Quail (Coturnix coturnix japonica).

Authors:  Kelsey Caetano-Anolles; Minseok Seo; Sandra Rodriguez-Zas; Jae-Don Oh; Jae Yong Han; Kichoon Lee; Tae Sub Park; Sangsu Shin; Zhang Jiao Jiao; Mrinmoy Ghosh; Dong Kee Jeong; Seoae Cho; Heebal Kim; Ki-Duk Song; Hak-Kyo Lee
Journal:  PLoS One       Date:  2015-09-29       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.