Literature DB >> 26389570

Comparing the performance of biomedical clustering methods.

Christian Wiwie1, Jan Baumbach1,2,3, Richard Röttger1.   

Abstract

Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future. This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art.

Mesh:

Year:  2015        PMID: 26389570     DOI: 10.1038/nmeth.3583

Source DB:  PubMed          Journal:  Nat Methods        ISSN: 1548-7091            Impact factor:   28.547


  27 in total

1.  MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison.

Authors:  Angel R Ortiz; Charlie E M Strauss; Osvaldo Olmea
Journal:  Protein Sci       Date:  2002-11       Impact factor: 6.725

Review 2.  How does gene expression clustering work?

Authors:  Patrik D'haeseleer
Journal:  Nat Biotechnol       Date:  2005-12       Impact factor: 54.908

3.  Clustering by passing messages between data points.

Authors:  Brendan J Frey; Delbert Dueck
Journal:  Science       Date:  2007-01-11       Impact factor: 47.728

4.  Detecting overlapping protein complexes in protein-protein interaction networks.

Authors:  Tamás Nepusz; Haiyuan Yu; Alberto Paccanaro
Journal:  Nat Methods       Date:  2012-03-18       Impact factor: 28.547

5.  Density parameter estimation for finding clusters of homologous proteins--tracing actinobacterial pathogenicity lifestyles.

Authors:  Richard Röttger; Prabhav Kalaghatgi; Peng Sun; Siomar de Castro Soares; Vasco Azevedo; Tobias Wittkop; Jan Baumbach
Journal:  Bioinformatics       Date:  2012-11-09       Impact factor: 6.937

6.  Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing.

Authors:  Tobias Wittkop; Jan Baumbach; Francisco P Lobo; Sven Rahmann
Journal:  BMC Bioinformatics       Date:  2007-10-17       Impact factor: 3.169

7.  An automated method for finding molecular complexes in large protein interaction networks.

Authors:  Gary D Bader; Christopher W V Hogue
Journal:  BMC Bioinformatics       Date:  2003-01-13       Impact factor: 3.169

Review 8.  Computational cluster validation in post-genomic data analysis.

Authors:  Julia Handl; Joshua Knowles; Douglas B Kell
Journal:  Bioinformatics       Date:  2005-05-24       Impact factor: 6.937

9.  A gold standard set of mechanistically diverse enzyme superfamilies.

Authors:  Shoshana D Brown; John A Gerlt; Jennifer L Seffernick; Patricia C Babbitt
Journal:  Genome Biol       Date:  2006-01-31       Impact factor: 13.583

10.  Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures.

Authors:  Pratyaksha Wirapati; Christos Sotiriou; Susanne Kunkel; Pierre Farmer; Sylvain Pradervand; Benjamin Haibe-Kains; Christine Desmedt; Michail Ignatiadis; Thierry Sengstag; Frédéric Schütz; Darlene R Goldstein; Martine Piccart; Mauro Delorenzi
Journal:  Breast Cancer Res       Date:  2008-07-28       Impact factor: 6.466

View more
  57 in total

1.  Health and Social-Physical Environment Profiles Among Older Adults Living Alone: Associations With Depressive Symptoms.

Authors:  Sojung Park; Jacqui Smith; Ruth E Dunkle; Berit Ingersoll-Dayton; Toni C Antonucci
Journal:  J Gerontol B Psychol Sci Soc Sci       Date:  2019-04-12       Impact factor: 4.077

Review 2.  Using Large Datasets to Understand Nanotechnology.

Authors:  Kalina Paunovska; David Loughrey; Cory D Sago; Robert Langer; James E Dahlman
Journal:  Adv Mater       Date:  2019-08-20       Impact factor: 30.849

3.  Integrative classification of human coding and noncoding genes through RNA metabolism profiles.

Authors:  Neelanjan Mukherjee; Lorenzo Calviello; Antje Hirsekorn; Stefano de Pretis; Mattia Pelizzola; Uwe Ohler
Journal:  Nat Struct Mol Biol       Date:  2016-11-21       Impact factor: 15.369

4.  Guiding biomedical clustering with ClustEval.

Authors:  Christian Wiwie; Jan Baumbach; Richard Röttger
Journal:  Nat Protoc       Date:  2018-05-24       Impact factor: 13.491

5.  Assisted gene expression-based clustering with AWNCut.

Authors:  Yang Li; Ruofan Bie; Sebastian J Teran Hidalgo; Yichen Qin; Mengyun Wu; Shuangge Ma
Journal:  Stat Med       Date:  2018-08-09       Impact factor: 2.373

6.  Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data.

Authors:  Runpu Chen; Le Yang; Steve Goodison; Yijun Sun
Journal:  Bioinformatics       Date:  2020-03-01       Impact factor: 6.937

7.  Obese-Inflammatory Phenotypes in Heart Failure With Preserved Ejection Fraction.

Authors:  Michael S Sabbah; Ahmed U Fayyaz; Simon de Denus; G Michael Felker; Barry A Borlaug; Surendra Dasari; Rickey E Carter; Margaret M Redfield
Journal:  Circ Heart Fail       Date:  2020-07-29       Impact factor: 8.790

8.  An unsupervised machine learning method for discovering patient clusters based on genetic signatures.

Authors:  Christian Lopez; Scott Tucker; Tarik Salameh; Conrad Tucker
Journal:  J Biomed Inform       Date:  2018-07-29       Impact factor: 6.317

9.  Overlapping clustering of gene expression data using penalized weighted normalized cut.

Authors:  Sebastian J Teran Hidalgo; Tingyu Zhu; Mengyun Wu; Shuangge Ma
Journal:  Genet Epidemiol       Date:  2018-10-09       Impact factor: 2.135

10.  Accounting for tumor purity improves cancer subtype classification from DNA methylation data.

Authors:  Weiwei Zhang; Hao Feng; Hao Wu; Xiaoqi Zheng
Journal:  Bioinformatics       Date:  2017-09-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.