Literature DB >> 29844526

Guiding biomedical clustering with ClustEval.

Christian Wiwie1, Jan Baumbach1,2, Richard Röttger1.   

Abstract

Clustering is a popular technique for discovering groups of similar objects in large datasets. It is nowadays applied in all areas of life sciences, from biomedicine to physics. However, designing high-quality cluster analyses is a tedious and complicated task with manifold choices along the way. As a cluster analysis is often the first step of a succeeding downstream analysis, the clustering must be reliable, reproducible, and of the highest quality. To address these challenges, we recently developed ClustEval, an integrated and extensible platform for the automated and standardized design and execution of complex cluster analyses. It allows researchers to design and carry out cluster analyses involving a large number of clustering methods applied to many, large datasets. ClustEval helps to shed light on all major aspects of cluster analysis, from choosing the right similarity function to using validity indices and data preprocessing protocols. Only this high degree of automation allows the researcher to easily run a clustering task with many different tools, parameters, and settings in order to gain the best possible outcome. In this paper, we guide the user step by step through three fundamentally important and widely applicable use cases: (i) identification of the best clustering method for a new, user-given protein sequence similarity dataset; (ii) evaluation of the performance of a new, user-given clustering method (densityCut) against the state of the art; and (iii) prediction of the best method for a new protein sequence similarity dataset. This protocol guides the user through the most important features of ClustEval and takes ∼4 h to complete.

Mesh:

Year:  2018        PMID: 29844526     DOI: 10.1038/nprot.2018.038

Source DB:  PubMed          Journal:  Nat Protoc        ISSN: 1750-2799            Impact factor:   13.491


  19 in total

1.  Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors:  Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal:  Genome Res       Date:  2003-11       Impact factor: 9.043

2.  Clustering by passing messages between data points.

Authors:  Brendan J Frey; Delbert Dueck
Journal:  Science       Date:  2007-01-11       Impact factor: 47.728

3.  Clustering of Biological Datasets in the Era of Big Data.

Authors:  Richard Röttger
Journal:  J Integr Bioinform       Date:  2016-12-22

4.  Detecting overlapping protein complexes in protein-protein interaction networks.

Authors:  Tamás Nepusz; Haiyuan Yu; Alberto Paccanaro
Journal:  Nat Methods       Date:  2012-03-18       Impact factor: 28.547

5.  Density parameter estimation for finding clusters of homologous proteins--tracing actinobacterial pathogenicity lifestyles.

Authors:  Richard Röttger; Prabhav Kalaghatgi; Peng Sun; Siomar de Castro Soares; Vasco Azevedo; Tobias Wittkop; Jan Baumbach
Journal:  Bioinformatics       Date:  2012-11-09       Impact factor: 6.937

6.  SCOPe: Manual Curation and Artifact Removal in the Structural Classification of Proteins - extended Database.

Authors:  John-Marc Chandonia; Naomi K Fox; Steven E Brenner
Journal:  J Mol Biol       Date:  2016-11-30       Impact factor: 5.469

7.  Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing.

Authors:  Tobias Wittkop; Jan Baumbach; Francisco P Lobo; Sven Rahmann
Journal:  BMC Bioinformatics       Date:  2007-10-17       Impact factor: 3.169

8.  An automated method for finding molecular complexes in large protein interaction networks.

Authors:  Gary D Bader; Christopher W V Hogue
Journal:  BMC Bioinformatics       Date:  2003-01-13       Impact factor: 3.169

9.  SCOPe: Structural Classification of Proteins--extended, integrating SCOP and ASTRAL data and classification of new structures.

Authors:  Naomi K Fox; Steven E Brenner; John-Marc Chandonia
Journal:  Nucleic Acids Res       Date:  2013-12-03       Impact factor: 16.971

10.  Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures.

Authors:  Pratyaksha Wirapati; Christos Sotiriou; Susanne Kunkel; Pierre Farmer; Sylvain Pradervand; Benjamin Haibe-Kains; Christine Desmedt; Michail Ignatiadis; Thierry Sengstag; Frédéric Schütz; Darlene R Goldstein; Martine Piccart; Mauro Delorenzi
Journal:  Breast Cancer Res       Date:  2008-07-28       Impact factor: 6.466

View more
  3 in total

1.  Causal Network Inference for Neural Ensemble Activity.

Authors:  Rong Chen
Journal:  Neuroinformatics       Date:  2021-01-04

2.  Distance-based clustering challenges for unbiased benchmarking studies.

Authors:  Michael C Thrun
Journal:  Sci Rep       Date:  2021-09-23       Impact factor: 4.379

Review 3.  Computational analyses of mechanism of action (MoA): data, methods and integration.

Authors:  Maria-Anna Trapotsi; Layla Hosseini-Gerami; Andreas Bender
Journal:  RSC Chem Biol       Date:  2021-12-22
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.