Literature DB >> 21118823

Improving the quality of protein similarity network clustering algorithms using the network edge weight distribution.

Leonard Apeltsin1, John H Morris, Patricia C Babbitt, Thomas E Ferrin.   

Abstract

MOTIVATION: Clustering protein sequence data into functionally specific families is a difficult but important problem in biological research. One useful approach for tackling this problem involves representing the sequence dataset as a protein similarity network, and afterwards clustering the network using advanced graph analysis techniques. Although a multitude of such network clustering algorithms have been developed over the past few years, comparing algorithms is often difficult because performance is affected by the specifics of network construction. We investigate an important aspect of network construction used in analyzing protein superfamilies and present a heuristic approach for improving the performance of several algorithms.
RESULTS: We analyzed how the performance of network clustering algorithms relates to thresholding the network prior to clustering. Our results, over four different datasets, show how for each input dataset there exists an optimal threshold range over which an algorithm generates its most accurate clustering output. Our results further show how the optimal threshold range correlates with the shape of the edge weight distribution for the input similarity network. We used this correlation to develop an automated threshold selection heuristic in order to most optimally filter a similarity network prior to clustering. This heuristic allows researchers to process their protein datasets with runtime efficient network clustering algorithms without sacrificing the clustering accuracy of the final results. AVAILABILITY: Python code for implementing the automated threshold selection heuristic, together with the datasets used in our analysis, are available at http://www.rbvi.ucsf.edu/Research/cytoscape/threshold_scripts.zip.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 21118823      PMCID: PMC3031030          DOI: 10.1093/bioinformatics/btq655

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  26 in total

Review 1.  Issues in predicting protein function from sequence.

Authors:  C P Ponting
Journal:  Brief Bioinform       Date:  2001-03       Impact factor: 11.622

2.  An efficient algorithm for large-scale detection of protein families.

Authors:  A J Enright; S Van Dongen; C A Ouzounis
Journal:  Nucleic Acids Res       Date:  2002-04-01       Impact factor: 16.971

3.  BioLayout--an automatic graph layout algorithm for similarity visualization.

Authors:  A J Enright; C A Ouzounis
Journal:  Bioinformatics       Date:  2001-09       Impact factor: 6.937

4.  Melamine deaminase and atrazine chlorohydrolase: 98 percent identical but functionally different.

Authors:  J L Seffernick; M L de Souza; M J Sadowsky; L P Wackett
Journal:  J Bacteriol       Date:  2001-04       Impact factor: 3.490

5.  UniProt: the Universal Protein knowledgebase.

Authors:  Rolf Apweiler; Amos Bairoch; Cathy H Wu; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Darren A Natale; Claire O'Donovan; Nicole Redaschi; Lai-Su L Yeh
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

6.  Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors:  Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal:  Genome Res       Date:  2003-11       Impact factor: 9.043

Review 7.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

Review 8.  Divergent evolution in the enolase superfamily: the interplay of mechanism and specificity.

Authors:  John A Gerlt; Patricia C Babbitt; Ivan Rayment
Journal:  Arch Biochem Biophys       Date:  2005-01-01       Impact factor: 4.013

Review 9.  Evolution of protein kinase signaling from yeast to man.

Authors:  Gerard Manning; Gregory D Plowman; Tony Hunter; Sucha Sudarsanam
Journal:  Trends Biochem Sci       Date:  2002-10       Impact factor: 13.807

10.  A hybrid clustering approach to recognition of protein families in 114 microbial genomes.

Authors:  Timothy J Harlow; J Peter Gogarten; Mark A Ragan
Journal:  BMC Bioinformatics       Date:  2004-04-29       Impact factor: 3.169

View more
  15 in total

1.  Resolving the evolutionary relationships of molluscs with phylogenomic tools.

Authors:  Stephen A Smith; Nerida G Wilson; Freya E Goetz; Caitlin Feehery; Sónia C S Andrade; Greg W Rouse; Gonzalo Giribet; Casey W Dunn
Journal:  Nature       Date:  2011-10-26       Impact factor: 49.962

2.  AGeNNT: annotation of enzyme families by means of refined neighborhood networks.

Authors:  Florian Kandlinger; Maximilian G Plach; Rainer Merkl
Journal:  BMC Bioinformatics       Date:  2017-05-25       Impact factor: 3.169

3.  Structural and mechanistic characterization of L-histidinol phosphate phosphatase from the polymerase and histidinol phosphatase family of proteins.

Authors:  Swapnil V Ghodge; Alexander A Fedorov; Elena V Fedorov; Brandan Hillerich; Ronald Seidel; Steven C Almo; Frank M Raushel
Journal:  Biochemistry       Date:  2013-01-30       Impact factor: 3.162

4.  Clustering evolving proteins into homologous families.

Authors:  Cheong Xin Chan; Maisarah Mahbob; Mark A Ragan
Journal:  BMC Bioinformatics       Date:  2013-04-08       Impact factor: 3.169

5.  clusterMaker: a multi-algorithm clustering plugin for Cytoscape.

Authors:  John H Morris; Leonard Apeltsin; Aaron M Newman; Jan Baumbach; Tobias Wittkop; Gang Su; Gary D Bader; Thomas E Ferrin
Journal:  BMC Bioinformatics       Date:  2011-11-09       Impact factor: 3.307

6.  Assessing the functional coherence of modules found in multiple-evidence networks from Arabidopsis.

Authors:  Artem Lysenko; Michael Defoin-Platel; Keywan Hassani-Pak; Jan Taubert; Charlie Hodgman; Christopher J Rawlings; Mansoor Saqi
Journal:  BMC Bioinformatics       Date:  2011-05-25       Impact factor: 3.169

7.  Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity.

Authors:  Janelle B Leuthaeuser; Stacy T Knutson; Kiran Kumar; Patricia C Babbitt; Jacquelyn S Fetrow
Journal:  Protein Sci       Date:  2015-08-18       Impact factor: 6.725

8.  Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm.

Authors:  Theodore R Gibbons; Stephen M Mount; Endymion D Cooper; Charles F Delwiche
Journal:  BMC Bioinformatics       Date:  2015-07-10       Impact factor: 3.169

9.  A new computational approach redefines the subtelomeric vir superfamily of Plasmodium vivax.

Authors:  Francisco Javier Lopez; Maria Bernabeu; Carmen Fernandez-Becerra; Hernando A del Portillo
Journal:  BMC Genomics       Date:  2013-01-16       Impact factor: 3.969

10.  A network approach to analyzing highly recombinant malaria parasite genes.

Authors:  Daniel B Larremore; Aaron Clauset; Caroline O Buckee
Journal:  PLoS Comput Biol       Date:  2013-10-10       Impact factor: 4.475

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.