Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Improving the quality of protein similarity network clustering algorithms using the network edge weight distribution.

Literature DB >> 21118823

Improving the quality of protein similarity network clustering algorithms using the network edge weight distribution.

Leonard Apeltsin¹, John H Morris, Patricia C Babbitt, Thomas E Ferrin.

Abstract

MOTIVATION: Clustering protein sequence data into functionally specific families is a difficult but important problem in biological research. One useful approach for tackling this problem involves representing the sequence dataset as a protein similarity network, and afterwards clustering the network using advanced graph analysis techniques. Although a multitude of such network clustering algorithms have been developed over the past few years, comparing algorithms is often difficult because performance is affected by the specifics of network construction. We investigate an important aspect of network construction used in analyzing protein superfamilies and present a heuristic approach for improving the performance of several algorithms.
RESULTS: We analyzed how the performance of network clustering algorithms relates to thresholding the network prior to clustering. Our results, over four different datasets, show how for each input dataset there exists an optimal threshold range over which an algorithm generates its most accurate clustering output. Our results further show how the optimal threshold range correlates with the shape of the edge weight distribution for the input similarity network. We used this correlation to develop an automated threshold selection heuristic in order to most optimally filter a similarity network prior to clustering. This heuristic allows researchers to process their protein datasets with runtime efficient network clustering algorithms without sacrificing the clustering accuracy of the final results. AVAILABILITY: Python code for implementing the automated threshold selection heuristic, together with the datasets used in our analysis, are available at http://www.rbvi.ucsf.edu/Research/cytoscape/threshold_scripts.zip.

Entities: Gene

Mesh：

Substances：
Proteins

Year: 2010 PMID： 21118823 PMCID： PMC3031030 DOI： 10.1093/bioinformatics/btq655

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

26 in total

Review 1. Issues in predicting protein function from sequence.

Authors: C P Ponting
Journal: Brief Bioinform Date: 2001-03 Impact factor: 11.622

2. An efficient algorithm for large-scale detection of protein families.

Authors: A J Enright; S Van Dongen; C A Ouzounis
Journal: Nucleic Acids Res Date: 2002-04-01 Impact factor: 16.971

3. BioLayout--an automatic graph layout algorithm for similarity visualization.

Authors: A J Enright; C A Ouzounis
Journal: Bioinformatics Date: 2001-09 Impact factor: 6.937

4. Melamine deaminase and atrazine chlorohydrolase: 98 percent identical but functionally different.

Authors: J L Seffernick; M L de Souza; M J Sadowsky; L P Wackett
Journal: J Bacteriol Date: 2001-04 Impact factor: 3.490

5. UniProt: the Universal Protein knowledgebase.

Authors: Rolf Apweiler; Amos Bairoch; Cathy H Wu; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Darren A Natale; Claire O'Donovan; Nicole Redaschi; Lai-Su L Yeh
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

6. Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors: Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal: Genome Res Date: 2003-11 Impact factor: 9.043

Review 7. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971

Review 8. Divergent evolution in the enolase superfamily: the interplay of mechanism and specificity.

Authors: John A Gerlt; Patricia C Babbitt; Ivan Rayment
Journal: Arch Biochem Biophys Date: 2005-01-01 Impact factor: 4.013

Review 9. Evolution of protein kinase signaling from yeast to man.

Authors: Gerard Manning; Gregory D Plowman; Tony Hunter; Sucha Sudarsanam
Journal: Trends Biochem Sci Date: 2002-10 Impact factor: 13.807

10. A hybrid clustering approach to recognition of protein families in 114 microbial genomes.

Authors: Timothy J Harlow; J Peter Gogarten; Mark A Ragan
Journal: BMC Bioinformatics Date: 2004-04-29 Impact factor: 3.169

15 in total

1. Resolving the evolutionary relationships of molluscs with phylogenomic tools.

Authors: Stephen A Smith; Nerida G Wilson; Freya E Goetz; Caitlin Feehery; Sónia C S Andrade; Greg W Rouse; Gonzalo Giribet; Casey W Dunn
Journal: Nature Date: 2011-10-26 Impact factor: 49.962

2. AGeNNT: annotation of enzyme families by means of refined neighborhood networks.

Authors: Florian Kandlinger; Maximilian G Plach; Rainer Merkl
Journal: BMC Bioinformatics Date: 2017-05-25 Impact factor: 3.169

3. Structural and mechanistic characterization of L-histidinol phosphate phosphatase from the polymerase and histidinol phosphatase family of proteins.

Authors: Swapnil V Ghodge; Alexander A Fedorov; Elena V Fedorov; Brandan Hillerich; Ronald Seidel; Steven C Almo; Frank M Raushel
Journal: Biochemistry Date: 2013-01-30 Impact factor: 3.162

4. Clustering evolving proteins into homologous families.

Authors: Cheong Xin Chan; Maisarah Mahbob; Mark A Ragan
Journal: BMC Bioinformatics Date: 2013-04-08 Impact factor: 3.169

5. clusterMaker: a multi-algorithm clustering plugin for Cytoscape.

Authors: John H Morris; Leonard Apeltsin; Aaron M Newman; Jan Baumbach; Tobias Wittkop; Gang Su; Gary D Bader; Thomas E Ferrin
Journal: BMC Bioinformatics Date: 2011-11-09 Impact factor: 3.307

6. Assessing the functional coherence of modules found in multiple-evidence networks from Arabidopsis.

Authors: Artem Lysenko; Michael Defoin-Platel; Keywan Hassani-Pak; Jan Taubert; Charlie Hodgman; Christopher J Rawlings; Mansoor Saqi
Journal: BMC Bioinformatics Date: 2011-05-25 Impact factor: 3.169

7. Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity.

Authors: Janelle B Leuthaeuser; Stacy T Knutson; Kiran Kumar; Patricia C Babbitt; Jacquelyn S Fetrow
Journal: Protein Sci Date: 2015-08-18 Impact factor: 6.725