Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Clustering protein sequences--structure prediction by transitive homology.

Literature DB >> 11673238

Clustering protein sequences--structure prediction by transitive homology.

E Bolten¹, A Schliep, S Schneckener, D Schomburg, R Schrader.

Abstract

MOTIVATION: It is widely believed that for two proteins Aand Ba sequence identity above some threshold implies structural similarity due to a common evolutionary ancestor. Since this is only a sufficient, but not a necessary condition for structural similarity, the question remains what other criteria can be used to identify remote homologues. Transitivity refers to the concept of deducing a structural similarity between proteins A and C from the existence of a third protein B, such that A and B as well as B and C are homologues, as ascertained if the sequence identity between A and B as well as that between B and C is above the aforementioned threshold. It is not fully understood if transitivity always holds and whether transitivity can be extended ad infinitum.
RESULTS: We developed a graph-based clustering approach, where transitivity plays a crucial role. We determined all pair-wise similarities for the sequences in the SwissProt database using the Smith-Waterman local alignment algorithm. This data was transformed into a directed graph, where protein sequences constitute vertices. A directed edge was drawn from vertex A to vertex B if the sequences A and B showed similarity, scaled with respect to the self-similarity of A, above a fixed threshold. Transitivity was important in the clustering process, as intermediate sequences were used, limited though by the requirement of having directed paths in both directions between proteins linked over such sequences. The length dependency-implied by the self-similarity-of the scaling of the alignment scores appears to be an effective criterion to avoid clustering errors due to multi-domain proteins. To deal with the resulting large graphs we have developed an efficient library. Methods include the novel graph-based clustering algorithm capable of handling multi-domain proteins and cluster comparison algorithms. Structural Classification of Proteins (SCOP) was used as an evaluation data set for our method, yielding a 24% improvement over pair-wise comparisons in terms of detecting remote homologues. AVAILABILITY: The software is available to academic users on request from the authors. CONTACT: e.bolten@science-factory.com; schliep@zpr.uni-koeln.de; s.schneckener@science-factory.com; d.schomburg@uni-koeln.de; schrader@zpr.uni-koeln.de. SUPPLEMENTARY INFORMATION: http://www.zaik.uni-koeln.de/~schliep/ProtClust.html.

Mesh：

Substances：
Proteins

Year: 2001 PMID： 11673238 DOI： 10.1093/bioinformatics/17.10.935

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

15 in total

1. Large-scale protein annotation through gene ontology.

Authors: Hanqing Xie; Alon Wasserman; Zurit Levine; Amit Novik; Vladimir Grebinskiy; Avi Shoshan; Liat Mintz
Journal: Genome Res Date: 2002-05 Impact factor: 9.043

Review 2. Data clustering in life sciences.

Authors: Ying Zhao; George Karypis
Journal: Mol Biotechnol Date: 2005-09 Impact factor: 2.695

3. Transitive homology-guided structural studies lead to discovery of Cro proteins with 40% sequence identity but different folds.

Authors: Christian G Roessler; Branwen M Hall; William J Anderson; Wendy M Ingram; Sue A Roberts; William R Montfort; Matthew H J Cordes
Journal: Proc Natl Acad Sci U S A Date: 2008-01-28 Impact factor: 11.205

Review 4. A common evolutionary origin for tailed-bacteriophage functional modules and bacterial machineries.

Authors: David Veesler; Christian Cambillau
Journal: Microbiol Mol Biol Rev Date: 2011-09 Impact factor: 11.056

5. The role of public goods in planetary evolution.

Authors: James O McInerney; Douglas H Erwin
Journal: Philos Trans A Math Phys Eng Sci Date: 2017-12-28 Impact factor: 4.226

6. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

Authors: Kevin Horan; Josh Lauricha; Julia Bailey-Serres; Natasha Raikhel; Thomas Girke
Journal: Plant Physiol Date: 2005-05 Impact factor: 8.340

7. Protein subcellular localization prediction of eukaryotes using a knowledge-based approach.

Authors: Hsin-Nan Lin; Ching-Tai Chen; Ting-Yi Sung; Shinn-Ying Ho; Wen-Lian Hsu
Journal: BMC Bioinformatics Date: 2009-12-03 Impact factor: 3.169

8. Genome-wide comparative gene family classification.

Authors: Christian Frech; Nansheng Chen
Journal: PLoS One Date: 2010-10-15 Impact factor: 3.240

9. Family classification without domain chaining.

Authors: Jacob M Joseph; Dannie Durand
Journal: Bioinformatics Date: 2009-06-15 Impact factor: 6.937

10. The bacterial Ras/Rap1 site-specific endopeptidase RRSP cleaves Ras through an atypical mechanism to disrupt Ras-ERK signaling.

Authors: Marco Biancucci; George Minasov; Avik Banerjee; Alfa Herrera; Patrick J Woida; Matthew B Kieffer; Lakshman Bindu; Maria Abreu-Blanco; Wayne F Anderson; Vadim Gaponenko; Andrew G Stephen; Matthew Holderfield; Karla J F Satchell
Journal: Sci Signal Date: 2018-10-02 Impact factor: 9.517