Literature DB >> 14734316

Graph-based clustering for finding distant relationships in a large set of protein sequences.

Hideya Kawaji1, Yoichi Takenaka, Hideo Matsuda.   

Abstract

MOTIVATION: Clustering of protein sequences is widely used for the functional characterization of proteins. However, it is still not easy to cluster distantly-related proteins, which have only regional similarity among their sequences. It is therefore necessary to develop an algorithm for clustering such distantly-related proteins.
RESULTS: We have developed a time and space efficient clustering algorithm. It uses a graph representation where its vertices and edges denote proteins and their sequence similarities above a certain cutoff score, respectively. It repeatedly partitions the graph by removing edges that have small weights, which correspond to low sequence similarities. To find the appropriate partitions, we introduce a score combining the normalized cut and a locally minimal cut capacities. Our method is applied to the entire 40,703 human proteins in SWISS-PROT and TrEMBL. The resulting clusters shows a 76% recall (20,529 proteins) of the 26,917 classified by InterPro. It also finds relationships not found by other clustering methods. AVAILABILITY: The complete result of our algorithm for all the human proteins in SWISS-PROT and TrEMBL, and other supplementary information are available at http://motif.ics.es.osaka-u.ac.jp/Ncut-KL/

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 14734316     DOI: 10.1093/bioinformatics/btg397

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  4 in total

1.  Ranking and compacting binding segments of protein families using aligned pattern clusters.

Authors:  En-Shiun Lee; Andrew Kc Wong
Journal:  Proteome Sci       Date:  2013-11-07       Impact factor: 2.480

2.  Automatic classification of protein structures relying on similarities between alignments.

Authors:  Guillaume Santini; Henry Soldano; Joël Pothier
Journal:  BMC Bioinformatics       Date:  2012-09-14       Impact factor: 3.169

3.  Objective sequence-based subfamily classifications of mouse homeodomains reflect their in vitro DNA-binding preferences.

Authors:  Miguel A Santos; Andrei L Turinsky; Serene Ong; Jennifer Tsai; Michael F Berger; Gwenael Badis; Shaheynoor Talukder; Andrew R Gehrke; Martha L Bulyk; Timothy R Hughes; Shoshana J Wodak
Journal:  Nucleic Acids Res       Date:  2010-08-12       Impact factor: 16.971

4.  Large scale hierarchical clustering of protein sequences.

Authors:  Antje Krause; Jens Stoye; Martin Vingron
Journal:  BMC Bioinformatics       Date:  2005-01-22       Impact factor: 3.169

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.