Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Graph-based clustering for finding distant relationships in a large set of protein sequences.

Literature DB >> 14734316

Graph-based clustering for finding distant relationships in a large set of protein sequences.

Hideya Kawaji¹, Yoichi Takenaka, Hideo Matsuda.

Abstract

MOTIVATION: Clustering of protein sequences is widely used for the functional characterization of proteins. However, it is still not easy to cluster distantly-related proteins, which have only regional similarity among their sequences. It is therefore necessary to develop an algorithm for clustering such distantly-related proteins.
RESULTS: We have developed a time and space efficient clustering algorithm. It uses a graph representation where its vertices and edges denote proteins and their sequence similarities above a certain cutoff score, respectively. It repeatedly partitions the graph by removing edges that have small weights, which correspond to low sequence similarities. To find the appropriate partitions, we introduce a score combining the normalized cut and a locally minimal cut capacities. Our method is applied to the entire 40,703 human proteins in SWISS-PROT and TrEMBL. The resulting clusters shows a 76% recall (20,529 proteins) of the 26,917 classified by InterPro. It also finds relationships not found by other clustering methods. AVAILABILITY: The complete result of our algorithm for all the human proteins in SWISS-PROT and TrEMBL, and other supplementary information are available at http://motif.ics.es.osaka-u.ac.jp/Ncut-KL/

Entities: Species

Mesh：

Substances：

Year: 2004 PMID： 14734316 DOI： 10.1093/bioinformatics/btg397

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

4 in total

1. Ranking and compacting binding segments of protein families using aligned pattern clusters.

Authors: En-Shiun Lee; Andrew Kc Wong
Journal: Proteome Sci Date: 2013-11-07 Impact factor: 2.480

2. Automatic classification of protein structures relying on similarities between alignments.

Authors: Guillaume Santini; Henry Soldano; Joël Pothier
Journal: BMC Bioinformatics Date: 2012-09-14 Impact factor: 3.169

3. Objective sequence-based subfamily classifications of mouse homeodomains reflect their in vitro DNA-binding preferences.

Authors: Miguel A Santos; Andrei L Turinsky; Serene Ong; Jennifer Tsai; Michael F Berger; Gwenael Badis; Shaheynoor Talukder; Andrew R Gehrke; Martha L Bulyk; Timothy R Hughes; Shoshana J Wodak
Journal: Nucleic Acids Res Date: 2010-08-12 Impact factor: 16.971

4. Large scale hierarchical clustering of protein sequences.

Authors: Antje Krause; Jens Stoye; Martin Vingron
Journal: BMC Bioinformatics Date: 2005-01-22 Impact factor: 3.169

4 in total