Literature DB >> 11791228

A graph-based clustering method for a large set of sequences using a graph partitioning algorithm.

H Kawaji1, Y Yamaguchi, H Matsuda, A Hashimoto.   

Abstract

A graph-based clustering method is proposed to cluster protein sequences into families, which automatically improves clusters of the conventional single linkage clustering method. Our approach formulates sequence clustering problem as a kind of graph partitioning problem in a weighted linkage graph, which vertices correspond to sequences, edges correspond to higher similarities than given threshold and are weighted by their similarities. The effectiveness of our method is shown in comparison with InterPro families in all mouse proteins in SWISS-PROT. The result clusters match to InterPro families much better than the single linkage clustering method. 77% of proteins in InterPro families are classified into appropriate clusters.

Entities:  

Mesh:

Substances:

Year:  2001        PMID: 11791228

Source DB:  PubMed          Journal:  Genome Inform        ISSN: 0919-9454


  2 in total

1.  Visualizing sequence similarity of protein families.

Authors:  Vamsi Veeramachaneni; Wojciech Makałowski
Journal:  Genome Res       Date:  2004-05-12       Impact factor: 9.043

2.  Medical record linkage in health information systems by approximate string matching and clustering.

Authors:  Erik A Sauleau; Jean-Philippe Paumier; Antoine Buemi
Journal:  BMC Med Inform Decis Mak       Date:  2005-10-11       Impact factor: 2.796

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.