Literature DB >> 15712113

Clustering of database sequences for fast homology search using upper bounds on alignment score.

Masumi Itoh1, Tatsuya Akutsu, Minoru Kanehisa.   

Abstract

Homology data are among the most important information used to predict the functions of unknown proteins and thus fast and accurate methods are needed. In this paper, we propose a new approach for fast and accurate homology search using pre-computed all-against-all similarity scores in a target database. We previously developed a method for derivation of an upper bound of the Smith-Waterman score (SW-score) between a query and a homolog candidate sequence using the SW-score between the candidate and a sequence similar to the query. In this paper, by using this upper bound, we first cluster the sequences in the target database so that upper bounds of SW-scores for all the members in the clusters are less than a given value and select representative sequences for respective clusters. Then, the query sequence is searched against the representative sequences and the upper bounds of SW-scores for respective clusters are estimated. Only if the upper bound is higher than a given threshold, SW-alignments are computed for all the sequences in the cluster. We performed computational experiments to test efficiency of the proposed method for the KEGG/GENES database using the KEGG/SSDB. The results suggest that our method is efficient for redundant databases that include multiple closely related species.

Mesh:

Year:  2004        PMID: 15712113

Source DB:  PubMed          Journal:  Genome Inform        ISSN: 0919-9454


  5 in total

1.  Identifying metabolic enzymes with multiple types of association evidence.

Authors:  Peter Kharchenko; Lifeng Chen; Yoav Freund; Dennis Vitkup; George M Church
Journal:  BMC Bioinformatics       Date:  2006-03-29       Impact factor: 3.169

2.  A conditional neural fields model for protein threading.

Authors:  Jianzhu Ma; Jian Peng; Sheng Wang; Jinbo Xu
Journal:  Bioinformatics       Date:  2012-06-15       Impact factor: 6.937

3.  Expression dynamics of a cellular metabolic network.

Authors:  Peter Kharchenko; George M Church; Dennis Vitkup
Journal:  Mol Syst Biol       Date:  2005-08-02       Impact factor: 11.429

4.  UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches.

Authors:  Baris E Suzek; Yuqi Wang; Hongzhan Huang; Peter B McGarvey; Cathy H Wu
Journal:  Bioinformatics       Date:  2014-11-13       Impact factor: 6.937

5.  Proposal for a new therapy for drug-resistant malaria using Plasmodium synthetic lethality inference.

Authors:  Sang Joon Lee; Eunseok Seo; Yonghyun Cho
Journal:  Int J Parasitol Drugs Drug Resist       Date:  2013-06-28       Impact factor: 4.077

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.