Literature DB >> 21548811

The irredundant class method for remote homology detection of protein sequences.

Matteo Comin1, Davide Verzotto.   

Abstract

The automatic classification of protein sequences into families is of great help for the functional prediction and annotation of new proteins. In this article, we present a method called Irredundant Class that address the remote homology detection problem. The best performing methods that solve this problem are string kernels, that compute a similarity function between pairs of proteins based on their subsequence composition. We provide evidence that almost all string kernels are based on patterns that are not independent, and therefore the associated similarity scores are obtained using a set of redundant features, overestimating the similarity between the proteins. To specifically address this issue, we introduce the class of irredundant common patterns. Loosely speaking, the set of irredundant common patterns is the smallest class of independent patterns that can describe all common patterns in a pair of sequences. We present a classification method based on the statistics of these patterns, named Irredundant Class. Results on benchmark data show that the Irredundant Class outperforms most of the string kernels previously proposed, and it achieves results as good as the current state-of-the-art method Local Alignment, but using the same pairwise information only once.

Mesh:

Substances:

Year:  2011        PMID: 21548811     DOI: 10.1089/cmb.2010.0171

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  6 in total

1.  Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns.

Authors:  Matteo Comin; Michele Schimd
Journal:  BMC Bioinformatics       Date:  2014-09-10       Impact factor: 3.169

2.  Parallel continuous flow: a parallel suffix tree construction tool for whole genomes.

Authors:  Matteo Comin; Montse Farreras
Journal:  J Comput Biol       Date:  2014-03-05       Impact factor: 1.479

3.  Clustering of reads with alignment-free measures and quality values.

Authors:  Matteo Comin; Andrea Leoni; Michele Schimd
Journal:  Algorithms Mol Biol       Date:  2015-01-28       Impact factor: 1.405

4.  Estimating evolutionary distances between genomic sequences from spaced-word matches.

Authors:  Burkhard Morgenstern; Bingyao Zhu; Sebastian Horwege; Chris André Leimeister
Journal:  Algorithms Mol Biol       Date:  2015-02-11       Impact factor: 1.405

5.  Fast and accurate phylogeny reconstruction using filtered spaced-word matches.

Authors:  Chris-André Leimeister; Salma Sohrabi-Jahromi; Burkhard Morgenstern
Journal:  Bioinformatics       Date:  2017-04-01       Impact factor: 6.937

6.  Alignment-free phylogeny of whole genomes using underlying subwords.

Authors:  Matteo Comin; Davide Verzotto
Journal:  Algorithms Mol Biol       Date:  2012-12-06       Impact factor: 1.405

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.