Literature DB >> 19644175

On subset seeds for protein alignment.

Mikhail Roytberg1, Anna Gambin, Laurent Noé, Slawomir Lasota, Eugenia Furletova, Ewa Szczurek, Gregory Kucherov.   

Abstract

We apply the concept of subset seeds to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard Blastp seeding method, as well as with the family of vector seeds. While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix. Finally, we perform a large-scale benchmarking of our seeds against several main databases of protein alignments. Here again, the results show a comparable or better performance of our seeds versus Blastp.

Mesh:

Substances:

Year:  2009        PMID: 19644175     DOI: 10.1109/TCBB.2009.4

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  6 in total

1.  Adaptive seeds tame genomic sequence comparison.

Authors:  Szymon M Kiełbasa; Raymond Wan; Kengo Sato; Paul Horton; Martin C Frith
Journal:  Genome Res       Date:  2011-01-05       Impact factor: 9.043

2.  SANSparallel: interactive homology search against Uniprot.

Authors:  Panu Somervuo; Liisa Holm
Journal:  Nucleic Acids Res       Date:  2015-04-08       Impact factor: 16.971

3.  A bioinformatician's guide to the forefront of suffix array construction algorithms.

Authors:  Anish Man Singh Shrestha; Martin C Frith; Paul Horton
Journal:  Brief Bioinform       Date:  2014-01-10       Impact factor: 11.622

4.  Best hits of 11110110111: model-free selection and parameter-free sensitivity calculation of spaced seeds.

Authors:  Laurent Noé
Journal:  Algorithms Mol Biol       Date:  2017-02-14       Impact factor: 1.405

5.  Minimally-overlapping words for sequence similarity search.

Authors:  Martin C Frith; Laurent Noé; Gregory Kucherov
Journal:  Bioinformatics       Date:  2020-12-21       Impact factor: 6.937

6.  PLAST: parallel local alignment search tool for database comparison.

Authors:  Van Hoa Nguyen; Dominique Lavenier
Journal:  BMC Bioinformatics       Date:  2009-10-12       Impact factor: 3.169

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.