Literature DB >> 12075022

Hybrid alignment: high-performance with universal statistics.

Yi-Kuo Yu1, Ralf Bundschuh, Terence Hwa.   

Abstract

The score statistics of a recently introduced 'hybrid alignment' algorithm is studied in detail numerically. An extensive survey across the 2216 models of protein domains contained in the Pfam v5.4 database (Bateman et al., Nucleic Acids Res., 28, 263-266, 2000) verifies the theoretical predictions: For the position-specific scoring functions used in the Pfam models, the score statistics of hybrid alignment obey the Gumbel distribution, with the key Gumbel parameter lambda taking on the asymptotic value 1 universally for all models. Thus, the use of hybrid alignment eliminates the time-consuming computer simulations normally needed to assign p-values to alignment scores, freeing the users to experiment with different scoring parameters and functions. The performance of the hybrid algorithm in detecting sequence homology is also studied. For protein sequences from the SCOP database (Murzin et al., J. Mol. Biol., 247, 536-540, 1995) using uniform scoring functions, the performance is found to be comparable to the best of the existing methods. Preliminary results using the PfamA database suggest that the hybrid algorithm achieves similar performance as existing methods for position-specific scoring systems as well. Hybrid alignment is thereby established as a high performance alignment algorithm with well-characterized, universal statistics.

Mesh:

Year:  2002        PMID: 12075022     DOI: 10.1093/bioinformatics/18.6.864

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  7 in total

1.  Finding functional sequence elements by multiple local alignment.

Authors:  Martin C Frith; Ulla Hansen; John L Spouge; Zhiping Weng
Journal:  Nucleic Acids Res       Date:  2004-01-02       Impact factor: 16.971

2.  A cell-based simulation software for multi-cellular systems.

Authors:  Stefan Hoehme; Dirk Drasdo
Journal:  Bioinformatics       Date:  2010-08-13       Impact factor: 6.937

3.  HMM-ModE--improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences.

Authors:  Prashant K Srivastava; Dhwani K Desai; Soumyadeep Nandi; Andrew M Lynn
Journal:  BMC Bioinformatics       Date:  2007-03-27       Impact factor: 3.169

4.  Optimizing amino acid substitution matrices with a local alignment kernel.

Authors:  Hiroto Saigo; Jean-Philippe Vert; Tatsuya Akutsu
Journal:  BMC Bioinformatics       Date:  2006-05-05       Impact factor: 3.169

5.  The effectiveness of position- and composition-specific gap costs for protein similarity searches.

Authors:  Aleksandar Stojmirović; E Michael Gertz; Stephen F Altschul; Yi-Kuo Yu
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

6.  The identification of complete domains within protein sequences using accurate E-values for semi-global alignment.

Authors:  Maricel G Kann; Sergey L Sheetlin; Yonil Park; Stephen H Bryant; John L Spouge
Journal:  Nucleic Acids Res       Date:  2007-06-27       Impact factor: 16.971

7.  A probabilistic model of local sequence alignment that simplifies statistical significance estimation.

Authors:  Sean R Eddy
Journal:  PLoS Comput Biol       Date:  2008-05-30       Impact factor: 4.475

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.