Literature DB >> 11535176

Statistical significance of probabilistic sequence alignment and related local hidden Markov models.

Y K Yu1, T Hwa.   

Abstract

The score statistics of probabilistic gapped local alignment of random sequences is investigated both analytically and numerically. The full probabilistic algorithm (e.g., the "local" version of maximum-likelihood or hidden Markov model method) is found to have anomalous statistics. A modified "semi-probabilistic" alignment consisting of a hybrid of Smith-Waterman and probabilistic alignment is then proposed and studied in detail. It is predicted that the score statistics of the hybrid algorithm is of the Gumbel universal form, with the key Gumbel parameter lambda taking on a fixed asymptotic value for a wide variety of scoring systems and parameters. A simple recipe for the computation of the "relative entropy," and from it the finite size correction to lambda, is also given. These predictions compare well with direct numerical simulations for sequences of lengths between 100 and 1,000 examined using various PAM substitution scores and affine gap functions. The sensitivity of the hybrid method in the detection of sequence homology is also studied using correlated sequences generated from toy mutation models. It is found to be comparable to that of the Smith-Waterman alignment and significantly better than the Viterbi version of the probabilistic alignment.

Mesh:

Year:  2001        PMID: 11535176     DOI: 10.1089/10665270152530845

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  14 in total

1.  The evolution of DNA regulatory regions for proteo-gamma bacteria by interspecies comparisons.

Authors:  Nikolaus Rajewsky; Nicholas D Socci; Martin Zapotocky; Eric D Siggia
Journal:  Genome Res       Date:  2002-02       Impact factor: 9.043

2.  Localization of denaturation bubbles in random DNA sequences.

Authors:  Terence Hwa; Enzo Marinari; Kim Sneppen; Lei-han Tang
Journal:  Proc Natl Acad Sci U S A       Date:  2003-04-02       Impact factor: 11.205

3.  Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution.

Authors:  Gelio Alves; Yi-Kuo Yu
Journal:  Bioinformatics       Date:  2016-04-29       Impact factor: 6.937

4.  ESTIMATING THE GUMBEL SCALE PARAMETER FOR LOCAL ALIGNMENT OF RANDOM SEQUENCES BY IMPORTANCE SAMPLING WITH STOPPING TIMES.

Authors:  Yonil Park; Sergey Sheetlin; John L Spouge
Journal:  Ann Stat       Date:  2009-12-01       Impact factor: 4.028

5.  A flexible and efficient template format for circular consensus sequencing and SNP detection.

Authors:  Kevin J Travers; Chen-Shan Chin; David R Rank; John S Eid; Stephen W Turner
Journal:  Nucleic Acids Res       Date:  2010-06-22       Impact factor: 16.971

6.  Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments.

Authors:  Michael L Sierk; Michael E Smoot; Ellen J Bass; William R Pearson
Journal:  BMC Bioinformatics       Date:  2010-03-22       Impact factor: 3.169

7.  The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment.

Authors:  Sergey Sheetlin; Yonil Park; John L Spouge
Journal:  Nucleic Acids Res       Date:  2005-09-06       Impact factor: 16.971

8.  The effectiveness of position- and composition-specific gap costs for protein similarity searches.

Authors:  Aleksandar Stojmirović; E Michael Gertz; Stephen F Altschul; Yi-Kuo Yu
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

9.  GraphAlignment: Bayesian pairwise alignment of biological networks.

Authors:  Michal Kolář; Jörn Meier; Ville Mustonen; Michael Lässig; Johannes Berg
Journal:  BMC Syst Biol       Date:  2012-11-21

10.  Local sequence alignments statistics: deviations from Gumbel statistics in the rare-event tail.

Authors:  Stefan Wolfsheimer; Bernd Burghardt; Alexander K Hartmann
Journal:  Algorithms Mol Biol       Date:  2007-07-11       Impact factor: 1.405

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.