Literature DB >> 16485984

Score statistics of global sequence alignment from the energy distribution of a modified directed polymer and directed percolation problem.

Mihaela E Sardiu1, Gelio Alves, Yi-Kuo Yu.   

Abstract

Sequence alignment is one of the most important bioinformatics tools for modern molecular biology. The statistical characterization of gapped alignment scores has been a long-standing problem in sequence alignment research. Using a variant of the directed path in random media model, we investigate the score statistics of global sequence alignment taking into account, in particular, the compositional bias of the sequences compared. Such statistics are used to distinguish accidental similarity due to compositional similarity from biologically significant similarity. To accommodate the compositional bias, we introduce an extra parameter p indicating the probability for positive matching scores to occur. When p is small, a high scoring alignment obviously cannot come from compositional similarity. When p is large, the highest scoring point within a global alignment tends to be close to the end of both sequences, in which case we say the system percolates. By applying finite-size scaling theory on percolating probability functions of various sizes (sequence lengths), the critical p at infinite size is obtained. For alignment of length t, the fact that the score fluctuation grows as chi(t)1/3 is confirmed upon investigating the scaling form of the alignment score. Using the Kolmogorov-Smirnov statistics test, we show that the random variable , if properly scaled, follows the Tracy-Widom distributions: Gaussian orthogonal ensemble for p slightly larger than pc and Gaussian unitary ensemble for larger p. Although these results deepen our understanding of the distribution of alignment scores, the use of these results in practical applications remains somewhat heuristic and needs to be further developed. Nevertheless, the possibility of characterizing score statistics for modest system size (sequence lengths), via proper reparametrization of alignment scores, is illustrated.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 16485984     DOI: 10.1103/PhysRevE.72.061917

Source DB:  PubMed          Journal:  Phys Rev E Stat Nonlin Soft Matter Phys        ISSN: 1539-3755


  3 in total

1.  Objective method for estimating asymptotic parameters, with an application to sequence alignment.

Authors:  Sergey Sheetlin; Yonil Park; John L Spouge
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2011-09-13

2.  A new extended gumbel distribution: Properties and application.

Authors:  Aisha Fayomi; Sadaf Khan; Muhammad Hussain Tahir; Ali Algarni; Farrukh Jamal; Reman Abu-Shanab
Journal:  PLoS One       Date:  2022-05-27       Impact factor: 3.752

3.  Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling.

Authors:  Stefan Wolfsheimer; Inke Herms; Sven Rahmann; Alexander K Hartmann
Journal:  BMC Bioinformatics       Date:  2011-02-03       Impact factor: 3.169

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.