| Literature DB >> 16485984 |
Mihaela E Sardiu1, Gelio Alves, Yi-Kuo Yu.
Abstract
Sequence alignment is one of the most important bioinformatics tools for modern molecular biology. The statistical characterization of gapped alignment scores has been a long-standing problem in sequence alignment research. Using a variant of the directed path in random media model, we investigate the score statistics of global sequence alignment taking into account, in particular, the compositional bias of the sequences compared. Such statistics are used to distinguish accidental similarity due to compositional similarity from biologically significant similarity. To accommodate the compositional bias, we introduce an extra parameter p indicating the probability for positive matching scores to occur. When p is small, a high scoring alignment obviously cannot come from compositional similarity. When p is large, the highest scoring point within a global alignment tends to be close to the end of both sequences, in which case we say the system percolates. By applying finite-size scaling theory on percolating probability functions of various sizes (sequence lengths), the critical p at infinite size is obtained. For alignment of length t, the fact that the score fluctuation grows as chi(t)1/3 is confirmed upon investigating the scaling form of the alignment score. Using the Kolmogorov-Smirnov statistics test, we show that the random variable , if properly scaled, follows the Tracy-Widom distributions: Gaussian orthogonal ensemble for p slightly larger than pc and Gaussian unitary ensemble for larger p. Although these results deepen our understanding of the distribution of alignment scores, the use of these results in practical applications remains somewhat heuristic and needs to be further developed. Nevertheless, the possibility of characterizing score statistics for modest system size (sequence lengths), via proper reparametrization of alignment scores, is illustrated.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16485984 DOI: 10.1103/PhysRevE.72.061917
Source DB: PubMed Journal: Phys Rev E Stat Nonlin Soft Matter Phys ISSN: 1539-3755