| Literature DB >> 15662195 |
G Nuel1.
Abstract
Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method.Mesh:
Year: 2004 PMID: 15662195 DOI: 10.1089/cmb.2004.11.1023
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.479