Literature DB >> 15662195

LD-SPatt: large deviations statistics for patterns on Markov chains.

G Nuel1.   

Abstract

Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method.

Mesh:

Year:  2004        PMID: 15662195     DOI: 10.1089/cmb.2004.11.1023

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  6 in total

1.  The power of detecting enriched patterns: an HMM approach.

Authors:  Zhiyuan Zhai; Shih-Yen Ku; Yihui Luan; Gesine Reinert; Michael S Waterman; Fengzhu Sun
Journal:  J Comput Biol       Date:  2010-04       Impact factor: 1.479

Review 2.  DNA motifs that sculpt the bacterial chromosome.

Authors:  Fabrice Touzain; Marie-Agnès Petit; Sophie Schbath; Meriem El Karoui
Journal:  Nat Rev Microbiol       Date:  2011-01       Impact factor: 60.633

3.  Effective p-value computations using Finite Markov Chain Imbedding (FMCI): application to local score and to pattern statistics.

Authors:  Grégory Nuel
Journal:  Algorithms Mol Biol       Date:  2006-04-07       Impact factor: 1.405

4.  Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data.

Authors:  Leslie Regad; Juliette Martin; Gregory Nuel; Anne-Claude Camproux
Journal:  Algorithms Mol Biol       Date:  2010-01-26       Impact factor: 1.405

5.  Pattern statistics on Markov chains and sensitivity to parameter estimation.

Authors:  Grégory Nuel
Journal:  Algorithms Mol Biol       Date:  2006-10-17       Impact factor: 1.405

6.  Analysis of pattern overlaps and exact computation of P-values of pattern occurrences numbers: case of Hidden Markov Models.

Authors:  Mireille Régnier; Evgenia Furletova; Victor Yakovlev; Mikhail Roytberg
Journal:  Algorithms Mol Biol       Date:  2014-12-16       Impact factor: 1.405

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.