Literature DB >> 16722531

Effective p-value computations using Finite Markov Chain Imbedding (FMCI): application to local score and to pattern statistics.

Grégory Nuel1.   

Abstract

The technique of Finite Markov Chain Imbedding (FMCI) is a classical approach to complex combinatorial problems related to sequences. In order to get efficient algorithms, it is known that such approaches need to be first rewritten using recursive relations. We propose here to give here a general recursive algorithms allowing to compute in a numerically stable manner exact Cumulative Distribution Function (CDF) or complementary CDF (CCDF). These algorithms are then applied in two particular cases: the local score of one sequence and pattern statistics. In both cases, asymptotic developments are derived. For the local score, our new approach allows for the very first time to compute exact p-values for a practical study (finding hydrophobic segments in a protein database) where only approximations were available before. In this study, the asymptotic approximations appear to be completely unreliable for 99.5% of the considered sequences. Concerning the pattern statistics, the new FMCI algorithms dramatically outperform the previous ones as they are more reliable, easier to implement, faster and with lower memory requirements.

Entities:  

Year:  2006        PMID: 16722531      PMCID: PMC1479348          DOI: 10.1186/1748-7188-1-5

Source DB:  PubMed          Journal:  Algorithms Mol Biol        ISSN: 1748-7188            Impact factor:   1.405


  9 in total

1.  The estimation of statistical parameters for local alignment score distributions.

Authors:  S F Altschul; R Bundschuh; R Olsen; T Hwa
Journal:  Nucleic Acids Res       Date:  2001-01-15       Impact factor: 16.971

2.  Exact distribution for the local score of one i.i.d. random sequence.

Authors:  S Mercier; J J Daudin
Journal:  J Comput Biol       Date:  2001       Impact factor: 1.479

3.  Occurrence probability of structured motifs in random sequences.

Authors:  S Robin; J-J Daudin; H Richard; M-F Sagot; S Schbath
Journal:  J Comput Biol       Date:  2002       Impact factor: 1.479

4.  LD-SPatt: large deviations statistics for patterns on Markov chains.

Authors:  G Nuel
Journal:  J Comput Biol       Date:  2004       Impact factor: 1.479

5.  S-SPatt: simple statistics for patterns on Markov chains.

Authors:  Grégory Nuel
Journal:  Bioinformatics       Date:  2005-04-19       Impact factor: 6.937

6.  The nature of the accessible and buried surfaces in proteins.

Authors:  C Chothia
Journal:  J Mol Biol       Date:  1976-07-25       Impact factor: 5.469

7.  Local alignment statistics.

Authors:  S F Altschul; W Gish
Journal:  Methods Enzymol       Date:  1996       Impact factor: 1.600

8.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

Authors:  S Karlin; S F Altschul
Journal:  Proc Natl Acad Sci U S A       Date:  1990-03       Impact factor: 11.205

9.  A simple method for displaying the hydropathic character of a protein.

Authors:  J Kyte; R F Doolittle
Journal:  J Mol Biol       Date:  1982-05-05       Impact factor: 5.469

  9 in total
  6 in total

1.  Normal and compound poisson approximations for pattern occurrences in NGS reads.

Authors:  Zhiyuan Zhai; Gesine Reinert; Kai Song; Michael S Waterman; Yihui Luan; Fengzhu Sun
Journal:  J Comput Biol       Date:  2012-06       Impact factor: 1.479

2.  The power of detecting enriched patterns: an HMM approach.

Authors:  Zhiyuan Zhai; Shih-Yen Ku; Yihui Luan; Gesine Reinert; Michael S Waterman; Fengzhu Sun
Journal:  J Comput Biol       Date:  2010-04       Impact factor: 1.479

3.  Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data.

Authors:  Leslie Regad; Juliette Martin; Gregory Nuel; Anne-Claude Camproux
Journal:  Algorithms Mol Biol       Date:  2010-01-26       Impact factor: 1.405

4.  Pattern statistics on Markov chains and sensitivity to parameter estimation.

Authors:  Grégory Nuel
Journal:  Algorithms Mol Biol       Date:  2006-10-17       Impact factor: 1.405

5.  Analysis of pattern overlaps and exact computation of P-values of pattern occurrences numbers: case of Hidden Markov Models.

Authors:  Mireille Régnier; Evgenia Furletova; Victor Yakovlev; Mikhail Roytberg
Journal:  Algorithms Mol Biol       Date:  2014-12-16       Impact factor: 1.405

6.  Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments.

Authors:  Morten Muhlig Nielsen; Paula Tataru; Tobias Madsen; Asger Hobolth; Jakob Skou Pedersen
Journal:  Algorithms Mol Biol       Date:  2018-12-08       Impact factor: 1.405

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.