Literature DB >> 18631020

Compound poisson approximation of the number of occurrences of a position frequency matrix (PFM) on both strands.

Utz J Pape1, Sven Rahmann, Fengzhu Sun, Martin Vingron.   

Abstract

Transcription factors play a key role in gene regulation by interacting with specific binding sites or motifs. Therefore, enrichment of binding motifs is important for genome annotation and efficient computation of the statistical significance, the p-value, of the enrichment of motifs is crucial. We propose an efficient approximation to compute the significance. Due to the incorporation of both strands of the DNA molecules and explicit modeling of dependencies between overlapping hits, we achieve accurate results for any DNA motif based on its Position Frequency Matrix (PFM) representation. The accuracy of the p-value approximation is shown by comparison with the simulated count distribution. Furthermore, we compare the approach with a binomial approximation, (compound) Poisson approximation, and a normal approximation. In general, our approach outperforms these approximations or is equally good but significantly faster. An implementation of our approach is available at http://mosta.molgen.mpg.de.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18631020      PMCID: PMC2607244          DOI: 10.1089/cmb.2007.0084

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  21 in total

1.  Fast probabilistic analysis of sequence function using scoring matrices.

Authors:  T D Wu; C G Nevill-Manning; D L Brutlag
Journal:  Bioinformatics       Date:  2000-03       Impact factor: 6.937

Review 2.  Probabilistic and statistical properties of words: an overview.

Authors:  G Reinert; S Schbath; M S Waterman
Journal:  J Comput Biol       Date:  2000 Feb-Apr       Impact factor: 1.479

3.  Finding motifs in promoter regions.

Authors:  Libi Hertzberg; Or Zuk; Gad Getz; Eytan Domany
Journal:  J Comput Biol       Date:  2005-04       Impact factor: 1.479

4.  Computing exact P-values for DNA motifs.

Authors:  Jing Zhang; Bo Jiang; Ming Li; John Tromp; Xuegong Zhang; Michael Q Zhang
Journal:  Bioinformatics       Date:  2007-01-18       Impact factor: 6.937

5.  The statistical significance of nucleotide position-weight matrix matches.

Authors:  J M Claverie; S Audic
Journal:  Comput Appl Biosci       Date:  1996-10

6.  Linguistics of nucleotide sequences: morphology and comparison of vocabularies.

Authors:  V Brendel; J S Beckmann; E N Trifonov
Journal:  J Biomol Struct Dyn       Date:  1986-08

7.  Methods for calculating the probabilities of finding patterns in sequences.

Authors:  R Staden
Journal:  Comput Appl Biosci       Date:  1989-04

8.  The distribution of the frequency of occurrence of nucleotide subsequences, based on their overlap capability.

Authors:  J F Gentleman; R C Mullin
Journal:  Biometrics       Date:  1989-03       Impact factor: 2.571

9.  Exact computation of pattern probabilities in random sequences generated by Markov chains.

Authors:  J Kleffe; U Langbecker
Journal:  Comput Appl Biosci       Date:  1990-10

10.  Binding site selection for the plant MADS domain protein AGL15: an in vitro and in vivo study.

Authors:  Weining Tang; Sharyn E Perry
Journal:  J Biol Chem       Date:  2003-05-12       Impact factor: 5.157

View more
  5 in total

1.  Normal and compound poisson approximations for pattern occurrences in NGS reads.

Authors:  Zhiyuan Zhai; Gesine Reinert; Kai Song; Michael S Waterman; Yihui Luan; Fengzhu Sun
Journal:  J Comput Biol       Date:  2012-06       Impact factor: 1.479

2.  Importance sampling of word patterns in DNA and protein sequences.

Authors:  Hock Peng Chan; Nancy Ruonan Zhang; Louis H Y Chen
Journal:  J Comput Biol       Date:  2010-12       Impact factor: 1.479

3.  motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences.

Authors:  Dennis Kostka; Tara Friedrich; Alisha K Holloway; Katherine S Pollard
Journal:  Stat Interface       Date:  2015       Impact factor: 0.582

4.  Statistical detection of cooperative transcription factors with similarity adjustment.

Authors:  Utz J Pape; Holger Klein; Martin Vingron
Journal:  Bioinformatics       Date:  2009-03-13       Impact factor: 6.937

5.  An improved compound Poisson model for the number of motif hits in DNA sequences.

Authors:  Wolfgang Kopp; Martin Vingron
Journal:  Bioinformatics       Date:  2017-12-15       Impact factor: 6.937

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.