Literature DB >> 20426691

The power of detecting enriched patterns: an HMM approach.

Zhiyuan Zhai1, Shih-Yen Ku, Yihui Luan, Gesine Reinert, Michael S Waterman, Fengzhu Sun.   

Abstract

The identification of binding sites of transcription factors (TF) and other regulatory regions, referred to as motifs, located in a set of molecular sequences is of fundamental importance in genomic research. Many computational and experimental approaches have been developed to locate motifs. The set of sequences of interest can be concatenated to form a long sequence of length n. One of the successful approaches for motif discovery is to identify statistically over- or under-represented patterns in this long sequence. A pattern refers to a fixed word W over the alphabet. In the example of interest, W is a word in the set of patterns of the motif. Despite extensive studies on motif discovery, no studies have been carried out on the power of detecting statistically over- or under-represented patterns Here we address the issue of how the known presence of random instances of a known motif affects the power of detecting patterns, such as patterns within the motif. Let N(W)(n) be the number of possibly overlapping occurrences of a pattern W in the sequence that contains instances of a known motif; such a sequence is modeled here by a Hidden Markov Model (HMM). First, efficient computational methods for calculating the mean and variance of N(W)(n) are developed. Second, efficient computational methods for calculating parameters involved in the normal approximation of N(W)(n) for frequent patterns and compound Poisson approximation of N(W)(n) for rare patterns are developed. Third, an easy to use web program is developed to calculate the power of detecting patterns and the program is used to study the power of detection in several interesting biological examples.

Mesh:

Year:  2010        PMID: 20426691      PMCID: PMC3203519          DOI: 10.1089/cmb.2009.0218

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  30 in total

1.  YAKUSA: a fast structural database scanning method.

Authors:  Mathilde Carpentier; Sophie Brouillet; Joël Pothier
Journal:  Proteins       Date:  2005-10-01

2.  Computing exact P-values for DNA motifs.

Authors:  Jing Zhang; Bo Jiang; Ming Li; John Tromp; Xuegong Zhang; Michael Q Zhang
Journal:  Bioinformatics       Date:  2007-01-18       Impact factor: 6.937

3.  Alignment-free sequence comparison (I): statistics and power.

Authors:  Gesine Reinert; David Chew; Fengzhu Sun; Michael S Waterman
Journal:  J Comput Biol       Date:  2009-12       Impact factor: 1.479

4.  Faster exact Markovian probability functions for motif occurrences: a DFA-only approach.

Authors:  Paolo Ribeca; Emanuele Raineri
Journal:  Bioinformatics       Date:  2008-10-09       Impact factor: 6.937

5.  Compound Poisson and Poisson process approximations for occurrences of multiple words in Markov chains.

Authors:  G Reinert; S Schbath
Journal:  J Comput Biol       Date:  1998       Impact factor: 1.479

6.  Exact computation of pattern probabilities in random sequences generated by Markov chains.

Authors:  J Kleffe; U Langbecker
Journal:  Comput Appl Biosci       Date:  1990-10

7.  Susceptibility of nonpromoter CpG islands to de novo methylation in normal and neoplastic cells.

Authors:  C Nguyen; G Liang; T T Nguyen; D Tsao-Wei; S Groshen; M Lübbert; J H Zhou; W F Benedict; P A Jones
Journal:  J Natl Cancer Inst       Date:  2001-10-03       Impact factor: 13.506

8.  Protein structure database search and evolutionary classification.

Authors:  Jinn-Moon Yang; Chi-Hua Tung
Journal:  Nucleic Acids Res       Date:  2006-08-02       Impact factor: 16.971

9.  Protein Block Expert (PBE): a web-based protein structure analysis server using a structural alphabet.

Authors:  M Tyagi; P Sharma; C S Swamy; F Cadet; N Srinivasan; A G de Brevern; B Offmann
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

10.  Protein structure search and local structure characterization.

Authors:  Shih-Yen Ku; Yuh-Jyh Hu
Journal:  BMC Bioinformatics       Date:  2008-08-22       Impact factor: 3.169

View more
  4 in total

1.  Normal and compound poisson approximations for pattern occurrences in NGS reads.

Authors:  Zhiyuan Zhai; Gesine Reinert; Kai Song; Michael S Waterman; Yihui Luan; Fengzhu Sun
Journal:  J Comput Biol       Date:  2012-06       Impact factor: 1.479

2.  Alignment-free sequence comparison (II): theoretical power of comparison statistics.

Authors:  Lin Wan; Gesine Reinert; Fengzhu Sun; Michael S Waterman
Journal:  J Comput Biol       Date:  2010-10-25       Impact factor: 1.479

Review 3.  New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing.

Authors:  Kai Song; Jie Ren; Gesine Reinert; Minghua Deng; Michael S Waterman; Fengzhu Sun
Journal:  Brief Bioinform       Date:  2013-09-23       Impact factor: 11.622

4.  Alignment-free sequence comparison based on next-generation sequencing reads.

Authors:  Kai Song; Jie Ren; Zhiyuan Zhai; Xuemei Liu; Minghua Deng; Fengzhu Sun
Journal:  J Comput Biol       Date:  2013-02       Impact factor: 1.479

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.