Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 The power of detecting enriched patterns: an HMM approach.

Literature DB >> 20426691

The power of detecting enriched patterns: an HMM approach.

Zhiyuan Zhai¹, Shih-Yen Ku, Yihui Luan, Gesine Reinert, Michael S Waterman, Fengzhu Sun.

Abstract

The identification of binding sites of transcription factors (TF) and other regulatory regions, referred to as motifs, located in a set of molecular sequences is of fundamental importance in genomic research. Many computational and experimental approaches have been developed to locate motifs. The set of sequences of interest can be concatenated to form a long sequence of length n. One of the successful approaches for motif discovery is to identify statistically over- or under-represented patterns in this long sequence. A pattern refers to a fixed word W over the alphabet. In the example of interest, W is a word in the set of patterns of the motif. Despite extensive studies on motif discovery, no studies have been carried out on the power of detecting statistically over- or under-represented patterns Here we address the issue of how the known presence of random instances of a known motif affects the power of detecting patterns, such as patterns within the motif. Let N(W)(n) be the number of possibly overlapping occurrences of a pattern W in the sequence that contains instances of a known motif; such a sequence is modeled here by a Hidden Markov Model (HMM). First, efficient computational methods for calculating the mean and variance of N(W)(n) are developed. Second, efficient computational methods for calculating parameters involved in the normal approximation of N(W)(n) for frequent patterns and compound Poisson approximation of N(W)(n) for rare patterns are developed. Third, an easy to use web program is developed to calculate the power of detecting patterns and the program is used to study the power of detection in several interesting biological examples.

Mesh：

Year: 2010 PMID： 20426691 PMCID： PMC3203519 DOI： 10.1089/cmb.2009.0218

Source DB: PubMed Journal: J Comput Biol ISSN： 1066-5277 Impact factor: 1.479

30 in total

The power of detecting enriched patterns: an HMM approach.

1. YAKUSA: a fast structural database scanning method.

2. Computing exact P-values for DNA motifs.

3. Alignment-free sequence comparison (I): statistics and power.

4. Faster exact Markovian probability functions for motif occurrences: a DFA-only approach.

5. Compound Poisson and Poisson process approximations for occurrences of multiple words in Markov chains.

6. Exact computation of pattern probabilities in random sequences generated by Markov chains.

7. Susceptibility of nonpromoter CpG islands to de novo methylation in normal and neoplastic cells.

8. Protein structure database search and evolutionary classification.

9. Protein Block Expert (PBE): a web-based protein structure analysis server using a structural alphabet.

10. Protein structure search and local structure characterization.

1. Normal and compound poisson approximations for pattern occurrences in NGS reads.

2. Alignment-free sequence comparison (II): theoretical power of comparison statistics.

Review 3. New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing.

4. Alignment-free sequence comparison based on next-generation sequencing reads.