Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Probabilistic and statistical properties of words: an overview.

Literature DB >> 10890386

Probabilistic and statistical properties of words: an overview.

Abstract

In the following, an overview is given on statistical and probabilistic properties of words, as occurring in the analysis of biological sequences. Counts of occurrence, counts of clumps, and renewal counts are distinguished, and exact distributions as well as normal approximations, Poisson process approximations, and compound Poisson approximations are derived. Here, a sequence is modelled as a stationary ergodic Markov chain; a test for determining the appropriate order of the Markov chain is described. The convergence results take the error made by estimating the Markovian transition probabilities into account. The main tools involved are moment generating functions, martingales, Stein's method, and the Chen-Stein method. Similar results are given for occurrences of multiple patterns, and, as an example, the problem of unique recoverability of a sequence from SBH chip data is discussed. Special emphasis lies on disentangling the complicated dependence structure between word occurrences, due to self-overlap as well as due to overlap between words. The results can be used to derive approximate, and conservative, confidence intervals for tests.

Mesh：

Year: 2000 PMID： 10890386 DOI： 10.1089/10665270050081360

Source DB: PubMed Journal: J Comput Biol ISSN： 1066-5277 Impact factor: 1.479

Keyword Cloud
Cited

36 in total

10. A comparative proteomic analysis of the simple amino acid repeat distributions in Plasmodia reveals lineage specific amino acid selection.

Authors: Andrew R Dalby
Journal: PLoS One Date: 2009-07-14 Impact factor: 3.240

Probabilistic and statistical properties of words: an overview.

1. Local homology recognition and distance measures in linear time using compressed amino acid alphabets.

2. Biased distribution of DNA uptake sequences towards genome maintenance genes.

3. Normal and compound poisson approximations for pattern occurrences in NGS reads.

4. The power of detecting enriched patterns: an HMM approach.

5. Studying the evolution of promoter sequences: a waiting time problem.

6. Importance sampling of word patterns in DNA and protein sequences.

7. Modulefinder: a tool for computational discovery of cis regulatory modules.

Review 8. Nonrandom clusters of palindromes in herpesvirus genomes.

9. Globally, unrelated protein sequences appear random.

10. A comparative proteomic analysis of the simple amino acid repeat distributions in Plasmodia reveals lineage specific amino acid selection.