Literature DB >> 10890386

Probabilistic and statistical properties of words: an overview.

G Reinert1, S Schbath, M S Waterman.   

Abstract

In the following, an overview is given on statistical and probabilistic properties of words, as occurring in the analysis of biological sequences. Counts of occurrence, counts of clumps, and renewal counts are distinguished, and exact distributions as well as normal approximations, Poisson process approximations, and compound Poisson approximations are derived. Here, a sequence is modelled as a stationary ergodic Markov chain; a test for determining the appropriate order of the Markov chain is described. The convergence results take the error made by estimating the Markovian transition probabilities into account. The main tools involved are moment generating functions, martingales, Stein's method, and the Chen-Stein method. Similar results are given for occurrences of multiple patterns, and, as an example, the problem of unique recoverability of a sequence from SBH chip data is discussed. Special emphasis lies on disentangling the complicated dependence structure between word occurrences, due to self-overlap as well as due to overlap between words. The results can be used to derive approximate, and conservative, confidence intervals for tests.

Mesh:

Year:  2000        PMID: 10890386     DOI: 10.1089/10665270050081360

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  36 in total

1.  Local homology recognition and distance measures in linear time using compressed amino acid alphabets.

Authors:  Robert C Edgar
Journal:  Nucleic Acids Res       Date:  2004-01-16       Impact factor: 16.971

2.  Biased distribution of DNA uptake sequences towards genome maintenance genes.

Authors:  Tonje Davidsen; Einar A Rødland; Karin Lagesen; Erling Seeberg; Torbjørn Rognes; Tone Tønjum
Journal:  Nucleic Acids Res       Date:  2004-02-11       Impact factor: 16.971

3.  Normal and compound poisson approximations for pattern occurrences in NGS reads.

Authors:  Zhiyuan Zhai; Gesine Reinert; Kai Song; Michael S Waterman; Yihui Luan; Fengzhu Sun
Journal:  J Comput Biol       Date:  2012-06       Impact factor: 1.479

4.  The power of detecting enriched patterns: an HMM approach.

Authors:  Zhiyuan Zhai; Shih-Yen Ku; Yihui Luan; Gesine Reinert; Michael S Waterman; Fengzhu Sun
Journal:  J Comput Biol       Date:  2010-04       Impact factor: 1.479

5.  Studying the evolution of promoter sequences: a waiting time problem.

Authors:  Sarah Behrens; Martin Vingron
Journal:  J Comput Biol       Date:  2010-12       Impact factor: 1.479

6.  Importance sampling of word patterns in DNA and protein sequences.

Authors:  Hock Peng Chan; Nancy Ruonan Zhang; Louis H Y Chen
Journal:  J Comput Biol       Date:  2010-12       Impact factor: 1.479

7.  Modulefinder: a tool for computational discovery of cis regulatory modules.

Authors:  Anthony A Philippakis; Fangxue Sherry He; Martha L Bulyk
Journal:  Pac Symp Biocomput       Date:  2005

Review 8.  Nonrandom clusters of palindromes in herpesvirus genomes.

Authors:  Ming-Ying Leung; Kwok Pui Choi; Aihua Xia; Louis H Y Chen
Journal:  J Comput Biol       Date:  2005-04       Impact factor: 1.479

9.  Globally, unrelated protein sequences appear random.

Authors:  Daniel T Lavelle; William R Pearson
Journal:  Bioinformatics       Date:  2009-11-30       Impact factor: 6.937

10.  A comparative proteomic analysis of the simple amino acid repeat distributions in Plasmodia reveals lineage specific amino acid selection.

Authors:  Andrew R Dalby
Journal:  PLoS One       Date:  2009-07-14       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.