Literature DB >> 1422876

First and second moment of counts of words in random texts generated by Markov chains.

J Kleffe1, M Borodovsky.   

Abstract

An exact expression for the variance of random frequency that a given word has in text generated by a Markov chain is presented. The result is applied to periodic Markov chains, which describe the protein-coding DNA sequences better than simple Markov chains. A new solution to the problem of word overlap is proposed. It was found that the expected frequency and overlapping properties determine most of the variance. The expectation and variance of counts for triplets are compared with experimental counts in Escherichia coli coding sequences.

Entities:  

Mesh:

Year:  1992        PMID: 1422876     DOI: 10.1093/bioinformatics/8.5.433

Source DB:  PubMed          Journal:  Comput Appl Biosci        ISSN: 0266-7061


  17 in total

1.  In silico identification of putative regulatory sequence elements in the 5'-untranslated region of genes that are expressed during male gametogenesis.

Authors:  Raymond Jozef Maurinus Hulzink; Han Weerdesteyn; Anton Felix Croes; Tom Gerats; Marinus Maria Antonius van Herpen; Jacques van Helden
Journal:  Plant Physiol       Date:  2003-05       Impact factor: 8.340

Review 2.  Computational approaches to identify promoters and cis-regulatory elements in plant genomes.

Authors:  Stephane Rombauts; Kobe Florquin; Magali Lescot; Kathleen Marchal; Pierre Rouzé; Yves van de Peer
Journal:  Plant Physiol       Date:  2003-07       Impact factor: 8.340

3.  The power of detecting enriched patterns: an HMM approach.

Authors:  Zhiyuan Zhai; Shih-Yen Ku; Yihui Luan; Gesine Reinert; Michael S Waterman; Fengzhu Sun
Journal:  J Comput Biol       Date:  2010-04       Impact factor: 1.479

4.  Frequent oligonucleotides and peptides of the Haemophilus influenzae genome.

Authors:  S Karlin; J Mrázek; A M Campbell
Journal:  Nucleic Acids Res       Date:  1996-11-01       Impact factor: 16.971

Review 5.  Statistical signals in bioinformatics.

Authors:  Samuel Karlin
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-12       Impact factor: 11.205

6.  Over- and underrepresentation of short DNA words in herpesvirus genomes.

Authors:  M Y Leung; G M Marsh; T P Speed
Journal:  J Comput Biol       Date:  1996       Impact factor: 1.479

7.  Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals.

Authors:  J van Helden; M del Olmo; J E Pérez-Ortín
Journal:  Nucleic Acids Res       Date:  2000-02-15       Impact factor: 16.971

8.  Multi-alphabet consensus algorithm for identification of low specificity protein-DNA interactions.

Authors:  A V Ulyanov; G D Stormo
Journal:  Nucleic Acids Res       Date:  1995-04-25       Impact factor: 16.971

9.  Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data.

Authors:  Leslie Regad; Juliette Martin; Gregory Nuel; Anne-Claude Camproux
Journal:  Algorithms Mol Biol       Date:  2010-01-26       Impact factor: 1.405

10.  Atypical regions in large genomic DNA sequences.

Authors:  S Scherer; M S McPeek; T P Speed
Journal:  Proc Natl Acad Sci U S A       Date:  1994-07-19       Impact factor: 11.205

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.