Literature DB >> 8521272

Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences.

S Schbath1, B Prum, E de Turckheim.   

Abstract

Identifying exceptional motifs is often used for extracting information from long DNA sequences. The two difficulties of the method are the choice of the model that defines the expected frequencies of words and the approximation of the variance of the difference T(W) between the number of occurrences of a word W and its estimation. We consider here different Markov chain models, either with stationary or periodic transition probabilities. We estimate the variance of the difference T(W) by the conditional variance of the number of occurrences of W given the oligonucleotides counts that define the model. Two applications show how to use asymptotically standard normal statistics associated with the counts to describe a given sequence in terms of its outlying words. Sequences of Escherichia coli and of Bacillus subtilis are compared with respect to their exceptional tri- and tetranucleotides. For both bacteria, exceptional 3-words are mainly found in the coding frame. E. coli palindrome counts are analyzed in different models, showing that many overabundant words are one-letter mutations of avoided palindromes.

Entities:  

Mesh:

Substances:

Year:  1995        PMID: 8521272     DOI: 10.1089/cmb.1995.2.417

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  29 in total

Review 1.  SWORDS: a statistical tool for analysing large DNA sequences.

Authors:  Probal Chaudhuri; Sandip Das
Journal:  J Biosci       Date:  2002-02       Impact factor: 1.826

2.  Evolutionary implications of microbial genome tetranucleotide frequency biases.

Authors:  David T Pride; Richard J Meinersmann; Trudy M Wassenaar; Martin J Blaser
Journal:  Genome Res       Date:  2003-02       Impact factor: 9.043

Review 3.  Computational approaches to identify promoters and cis-regulatory elements in plant genomes.

Authors:  Stephane Rombauts; Kobe Florquin; Magali Lescot; Kathleen Marchal; Pierre Rouzé; Yves van de Peer
Journal:  Plant Physiol       Date:  2003-07       Impact factor: 8.340

4.  Statistical analysis of over-represented words in human promoter sequences.

Authors:  Leonardo Mariño-Ramírez; John L Spouge; Gavin C Kanga; David Landsman
Journal:  Nucleic Acids Res       Date:  2004-02-12       Impact factor: 16.971

5.  Frequent oligonucleotides and peptides of the Haemophilus influenzae genome.

Authors:  S Karlin; J Mrázek; A M Campbell
Journal:  Nucleic Acids Res       Date:  1996-11-01       Impact factor: 16.971

Review 6.  Sequence analysis by iterated maps, a review.

Authors:  Jonas S Almeida
Journal:  Brief Bioinform       Date:  2013-10-25       Impact factor: 11.622

7.  Over- and underrepresentation of short DNA words in herpesvirus genomes.

Authors:  M Y Leung; G M Marsh; T P Speed
Journal:  J Comput Biol       Date:  1996       Impact factor: 1.479

8.  Palindromes in SARS and Other Coronaviruses.

Authors:  David S H Chew; Kwok Pui Choi; Hans Heidner; Ming-Ying Leung
Journal:  INFORMS J Comput       Date:  2004       Impact factor: 2.276

Review 9.  A primer on metagenomics.

Authors:  John C Wooley; Adam Godzik; Iddo Friedberg
Journal:  PLoS Comput Biol       Date:  2010-02-26       Impact factor: 4.475

Review 10.  Integrating sequence, evolution and functional genomics in regulatory genomics.

Authors:  Martin Vingron; Alvis Brazma; Richard Coulson; Jacques van Helden; Thomas Manke; Kimmo Palin; Olivier Sand; Esko Ukkonen
Journal:  Genome Biol       Date:  2009-01-30       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.