Literature DB >> 8891954

Over- and underrepresentation of short DNA words in herpesvirus genomes.

M Y Leung1, G M Marsh, T P Speed.   

Abstract

The relative abundance and rarity of DNA words have been recognized in previous biological studies to have implications for the regulation, repair, and evolutionary mechanisms of a genome. In this paper, we review several different measures of abundance and rarity of DNA words, including z-scores, representation ratios, and cross-ratios, that have appeared in the recent literature, and examine the concordance among them using the human cytomegalovirus genome sequence. We then rank all words of length k = 2, ..., 5 of seven herpesvirus genomes according to their abundance, as measured by one of the z-scores based upon a stationary Markov model of order k-2. Using a simple metric on the ranks of 2-words of the seven herpesvirus sequences, we construct an evolutionary tree. Several 3-words are observed to be consistently over- or underrepresented in all seven herpesviruses. Furthermore, clusters of some of the most over- and underrepresented 4- and 5-words in the genomes are identified with functional sites such as the origins of replication and regulatory signals of individual viruses.

Entities:  

Mesh:

Substances:

Year:  1996        PMID: 8891954      PMCID: PMC4076300          DOI: 10.1089/cmb.1996.3.345

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  26 in total

1.  The application of Markov chain analysis to oligonucleotide frequency prediction and physical mapping of Drosophila melanogaster.

Authors:  A J Cuticchia; R Ivarie; J Arnold
Journal:  Nucleic Acids Res       Date:  1992-07-25       Impact factor: 16.971

2.  Linguistics of nucleotide sequences. II: Stationary words in genetic texts and the zonal structure of DNA.

Authors:  P A Pevzner; A A Mironov
Journal:  J Biomol Struct Dyn       Date:  1989-04

3.  Linguistics of nucleotide sequences: morphology and comparison of vocabularies.

Authors:  V Brendel; J S Beckmann; E N Trifonov
Journal:  J Biomol Struct Dyn       Date:  1986-08

4.  Compilation and analysis of eukaryotic POL II promoter sequences.

Authors:  P Bucher; E N Trifonov
Journal:  Nucleic Acids Res       Date:  1986-12-22       Impact factor: 16.971

5.  Pervasive CpG suppression in animal mitochondrial genomes.

Authors:  L R Cardon; C Burge; D A Clayton; S Karlin
Journal:  Proc Natl Acad Sci U S A       Date:  1994-04-26       Impact factor: 11.205

6.  Strong adenine clustering in nucleotide sequences.

Authors:  R Nussinov
Journal:  J Theor Biol       Date:  1980-07-21       Impact factor: 2.691

7.  Molecular evolution of herpesviruses: genomic and protein sequence comparisons.

Authors:  S Karlin; E S Mocarski; G A Schachtel
Journal:  J Virol       Date:  1994-03       Impact factor: 5.103

8.  Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences.

Authors:  S Schbath; B Prum; E de Turckheim
Journal:  J Comput Biol       Date:  1995       Impact factor: 1.479

9.  Human cytomegalovirus origin of DNA replication (oriLyt) resides within a highly complex repetitive region.

Authors:  M J Masse; S Karlin; G A Schachtel; E S Mocarski
Journal:  Proc Natl Acad Sci U S A       Date:  1992-06-15       Impact factor: 11.205

View more
  18 in total

Review 1.  SWORDS: a statistical tool for analysing large DNA sequences.

Authors:  Probal Chaudhuri; Sandip Das
Journal:  J Biosci       Date:  2002-02       Impact factor: 1.826

2.  Evolutionary implications of microbial genome tetranucleotide frequency biases.

Authors:  David T Pride; Richard J Meinersmann; Trudy M Wassenaar; Martin J Blaser
Journal:  Genome Res       Date:  2003-02       Impact factor: 9.043

Review 3.  Nonrandom clusters of palindromes in herpesvirus genomes.

Authors:  Ming-Ying Leung; Kwok Pui Choi; Aihua Xia; Louis H Y Chen
Journal:  J Comput Biol       Date:  2005-04       Impact factor: 1.479

4.  Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals.

Authors:  J van Helden; M del Olmo; J E Pérez-Ortín
Journal:  Nucleic Acids Res       Date:  2000-02-15       Impact factor: 16.971

5.  A basic analysis toolkit for biological sequences.

Authors:  Raffaele Giancarlo; Alessandro Siragusa; Enrico Siragusa; Filippo Utro
Journal:  Algorithms Mol Biol       Date:  2007-09-18       Impact factor: 1.405

6.  Mining protein loops using a structural alphabet and statistical exceptionality.

Authors:  Leslie Regad; Juliette Martin; Gregory Nuel; Anne-Claude Camproux
Journal:  BMC Bioinformatics       Date:  2010-02-04       Impact factor: 3.169

7.  Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data.

Authors:  Leslie Regad; Juliette Martin; Gregory Nuel; Anne-Claude Camproux
Journal:  Algorithms Mol Biol       Date:  2010-01-26       Impact factor: 1.405

8.  Conservation and implications of eukaryote transcriptional regulatory regions across multiple species.

Authors:  Lin Wan; Dayong Li; Donglei Zhang; Xue Liu; Wenjiang J Fu; Lihuang Zhu; Minghua Deng; Fengzhu Sun; Minping Qian
Journal:  BMC Genomics       Date:  2008-12-20       Impact factor: 3.969

9.  Discovery of novel transcription factor binding sites by statistical overrepresentation.

Authors:  Saurabh Sinha; Martin Tompa
Journal:  Nucleic Acids Res       Date:  2002-12-15       Impact factor: 16.971

10.  Compound poisson approximation of the number of occurrences of a position frequency matrix (PFM) on both strands.

Authors:  Utz J Pape; Sven Rahmann; Fengzhu Sun; Martin Vingron
Journal:  J Comput Biol       Date:  2008 Jul-Aug       Impact factor: 1.479

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.