Literature DB >> 6316035

A Markov analysis of DNA sequences.

H Almagor.   

Abstract

We present a model by which we look at the DNA sequence as a Markov process. It has been suggested by several workers that some basic biological or chemical features of nucleic acids stand behind the frequencies of dinucleotides (doublets) in these chains. Comparing patterns of doublet frequencies in DNA of different organisms was shown to be a fruitful approach to some phylogenetic questions (Russel & Subak-Sharpe, 1977). Grantham (1978) formulated mRNA sequence indices, some of which involve certain doublet frequencies. He suggested that using these indices may provide indications of the molecular constraints existing during gene evolution. Nussinov (1981) has shown that a set of dinucleotide preference rules holds consistently for eukaryotes, and suggested a strong correlation between these rules and degenerate codon usage. Gruenbaum, Cedar & Razin (1982) found that methylation in eukaryotic DNA occurs exclusively at C-G sites. Important biological information thus seems to be contained in the doublet frequencies. One of the basic questions to be asked (the "correlation question") is to what extent are the 64 trinucleotide (triplet) frequencies measured in a sequence determined by the 16 doublet frequencies in the same sequence. The DNA is described here as a Markov process, with the nucleotides being outcomes of a sequence generator. Answering the correlation question mentioned above means finding the order of the Markov process. The difficulty is that natural sequences are of finite length, and statistical noise is quite strong. We show that even for a 16000 nucleotide long sequence (like that of the human mitochondrial genome) the finite length effect cannot be neglected. Using the Markov chain model, the correlation between doublet and triplet frequencies can, however, be determined even for finite sequences, taking proper account of the finite length. Two natural DNA sequences, the human mitochondrial genome and the SV40 DNA, are analysed as examples of the method.

Entities:  

Mesh:

Substances:

Year:  1983        PMID: 6316035     DOI: 10.1016/0022-5193(83)90251-5

Source DB:  PubMed          Journal:  J Theor Biol        ISSN: 0022-5193            Impact factor:   2.691


  25 in total

1.  Evolutionary implications of microbial genome tetranucleotide frequency biases.

Authors:  David T Pride; Richard J Meinersmann; Trudy M Wassenaar; Martin J Blaser
Journal:  Genome Res       Date:  2003-02       Impact factor: 9.043

2.  Statistical analysis of nucleotide sequences.

Authors:  E E Stückle; C Emmrich; U Grob; P J Nielsen
Journal:  Nucleic Acids Res       Date:  1990-11-25       Impact factor: 16.971

3.  A set of viral DNA decamers enriched in transcription control signals.

Authors:  S Volinia; C Scapoli; R Gambari; R Barale; I Barrai
Journal:  Nucleic Acids Res       Date:  1991-07-11       Impact factor: 16.971

4.  Self-identification of protein-coding regions in microbial genomes.

Authors:  S Audic; J M Claverie
Journal:  Proc Natl Acad Sci U S A       Date:  1998-08-18       Impact factor: 11.205

5.  Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics.

Authors:  Jie Ren; Kai Song; Minghua Deng; Gesine Reinert; Charles H Cannon; Fengzhu Sun
Journal:  Bioinformatics       Date:  2015-06-30       Impact factor: 6.937

6.  Oligonucleotide correlations between infector and host genomes hint at evolutionary relationships.

Authors:  I Barrai; C Scapoli; R Barale; S Volinia
Journal:  Nucleic Acids Res       Date:  1990-05-25       Impact factor: 16.971

7.  Codon preference and primary sequence structure in protein-coding regions.

Authors:  S Tavaré; B Song
Journal:  Bull Math Biol       Date:  1989       Impact factor: 1.758

8.  Diversity of the abundant pKLC102/PAGI-2 family of genomic islands in Pseudomonas aeruginosa.

Authors:  Jens Klockgether; Dieco Würdemann; Oleg Reva; Lutz Wiehlmann; Burkhard Tümmler
Journal:  J Bacteriol       Date:  2006-12-28       Impact factor: 3.490

9.  The genomes of the family Rhizobiaceae: size, stability, and rarely cutting restriction endonucleases.

Authors:  B W Sobral; R J Honeycutt; A G Atherly
Journal:  J Bacteriol       Date:  1991-01       Impact factor: 3.490

10.  Mono- through hexanucleotide composition of the Escherichia coli genome: a Markov chain analysis.

Authors:  G J Phillips; J Arnold; R Ivarie
Journal:  Nucleic Acids Res       Date:  1987-03-25       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.