Literature DB >> 3550699

Mono- through hexanucleotide composition of the Escherichia coli genome: a Markov chain analysis.

G J Phillips, J Arnold, R Ivarie.   

Abstract

Several statistical methods were tested for accuracy in predicting observed frequencies of di- through hexanucleotides in 74,444 bp of E. coli DNA. A Markov chain was most accurate overall, whereas other methods, including a random model based on mononucleotide frequencies, were very inaccurate. When ranked highest to lowest abundance, the observed frequencies of oligonucleotides up to six bases in length in E. coli DNA were highly asymmetric. All ordered abundance plots had a wide linear range containing the majority of the oligomers which deviated sharply at the high and low ends of the curves. In general, values predicted by a Markov chain closely followed the overall shape of the ordered abundance curves. A simple equation was derived by which the frequency of any nucleotide longer than four bases in the E. coli genome (or any genome) can be relatively accurately estimated from the nested set of component tri- and tetranucleotides by serial application of a 3rd order Markov chain. The equation yielded a mean ratio of 1.03 +/- 0.94 for the observed-to-expected frequencies of the 4,096 hexanucleotides. Hence, the method is a relatively accurate but not perfect predictor of the length in nucleotides between hexanucleotide sites. Higher accuracy can be achieved using a 4th order Markov chain and larger data sets. The high asymmetry in oligonucleotide abundance means that in the E. coli genome of 4.2 X 10(6) bp many relatively short sequences of 7-9 bp are very rare or absent.

Entities:  

Mesh:

Substances:

Year:  1987        PMID: 3550699      PMCID: PMC340672          DOI: 10.1093/nar/15.6.2611

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  11 in total

1.  A comprehensive package for DNA sequence analysis in FORTRAN IV for the PDP-11.

Authors:  J Arnold; V K Eckenrode; K Lemke; G J Phillips; S W Schaeffer
Journal:  Nucleic Acids Res       Date:  1986-01-10       Impact factor: 16.971

2.  A measure of the similarity of sets of sequences not requiring sequence alignment.

Authors:  B E Blaisdell
Journal:  Proc Natl Acad Sci U S A       Date:  1986-07       Impact factor: 11.205

3.  The statistical distribution of nucleic acid similarities.

Authors:  T F Smith; M S Waterman; C Burks
Journal:  Nucleic Acids Res       Date:  1985-01-25       Impact factor: 16.971

4.  Strong doublet preferences in nucleotide sequences and DNA geometry.

Authors:  R Nussinov
Journal:  J Mol Evol       Date:  1984       Impact factor: 2.395

5.  A method for detecting distant evolutionary relationships between protein or nucleic acid sequences in the presence of deletions or insertions.

Authors:  T C Elleman
Journal:  J Mol Evol       Date:  1978-06-20       Impact factor: 2.395

6.  The universal dinucleotide asymmetry rules in DNA and the amino acid codon choice.

Authors:  R Nussinov
Journal:  J Mol Evol       Date:  1981       Impact factor: 2.395

7.  Nearest neighbor nucleotide patterns. Structural and biological implications.

Authors:  R Nussinov
Journal:  J Biol Chem       Date:  1981-08-25       Impact factor: 5.157

8.  A test for nucleotide sequence homology.

Authors:  D Sankoff; R J Cedergren
Journal:  J Mol Biol       Date:  1973-06-15       Impact factor: 5.469

9.  Some rules in the ordering of nucleotides in the DNA.

Authors:  R Nussinov
Journal:  Nucleic Acids Res       Date:  1980-10-10       Impact factor: 16.971

View more
  43 in total

Review 1.  SWORDS: a statistical tool for analysing large DNA sequences.

Authors:  Probal Chaudhuri; Sandip Das
Journal:  J Biosci       Date:  2002-02       Impact factor: 1.826

2.  The use of simulated annealing in chromosome reconstruction experiments based on binary scoring.

Authors:  A J Cuticchia; J Arnold; W E Timberlake
Journal:  Genetics       Date:  1992-10       Impact factor: 4.562

3.  Statistical analysis of nucleotide sequences.

Authors:  E E Stückle; C Emmrich; U Grob; P J Nielsen
Journal:  Nucleic Acids Res       Date:  1990-11-25       Impact factor: 16.971

4.  Statistical evaluation and biological interpretation of non-random abundance in the E. coli K-12 genome of tetra- and pentanucleotide sequences related to VSP DNA mismatch repair.

Authors:  R Merkl; M Kröger; P Rice; H J Fritz
Journal:  Nucleic Acids Res       Date:  1992-04-11       Impact factor: 16.971

5.  DNA mismatch correction by Very Short Patch repair may have altered the abundance of oligonucleotides in the E. coli genome.

Authors:  A S Bhagwat; M McClelland
Journal:  Nucleic Acids Res       Date:  1992-04-11       Impact factor: 16.971

6.  The application of Markov chain analysis to oligonucleotide frequency prediction and physical mapping of Drosophila melanogaster.

Authors:  A J Cuticchia; R Ivarie; J Arnold
Journal:  Nucleic Acids Res       Date:  1992-07-25       Impact factor: 16.971

7.  Distinct patterns in the dinucleotide nearest neighbors to G/C and A/T oligomers in eukaryotic sequences.

Authors:  R Nussinov
Journal:  J Mol Evol       Date:  1991-09       Impact factor: 2.395

8.  Counterselection of GATC sequences in enterobacteriophages by the components of the methyl-directed mismatch repair system.

Authors:  P Deschavanne; M Radman
Journal:  J Mol Evol       Date:  1991-08       Impact factor: 2.395

9.  Compositional heterogeneity of the Escherichia coli genome: a role for VSP repair?

Authors:  G Gutiérrez; J Casadesús; J L Oliver; A Marín
Journal:  J Mol Evol       Date:  1994-10       Impact factor: 2.395

10.  Concordant evolution of coding and noncoding regions of DNA made possible by the universal rule of TA/CG deficiency-TG/CT excess.

Authors:  T Yomo; S Ohno
Journal:  Proc Natl Acad Sci U S A       Date:  1989-11       Impact factor: 11.205

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.