Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Mono- through hexanucleotide composition of the Escherichia coli genome: a Markov chain analysis.

Literature DB >> 3550699

Mono- through hexanucleotide composition of the Escherichia coli genome: a Markov chain analysis.

Abstract

Several statistical methods were tested for accuracy in predicting observed frequencies of di- through hexanucleotides in 74,444 bp of E. coli DNA. A Markov chain was most accurate overall, whereas other methods, including a random model based on mononucleotide frequencies, were very inaccurate. When ranked highest to lowest abundance, the observed frequencies of oligonucleotides up to six bases in length in E. coli DNA were highly asymmetric. All ordered abundance plots had a wide linear range containing the majority of the oligomers which deviated sharply at the high and low ends of the curves. In general, values predicted by a Markov chain closely followed the overall shape of the ordered abundance curves. A simple equation was derived by which the frequency of any nucleotide longer than four bases in the E. coli genome (or any genome) can be relatively accurately estimated from the nested set of component tri- and tetranucleotides by serial application of a 3rd order Markov chain. The equation yielded a mean ratio of 1.03 +/- 0.94 for the observed-to-expected frequencies of the 4,096 hexanucleotides. Hence, the method is a relatively accurate but not perfect predictor of the length in nucleotides between hexanucleotide sites. Higher accuracy can be achieved using a 4th order Markov chain and larger data sets. The high asymmetry in oligonucleotide abundance means that in the E. coli genome of 4.2 X 10(6) bp many relatively short sequences of 7-9 bp are very rare or absent.

Entities: Chemical Species

Mesh：

Substances：

Year: 1987 PMID： 3550699 PMCID： PMC340672 DOI： 10.1093/nar/15.6.2611

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

11 in total

1. A comprehensive package for DNA sequence analysis in FORTRAN IV for the PDP-11.

Authors: J Arnold; V K Eckenrode; K Lemke; G J Phillips; S W Schaeffer
Journal: Nucleic Acids Res Date: 1986-01-10 Impact factor: 16.971

2. A measure of the similarity of sets of sequences not requiring sequence alignment.

Authors: B E Blaisdell
Journal: Proc Natl Acad Sci U S A Date: 1986-07 Impact factor: 11.205

3. The statistical distribution of nucleic acid similarities.

Authors: T F Smith; M S Waterman; C Burks
Journal: Nucleic Acids Res Date: 1985-01-25 Impact factor: 16.971

4. Strong doublet preferences in nucleotide sequences and DNA geometry.

Authors: R Nussinov
Journal: J Mol Evol Date: 1984 Impact factor: 2.395

5. A method for detecting distant evolutionary relationships between protein or nucleic acid sequences in the presence of deletions or insertions.

Authors: T C Elleman
Journal: J Mol Evol Date: 1978-06-20 Impact factor: 2.395

6. The universal dinucleotide asymmetry rules in DNA and the amino acid codon choice.

Authors: R Nussinov
Journal: J Mol Evol Date: 1981 Impact factor: 2.395

7. Nearest neighbor nucleotide patterns. Structural and biological implications.

Authors: R Nussinov
Journal: J Biol Chem Date: 1981-08-25 Impact factor: 5.157

8. A test for nucleotide sequence homology.

Authors: D Sankoff; R J Cedergren
Journal: J Mol Biol Date: 1973-06-15 Impact factor: 5.469

9. Some rules in the ordering of nucleotides in the DNA.

Authors: R Nussinov
Journal: Nucleic Acids Res Date: 1980-10-10 Impact factor: 16.971

43 in total

Review 1. SWORDS: a statistical tool for analysing large DNA sequences.

Authors: Probal Chaudhuri; Sandip Das
Journal: J Biosci Date: 2002-02 Impact factor: 1.826

2. The use of simulated annealing in chromosome reconstruction experiments based on binary scoring.

Authors: A J Cuticchia; J Arnold; W E Timberlake
Journal: Genetics Date: 1992-10 Impact factor: 4.562

3. Statistical analysis of nucleotide sequences.

Authors: E E Stückle; C Emmrich; U Grob; P J Nielsen
Journal: Nucleic Acids Res Date: 1990-11-25 Impact factor: 16.971

4. Statistical evaluation and biological interpretation of non-random abundance in the E. coli K-12 genome of tetra- and pentanucleotide sequences related to VSP DNA mismatch repair.

Authors: R Merkl; M Kröger; P Rice; H J Fritz
Journal: Nucleic Acids Res Date: 1992-04-11 Impact factor: 16.971

5. DNA mismatch correction by Very Short Patch repair may have altered the abundance of oligonucleotides in the E. coli genome.

Authors: A S Bhagwat; M McClelland
Journal: Nucleic Acids Res Date: 1992-04-11 Impact factor: 16.971

6. The application of Markov chain analysis to oligonucleotide frequency prediction and physical mapping of Drosophila melanogaster.

Authors: A J Cuticchia; R Ivarie; J Arnold
Journal: Nucleic Acids Res Date: 1992-07-25 Impact factor: 16.971

7. Distinct patterns in the dinucleotide nearest neighbors to G/C and A/T oligomers in eukaryotic sequences.

Authors: R Nussinov
Journal: J Mol Evol Date: 1991-09 Impact factor: 2.395

8. Counterselection of GATC sequences in enterobacteriophages by the components of the methyl-directed mismatch repair system.

Authors: P Deschavanne; M Radman
Journal: J Mol Evol Date: 1991-08 Impact factor: 2.395

9. Compositional heterogeneity of the Escherichia coli genome: a role for VSP repair?

Authors: G Gutiérrez; J Casadesús; J L Oliver; A Marín
Journal: J Mol Evol Date: 1994-10 Impact factor: 2.395

10. Concordant evolution of coding and noncoding regions of DNA made possible by the universal rule of TA/CG deficiency-TG/CT excess.

Authors: T Yomo; S Ohno
Journal: Proc Natl Acad Sci U S A Date: 1989-11 Impact factor: 11.205