Literature DB >> 2004273

Genome inhomogeneity is determined mainly by WW and SS dinucleotides.

C G Kozhukhin1, P A Pevzner.   

Abstract

According to the hypothesis of the modular structure of DNA, genomes consist of modules of various nature which may differ in statistical characteristics. Statistical analysis helps in revealing the differences in statistical characteristics and predicting the modular structure. In this connection the question about the contribution of each word of length l (l-tuple) to the inhomogeneity of genetic text arises. The notion of stationary (i.e. relatively evenly distributed over a genome) versus non-stationary l-tuples has been introduced previously. In this paper, the dinucleotide distributions for all long sequences from GenBank were analyzed and it was shown that non-stationary dinucleotides are closely associated with polyW and polyS tracts (W denotes 'weak' nucleotides A or T, while S stands for the 'strong' nucleotides G or C). Thus, genome inhomogeneity is shown to be determined mainly by AA, TT, GG, CC, AT, TA, GC and CG dinucleotides. It has been demonstrated that neither 'codon usage' nor the 'isochore model' can account for this phenomenon.

Entities:  

Mesh:

Substances:

Year:  1991        PMID: 2004273     DOI: 10.1093/bioinformatics/7.1.39

Source DB:  PubMed          Journal:  Comput Appl Biosci        ISSN: 0266-7061


  8 in total

Review 1.  Information contents and dinucleotide compositions of plant intron sequences vary with evolutionary origin.

Authors:  O White; C Soderlund; P Shanmugan; C Fields
Journal:  Plant Mol Biol       Date:  1992-09       Impact factor: 4.076

2.  WORDUP: an efficient algorithm for discovering statistically significant patterns in DNA sequences.

Authors:  G Pesole; N Prunella; S Liuni; M Attimonelli; C Saccone
Journal:  Nucleic Acids Res       Date:  1992-06-11       Impact factor: 16.971

3.  Symmetry observations in long nucleotide sequences.

Authors:  V V Prabhu
Journal:  Nucleic Acids Res       Date:  1993-06-25       Impact factor: 16.971

4.  Classification of COVID-19 and Other Pathogenic Sequences: A Dinucleotide Frequency and Machine Learning Approach.

Authors:  Gciniwe S Dlamini; Stephanie J Muller; Rebone L Meraba; Richard A Young; James Mashiyane; Tapiwa Chiwewe; Darlington S Mapiye
Journal:  IEEE Access       Date:  2020-10-15       Impact factor: 3.367

5.  Comparative DNA sequence features in two long Escherichia coli contigs.

Authors:  L R Cardon; C Burge; G A Schachtel; B E Blaisdell; S Karlin
Journal:  Nucleic Acids Res       Date:  1993-08-11       Impact factor: 16.971

6.  The frequency of two-base tracts in eukaryotic genomes.

Authors:  G Yagil
Journal:  J Mol Evol       Date:  1993-08       Impact factor: 2.395

7.  Over- and under-representation of short oligonucleotides in DNA sequences.

Authors:  C Burge; A M Campbell; S Karlin
Journal:  Proc Natl Acad Sci U S A       Date:  1992-02-15       Impact factor: 11.205

8.  Conservation vs. variation of dinucleotide frequencies across bacterial and archaeal genomes: evolutionary implications.

Authors:  Hang Zhang; Peng Li; Hong-Sheng Zhong; Shang-Hong Zhang
Journal:  Front Microbiol       Date:  2013-09-06       Impact factor: 5.640

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.