| Literature DB >> 17562011 |
Kohji Okamura1, John Wei, Stephen W Scherer.
Abstract
BACKGROUND: Chargaff's rule of DNA base composition, stating that DNA comprises equal amounts of adenine and thymine (%A = %T) and of guanine and cytosine (%C = %G), is well known because it was fundamental to the conception of the Watson-Crick model of DNA structure. His second parity rule stating that the base proportions of double-stranded DNA are also reflected in single-stranded DNA (%A = %T, %C = %G) is more obscure, likely because its biological basis and significance are still unresolved. Within each strand, the symmetry of single nucleotide composition extends even further, being demonstrated in the balance of di-, tri-, and multi-nucleotides with their respective complementary oligonucleotides.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17562011 PMCID: PMC1913523 DOI: 10.1186/1471-2164-8-160
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Mononucleotide content in contiguous single-stranded DNA scaffolds from each human chromosome *
| Chromosome | Accession number | Length | %A | %T | %C | %G |
| 1 | NT_032977 | 73,835,825 | 29.72 | 29.69 | 20.33 | 20.27 |
| 2 | NT_005403 | 84,213,157 | 30.60 | 30.68 | 19.34 | 19.38 |
| 3 | NT_005612 | 100,530,253 | 30.51 | 30.53 | 19.46 | 19.49 |
| 4 | NT_016354 | 92,123,751 | 31.34 | 31.33 | 18.64 | 18.69 |
| 5 | NT_006576 | 46,378,398 | 30.45 | 30.31 | 19.62 | 19.62 |
| 6 | NT_025741 | 61,645,385 | 30.84 | 30.86 | 19.16 | 19.14 |
| 7 | NT_007933 | 64,426,257 | 30.43 | 30.39 | 19.62 | 19.56 |
| 8 | NT_008046 | 57,155,273 | 30.21 | 30.04 | 19.89 | 19.86 |
| 9 | NT_008470 | 40,394,265 | 28.72 | 28.72 | 21.27 | 21.28 |
| 10 | NT_030059 | 44,617,998 | 29.12 | 29.30 | 20.80 | 20.77 |
| 11 | NT_009237 | 49,571,094 | 29.57 | 29.70 | 20.36 | 20.37 |
| 12 | NT_029419 | 38,648,979 | 30.06 | 30.01 | 19.96 | 19.97 |
| 13 | NT_024524 | 67,740,325 | 30.97 | 30.93 | 19.06 | 19.04 |
| 14 | NT_026437 | 88,290,585 | 29.44 | 29.67 | 20.42 | 20.47 |
| 15 | NT_010194 | 53,619,965 | 29.06 | 28.82 | 21.11 | 21.01 |
| 16 | NT_010498 | 42,003,582 | 28.32 | 28.31 | 21.66 | 21.70 |
| 17 | NT_010783 | 24,793,602 | 28.22 | 28.25 | 21.76 | 21.76 |
| 18 | NT_010966 | 33,548,238 | 30.34 | 30.23 | 19.73 | 19.71 |
| 19 | NT_011109 | 31,383,029 | 26.25 | 26.32 | 23.68 | 23.76 |
| 20 | NT_011362 | 26,144,333 | 27.26 | 27.56 | 22.57 | 22.61 |
| 21 | NT_011512 | 28,617,429 | 30.57 | 30.31 | 19.60 | 19.52 |
| 22 | NT_011520 | 23,276,302 | 26.33 | 26.29 | 23.72 | 23.67 |
| X | NT_011651 | 36,813,576 | 31.07 | 31.36 | 18.74 | 18.82 |
| Y | NT_011875 | 10,002,238 | 30.43 | 30.52 | 19.35 | 19.70 |
| mtDNA | NC_001807 | 16,571 | 30.86 | 24.66 | 31.33 | 13.16 |
* The longest contig was chosen from each human chromosome.
Figure 1Inversions as an explanation for intra-strand parity. A, Duplication followed by inversion. If a double-stranded DNA, shown in gray, undergoes duplication and inversion, then the resulting molecule precisely demonstrates the strand parity (both within and between strands). B, A mathematical explanation of intra-strand parity. The nth inversion is illustrated by a box with crossed bars and ris the relative length of the inversion within a total fragment of length = 1. Ultimately both Aand Tconverge to the average of their initial frequencies. See Methods for details. Although a linear double-stranded DNA is shown, this could also be circular. C, A small number of inversions can cause DNA to follow the intra-strand parity. A 40-bp double-stranded DNA fragment in the human mtDNA (position 1875–1914 in accession number NC_001807) is shown, along with the outcome of a single artificial inversion, which has homogenized the contents of the two strands.
Dinucleotide frequencies in a human genomic contig without repetitive sequences *
| Dinucleotide | Frequency | Difference | Frequency | Dinucleotide |
| AA | 0.10956 | 0.00084 | 0.10872 | TT |
| AC | 0.04992 | 0.00047 | 0.04945 | GT |
| AG | 0.06718 | 0.00016 | 0.06702 | CT |
| AT | 0.08639 | 0.00000 | 0.08639 | AT |
| CA | 0.07012 | 0.00072 | 0.06940 | TG |
| CC | 0.04309 | 0.00027 | 0.04282 | GG |
| CG | 0.00781 | 0.00000 | 0.00781 | CG |
| CT | 0.06702 | 0.00016 | 0.06718 | AG |
| GA | 0.05869 | 0.00008 | 0.05876 | TC |
| GC | 0.03630 | 0.00000 | 0.03630 | GC |
| GG | 0.04282 | 0.00027 | 0.04309 | CC |
| GT | 0.04945 | 0.00047 | 0.04992 | AC |
| TA | 0.07474 | 0.00000 | 0.07474 | TA |
| TC | 0.05876 | 0.00008 | 0.05869 | GA |
| TG | 0.06940 | 0.00072 | 0.07012 | CA |
| TT | 0.10872 | 0.00084 | 0.10956 | AA |
| Total | 1.00000 | 1.00000 | Total |
* See text and Methods.
Figure 2Intra-strand parity visually represented by radar charts. Frequencies of trinucleotides in various DNA sequences are shown here. Each trinucleotide is sorted alphabetically from bottom to top (left side). The corresponding complementary trinucleotides are arranged across to the right. A, Radar chart representing a fully sequenced contig (NT_010966, 33,548,238 bp) of human chromosome 18. This contig is continuous and does not include any annotated gaps or ambiguous nucleotides. The symmetrical chart shows the equal frequencies of specific oligonucleotides and their reverse complementary oligonucleotides. The high frequencies of poly-A and poly-T, which might be, in part, traces of retrotranspositions of poly-A+ mRNA, and the deficiencies of trinucleotides that contain the CpG dinucleotide make the stalk and four grooves, respectively, of the "maple leaf" shape. (The shapes vary slightly based on the genome sequence analyzed, but the general symmetry is maintained). B, The genomic sequence of the p53 (TP53) locus (U94788, 20,303 bp). The symmetry is roughly retained in sequences as short as 20 kb in length. The protein-coding sequences occupy 5.8% of this locus. This chart also suggests that transcriptional asymmetry is small in magnitude. C, Human mtDNA. The asymmetry illustrates that this DNA does not show intra-strand parity. D, Human mtDNA after inversion in silico. It becomes symmetrical, demonstrating that inversions can change a sequence to create the parity. In this case, each rapproximates to 1/16.6. This also demonstrates that only 1/(2r) inversions (eight inversions in this case) are enough to make a sequence conform to parity. E, The difference of frequencies of GGG and CCC ([GGG] - [CCC]) in human mtDNA approaches 0 by in silico random inversions. In this analysis, for simplicity, the size of each inversion was fixed to 100 bp. In human mtDNA, GGG and CCC have the largest difference of frequencies among all trinucleoties (see Fig. 2C).