| Literature DB >> 21814388 |
Abstract
All coding DNAs exhibit 3-base periodicity (TBP), which may be defined as the tendency of nucleotides and higher order n-tuples, e.g. trinucleotides (triplets), to be preferentially spaced by 3, 6, 9 etc, bases, and we have proposed an association between TBP and clustering of same-phase triplets. We here investigated if TBP was affected by intercodon dinucleotide tendencies and whether clustering of same-phase triplets was involved. Under constant protein sequence intercodon dinucleotide frequencies depend on the distribution of synonymous codons. So, possible effects were revealed by randomly exchanging synonymous codons without altering protein sequences to subsequently document changes in TBP via frequency distribution of distances (FDD) of DNA triplets. A tripartite positive correlation was found between intercodon dinucleotide frequencies, clustering of same-phase triplets and TBP. So, intercodon C|A (where "|" indicates the boundary between codons) was more frequent in native human DNA than in the codon-shuffled sequences; higher C|A frequency occurred along with more frequent clustering of C|AN triplets (where N jointly represents A, C, G and T) and with intense CAN TBP. The opposite was found for C|G, which was less frequent in native than in shuffled sequences; lower C|G frequency occurred together with reduced clustering of C|GN triplets and with less intense CGN TBP. We hence propose that intercodon dinucleotides affect TBP via same-phase triplet clustering. A possible biological relevance of our findings is briefly discussed.Entities:
Keywords: Period-3 DNA structure; codon junctions; dinucleotide frequency; frequency distribution of distances; human coding DNA; neighboring codon choice; same-phase triplet clustering
Year: 2011 PMID: 21814388 PMCID: PMC3143393 DOI: 10.6026/97320630006327
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1A) Assessment of intercodon dinucleotide tendencies. Tendencies were estimated by comparing native dinucleotide frequencies at codon junctions (e.g. A|G) against those in synonymous-codon-shuffled sequences. The y-axis shows ratios of native to synonymous-codon-shuffled frequencies minus the unit (1), i.e. (native/shuffled) 1. Therefore positive values indicate intercodon preference while negative values indicate intercodon avoidance. Error bars represent 3 standard deviations (n=5). B) Analysis of 3-base periodicity by FDD of triplet formulas CAN, TGN, CGN and TAN in native and synonymous-codon-shuffled sequences. The x-axis is presented in logarithmic form to aid visualization of differences. Note that FDD does not discriminate between the different triplet phases. The blue line is for the native sequence and the red line is for the synonymous-codon-shuffled sequence. In the upper panels TBP is more intense in the native than in the shuffled sequence while in the lower panels dominance is inverted and TBP is more intense for the synonymous-codon-shuffled sequence than for the native one. In all cases error bars (on red line) represent 3 standard deviations. C) Clustering of same-phase triplets. The upper two panels correspond to triplet formulas CAN and TGN that displayed more intense 3-base periodicity in native sequences than in shuffled controls (Figure 1B, upper panels) with total frequencies of 25,521 and 15,454 ± 121 for C|AN and 27,047 and 17,906 ± 69 for T|GN and the lower two panels correspond to triplet formulas CGN and TAN that displayed less intense 3-base periodicity in native sequences than in shuffled controls (Figure 1B, lower panels) with total frequencies of 9,385 and 26,985 ± 71 for C|GN and 8,668 and 16,361 ± 63 for T|AN. Note that values above and below zero in the x-axis are both positive; hence in all cases the abundance and length of vertical lines are proportional to the number of same-phase triplets engaged in clusters as indicated in the y-axis. In the bottom of graphs the position of non-overlapping 10,000-bp windows is shown. D) Determination of same-phase triplet clustering in the three reading frames. The case for CGN is presented. In the x-axis the size of each cluster (number of triplets in each cluster) is shown. Clustering is presented for CGN|, CG|N and C|GN as indicated above each column set. Note that in the y-axis values above and below the x-axis are positive so that in both cases the column length is proportional to the number of triplets in clusters, i.e. cluster size x cluster frequency. Red columns above the x-axis are for the synonymous-codonshuffled sequence and blue columns below the x-axis are for the native sequence. E) Analysis of 3-base periodicity of triplet CGN in coding DNA of the indicated organisms. With a blue line TBP patterns for native sequences are shown while patterns for synonymous-codon-shuffled sequences are shown with a red line. As in other cases the x-axis is presented in logarithmic form to help visualization and frequency values are shown in the y-axis.