| Literature DB >> 23084772 |
Abstract
The genetic code serves as one of the natural links for life's two conceptual frameworks-the informational and operational tracks-bridging the nucleotide sequence of DNA and RNA to the amino acid sequence of protein and thus its structure and function. On the informational track, DNA and its four building blocks have four basic variables: order, length, GC and purine contents; the latter two exhibit unique characteristics in prokaryotic genomes where protein-coding sequences dominate. Bridging the two tracks, tRNAs and their aminoacyl tRNA synthases that interpret each codon-nucleotide triplet, together with ribosomes, form a complex machinery that translates genetic information encoded on the messenger RNAs into proteins. On the operational track, proteins are selected in a context of cellular and organismal functions constantly. The principle of such a functional selection is to minimize the damage caused by sequence alteration in a seemingly random fashion at the nucleotide level and its function-altering consequence at the protein level; the principle also suggests that there must be complex yet sophisticated mechanisms to protect molecular interactions and cellular processes for cells and organisms from the damage in addition to both immediate or short-term eliminations and long-term selections. The two-century study of selection at species and population levels has been leading a way to understand rules of inheritance and evolution at molecular levels along the informational track, while ribogenomics, epigenomics and other operationally-defined omics (such as the metabolite-centric metabolomics) have been ushering biologists into the new millennium along the operational track.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23084772 PMCID: PMC5054704 DOI: 10.1016/j.gpb.2012.08.002
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Figure 1The four nucleotides and their variables in DNA sequence A. Life’s informational track has only four “cards”—four nucleotides A, T, G, and C—to “play” but a highly variable “deck” size. For instance, human has a deck of 3 billion “cards”. Although modified nucleotides do exist in genomes, their functional roles are often operational, such as 5-methylcytocine (5-mC) and 5-hydroxylcytocine (5-hmC). B. A “deck of cards” for all life forms has a limited number of basic variables.
Mechanisms associated with compositional dynamics of eubacterial genomes
| Mechanism | In-Track/Op-Track | GC-content/R-content | Mutation/selection | Selected Ref. |
|---|---|---|---|---|
| Global repair and replicationa | +++/++ | +++/++ | +++/++ | |
| Transcription-coupled DNA repair and transcriptiona | +/++ | ++/+ | ++/+ | |
| Strand-biased nucleotide compositiona | ++/+ | +/++ | ++/+ | |
| Strand-biased gene distributionb | +/++ | +/++ | +/++ | |
| Horizontal gene transfer/transpositionb | +/++ | +/+ | +/++ | |
| Genome size expansionb | +/++ | ++/+ | +/++ | |
| Environmental biotic and abiotic factors | +/+ | +/+ | +/+ |
Note: Some relevant molecular mechanisms and their impacts on the informational (In-Track) or (/) operational (Op-Track) tracks, GC- or R (purine)-contents, and mutation or selection are scored qualitatively (weakly positive, +; moderately positive, ++; and strongly positive, +++). A limited number of examples are provided as references (parentheses). Mechanisms related to composition alteration and those related to gene alteration are indicated by a and b, respectively.
Figure 2Dinucleotide and codon contents of prokaryotic genomes We used 300 genomes, 100 each, from the three dnaE-based groups including the dnaE1 (dnaE1–dnaE1) group (A), the dnaE2 (dnaE1–dnaE1 and dnaE2) group (B) and the dnaE3 (dnaE3–polC) group (C). Di-nucleotide contents (left panels) are sorted based on GC content increase (left to right; scale bars). Codons (right panels) are also sorted based on GC content changes (left to right; scale bars). The six-fold codons are separated into their corresponding two and four codon sets. Note that frequencies of dinucleotides are essentially equivalent to those of codons and that GC-rich and GC-poor codons are over-utilized in the dnaE2 and dnaE3 group bacteria.
Figure 3The Pendulum Model Pendulum models were drawn for both GC and purine contents in the same figure. On the horizontal scale, the GC content variation is shown where the equilibrium position (dashed yellow massless rob and massive bob) points at 50%, although there are ample genomes whose GC contents deviate significantly from this position. The amplitude of GC variation is rather broad, leading to a 60% difference (from 20% to 80%, horizontal double-arrowed blue line). The dashed blue curve indicates bob’s trajectory. Other dashed massless robs and massive bobs (red and blue indicate GC-rich and GC-poor, respectively) and their connected arrowed dials connect different GC-content to amino acids; the arrowed dials (dashed arrowed lines indicate transient positions) are aligned linearly with the bobs. On the vertical scale, variation of purine content is shown, which has smaller amplitude than that of GC content, (40–60%, green massive bob and massless rob), a third of the amplitude for GC content variation. The vertical double-arrowed green line indicates the amplitude and the green dashed curve shows the bob’s trajectory. The equilibrium position is indicated with yellow dashed massless rob and massive bob, pointing at 50% that is roughly the average purine content for most of the genomes and genes. The connected arrowed dials are perpendicular in this part of the model to the pendulums and such connection has no particular meaning but to demonstrate the link between nucleotide to amino acid sequences. The face of the “pendulum clock” has two components, the 64-codon genetic code and the 20 amino acid set. The frictionless pivot (dark grey toothed button) is fixed in the genetic code to indicate the fact that the information flow is translated into protein sequences through the code and that the code has both evolved step-wise to fix the coding capacity in the operational track and selected to minimize the damage in the operational track when DNA sequence varies to change the amino acid sequence. Among prokaryotic genomes, GC content variation is the major force dominating composition dynamics, while purine content variation only becomes pronounced when GC-content becomes relatively low. Lower GC-content forces the genomes to select more G for protein coding diversity and more genes on the leading strand to achieve transcription efficiency and transcript stability (see the main text for details).
Figure 4A detailed illustration of the Pendulum Model Three pendulums are positioned in such a way where the equilibrium position is shown in color and the other two positions are shown in grey to indicate their transient nature. The bob’s trajectory is indicated with a dashed blue line. When the pendulum moves toward either GC increase (red) or AT increase (blue), the GC or AT quarters expand (gray pendulums) as indicated with three schematic circular representations of the clock faces at the three positions. We only filled in the lower half of the clock’s face here, since half of the codon table is not GC-content sensitive. The concentric circles (from the center) are the first (A and U in blue; G and C in red), second (A and U in blue; G and C in red) and the third (only R or purine and Y or pyrimidine are indicated in the AU quarter; N in the GC quarter is omitted) codon positions. The outermost circle displays the corresponding amino acids in a single letter code. The model demonstrates that alterations in GC content lead to the reshuffling of the codon composition in protein through the organization of the genetic code or the codon table. Although the GC-sensitive quarters of the genetic code is directly affected, other codons are not all standing still since the six-codon members of the genetic code including Arg (R), Leu (L) and Ser (S), balance both the purine-sensitive and purine-insensitive halves and GC-sensitive and GC-insensitive quarters [32], [33], [34].