| Literature DB >> 29207500 |
Christian J Michel1, Viviane Nguefack Ngoune2, Olivier Poch3, Raymond Ripp4, Julie D Thompson5.
Abstract
A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading) frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X, using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X, in the complete genome of the yeast Saccharomyces cerevisiae. Several properties of X motifs are identified by basic statistics (at the frequency level), and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R. We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae. We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae, but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions). This property is true for all cardinalities of X motifs (from 4 to 20) and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non-X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together, represent the first evidence for a significant enrichment of X motifs in the genes of an extant organism. They raise two hypotheses: the X motifs may be evolutionary relics of the primitive codes used for translation, or they may continue to play a functional role in the complex processes of genome decoding and protein synthesis.Entities:
Keywords: circular code motifs; gene enrichment; yeast Saccharomyces cerevisiae
Year: 2017 PMID: 29207500 PMCID: PMC5745565 DOI: 10.3390/life7040052
Source DB: PubMed Journal: Life (Basel) ISSN: 2075-1729
Figure 1Example of a gene structure, showing exons, introns and the CoDing Sequence (CDS) between the start and stop trinucleotides.
Figure 2Occurrence number (Section 2.3) of motifs (blue) and mean occurrence number (Section 2.3) of motifs (red) in the genome of S. cerevisiae. The abscissa shows the cardinality in trinucleotides. The ordinate gives the occurrence numbers and in logarithm.
Figure 3Occurrence number (Section 2.3) of motifs (blue) and mean occurrence number (Section 2.3) of motifs (red) in the non-coding regions of S. cerevisiae. The abscissa shows the cardinality in trinucleotides. The ordinate gives the occurrence numbers and in logarithm.
Figure 4Occurrence number (Section 2.3) of motifs (blue) and mean occurrence number (Section 2.3) of motifs (red) in the genes of S. cerevisiae. The abscissa shows the cardinality in trinucleotides. The ordinate gives the occurrence numbers and in logarithm.
Figure 5Difference (blue, left) and ratio (red, right) of motifs and motifs in the genes of S. cerevisiae (deduced from Figure 4). The abscissa shows the cardinality in trinucleotides. The ordinate gives the occurrence numbers and .
Figure 6Occurrence number (Section 2.3) of motifs (blue) and mean occurrence number (Section 2.3) of motifs (red) in the genes of the 16 chromosomes of S. cerevisiae. The abscissa shows the 16 chromosomes. The ordinate gives the occurrence numbers and in logarithm.
Longest motifs in the genes of S. cerevisiae. The 1st column gives the chromosome number, the 2nd, 3rd, 4th and 5th indicate the name, the start position, the end position and the nucleotide length, respectively, of genes containing the longest motifs, the 6th, 7th and 8th point out the start position, the end position and the nucleotide length, respectively, of the longest motifs, and 9th column gives the sequence of the longest motifs.
| Chr | Gene Name | Gene Start | Gene End | Gene Length | ||||
|---|---|---|---|---|---|---|---|---|
| VIII | YHR131C | 365,340 | 367,892 | 2553 | 365,358 | 365,489 | 132 | |
| XVI | YPL190C | 185,317 | 187,725 | 2409 | 187,303 | 187,428 | 126 | |
| XVI | YPL158C | 252,034 | 254,310 | 2277 | 252,241 | 252,363 | 123 | |
| XVI | YPR042C | 650,435 | 653,662 | 3228 | 650,504 | 650,611 | 108 | |
| VII | YGL150C | 221,104 | 225,573 | 4470 | 224,830 | 224,934 | 105 | |
| II | YBR150C | 541,209 | 544,493 | 3285 | 541,446 | 541,550 | 105 | |
| XI | YKR072C | 576,435 | 578,123 | 1689 | 576,471 | 576,572 | 102 | |
| XII | YLR114C | 374,944 | 377,238 | 2295 | 375,259 | 375,360 | 102 |
Figure 7Proportion (%, Section 2.4) of the motifs in the frames (reading frame; dark blue full line), (blue dashed line) and (light blue dotted line) of genes in S. cerevisiae. Mean proportion (%, Section 2.4) of the motifs in the frames (reading frame; dark red full line), (red dashed line) and (light red dotted line) of genes in S. cerevisiae. The abscissa shows the cardinality in trinucleotides. The ordinate gives the proportions in percentage.
Number of stop trinucleotides in frames 1 and 2 of the genes in S. cerevisiae.
| Frame 1 | Frame 2 | Total | |
|---|---|---|---|
| TAA | 64,458 | 91,661 | 156,119 |
| TAG | 51,774 | 37,366 | 89,140 |
| TGA | 69,568 | 115,459 | 185,027 |
| Total | 185,800 | 244,486 | 430,286 |
Figure 8Proportion (%, Section 2.4) of the motifs in the frames (reading frame; dark blue full line), (blue dashed line) and (light blue dotted line) of genes in S. cerevisiae. Mean proportion (%, Section 2.4) of the motifs in the frames (reading frame; dark red full line), (red dashed line) and (light red dotted line) of genes in S. cerevisiae. The abscissa shows the length in trinucleotides. The ordinate gives the proportions in percentage.
Figure 9Proportion of genes (blue) and non- genes (braun) according to their nucleotide length in S. cerevisiae. An gene is a gene containing at least one motif of cardinality trinucleotides in any frame. A non- gene is a gene with no motif of cardinality trinucleotides in any frame. The abscissa shows the gene length in intervals of 100 nucleotides. The ordinate gives the percentage of genes.
Numbers of genes and non- genes depending on the status of S. cerevisiae genes according to the SGD database. An gene is a gene containing at least one motif of cardinality trinucleotides in any frame. A non- gene is a gene with no motifs of cardinality trinucleotides in any frame. The total column represents the sum of genes with motifs and the non- genes, i.e., the number of S. cerevisiae genes in each category.
| Non- | Total | ||||||
|---|---|---|---|---|---|---|---|
| Verified genes | 5262 | 5082 | 4758 | 4388 | 4013 | 121 | 5383 |
| Uncharacterized genes | 449 | 348 | 266 | 221 | 174 | 97 | 546 |
| Dubious genes | 404 | 247 | 133 | 61 | 32 | 269 | 673 |
| Transposable elements | 60 | 60 | 60 | 59 | 59 | 29 | 89 |
| Total | 6175 | 5737 | 5217 | 4729 | 4278 | 516 | 6691 |
Trinucleotide compositions in the 5262 S. cerevisiae verified genes and in the motifs in frame 0 of these genes.
| Verified | ||||
|---|---|---|---|---|
| Number | % | Number | % | |
| AAC | 9796 | 6.33 | 48,354 | 6.27 |
| AAT | 13,228 | 8.55 | 71,108 | 9.22 |
| ACC | 5245 | 3.39 | 24,307 | 3.15 |
| ATC | 7569 | 4.89 | 33,049 | 4.29 |
| ATT | 12,117 | 7.84 | 58,617 | 7.60 |
| CAG | 4350 | 2.81 | 24,378 | 3.16 |
| CTC | 2499 | 1.62 | 10,475 | 1.36 |
| CTG | 4121 | 2.66 | 20,695 | 2.68 |
| GAA | 15,353 | 9.93 | 90,008 | 11.68 |
| GAC | 9125 | 5.90 | 39,699 | 5.15 |
| GAG | 7935 | 5.13 | 38,265 | 4.96 |
| GAT | 14,132 | 9.14 | 74,274 | 9.64 |
| GCC | 4896 | 3.17 | 23,549 | 3.05 |
| GGC | 3992 | 2.58 | 18,951 | 2.46 |
| GGT | 9004 | 5.82 | 44,365 | 5.76 |
| GTA | 4623 | 2.99 | 23,497 | 3.05 |
| GTC | 5132 | 3.32 | 21,884 | 2.84 |
| GTT | 8538 | 5.52 | 42,051 | 5.46 |
| TAC | 5983 | 3.87 | 28,452 | 3.69 |
| TTC | 6997 | 4.52 | 34,862 | 4.52 |
| Total | 154,635 | 100.00 | 770,840 | 100.00 |
Figure 10A model of the evolution of self-complementary circular codes. A code stands for a Self-complementary Circular code of longest path length . The maximal self-complementary trinucleotide circular code (1) belongs to the class of cardinality 20 trinucleotides (red rectangle). A (ordered) non-maximal self-complementary trinucleotide circular code of 10 trinucleotides among the 20 trinucleotides of belonging to the class of cardinality 10 trinucleotides (green rectangle) codes the 8 (ordered) early amino acids .