| Literature DB >> 35602045 |
Yan-Ting Jin1,2, Dong-Kai Pu1, Hai-Xia Guo1, Zixin Deng2, Ling-Ling Chen3, Feng-Biao Guo2.
Abstract
If a stop codon appears within one gene, then its translation will be terminated earlier than expected. False folding of premature protein will be adverse to the host; hence, all functional genes would tend to avoid the intragenic stop codons. Therefore, we hypothesize that there will be less frequency of nucleotides corresponding to stop codons at each codon position of genes. Here, we validate this inference by investigating the nucleotide frequency at a large scale and results from 19,911 prokaryote genomes revealed that nucleotides coinciding with stop codons indeed have the lowest frequency in most genomes. Interestingly, genes with three types of stop codons all tend to follow a T-G-A deficiency pattern, suggesting that the property of avoiding intragenic termination pressure is the same and the major stop codon TGA plays a dominant role in this effect. Finally, a positive correlation between the TGA deficiency extent and the base length was observed in start-experimentally verified genes of Escherichia coli (E. coli). This strengthens the proof of our hypothesis. The T-G-A deficiency pattern observed would help to understand the evolution of codon usage tactics in extant organisms.Entities:
Keywords: T-G-A deficiency; codon position; premature protein; stop codons; termination
Year: 2022 PMID: 35602045 PMCID: PMC9116502 DOI: 10.3389/fmicb.2022.847325
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Number and percentage of bacterial genomes with each nucleotide showing the least at each codon position.
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| First | 0 | 0 |
|
| 6,039 | 30.3 | 0 | 0 |
| Second | 176 | 0.9 | 0 | 0 | 3 | 0.0 |
|
|
| Third |
|
| 827 | 4.2 | 6,224 | 31.3 | 2,364 | 11.9 |
For each genome, we calculated frequency of four nucleotides at each codon position and chose the least nucleotides for this genome.
Taking the first codon position as an example, we explain the meaning of the genome number and genome percentage. As we know, a total of 19,911 genomes are involved. Among them, 13,872 genomes have T as the least nucleotide out of the four nucleotides at the first codon position and 6,039 genomes have C as the least nucleotide at this codon position. No genomes have A and G as the least nucleotide at this position. In other words, T constitutes the least nucleotides at the codon position in more genomes than A, C, and G.
The bold values indicate the least nucleotides at each codon position.
Figure 1Usage of nucleotides in triplet codons. The y axis denotes the frequency of one specific nucleotide at one specific codon position. In Escherichia coli (E. coli), T, G, and A are used the least at the first, second, and third positions of triple codons, respectively.
Percentage of genes with nucleotides with the least frequency at the first, second, and third positions in three groups of genes.
|
|
|
| ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
| TAG group | T | 0.728 | G | 0.730 | A | 0.551 |
| TGA group | T | 0.845 | G | 0.627 | A | 0.661 |
| TAA group | T | 0.620 | G | 0.804 | A | 0.380 |
To save space and improve the readability, only the nucleotide with the highest percentage of genes was shown.
Figure 2Linear regression analyses of frequencies of T, G, and A against gene length. Note that 513 genes are divided into 30 groups based on their lengths.