| Literature DB >> 34440343 |
Xin Li1, Xiaocen Wang1, Pengtao Gong1, Nan Zhang1, Xichen Zhang1, Jianhua Li1.
Abstract
Giardia duodenalis, a flagellated parasitic protozoan, the most common cause of parasite-induced diarrheal diseases worldwide. Codon usage bias (CUB) is an important evolutionary character in most species. However, G. duodenalis CUB remains unclear. Thus, this study analyzes codon usage patterns to assess the restriction factors and obtain useful information in shaping G. duodenalis CUB. The neutrality analysis result indicates that G. duodenalis has a wide GC3 distribution, which significantly correlates with GC12. ENC-plot result-suggesting that most genes were close to the expected curve with only a few strayed away points. This indicates that mutational pressure and natural selection played an important role in the development of CUB. The Parity Rule 2 plot (PR2) result demonstrates that the usage of GC and AT was out of proportion. Interestingly, we identified 26 optimal codons in the G. duodenalis genome, ending with G or C. In addition, GC content, gene expression, and protein size also influence G. duodenalis CUB formation. This study systematically analyzes G. duodenalis codon usage pattern and clarifies the mechanisms of G. duodenalis CUB. These results will be very useful to identify new genes, molecular genetic manipulation, and study of G. duodenalis evolution.Entities:
Keywords: Giardia duodenalis; codon usage bias; evolution; optimal codon; transcriptome
Mesh:
Substances:
Year: 2021 PMID: 34440343 PMCID: PMC8393687 DOI: 10.3390/genes12081169
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Distribution of GC contents in G. duodenalis genes.
Figure 2Neutrality plots analysis of GC12 and GC3 of G. duodenalis transcriptome.
Codon usage in G. duodenalis.
| AA | Codon | N | RSCU | AA | Codon | N | RSCU |
|---|---|---|---|---|---|---|---|
| Phe |
| 59,550 | 1.07 | Ser |
| 73,318 | 1.49 |
| UUC | 52,173 | 0.93 | UCC | 48,316 | 0.98 | ||
| Leu | UUA | 31,152 | 0.55 | UCA | 44,668 | 0.91 | |
| UUG | 46,111 | 0.82 | UCG | 32,496 | 0.66 | ||
|
| 82,286 | 1.47 | Pro |
| 41,755 | 1.11 | |
|
| 68,932 | 1.23 | CCC | 36,359 | 0.97 | ||
| CUA | 44,031 | 0.78 |
| 43,775 | 1.17 | ||
|
| 64,305 | 1.15 | CCG | 28,158 | 0.75 | ||
| Ile |
| 64,266 | 1.06 | Thr |
| 51,120 | 1.01 |
|
| 62,851 | 1.04 | ACC | 48,607 | 0.96 | ||
| AUA | 54,849 | 0.90 |
| 64,162 | 1.27 | ||
| Met | AUG | 71,824 | 1.00 | ACG | 38,778 | 0.77 | |
| Val |
| 53,711 | 1.12 | Ala |
| 66,007 | 1.04 |
|
| 53,666 | 1.12 |
| 64,537 | 1.02 | ||
| GUA | 33,623 | 0.70 |
| 82,061 | 1.29 | ||
|
| 50,286 | 1.05 | GCG | 41,021 | 0.65 | ||
| Tyr |
| 53,440 | 1.01 | Cys | UGU | 32,743 | 0.82 |
| UAC | 52,143 | 0.99 |
| 46,757 | 1.18 | ||
| His | CAU | 35,269 | 0.90 | Arg | CGU | 26,186 | 0.90 |
|
| 43,263 | 1.10 |
| 35,590 | 1.22 | ||
| Gln | CAA | 50,096 | 0.79 | CGA | 22,041 | 0.76 | |
|
| 76,057 | 1.21 | CGG | 24,910 | 0.85 | ||
| Asn | AAU | 61,610 | 0.97 | Ser | AGU | 38,408 | 0.78 |
|
| 66,024 | 1.03 |
| 57,389 | 1.17 | ||
| Lys | AAA | 50,623 | 0.63 | Arg |
| 34,334 | 1.18 |
|
| 109,717 | 1.37 |
| 31,972 | 1.10 | ||
| Asp |
| 85,457 | 1.00 | Gly | GGU | 32,910 | 0.77 |
| GAC | 85,476 | 1.00 |
| 49,200 | 1.15 | ||
| Glu | GAA | 72,247 | 0.78 |
| 47,874 | 1.12 | |
|
| 112,061 | 1.22 | GGG | 40,586 | 0.95 |
N: The number of codons; the frequently used codons of G. duodenalis are displayed in bold.
15 most frequently used codons of G. duodenalis highly expressed VSPs genes.
| Codon Amino Acid Fraction Frequency Number |
|---|
| GAU Asp 0.425 26.080 308 |
Bold, frequently used codons ended with G or C; Italics, frequently used codons ended with A or U.
The codon usage table of Giardia intestinalis was obtained from the Codon Usage Database via Optimizer software.
| C | AA | FRA. | FRE. | N | C | AA | FRA. | FRE. | N | C | AA | FRA. | FRE. | N | C | AA | FRA. | FRE. | N |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| UUU | F | 0.44 | 14.6 | 2723 | UCU | S | 0.23 | 17.7 | 3293 | UAU | Y | 0.44 | 14.3 | 2658 | UGU | C | 0.35 | 12.6 | 2349 |
| UUC | F | 0.56 | 18.9 | 3516 | UCC | S | 0.19 | 14.1 | 2632 | UAC | Y | 0.56 | 18.5 | 3448 |
| C | 0.65 | 23.8 | 4430 |
| UUA | L | 0.07 | 5.9 | 1095 | UCA | S | 0.13 | 9.9 | 1849 | UAA | * | 0.53 | 0.7 | 137 | UGA | W | 0.07 | 0.6 | 108 |
| UUG | L | 0.11 | 10.0 | 1861 | UCG | S | 0.12 | 9.2 | 1721 | UAG | * | 0.47 | 0.7 | 122 | UGG | W | 0.93 | 7.4 | 1378 |
| CUU | L | 0.24 | 21.0 | 3091 | CCU | P | 0.25 | 11.2 | 2077 | CAU | H | 0.36 | 7.6 | 1421 | CGU | R | 0.16 | 7.9 | 1462 |
|
| L | 0.28 | 24.6 | 4578 | CCC | P | 0.27 | 11.9 | 2215 | CAC | H | 0.64 | 13.4 | 2493 | CGC | R | 0.28 | 13.7 | 2552 |
| CUA | L | 0.10 | 9.1 | 1702 | CCA | P | 0.24 | 10.5 | 1963 | CAA | Q | 0.33 | 12.0 | 2228 | CGA | R | 0.10 | 4.8 | 891 |
| CUG | L | 0.20 | 17.5 | 3266 | CCG | P | 0.23 | 10.2 | 1898 |
| Q | 0.67 | 24.5 | 4551 | CGG | R | 0.10 | 4.8 | 896 |
| AUU | I | 0.32 | 17.3 | 3217 | ACU | T | 0.23 | 14.9 | 2776 | AAU | N | 0.40 | 16.5 | 3074 | AGU | S | 0.12 | 8.9 | 1658 |
|
| I | 0.44 | 23.7 | 4418 | ACC | T | 0.27 | 18.1 | 3377 |
| N | 0.60 | 24.4 | 4548 | AGC | S | 0.22 | 16.4 | 3061 |
| AUA | I | 0.25 | 13.5 | 2505 | ACA | T | 0.27 | 17.9 | 3331 | AAA | K | 0.23 | 13.8 | 2574 | AGA | R | 0.16 | 7.7 | 1433 |
| AUG | M | 1.00 | 21.3 | 3961 | ACG | T | 0.23 | 15.3 | 2846 |
| K | 0.77 | 45.3 | 8433 | AGG | R | 0.19 | 9.4 | 1749 |
| GUU | V | 0.26 | 16.2 | 3009 | GCU | A | 0.23 | 19.3 | 3583 |
| D | 0.42 | 24.6 | 4575 | GGU | G | 0.17 | 11.6 | 2155 |
| GUC | V | 0.36 | 22.7 | 4229 |
| A | 0.30 | 25.3 | 4707 |
| D | 0.58 | 33.6 | 6253 | GGC | G | 0.34 | 23.1 | 4291 |
| GUA | V | 0.13 | 8.0 | 1483 | GCA | A | 0.28 | 23.2 | 4318 | GAA | E | 0.31 | 18.9 | 3516 | GGA | G | 0.25 | 16.8 | 3128 |
| GUG | V | 0.25 | 15.6 | 2895 | GCG | A | 0.19 | 15.9 | 2965 |
| E | 0.69 | 41.5 | 7719 | GGG | G | 0.23 | 15.7 | 2917 |
Bold, 10 most frequently used codons. C, Codon; AA, amino acid; FRA., fraction; FRE., frequency; N, number.
Figure 3Distribution of ENC and GC3s of G. duodenalis genes. ENC is plotted against GC3s. The solid line (green) shows the expected ENC value if the CUB is caused by GC3s only.
Figure 4Frequency distribution of the ENC ratio.
Figure 5Correspondence analysis of RSCU for G. duodenalis genes. (A). The first two axes show the distribution of G. duodenalis genes. Green dots represent G + C content ≥ 60%; Blue dots represent G + C content ≥ 45%, but less than 60%; Red dots represent G + C content ≤ 45%. (B). Panel A shows the distribution of codons on the same two axes. Red dots indicate A and T ending codons, and green dots indicate G and C ending codons. (C). Red dots represent ribosomal genes, yellow dots represent genes with an Aromo value ≥0.15, green dots represent genes with a Gravy value higher than 5, and blue dots represent other genes.
Figure 6PR2-bias plot [A3/(A3 + T3) against G3/(G3 + C3)]. Red circle represents the average position for each plot.
Figure 7Relationship between ENC and gene expression level for.
Figure 8Relationship between ENC and encoded protein length for G. duodenalis.
Translational optimal codons of G. duodenalis.
| AA | Codon | High RSCU (N) | Low RSCU (N) | AA | Codon | High RSCU (N) | Low RSCU (N) |
|---|---|---|---|---|---|---|---|
| Phe | UUU | 0.36 (632) | 1.24 (1846) | Ser | UCU | 0.88 (972) | 1.58 (3140) |
| UUC * | 1.64 (2845) | 0.76 (1132) | UCC * | 1.66 (1833) | 0.75 (1484) | ||
| Leu | UUA | 0.07 (94) | 0.99 (1832) | UCA | 0.27 (301) | 1.20 (2392) | |
| UUG | 0.27 (377) | 0.96 (1777) | UCG * | 1.13 (1249) | 0.53 (1044) | ||
| CUU | 0.96 (1350) | 1.31 (2421) | AGU | 0.32 (353) | 0.98 (1943) | ||
| CUC * | 2.96 (4153) | 0.76 (1399) | AGC * | 1.74 (1929) | 0.96 (1915) | ||
| CUA | 0.14 (190) | 1.06 (1962) | Pro | CCU | 0.64 (807) | 1.21 (1684) | |
| CUG * | 1.61 (2262) | 0.92 (1690) | CCC * | 1.57 (1974) | 0.76 (1063) | ||
| Ile | AUU | 0.51 (801) | 1.16 (2187) | CCA | 0.38 (480) | 1.43 (1992) | |
| AUC * | 2.16 (3418) | 0.73 (1369) | CCG * | 1.41 (1778) | 0.59 (824) | ||
| AUA | 0.33 (529) | 1.11 (2088) | Thr | ACU | 0.50 (729) | 1.21 (2058) | |
| Met | AUG | 1.00 (2296) | 1.00 (2331) | ACC * | 1.30 (1914) | 0.77 (1306) | |
| Val | GUU | 0.60 (1048) | 1.13 (1611) | ACA | 0.62 (917) | 1.42 (2413) | |
| GUC * | 2.27 (3977) | 0.74 (1066) | ACG * | 1.58 (2319) | 0.60 (1013) | ||
| GUA | 0.17 (297) | 1.11 (1590) | Ala | GCU | 0.52 (1271) | 1.22 (2344) | |
| GUG | 0.96 (1687) | 1.02 (1461) | GCC * | 1.70 (4180) | 0.77 (1487) | ||
| Tyr | UAU | 0.40 (638) | 1.21 (1682) | GCA | 0.63 (1551) | 1.47 (2826) | |
| UAC * | 1.60 (2521) | 0.79 (1095) | GCG * | 1.15 (2823) | 0.54 (1042) | ||
| His | CAU | 0.32 (338) | 1.13 (1582) | Cys | UGU | 0.32 (702) | 1.05 (1125) |
| CAC * | 1.68 (1779) | 0.87 (1221) | UGC * | 1.68 (3650) | 0.95 (1015) | ||
| Gln | CAA | 0.21 (335) | 1.07 (2489) | Trp | UGG | 1.00 (823) | 1.00 (749) |
| CAG * | 1.79 (2878) | 0.93 (2171) | Arg | CGU | 0.57 (547) | 0.87 (1036) | |
| Asn | AAU | 0.39 (763) | 1.15 (2189) | CGC * | 2.58 (2471) | 0.75 (890) | |
| AAC * | 1.61 (3105) | 0.85 (1618) | CGA | 0.25 (242) | 0.93 (1105) | ||
| Lys | AAA | 0.18 (548) | 0.92 (2284) | CGG | 0.91 (868) | 0.89 (1056) | |
| AAG * | 1.82 (5508) | 1.08 (2680) | AGA | 0.37 (355) | 1.57 (1859) | ||
| Asp | GAU | 0.45 (1310) | 1.23 (3040) | AGG * | 1.32 (1264) | 0.99 (1168) | |
| GAC * | 1.55 (4509) | 0.77 (1913) | Gly | GGU | 0.41 (833) | 0.96 (1045) | |
| Glu | GAA | 0.23 (721) | 1.00 (2804) | GGC * | 1.87 (3831) | 0.91 (987) | |
| GAG * | 1.77 (5474) | 1.00 (2795) | GGA | 0.54 (1106) | 1.30 (1407) | ||
| GGG * | 1.18 (2417) | 0.83 (898) |
Comparison of codon usage frequency between high and low expression genes of G. duodenalis. The optimal codons were determined by a Chi-square contingency test. * indicates that the frequency of the codons is much higher (p < 0.01). AA, amino acid; N, number of codons.