| Literature DB >> 27428961 |
Xiuzhang Li1, Hui Song2, Yu Kuang3, Shuihong Chen4, Pei Tian5, Chunjie Li6, Zhibiao Nan7.
Abstract
Analysis of codon usage data has both practical and theoretical applications in understanding the basics of molecular biology. Differences in codon usage patterns among genes reflect variations in local base compositional biases and the intensity of natural selection. Recently, there have been several reports related to codon usage in fungi, but little is known about codon usage bias in Epichloë endophytes. The present study aimed to assess codon usage patterns and biases in 4870 sequences from Epichloë festucae, which may be helpful in revealing the constraint factors such as mutation or selection pressure and improving the bioreactor on the cloning, expression, and characterization of some special genes. The GC content with 56.41% is higher than the AT content (43.59%) in E. festucae. The results of neutrality and effective number of codons plot analyses showed that both mutational bias and natural selection play roles in shaping codon usage in this species. We found that gene length is strongly correlated with codon usage and may contribute to the codon usage patterns observed in genes. Nucleotide composition and gene expression levels also shape codon usage bias in E. festucae. E. festucae exhibits codon usage bias based on the relative synonymous codon usage (RSCU) values of 61 sense codons, with 25 codons showing an RSCU larger than 1. In addition, we identified 27 optimal codons that end in a G or C.Entities:
Keywords: Epichloë festucae; codon usage bias; grass endophyte; natural selection; optimal codons
Mesh:
Substances:
Year: 2016 PMID: 27428961 PMCID: PMC4964511 DOI: 10.3390/ijms17071138
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Base composition, ENC, Gravy, and Aromo of codons from Epichloë Festucae.
| Class | Genes | Codons | GC (%) | GC1 (%) | GC2 (%) | GC3 (%) | GC3s (%) | T3s (%) | C3s (%) | A3s (%) | G3s (%) | Gravy | Aro | ENC | CAI |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Total | 4870 | 2498255 | 56.41 ± 4.60 | 58.68 ± 5.30 | 46.43 ± 5.80 | 64.11 ± 10.16 | 62.86 ± 10.52 | 23.91 ± 7.41 | 43.85 ± 9.99 | 21.13 ± 7.5 | 33.66 ± 7.37 | −0.41 ± 0.37 | 0.07 ± 0.02 | 51.58 ± 7.14 | 0.22 ± 0.04 |
Figure 1Neutrality plots (GC12 vs. GC3).GC12 stands for the average value of GC content in the first and second position of the codons (GC1 and GC2). While GC3 refers to the GC content in the third position. The solid line is the linear regression of GC12 against GC3, R2 = 0.0146, p < 0.01.
Figure 2The frequency distribution of effective number of codons (ENC) ratio.
Figure 3Effective number of codons ENC-plot showing relationship between ENC and GC3s.
Correlation coefficients between the positions of genes along the first two major axes with index of total genes’ codon usage and synonymous codon usage bias.
| Length | GC | GC1 | GC2 | GC3 | GC3S | A3S | T3S | C3S | G3S | GRAVY | AROMO | ENC | CAI | AXIS1 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GC | −0.191 ** | ||||||||||||||
| GC1 | −0.041 ** | 0.568 ** | |||||||||||||
| GC2 | −0.069 ** | 0.443 ** | 0.085 ** | ||||||||||||
| GC3 | −0.199 ** | 0.808 ** | 0.201 ** | −0.013 | |||||||||||
| GC3S | −0.200 ** | 0.816 ** | 0.214 ** | −0.004 | 0.998 ** | ||||||||||
| A3S | 0.189 ** | −0.710 ** | −0.181 ** | −0.028 * | −0.854 ** | −0.857 ** | |||||||||
| T3S | 0.160 ** | −0.753 ** | −0.196 ** | −0.092 ** | −0.868 ** | −0.868 ** | 0.508 ** | ||||||||
| C3S | −0.152 ** | 0.635 ** | 0.120 ** | −0.063 ** | 0.835 ** | 0.838 ** | −0.760 ** | −0.671 ** | |||||||
| G3S | −0.117 ** | 0.393 ** | 0.168 ** | −0.194 ** | 0.557 ** | 0.555 ** | −0.383 ** | −0.530 ** | 0.060 ** | ||||||
| GRAVY | −0.065 ** | 0.033 * | −0.180 ** | −0.138 ** | 0.217 ** | 0.207 ** | −0.336 ** | −0.114 ** | 0.198 ** | −0.063 ** | |||||
| AROMO | −0.040 ** | −0.195 ** | −0.412 ** | −0.317 ** | 0.130 ** | 0.106 ** | −0.129 ** | −0.028 | 0.163 ** | 0.004 | 0.420 ** | ||||
| ENC | 0.227 ** | −0.673 ** | −0.211 ** | −0.008 | −0.799 ** | −0.808 ** | 0.786 ** | 0.608 ** | −0.807 ** | −0.252 ** | −0.164 ** | −0.034 * | |||
| CAI | −0.024 | 0.108 ** | −0.015 | −0.193 ** | 0.265 ** | 0.266 ** | −0.455 ** | 0.045 ** | 0.526 ** | −0.176 ** | 0.080 ** | 0.144 ** | −0.402 ** | ||
| AXIS1 | 0.195 ** | −0.813 ** | −0.231 ** | −0.023 | −0.970 ** | −0.972 ** | 0.850 ** | 0.830 ** | −0.884 ** | −0.436 ** | −0.182 ** | −0.087 ** | 0.836 ** | −0.328 ** | |
| AXIS2 | −0.050 ** | 0.119 ** | 0.034 * | 0.042 ** | 0.120 ** | 0.115 ** | 0.177 ** | −0.374 ** | −0.345 ** | 0.690 ** | −0.054 ** | −0.025 | 0.193 ** | −0.613 ** | 0.000 |
** p < 0.01; * p < 0.05.
Correlation coefficients between the positions of genes along the first four major axes with an index of total genes’ amino acid usage.
| CAI | GRAVY | Aromo | Axis1 | Axis2 | Axis3 | |
|---|---|---|---|---|---|---|
| Gravy | 0.080 ** | |||||
| Aromo | 0.144 ** | 0.420 ** | ||||
| Axis1 | −0.328 ** | −0.182 ** | −0.087 ** | |||
| Axis2 | −0.613 ** | −0.054 ** | −0.025 * | 0.000 | ||
| Axis3 | −0.159 ** | −0.056 ** | 0.007 | 0.000 | 0.001 | |
| Axis4 | 0.005 | −0.008 | −0.036 ** | −0.003 | −0.001 | 0.002 |
** p < 0.01; * p < 0.05.
Codon usage of Epichloë festucae.
| Amino Acid | Codon | Total Count | RSCU |
|---|---|---|---|
| UUU | 31,361 | 0.71 | |
| UUC | 53,807 | ||
| UUA | 8539 | 0.22 | |
| UUG | 39,289 | ||
| CUU | 32,567 | 0.86 | |
| CUC | 63,009 | ||
| CUA | 15,908 | 0.42 | |
| CUG | 57,144 | ||
| AUU | 36,953 | 0.97 | |
| AUC | 56,383 | ||
| AUA | 14,980 | 0.42 | |
| AUG | 56,564 | ||
| GUU | 33,011 | 0.83 | |
| GUC | 63,583 | ||
| GUA | 15,230 | 0.39 | |
| GUG | 41,264 | ||
| UAU | 21,165 | 0.68 | |
| UAC | 39,027 | ||
| UGU | 10,231 | 0.59 | |
| UGC | 20,393 | ||
| CAU | 27,393 | 0.81 | |
| CAC | 36,289 | ||
| CAA | 44,072 | 0.80 | |
| CAG | 59,569 | ||
| AAU | 32,251 | 0.70 | |
| AAC | 55,829 | ||
| AAA | 37,695 | 0.63 | |
| AAG | 80,172 | ||
| GAU | 60,666 | 0.79 | |
| GAC | 87,756 | ||
| GAA | 60,978 | 0.76 | |
| GAG | 89,320 | ||
| UCU | 33,326 | 0.87 | |
| UCC | 44,291 | ||
| UCA | 30,242 | 0.76 | |
| UCG | 39,274 | ||
| CCU | 35,400 | 0.86 | |
| CCC | 49,710 | ||
| CCA | 35,372 | 0.83 | |
| CCG | 36,167 | 0.93 | |
| ACU | 27,362 | 0.71 | |
| ACC | 46,502 | ||
| ACA | 31,612 | 0.81 | |
| ACG | 40,791 | ||
| GCU | 49,281 | 0.84 | |
| GCC | 88,269 | ||
| GCA | 44,142 | 0.75 | |
| GCG | 45,406 | 0.83 | |
| UGA | 2445 | ||
| UAA | 1271 | 0.74 | |
| UAG | 1529 | 0.87 | |
| UGG | 34,101 | 0.94 | |
| CGU | 20,731 | 0.73 | |
| CGC | 40,880 | ||
| CGA | 33,836 | ||
| CGG | 23,806 | 0.86 | |
| AGA | 24,225 | 0.84 | |
| AGG | 23,489 | 0.89 | |
| GGU | 35,358 | 0.75 | |
| GGC | 79,015 | ||
| GGA | 35,677 | 0.78 | |
| GGG | 28,001 | 0.67 | |
| AGU | 21,367 | 0.57 | |
| AGC | 46,172 |
Codon indicates synonymous codons; Total Count indicates the number of the synonymous codons; RSCU indicates relative synonymous coden usage, the preferentially-used codons are underlined.
Optimal codons of genes in E. festucae.
| Amino Acid | Codon | High RSCU | N | Low RSCU | N |
|---|---|---|---|---|---|
| UUU | 0.41 | 1246 | 0.97 | 4176 | |
| UUC * | 1.59 | 4874 | 1.03 | 4414 | |
| UUA | 0.03 | 86 | 0.54 | 2081 | |
| UUG | 0.55 | 1375 | 1.27 | 4893 | |
| CUU | 0.29 | 744 | 1.22 | 4713 | |
| CUC * | 2.90 | 7320 | 1.10 | 4257 | |
| CUA | 0.14 | 358 | 0.71 | 2725 | |
| CUG * | 2.08 | 5252 | 1.16 | 4456 | |
| AUU | 0.57 | 1339 | 1.20 | 4799 | |
| AUC * | 2.24 | 5277 | 1.13 | 4511 | |
| AUA | 0.19 | 442 | 0.68 | 2704 | |
| AUG | 1.00 | 3940 | 1.00 | 5668 | |
| GUU | 0.35 | 1025 | 1.18 | 4457 | |
| GUC * | 2.39 | 7011 | 1.21 | 4561 | |
| GUA | 0.11 | 322 | 0.63 | 2379 | |
| GUG * | 1.16 | 3396 | 0.97 | 3649 | |
| UAU | 0.29 | 651 | 1.03 | 3182 | |
| UAC * | 1.71 | 3910 | 0.97 | 2979 | |
| AGU | 0.24 | 488 | 0.76 | 3234 | |
| AGC * | 1.61 | 3266 | 1.05 | 4495 | |
| CAU | 0.42 | 875 | 1.11 | 3946 | |
| CAC * | 1.58 | 3322 | 0.89 | 3180 | |
| CAA | 0.42 | 1184 | 1.07 | 6735 | |
| CAG * | 1.58 | 4457 | 0.93 | 5867 | |
| AAU | 0.30 | 808 | 0.98 | 4824 | |
| AAC * | 1.70 | 4557 | 1.02 | 4978 | |
| AAA | 0.29 | 1091 | 0.94 | 5763 | |
| AAG * | 1.71 | 6306 | 1.06 | 6539 | |
| GAU | 0.40 | 1941 | 1.03 | 8086 | |
| GAC * | 1.60 | 7765 | 0.97 | 7678 | |
| GAA | 0.39 | 1802 | 1.03 | 8278 | |
| GAG * | 1.61 | 7396 | 0.97 | 7846 | |
| UCU | 0.40 | 818 | 1.22 | 5215 | |
| UCC * | 1.96 | 3989 | 0.89 | 3811 | |
| UCA | 0.31 | 635 | 1.17 | 5018 | |
| UCG * | 1.48 | 2998 | 0.91 | 3878 | |
| CCU | 0.43 | 1030 | 1.10 | 4937 | |
| CCC * | 2.13 | 5100 | 0.81 | 3633 | |
| CCA | 0.32 | 763 | 1.27 | 5704 | |
| CCG * | 1.12 | 2689 | 0.81 | 3651 | |
| ACU | 0.28 | 685 | 1.04 | 4130 | |
| ACC * | 1.80 | 4428 | 0.91 | 3615 | |
| ACA | 0.38 | 941 | 1.20 | 4774 | |
| ACG * | 1.54 | 3796 | 0.86 | 3422 | |
| GCU | 0.38 | 1665 | 1.14 | 6365 | |
| GCC * | 2.29 | 9960 | 1.05 | 5840 | |
| GCA | 0.34 | 1462 | 1.10 | 6141 | |
| GCG * | 0.99 | 4333 | 0.70 | 3903 | |
| UGU | 0.29 | 322 | 0.95 | 1626 | |
| UGC * | 1.71 | 1869 | 1.05 | 1790 | |
| UGG | 1.00 | 2561 | 1.00 | 3365 | |
| GGU | 0.39 | 1322 | 0.96 | 4175 | |
| GGC * | 2.55 | 8641 | 1.35 | 5864 | |
| GGA | 0.37 | 1263 | 1.03 | 4480 | |
| GGG | 0.69 | 2353 | 0.65 | 2813 | |
| AGA | 0.45 | 825 | 1.17 | 3566 | |
| AGG * | 1.29 | 2390 | 0.79 | 2427 | |
| CGU | 0.37 | 680 | 0.82 | 2510 | |
| CGC * | 2.34 | 4329 | 1.02 | 3113 | |
| CGA | 0.50 | 932 | 1.41 | 4306 | |
| CGG * | 1.05 | 1934 | 0.80 | 2440 | |
| UAA | 0.62 | 101 | 0.79 | 169 | |
| UAG | 0.92 | 150 | 0.74 | 158 | |
| UGA | 1.47 | 240 | 1.48 | 317 |
Codon indicates synonymous codons; N indicates codon frequency; RSCU indicates relative synonymous codon usage; High and Low indicate the codon usage of 244 genes (5% of the total number of genes) from the top and bottom of the dataset ordered by ENC ratio value, respectively. The optimal codons are indicated with a (*).