| Literature DB >> 29584741 |
Liyuan Wang1, Huixian Xing1, Yanchao Yuan1, Xianlin Wang1, Muhammad Saeed2, Jincai Tao1, Wei Feng1, Guihua Zhang3, Xianliang Song1, Xuezhen Sun1.
Abstract
Codon usage bias (CUB) is an important evolutionary feature in a genome which provides important information for studying organism evolution, gene function and exogenous gene expression. The CUB and its shaping factors in the nuclear genomes of four sequenced cotton species, G. arboreum (A2), G. raimondii (D5), G. hirsutum (AD1) and G. barbadense (AD2) were analyzed in the present study. The effective number of codons (ENC) analysis showed the CUB was weak in these four species and the four subgenomes of the two tetraploids. Codon composition analysis revealed these four species preferred to use pyrimidine-rich codons more frequently than purine-rich codons. Correlation analysis indicated that the base content at the third position of codons affect the degree of codon preference. PR2-bias plot and ENC-plot analyses revealed that the CUB patterns in these genomes and subgenomes were influenced by combined effects of translational selection, directional mutation and other factors. The translational selection (P2) analysis results, together with the non-significant correlation between GC12 and GC3, further revealed that translational selection played the dominant role over mutation pressure in the codon usage bias. Through relative synonymous codon usage (RSCU) analysis, we detected 25 high frequency codons preferred to end with T or A, and 31 low frequency codons inclined to end with C or G in these four species and four subgenomes. Finally, 19 to 26 optimal codons with 19 common ones were determined for each species and subgenomes, which preferred to end with A or T. We concluded that the codon usage bias was weak and the translation selection was the main shaping factor in nuclear genes of these four cotton genomes and four subgenomes.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29584741 PMCID: PMC5870960 DOI: 10.1371/journal.pone.0194372
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The number of CDSs and codons of 4 cotton species and 4 subgenomes used in this study.
| Species or subgenomes | Genome | Number of CDSs | Number of Codons |
|---|---|---|---|
| A2 | 40134 | 14538888 | |
| D5 | 77267 | 32995450 | |
| (AD)1 | 66434 | 27096230 | |
| (AD)2 | 77358 | 29259373 | |
| (AD)1 | 32032 | 13188672 | |
| (AD)1 | 34402 | 13907558 | |
| (AD)2 | 39568 | 14786710 | |
| (AD)2 | 37790 | 14472663 |
The composition parameters values of codon usage in 4 cotton species and 4 subgenomes.
| Species and subgenomes | T3s | A3s | G3s | C3s | GC3s | GC | ENC |
|---|---|---|---|---|---|---|---|
| 0.425 | 0.341 | 0.262 | 0.230 | 0.381 | 0.437 | 54.08 | |
| 0.437 | 0.344 | 0.259 | 0.220 | 0.370 | 0.434 | 53.39 | |
| 0.425 | 0.341 | 0.262 | 0.232 | 0.382 | 0.437 | 54.11 | |
| 0.423 | 0.342 | 0.263 | 0.232 | 0.383 | 0.437 | 54.26 | |
| 0.425 | 0.340 | 0.262 | 0.231 | 0.382 | 0.438 | 54.11 | |
| 0.425 | 0.341 | 0.262 | 0.232 | 0.382 | 0.437 | 54.10 | |
| 0.423 | 0.342 | 0.264 | 0.233 | 0.384 | 0.437 | 54.41 | |
| 0.424 | 0.342 | 0.263 | 0.230 | 0.382 | 0.436 | 54.11 |
Fig 1The composition parameters values of codon usage in 4 cotton species and 4 subgenomes.
Fig 2The distribution of T3s, GC3s, GC and ENC of genes in 4 cotton species and 4 subgenomes.
Values of T3s(av), G3s(av), GC(av) and ENC(av) of genes in 4 species and 4 subgenomes and their multiple comparisons.
| Genomes and subgenomes | T3s(av)
| G3s(av)
| GC(av)
| ENC(av)
|
|---|---|---|---|---|
| 0.4147±0.0677b | 0.2670±0.0624b | 0.4413±0.0378c | 52.06±0.0267e | |
| 0.4252±0.0594a | 0.2607±0.0540d | 0.4378±0.0334d | 52.35±0.0160d | |
| 0.4133±0.0632c | 0.2655±0.0592c | 0.4422±0.0358bc | 52.47±0.0190c | |
| 0.4116±0.0651d | 0.2683±0.0580a | 0.4427±0.0368b | 52.64±0.0171ab | |
| 0.4130±0.0627c | 0.2660±0.0584bc | 0.4432±0.0349ab | 52.55±0.0263bc | |
| 0.4136±0.0636bc | 0.2650±0.0599c | 0.4412±0.0366c | 52.40±0.0272cd | |
| 0.4098±0.0663e | 0.2682±0.0579ab | 0.4435±0.0378a | 52.69±0.0241a | |
| 0.4134±0.0638bc | 0.2684±0.0582a | 0.4417±0.0358c | 52.58±0.0242b |
*: The “(av)” represents the average of all genes. The multiple comparisons were performed by Duncan's Multiple Range Method.
The various lowercase letters following the data in the same column indicate significant differences at 0.05 level.
The correlation coefficients between GC12 and GC3 in 4 cotton species and 4 subgenomes.
| GC1 | GC2 | GC12 | |
| GC2 | .827/.969 | ||
| GC12 | .984 | .914/992 | |
| GC3 | -.929/-.283 | -.610/-.401 | -.865/-.344 |
The digits before and after backslash represent correlation coefficient among 4 cotton species and 4 subgenomes, respectively.
* Correlation was significant at the 0.05 level (2-tailed).
** Correlation was significant at the 0.01 level (2-tailed).
Fig 3The PR2-bias plots of 4 cotton species and 4 subgenomes.
Fig 4The ENC-plot of 4 cotton species and 4 subgenomes.
The optimal codons of 4 cotton species and 4 subgenomes.
| Codon | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| High | Low | High | Low | High | Low | High | Low | High | Low | High | Low | High | Low | High | Low | |
| UUU | 1.24 | 1 | 1.24 | 1 | 1.21 | 0.99 | 1.22 | 0.98 | 1.18 | 0.98 | 1.23 | 1 | 1.21 | 0.97 | 1.22 | 1 |
| UUA | 1.04 | 0.88 | 1.1 | 0.88 | 1.07 | 0.87 | 0.99 | 0.9 | 0.98 | 0.86 | 1.16 | 0.89 | 1 | 0.9 | ||
| UUG | 1.7 | 1.41 | 1.67 | 1.4 | 1.69 | 1.41 | 1.75 | 1.3 | 1.71 | 1.41 | 1.66 | 1.41 | 1.7 | 1.23 | 1.79 | 1.38 |
| CUU | 1.46 | 1.27 | 1.54 | 1.32 | 1.49 | 1.29 | 1.49 | 1.29 | ||||||||
| AUU | 1.55 | 1.18 | 1.53 | 1.22 | 1.48 | 1.2 | 1.53 | 1.18 | 1.49 | 1.2 | 1.47 | 1.21 | 1.54 | 1.18 | 1.53 | 1.19 |
| GUU | 1.79 | 1.38 | 1.91 | 1.42 | 1.86 | 1.39 | 1.86 | 1.3 | 1.88 | 1.39 | 1.85 | 1.4 | 1.84 | 1.27 | 1.88 | 1.34 |
| UCU | 1.58 | 1.15 | 1.65 | 1.23 | 1.61 | 1.16 | 1.5 | 1.15 | 1.6 | 1.16 | 1.64 | 1.17 | 1.5 | 1.14 | 1.51 | 1.18 |
| UCA | 1.36 | 1.15 | 1.49 | 1.17 | 1.52 | 1.14 | 1.43 | 1.16 | 1.5 | 1.14 | 1.53 | 1.14 | 1.45 | 1.16 | 1.41 | 1.15 |
| CCU | 1.61 | 1.26 | 1.63 | 1.28 | 1.53 | 1.25 | 1.53 | 1.25 | ||||||||
| CCA | 1.54 | 1.21 | 1.52 | 1.18 | 1.61 | 1.19 | 1.55 | 1.22 | 1.59 | 1.17 | 1.62 | 1.2 | 1.57 | 1.25 | 1.53 | 1.19 |
| ACU | 1.59 | 1.18 | 1.59 | 1.19 | 1.53 | 1.16 | 1.54 | 1.14 | 1.55 | 1.15 | 1.51 | 1.16 | 1.53 | 1.12 | 1.56 | 1.17 |
| ACA | 1.2 | 1.05 | 1.25 | 1.08 | 1.21 | 1.05 | 1.21 | 1.11 | 1.17 | 1.05 | 1.25 | 1.05 | 1.21 | 1.07 | ||
| GCU | 1.9 | 1.38 | 1.96 | 1.42 | 1.91 | 1.4 | 1.87 | 1.34 | 1.9 | 1.39 | 1.91 | 1.39 | 1.87 | 1.29 | 1.88 | 1.38 |
| GCA | 1.16 | 1.07 | 1.15 | 1.06 | 1.17 | 1.08 | ||||||||||
| UAU | 1.31 | 1.03 | 1.31 | 1.06 | 1.3 | 1.02 | 1.25 | 1.02 | 1.27 | 1.02 | 1.32 | 1.03 | 1.23 | 1.02 | 1.27 | 1.03 |
| CAU | 1.44 | 1.14 | 1.37 | 1.15 | 1.38 | 1.14 | 1.41 | 1.12 | 1.38 | 1.13 | 1.38 | 1.16 | 1.41 | 1.1 | 1.42 | 1.16 |
| CAA | 1.36 | 1.16 | 1.32 | 1.16 | 1.37 | 1.16 | 1.33 | 1.18 | 1.34 | 1.16 | 1.4 | 1.17 | 1.31 | 1.18 | 1.35 | 1.18 |
| AAU | 1.29 | 1.04 | 1.27 | 1.06 | 1.25 | 1.03 | 1.26 | 1.06 | 1.22 | 1.02 | 1.28 | 1.03 | 1.25 | 1.07 | 1.26 | 1.04 |
| GAU | 1.52 | 1.27 | 1.53 | 1.28 | 1.52 | 1.26 | 1.51 | 1.25 | 1.51 | 1.25 | 1.52 | 1.26 | 1.49 | 1.25 | 1.52 | 1.25 |
| GAA | 1.19 | 1.09 | 1.19 | 1.09 | 1.21 | 1.09 | ||||||||||
| UGU | 1.23 | 0.98 | 1.21 | 0.99 | 1.19 | 0.96 | 1.22 | 0.99 | 1.18 | 0.96 | 1.2 | 0.97 | 1.21 | 1 | 1.23 | 0.99 |
| AGU | 1.22 | 0.94 | 1.12 | 0.91 | 1.08 | 0.89 | 1.2 | 0.96 | 1.03 | 0.89 | 1.1 | 0.88 | 1.17 | 0.96 | 1.22 | 0.95 |
| AGA | 2.24 | 1.36 | 2.38 | 1.4 | 2.32 | 1.32 | 2.25 | 1.34 | 2.23 | 1.31 | 2.39 | 1.33 | 2.19 | 1.32 | 2.31 | 1.36 |
| AGG | 2.01 | 1.39 | 1.83 | 1.34 | 1.87 | 1.35 | 2.06 | 1.33 | 1.9 | 1.32 | 1.84 | 1.36 | 2 | 1.31 | 2.06 | 1.35 |
| GGU | 1.49 | 1.14 | 1.49 | 1.13 | 1.51 | 1.14 | 1.45 | 1.14 | 1.5 | 1.13 | 1.51 | 1.15 | 1 | 1.13 | 1.43 | 1.14 |
| GGA | 1.25 | 1.12 | 1.28 | 1.19 | 1.26 | 1.13 | 1.21 | 1.13 | 1.24 | 1.13 | 1.28 | 1.13 | ||||
*** Correlation is significant at the 0.005 level.
**** Correlation is significant at the 0.001 level.
The values of WWC, SSC, WWU, SSU and P2 in 4 cotton species and 4 subgenomes.
| Genomes and subgenomes | SSU | WWU | SSC | WWC | P2 |
|---|---|---|---|---|---|
| 5.30 | 4.94 | 2.40 | 3.28 | 0.5389 | |
| 5.44 | 5.09 | 2.33 | 3.11 | 0.5354 | |
| 5.31 | 4.94 | 2.40 | 3.29 | 0.5395 | |
| 5.32 | 4.94 | 2.40 | 3.29 | 0.5398 | |
| 5.29 | 4.94 | 2.41 | 3.29 | 0.5386 | |
| 5.26 | 4.94 | 2.40 | 3.29 | 0.5381 | |
| 5.25 | 4.93 | 2.43 | 3.30 | 0.5374 | |
| 5.26 | 4.94 | 2.39 | 3.28 | 0.5381 |
Fig 5Cluster tree based on the RSCU values of 4 cotton species and 4 subgenomes.