| Literature DB >> 35123393 |
Yao Zhang1,2, Zenan Shen3, Xiangrui Meng1,2, Liman Zhang1,2, Zhiguo Liu4, Mengjun Liu4, Fa Zhang5, Jin Zhao6,7.
Abstract
BACKGROUND: Codon usage bias (CUB) analysis is an effective method for studying specificity, evolutionary relationships, and mRNA translation and discovering new genes among various species. In general, CUB analysis is mainly performed within one species or between closely related species and no such study has been applied among species with distant genetic relationships. Here, seven Rosales species with high economic value were selected to conduct CUB analysis.Entities:
Keywords: Codon usage bias; Evolutionary relationships; Natural selection; Rosales
Mesh:
Year: 2022 PMID: 35123393 PMCID: PMC8817548 DOI: 10.1186/s12870-022-03450-x
Source DB: PubMed Journal: BMC Plant Biol ISSN: 1471-2229 Impact factor: 4.215
GC content of CDS across 7 Rosales species
| Species | GC% | GC1% | GC2% | GC3% | GC3s% | The number of genes |
|---|---|---|---|---|---|---|
| 44.10 | 50.50 | 40.33 | 41.47 | 39.26 | 26,319 | |
| 44.91 | 50.99 | 40.29 | 43.45 | 41.26 | 30,405 | |
| 45.28 | 51.31 | 40.66 | 43.87 | 41.77 | 44,181 | |
| 44.57 | 51.01 | 40.44 | 42.24 | 40.06 | 22,850 | |
| 44.40 | 50.92 | 40.38 | 41.91 | 39.71 | 26,499 | |
| 45.54 | 51.59 | 40.85 | 44.17 | 42.10 | 39,184 | |
| 45.57 | 51.27 | 40.67 | 44.76 | 42.74 | 19,947 | |
| Average | 44.91 | 51.08 | 40.52 | 43.12 | 40.99 |
Note: GC1, GC2 and GC3 represent the GC content of the first, second, third base of codon; GC3s represents the GC content of the third synonymous position
Fig. 1Neutrality plot of 7 Rosales species. The blue solid line represents the regression line. P-value is the correlation coefficient. If P-value was less than 0.05, and it showed that GC3 and GC12 was significantly correlated
Fig. 2ENc plot of 7 Rosales species. The blue solid line represents the expected curve of positions of genes when the codon usage was only determined by the GC3s composition
Frequency distribution of (ENCexp-ENCobs)/ENCexp in 7 Rosales species (%)
| Species | −0.2 ~ − 0.1 | −0.1 ~ 0 | 0 ~ 0.1 | 0.1 ~ 0.2 | 0.2 ~ 0.3 | 0.3 ~ 0.4 | 0.4 ~ 0.5 |
|---|---|---|---|---|---|---|---|
| 0.21 | 7.22 | 57.59 | 30.28 | 4.17 | 0.48 | 0.05 | |
| 0.07 | 6.07 | 61.55 | 28.49 | 3.36 | 0.44 | 0.02 | |
| 0.07 | 8.05 | 66.55 | 22.33 | 2.67 | 0.30 | 0.02 | |
| 0.07 | 5.50 | 61.93 | 29.05 | 3.12 | 0.32 | 0.01 | |
| 0.07 | 4.91 | 63.59 | 28.30 | 2.81 | 0.30 | 0.02 | |
| 0.11 | 7.70 | 66.55 | 22.53 | 2.80 | 0.26 | 0.04 | |
| 0.09 | 7.68 | 67.20 | 22.79 | 2.00 | 0.23 | 0.02 |
The top five high-frequency codons of 7 Rosales species
| Species | Codon (RSCU) | ||||
|---|---|---|---|---|---|
| AGA(1.86) | GTT(1.58) | TTG(1.54) | TCT(1.51) | GCT(1.55) | |
| AGA(1.86) | GTT(1.51) | TTG(1.52) | AGG(1.59) | GCT(1.56) | |
| AGA(1.75) | GTT(1.55) | TTG(1.53) | AGG(1.58) | ||
| AGA(1.86) | GTT(1.57) | TCT/TTG(1.54) | AGG(1.60) | GCT(1.56) | |
| AGA(1.86) | GTT(1.58) | TTG(1.55) | AGG(1.58) | GCT(1.56) | |
| AGA(1.73) | GTT(1.55) | TTG(1.53) | AGG(1.58) | ||
| AGA(1.82) | GTT(1.54) | TTG(1.55) | AGG(1.52) | ||
Fig. 3The RSCU value of NCG and NTA in 7 Rosales Species
Comparison of high-frequency codon pairs usage among 7 Rosales species
| Codon Pairs | |||||||
|---|---|---|---|---|---|---|---|
| nnAAnn | 9.38 | 11.07 | 12.54 | 11.56 | 12.53 | 12.51 | 12.21 |
| nnACnn | 3.72 | 4.49 | 4.79 | 6.41 | 5.17 | 4.30 | 4.32 |
| nnAGnn | 7.52 | 8.10 | 8.78 | 7.84 | 8.26 | 9.22 | 8.98 |
| nnATnn | 8.39 | 10.29 | 10.46 | 9.85 | 10.53 | 9.73 | 9.43 |
| nnCAnn | 4.45 | 4.57 | 4.92 | 5.42 | 5.09 | 4.98 | 4.59 |
| nnCCnn | 4.31 | 2.09 | 1.15 | 2.46 | 2.25 | 0.92 | 0.70 |
| nnCGnn | 1.90 | 3.23 | 1.29 | 0.32 | 0.15 | 1.33 | 2.38 |
| nnCTnn | 4.33 | 4.45 | 4.51 | 3.61 | 3.70 | 4.49 | 4.33 |
| nnGAnn | 9.16 | 10.04 | 11.64 | 9.65 | 9.10 | 11.01 | 10.63 |
| nnGCnn | 4.97 | 3.72 | 2.97 | 4.60 | 4.37 | 3.38 | 3.97 |
| nnGGnn | 5.38 | 6.59 | 5.33 | 6.12 | 5.49 | 6.07 | 5.40 |
| nnGTnn | 5.58 | 4.66 | 5.21 | 5.02 | 5.42 | 5.60 | 4.83 |
| nnTAnn | 7.63 | 7.17 | 5.89 | 6.01 | 6.09 | 5.38 | 7.29 |
| nnTCnn | 5.02 | 4.41 | 4.24 | 5.18 | 4.86 | 3.84 | 3.86 |
| nnTGnn | 8.62 | 6.71 | 8.06 | 6.84 | 6.94 | 8.16 | 8.07 |
| nnTTnn | 9.65 | 8.40 | 8.24 | 9.12 | 10.04 | 9.08 | 9.02 |
Fig. 4A: GC3 variation plot from 5′ to 3′ of 7 Rosales species. All genes in the species were divided into 100 groups equally, and each dot represents the average GC3 content of the genes in each group; B: The Euclidean distance of GC3 gradient between each of two Rosales species. The lower value of the Euclidean distance means the closer relationship. The maximum and the minimum values were marked in red and blue colors, respectively
Fig. 5The bi-clustering heat map of RSCU based on 59 codons from 27 species using Euclidean distance and complete linkage clustering module. The blue, green and pink colors represent the Chlorophyte, Monocotyledon and Dicotyledon, respectively. The 7 Rosales species were marked in red box