| Literature DB >> 35360853 |
Zhanjun Wang1,2, Qianwen Cai1, Yue Wang1, Minhui Li1, Chenchen Wang1, Zhaoxia Wang1, Chunyan Jiao1, Congcong Xu1, Hongyan Wang1, Zhaoliang Zhang2.
Abstract
Theaceae species are dicotyledonous angiosperms with extremely high ornamental and economic value. The chloroplast genome is traditionally used to study species evolution, expression of chloroplast genes and chloroplast transformation. Codon usage bias (CUB) analysis is beneficial for investigations of evolutionary relationships and can be used to improve gene expression efficiency in genetic transformation research. However, there are relatively few systematic studies of the CUB in the chloroplast genomes of Theaceae species. In this study, CUB and nucleotide compositions parameters were determined by the scripts written in the Perl language, CodonW 1.4.2, CU.Win2000, RStudio and SPSS 23.0. The chloroplast genome data of 40 Theaceae species were obtained to analyse the codon usage (CU) characteristics of the coding regions and the influence of the source of variation on CUB. To explore the relationship between the CUB and gene expression levels in these 40 Theaceae plastomes, the synonymous codon usage order (SCUO) and measure independent of length and composition (MILC) values were determined. Finally, phylogenetic analysis revealed the genetic evolutionary relationships among these Theaceae species. Our results showed that based on the chloroplast genomes of these 40 Theaceae species, the CUB was for codons containing A/T bases and those that ended with A/T bases. Moreover, there was great commonality in the CUB of the Theaceae species according to comparative analysis of relative synonymous codon usage (RSCU) and relative frequency of synonymous codon (RFSC): these species had 29 identical codons with bias (RSCU > 1), and there were 19 identical high-frequency codons. The CUB of Theaceae species is mainly affected by natural selection. The SCUO value of the 40 Theaceae species was 0.23 or 0.24, and the chloroplast gene expression level was moderate, according to MILC values. Additionally, we observed a positive correlation between the SCUO and MILC values, which indicated that CUB might affect gene expression. Furthermore, the phylogenetic analysis showed that the evolutionary relationships in these 40 Theaceae species were relatively conserved. A systematic study on the CUB and expression of Theaceae species provides further evidence for their evolution and phylogeny.Entities:
Keywords: Theaceae species; chloroplast genome; cluster analysis; codon usage bias; expression analysis
Year: 2022 PMID: 35360853 PMCID: PMC8961065 DOI: 10.3389/fgene.2022.824610
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Chloroplast genome information of 40 Theaceae species.
| Numbering | Species | Accession no | CDS number (before processing) | CDS number (after processing) |
|---|---|---|---|---|
| A |
| NC_051559.1/MT317095.1 | 87 | 58 |
| B |
| NC_050354.1/MN756594.1 | 87 | 57 |
| C |
| NC_035574.1/KY856741.1 | 87 | 56 |
| D |
| NC_052752.1/MN640791.1 | 81 | 48 |
| E |
| NC_037472.1/MG431968.1 | 88 | 55 |
| F |
| NC_024541.1/KF753632.1 | 89 | 58 |
| G |
| NC_022459.1/KF156833.1 | 89 | 58 |
| H |
| NC_022460.1/KF156834.1 | 87 | 58 |
| I |
| NC_053896.1/MW026668.1 | 88 | 58 |
| J |
| NC_050388.1/MT663342.1 | 87 | 57 |
| K |
| NC_053541.1/MT449927.1 | 80 | 47 |
| L |
| NC_024659.1/KJ806274.1 | 87 | 56 |
| M |
| NC_038181.1/MG782842.1 | 89 | 58 |
| N |
| NC_039626.1/MH394403.1 | 85 | 57 |
| O |
| NC_022461.1/KF156835.1 | 89 | 58 |
| P |
| NC_036830.1/MF850254.1 | 89 | 57 |
| Q |
| NC_053915.1/MN635793.1 | 88 | 53 |
| R |
| NC_024660.1/KJ806275.1 | 87 | 56 |
| S |
| NC_039645.1/MH382827.1 | 87 | 57 |
| T |
| NC_054364.1/MW186718.1 | 86 | 45 |
| U |
| NC_024661.1/KJ806276.1 | 87 | 56 |
| V |
| NC_022462.1/KF156837.1 | 89 | 58 |
| W |
| NC_038198.1/MG797642.1 | 90 | 57 |
| X |
| NC_024662.1/KJ806277.1 | 87 | 56 |
| Y |
| NC_054365.1/MW186719.1 | 87 | 57 |
| Z |
| NC_041672.1/MH253889.1 | 88 | 55 |
| AA |
| NC_024663.1/KJ806278.1 | 88 | 56 |
| AB |
| NC_050389.1/MT663343.1 | 87 | 57 |
| AC |
| NC_041473.1/MH782189.1 | 85 | 57 |
| AD |
| Pltd:NC_020019.1/ | 87 | 57 |
| AE |
| NC_022264.1/KF156839.1 | 89 | 58 |
| AF |
| NC_053622.1/MT665973.1 | 82 | 49 |
| AG |
| NC_022463.1/KF156838.1 | 89 | 58 |
| AH |
| NC_041509.1/MH782185.1 | 86 | 57 |
| AI |
| NC_041471.1/MH782186.1 | 86 | 57 |
| AJ |
| NC_041468.1/MH782174.1 | 86 | 57 |
| AK |
| NC_041472.1/MH782187.1 | 86 | 57 |
| AL |
| NC_041467.1/MH753079.1 | 86 | 57 |
| AM |
| NC_041470.1/MH782182.1 | 86 | 57 |
| AN |
| NC_041469.1/MH782180.1 | 86 | 57 |
Summary of the SCUO, MILC, GC, GC1, GC2, and GC3 values of the chloroplast genomes of 40 Theaceae species.
| Species | SCUO | MILC | GC1% | GC2% | GC3% | GC% |
|---|---|---|---|---|---|---|
|
| 0.23 | 0.55 | 37.48 | 37.93 | 37.23 | 37.55 |
|
| 0.24 | 0.56 | 38.16 | 36.80 | 37.72 | 37.56 |
|
| 0.24 | 0.55 | 38.45 | 37.43 | 36.80 | 37.56 |
|
| 0.24 | 0.56 | 37.79 | 36.42 | 37.20 | 37.14 |
|
| 0.24 | 0.56 | 37.47 | 36.59 | 38.22 | 37.42 |
|
| 0.24 | 0.55 | 37.31 | 37.41 | 37.93 | 37.55 |
|
| 0.24 | 0.55 | 36.89 | 39.04 | 36.76 | 37.57 |
|
| 0.24 | 0.55 | 35.84 | 37.26 | 39.60 | 37.57 |
|
| 0.23 | 0.55 | 37.16 | 37.77 | 37.71 | 37.55 |
|
| 0.24 | 0.55 | 37.55 | 38.25 | 36.99 | 37.60 |
|
| 0.24 | 0.55 | 36.09 | 37.37 | 38.32 | 37.26 |
|
| 0.24 | 0.56 | 37.30 | 38.76 | 36.65 | 37.57 |
|
| 0.23 | 0.55 | 38.69 | 37.21 | 36.75 | 37.55 |
|
| 0.24 | 0.55 | 36.73 | 37.56 | 38.41 | 37.56 |
|
| 0.24 | 0.55 | 37.22 | 38.54 | 36.88 | 37.55 |
|
| 0.23 | 0.55 | 38.37 | 36.44 | 37.75 | 37.52 |
|
| 0.23 | 0.55 | 38.23 | 37.20 | 37.29 | 37.57 |
|
| 0.24 | 0.55 | 37.58 | 36.66 | 38.50 | 37.58 |
|
| 0.24 | 0.55 | 38.02 | 36.41 | 38.19 | 37.54 |
|
| 0.24 | 0.55 | 36.69 | 36.38 | 39.08 | 37.38 |
|
| 0.24 | 0.56 | 37.34 | 36.39 | 38.97 | 37.57 |
|
| 0.24 | 0.55 | 38.49 | 36.31 | 37.89 | 37.57 |
|
| 0.23 | 0.55 | 37.79 | 38.29 | 36.51 | 37.53 |
|
| 0.24 | 0.56 | 37.92 | 35.95 | 38.86 | 37.58 |
|
| 0.24 | 0.56 | 38.56 | 37.06 | 36.34 | 37.32 |
|
| 0.24 | 0.56 | 38.73 | 37.79 | 38.27 | 38.27 |
|
| 0.23 | 0.55 | 38.92 | 37.20 | 36.53 | 37.55 |
|
| 0.24 | 0.55 | 36.78 | 39.20 | 36.81 | 37.60 |
|
| 0.24 | 0.55 | 38.16 | 37.65 | 38.77 | 38.19 |
|
| 0.24 | 0.55 | 36.57 | 39.88 | 36.23 | 37.56 |
|
| 0.24 | 0.55 | 38.54 | 36.72 | 37.45 | 37.57 |
|
| 0.24 | 0.55 | 38.33 | 37.88 | 35.61 | 37.27 |
|
| 0.24 | 0.55 | 36.35 | 38.22 | 38.08 | 37.55 |
|
| 0.23 | 0.55 | 38.20 | 39.08 | 37.56 | 38.28 |
|
| 0.23 | 0.55 | 37.65 | 37.53 | 39.65 | 38.28 |
|
| 0.23 | 0.56 | 39.20 | 36.95 | 38.56 | 38.24 |
|
| 0.23 | 0.55 | 37.56 | 39.39 | 37.92 | 38.29 |
|
| 0.23 | 0.55 | 38.18 | 37.37 | 39.17 | 38.24 |
|
| 0.23 | 0.55 | 39.61 | 36.16 | 39.01 | 38.26 |
| Stewartia villosa | 0.23 | 0.55 | 39.30 | 37.68 | 37.89 | 38.29 |
FIGURE 1The RSCU values of the chloroplast genomes in 40 Theaceae species. A gradient from blue to red indicates that the average RSCU value of the codon is from low to high.
Screening of high-frequency codons in the chloroplast genomes of the 40 Theaceae species.
| Amino acid | High-frequency codons |
|---|---|
| A(Ala) | GCUh |
| C(Cys) | UGUh |
| D(Asp) | GAUh |
| E(Glu) | GAAh |
| F(Phe) | UUUh , |
| G(Gly) | GGAh |
| H(His) | CAUh |
| I(Ile) | AUUh |
| K(Lys) | AAAh |
| L(Leu) | UUAh |
| N(Asn) | AAUh |
| P(Pro) | CCUh |
| Q(Gln) | CAAh , |
| R(Arg) | AGAh |
| S(Ser) | UCUh |
| T(Thr) | ACUh |
| V(Val) | GUAh , |
| Y(Tyr) | UAUh |
| TER |
|
Codons in bold font are high-frequency codons with differences in the chloroplast genomes of 40 Theaceae species.
FIGURE 2COA analysis of the chloroplast genomes in 40 Theaceae species. In all the Theaceae species, the dots of different genes are separated from each other in the figure. The numbers (A–AN) of the 40 species are shown in Table 1.
FIGURE 3ENc-GC3s plot of the chloroplast genomes in 40 Theaceae species. The chloroplast genomes of the 40 Theaceae species are generally scattered in small clusters, and the genes are distributed on the left side of the figure. The numbers (A–AN) of the 40 species are shown in Table 1.
FIGURE 4PR2 plot of the chloroplast genomes in 40 Theaceae species. Points are randomly distributed in the four quadrants, and are mainly distributed in the regions of G3/(G3+C3) > 0.5 and A3/(A3+T3) < 0.5. The numbers (A–AN) of the 40 species are shown in Table 1.
FIGURE 5Neutral analysis of the chloroplast genomes in 40 Theaceae species. The distribution range of GC12 and GC3 is relatively narrow, GC12 is distributed between 0.32 and 0.57, GC3 is distributed between 0.14 and 0.36, most of the points are distributed in small clusters. The numbers (A–AN) of the 40 species are shown in Table 1.
Correlation between the SCUO and MILC values in the different chloroplast genomes of 40 Theaceae species.
| Species |
|
|
|---|---|---|
|
| 0.463** | 0.000 |
|
| 0.461** | 0.000 |
|
| 0.467** | 0.000 |
|
| 0.398** | 0.002 |
|
| 0.448** | 0.001 |
|
| 0.444** | 0.000 |
|
| 0.440** | 0.001 |
|
| 0.419** | 0.001 |
|
| 0.455** | 0.000 |
|
| 0.456** | 0.000 |
|
| 0.483** | 0.000 |
|
| 0.478** | 0.000 |
|
| 0.426** | 0.001 |
|
| 0.450** | 0.000 |
|
| 0.458** | 0.000 |
|
| 0.459** | 0.000 |
|
| 0.449** | 0.000 |
|
| 0.418** | 0.001 |
|
| 0.467** | 0.000 |
|
| 0.459** | 0.001 |
|
| 0.418** | 0.001 |
|
| 0.467** | 0.000 |
|
| 0.480** | 0.001 |
|
| 0.490** | 0.000 |
|
| 0.440** | 0.000 |
|
| 0.432** | 0.000 |
|
| 0.478** | 0.005 |
|
| 0.398** | 0.001 |
|
| 0.480** | 0.000 |
|
| 0.466** | 0.001 |
|
| 0.450** | 0.000 |
|
| 0.443** | 0.000 |
|
| 0.451** | 0.000 |
|
| 0.435** | 0.000 |
|
| 0.416** | 0.001 |
|
| 0.440** | 0.000 |
|
| 0.480** | 0.000 |
|
| 0.461** | 0.001 |
|
| 0.482** | 0.001 |
|
| 0.383** | 0.000 |
**Significant at the P < 0.01 level (two-tailed).
FIGURE 6Phylogenetic analysis of the chloroplast genomes in 40 Theaceae species. The clustering results and the phylogenetic tree are divided into two branches. The topology of the phylogenetic tree is similar to the clustering results based on the codon RSCU values of the chloroplast genome to a certain extent.