| Literature DB >> 35106386 |
Mebeaselassie Andargie1, Zhu Congyi2.
Abstract
Sesamum indicum is an ancient oil crop grown in tropical and subtropical areas of the world. We have analyzed 23,538 coding sequences (CDS) of S. indicum to understand the factors shaping codon usage in this important oil crop plant. We identified eleven highly preferred codons in S. indicum that have AT-endings. The slope of a neutrality plot was less than one while effective number of codons (ENC) plot showed distribution above and below the standard curve. There is a significant relationship between protein length and relative synonymous codon usage (RSCU) at the primary axis while there is a weak correlation between protein length and Nc values. Correspondence analysis conducted on RSCU values differentiated CDS based on their GC content and their characteristic feature and showed a discrete distribution. Moreover, by determining codon usage, we found out that majority of the lignan biosynthesis related genes showed a weaker codon usage bias. These results provide insights into understanding codon evolution in sesame.Entities:
Keywords: Codon usage bias; Natural selection; Optimal codon; Sesamum indicum
Year: 2021 PMID: 35106386 PMCID: PMC8789531 DOI: 10.1016/j.heliyon.2021.e08687
Source DB: PubMed Journal: Heliyon ISSN: 2405-8440
Codon usage of Sesamum indicum.
| Amino acid | Codon | N | RSCU | Amino acid | Codon | N | RSCU |
|---|---|---|---|---|---|---|---|
| Phe | UUU | 218,627 | 1.08 | Ser | 210,054 | ||
| UUC | 187,400 | 0.92 | UCC | 134,481 | 0.92 | ||
| Leu | 108,212 | UCA | 185,541 | 1.27 | |||
| 234,522 | 82,163 | ||||||
| 215,505 | Pro | 174,638 | |||||
| CUC | 142,937 | 0.90 | 89,309 | ||||
| 90,524 | 165,595 | ||||||
| CUG | 160,661 | 1.01 | 77,842 | ||||
| Ile | 232,353 | Thr | 164,709 | ||||
| AUC | 148,610 | 0.87 | ACC | 103,609 | 0.86 | ||
| 133,024 | ACA | 146,978 | 1.22 | ||||
| Met | AUG | 238,739 | 1.00 | 64,844 | |||
| Val | 237,229 | Ala | 264,788 | ||||
| 121,122 | GCC | 143,209 | 0.82 | ||||
| 96,779 | GCA | 212,485 | 1.21 | ||||
| GUG | 190,067 | 1.18 | 79,502 | ||||
| Tyr | UAU | 153,170 | 1.11 | Ter | 10,538 | ||
| UAC | 121,840 | 0.89 | UAA | 7,089 | 0.90 | ||
| Cys | UGU | 90,992 | 1.01 | 5,910 | |||
| UGC | 89,732 | 0.99 | Trp | UGG | 126,740 | 1.00 | |
| His | CAU | 143,870 | 1.22 | Arg | 66,194 | ||
| 91,674 | 50,368 | ||||||
| Gln | CAA | 181,110 | 1.00 | 59,332 | |||
| CAG | 179,813 | 1.00 | 59,579 | ||||
| Asn | AAU | 254,311 | 1.19 | 164,366 | |||
| AAC | 174,393 | 0.81 | 142,731 | ||||
| Lys | AAA | 270,541 | 0.93 | Gly | GGU | 179,791 | 1.09 |
| AAG | 313,367 | 1.07 | GGC | 131,385 | 0.80 | ||
| Asp | 352,544 | GGA | 204,465 | 1.24 | |||
| 176,211 | GGG | 142,337 | 0.87 | ||||
| Glu | GAA | 331,043 | 1.03 | Ser | AGU | 143,087 | 0.98 |
| GAG | 312,300 | 0.97 | AGC | 124,387 | 0.85 |
Note: Codons with RSCU >1.30 are shown in bold, codons with RSCU <0.80 are shown in the underlined text.
Figure 1Analysis of codon usage in Sesamum indicum. (A) The distribution of GC contents at the three codon positions in S. indicum genes, (B) Distribution of effective number of codons (N) and GC3s of S. indicum genes. Individual genes are indicated by dots while the standard curve represents the expected N under random codon usage, (C) Frequency distribution of effective number of codons (N) ratio, (D) Parity Rule 2 (PR2) bias plot analysis. Genes are plotted based on their GC bias [G3/(G3+C3)] and AU bias [A3/(A3+T3)] in the third codon position, (E) Neutrality plot analysis (GC12 vs GC3s) of S. indicum. GC12 is the average value of Guanine and Cytosine content in the first (GC1) and second (GC2) position of the codons, whereas GC3 is the Guanine and Cytosine content at the third codon position.
Figure 2Effects of nucleotide composition on codon usage bias. (A) The relative first 20 factors from correspondence analysis according to their amino acid proportions. The line represents the cumulative total of the inertia explained by the first 20 axis. (B) Correspondence analysis of RSCU values of genes. Red, blue and green dots indicate genes with guanine and cytosine contents that are <45 %, ≥45 %, & < 60 % and ≥60 %, respectively. (C) Correspondence analysis on RSCU values of codons. Blue dots, yellow dots, red dots and green dots indicate other genes, ribosomal genes, genes with a Gravy value >0.3 and genes with Aromo value ≥0.15, respectively.
Correlation analysis of axis 1 and axis 2 with overall nucleotide composition and effective number of codons.
| A | T | C | G | A3 | T3 | C3 | G3 | GC% | GC1% | GC2% | GC3% | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Axis 1 | 0.3630 | 0.3373 | 0.1798 | 0.2733 | 0.4106 | − | 0.4038 | 0.2047 | −0.7859 | −0.1717 | −0.3019 | −0.8551 | −0.08383 |
| 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | ||
| Axis 2 | 0.04344 | 0.1770 | 0.1351 | 0.04594 | 0.08181 | 0.1854 | 0.1329 | −0.01512 | −0.2078 | −0.2643 | 0.07242 | −0.2071 | 0.05478 |
| 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.0204 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
A, T, C, G = frequency of each individual base, A3, T3, C3 and G3 = frequency of each individual base A, T, C and G at the third position of codons, GC = total guanine-cytosine content of the entire gene, GC1 = GC content in the first position of codons, GC2 = GC content in the second position of codons, GC3 = GC content in the third position of codons, and N = effective codon number. N = effective codon number. Significant difference at P < 0.0001. Bold value indicates that there is no significant correlation at P < 0.0001.
Correlation analysis of axis 1 and axis 2 with Protein length, Aromo and GRAVY values.
| Protein length | Aromo | GRAVY | |
|---|---|---|---|
| Axis 1 | 0.3239 | −0.03343 | −0.06806 |
| P | 0.000 | 0.000 | 0.000 |
| Axis 2 | 0.05092 | 0.1123 | |
| P | 0.000 | 0.000 |
Aromo = frequency of aromatic amino acids, GRAVY = General average hydrophobicity Significant difference at P < 0.0001. Bold value indicates that there is no significant correlation at P < 0.0001.
Figure 3Plot of protein length versus the N value variation.
Correlation analysis of Nc with GRAVY and Aromo values.
| GRAVY | Aromo | |
|---|---|---|
| 0.07161 | 0.1132 | |
| 0.000 | 0.000 |
GRAVY = General average hydrophobicity, Aromo = frequency of aromatic amino acids, Nc = effective codon number Significant difference at P < 0.0001.
Figure 4Codon usage analysis in S. indicum lignan biosynthesis related genes. (A) Comparisons of GC12/GC3 ratio and N values among different lignan biosynthesis related genes. The average N value is 53.54 while the general GC12/GC3 value is 1.04. (B) Correspondence analysis of the synonymous codon usage of S. indicum lignan biosynthesis related genes. Different base ended codons were marked in the figure, where the red, green, blue, and yellow colors refer to codons ending with A, T, C, G respectively.
Means and standard deviations of several index numbers from 23,538 genes in Sesamum indicum.
| Class | Genes | Codons | GC all (%) | GC1 (%) | GC2 (%) | GC3 (%) | GC3s (%) | T3s (%) | C3s (%) | A3s (%) | G3s (%) | Gravy | Aromo | ENC | CAI |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Total | 23,538 | 9,911,268 | 46.89 ± 4.39 | 51.95 ± 6.22 | 41.86 ± 4.68 | 46.85 ± 4.19 | 44.86 ± 9.37 | 37.47 ± 7.59 | 27.11 ± 7.87 | 30.49 ± 6.68 | 30.21 ± 6.81 | −0.31 ± 0.37 | 0.08 ± 0.03 | 53.54 ± 4.53 | 0.20 ± 0.03 |
GC all = total guanine-cytosine content of the entire gene, GC1 = GC content at the first, GC2 = GC content at the second, GC3 = GC content at the third codon positions, GC3s = proportion of GC nucleotides at the third (variable) coding position of synonymous codons, T3s, C3s, A3s and G3s = frequency of each individual base A, T, G and C at the third position of codons, Gravy = General average hydropathicity, Aromo = frequency of aromatic amino acids, ENC = effective codon number, CAI = Codon adaptation index.