| Literature DB >> 20172490 |
Abstract
Transposable elements (TEs) are mobile genetic entities ubiquitously distributed in nearly all genomes. High frequency of codons ending in A/T in TEs has been previously observed in some species. In this study, the biases in nucleotide composition and codon usage of TE transposases and host nuclear genes were investigated in the AT-rich genome of Arabidopsis thaliana and the GC-rich genome of Oryza sativa. Codons ending in A/T are more frequently used by TEs compared with their host nuclear genes. A remarkable positive correlation between highly expressed nuclear genes and C/G-ending codons were detected in O. sativa (r=0.944 and 0.839, respectively, P<0.0001) but not in A. thaliana, indicating a close association between the GC content and gene expression level in monocot species. In both species, TE codon usage biases are similar to that of weakly expressed genes. The expression and activity of TEs may be strictly controlled in plant genomes. Mutation bias and selection pressure have simultaneously acted on the TE evolution in A. thaliana and O. sativa. The consistently observed biases of nucleotide composition and codon usage of TEs may also provide a useful clue to accurately detect TE sequences in different species. Copyright 2009 Beijing Genomics Institute. Published by Elsevier Ltd. All rights reserved.Entities:
Mesh:
Substances:
Year: 2009 PMID: 20172490 PMCID: PMC5054417 DOI: 10.1016/S1672-0229(08)60047-9
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Comparison of the AT content between A. thaliana and O. sativa across different genomic regions and genetic components
| Region | Total number of codons used | %AT at the first position | %AT at the second position | %AT at the third position | Over all %AT |
|---|---|---|---|---|---|
| Coding sequence of host genes | |||||
| 394,465 | 48.8 | 59.0 | 56.6 | 54.8 | |
| 45,173,142 | 42.3 | 56.2 | 35.5 | 44.7 | |
| Coding sequence of transposases | |||||
| 153,997 | 51.2 | 61.3 | 61.3 | 57.9 | |
| 230,627 | 45.1 | 57.3 | 50.6 | 51.0 | |
| Non-coding sequences | |||||
| Intron regions | |||||
| / | / | / | / | 67.6 | |
| / | / | / | / | 63.0 | |
| Intergenic regions | |||||
| / | / | / | / | 64.0 | |
| / | / | / | / | 56.7 | |
Figure 1Distribution and correlation of GC3 and ENC in host nuclear genes and TEs. A. A. thaliana nuclear genes; B. A. thaliana TEs; C. O. sativa nuclear genes; D. O. sativa TEs.
Average relative frequency (RSCU) of 59 degenerated codons for highly and weakly expressed host nuclear genes and TEs in A. thaliana and O. sativa
| No. | Amino acid | Codon | ||||||
|---|---|---|---|---|---|---|---|---|
| Genes (high) | Genes (weak) | TEs | Genes (high) | Genes (weak) | TEs | |||
| 1 | K | AAA | 0.30 | 0.48 | 0.47 | 0.02 | 0.47 | 0.40 |
| 2 | K | AAG | ||||||
| 3 | N | AAU | 0.27 | 0.03 | ||||
| 4 | N | AAC | 0.37 | 0.40 | 0.34 | 0.44 | ||
| 5 | I | AUA | 0.10 | 0.30 | 0.26 | 0.02 | 0.28 | 0.23 |
| 6 | I | AUU | 0.32 | 0.02 | ||||
| 7 | I | AUC | 0.24 | 0.29 | 0.24 | 0.33 | ||
| 8 | T | ACA | 0.18 | 0.34 | 0.02 | 0.38 | 0.30 | |
| 9 | T | ACU | 0.32 | 0.02 | 0.36 | 0.30 | ||
| 10 | T | ACC | 0.16 | 0.16 | 0.48 | 0.19 | 0.25 | |
| 11 | T | ACG | 0.12 | 0.11 | 0.12 | 0.07 | 0.15 | |
| 12 | R | AGA | 0.05 | 0.13 | 0.14 | 0.01 | 0.12 | 0.13 |
| 13 | R | AGA | 0.10 | 0.06 | 0.07 | 0.08 | 0.16 | |
| 14 | R | CGU | 0.09 | 0.09 | 0.07 | 0.31 | 0.08 | 0.13 |
| 15 | R | CGC | 0.20 | 0.24 | 0.21 | 0.17 | 0.24 | |
| 16 | R | CGG | 0.11 | 0.14 | 0.01 | 0.15 | 0.17 | |
| 17 | R | CGG | 0.23 | 0.01 | 0.20 | |||
| 18 | S | AGA | 0.24 | 0.08 | 0.10 | 0.12 | 0.15 | |
| 19 | S | AGU | 0.15 | 0.24 | 0.24 | 0.01 | 0.24 | 0.19 |
| 20 | S | UCU | 0.11 | 0.08 | 0.08 | 0.33 | 0.06 | 0.12 |
| 21 | S | UCC | 0.07 | 0.20 | 0.20 | 0.00 | 0.19 | 0.16 |
| 22 | S | UCC | 0.01 | |||||
| 23 | S | UCG | 0.16 | 0.11 | 0.11 | 0.29 | 0.12 | 0.15 |
| 24 | Y | UAU | 0.27 | 0.01 | ||||
| 25 | Y | UAC | 0.37 | 0.39 | 0.36 | 0.49 | ||
| 26 | L | CUA | 0.07 | 0.11 | 0.12 | 0.00 | 0.12 | 0.13 |
| 27 | L | CUA | 0.11 | 0.13 | 0.11 | 0.16 | ||
| 28 | L | CUU | 0.07 | 0.14 | 0.11 | 0.36 | 0.14 | 0.13 |
| 29 | L | CUC | 0.19 | 0.23 | 0.03 | 0.21 | ||
| 30 | L | UUG | 0.28 | 0.22 | 0.01 | 0.22 | ||
| 31 | L | UUG | 0.05 | 0.16 | 0.17 | 0.00 | 0.15 | 0.10 |
| 32 | F | UUU | 0.38 | 0.40 | 0.40 | 0.46 | ||
| 33 | F | UUC | 0.31 | 0.01 | ||||
| 34 | C | UGU | 0.41 | 0.00 | ||||
| 35 | C | UGC | 0.34 | 0.32 | 0.45 | 0.49 | ||
| 36 | Q | CAA | 0.44 | 0.03 | 0.48 | |||
| 37 | Q | CAG | 0.45 | 0.35 | 0.40 | |||
| 38 | H | CAU | 0.37 | 0.05 | ||||
| 39 | H | CAC | 0.30 | 0.35 | 0.28 | 0.48 | ||
| 40 | P | CCA | 0.35 | 0.03 | 0.30 | |||
| 41 | P | CCU | 0.29 | 0.35 | 0.02 | 0.40 | ||
| 42 | P | CCC | 0.17 | 0.09 | 0.11 | 0.27 | 0.11 | 0.18 |
| 43 | P | CCG | 0.21 | 0.12 | 0.14 | 0.08 | 0.20 | |
| 44 | E | GAA | 0.32 | 0.03 | 0.42 | |||
| 45 | E | GAG | 0.47 | 0.42 | 0.43 | |||
| 46 | D | GAU | 0.46 | 0.03 | ||||
| 47 | D | GAC | 0.26 | 0.30 | 0.26 | 0.41 | ||
| 48 | V | GUA | 0.05 | 0.17 | 0.18 | 0.00 | 0.19 | 0.15 |
| 49 | V | GUU | 0.32 | 0.01 | ||||
| 50 | V | GUC | 0.12 | 0.17 | 0.47 | 0.15 | 0.22 | |
| 51 | V | GUG | 0.24 | 0.26 | 0.26 | 0.23 | 0.31 | |
| 52 | A | GCA | 0.17 | 0.35 | 0.34 | 0.01 | 0.36 | 0.27 |
| 53 | A | GCU | 0.01 | |||||
| 54 | A | GCC | 0.28 | 0.11 | 0.13 | 0.48 | 0.15 | 0.24 |
| 55 | A | GCG | 0.14 | 0.10 | 0.12 | 0.09 | 0.18 | |
| 56 | G | GGA | 0.34 | 0.34 | 0.03 | 0.30 | 0.27 | |
| 57 | G | GGU | 0.33 | 0.02 | ||||
| 58 | G | GGC | 0.17 | 0.12 | 0.13 | 0.20 | 0.24 | |
| 59 | G | GGG | 0.09 | 0.18 | 0.16 | 0.27 | 0.17 | 0.20 |
Note: A codon with the highest RSCU value for each amino acid is indicated in boldface.
Codons reported by Lerat et al. (.
Major factors of variations in synonymous codon usages in TEs and host nuclear genes
| Subject | Source of variation | Axis 1 | Axis 2 | ||
|---|---|---|---|---|---|
| Total variability | Correlation coefficient (r-value) | Total variability | Correlation coefficient (r-value) | ||
| A3 | 9.89 | −0.64 | 7.42 | 0.11 | |
| C3 | 0.85 | −0.20 | |||
| G3 | – | 0.37 | |||
| T3 | −0.50 | −0.26 | |||
| GC3 | 0.81 | 0.11 | |||
| GC | 0.71 | – | |||
| ENC | – | 0.34 | |||
| A3 | 35.17 | 0.31 | 7.47 | −0.53 | |
| C3 | −0.93 | – | |||
| G3 | −0.15 | – | |||
| T3 | 0.76 | 0.51 | |||
| GC3 | −0.83 | – | |||
| GC | −0.80 | – | |||
| ENC | −0.67 | – | |||
| A3 | 50.78 | −0.96 | 4.64 | 0.13 | |
| C3 | 0.94 | −0.23 | |||
| G3 | 0.84 | 0.27 | |||
| T3 | −0.98 | – | |||
| GC3 | 1.00 | – | |||
| GC | 0.96 | – | |||
| ENC | −0.91 | – | |||
| A3 | 30.14 | −0.79 | 9.47 | −0.47 | |
| C3 | 0.94 | – | |||
| G3 | 0.76 | – | |||
| T3 | −0.92 | 0.32 | |||
| GC3 | 0.99 | – | |||
| GC | 0.95 | – | |||
| ENC | 0.21 | −0.29 | |||
Note: Only significant correlation coefficients are listed (P<0.0001).
Figure 2Correspondence analysis plots of the major explanatory axes of A. thaliana nuclear genes and TEs. A. A. thaliana nuclear genes; B. A. thaliana TEs; C. 59 synonymous codons of A. thaliana nuclear genes; D. 59 synonymous codons of A. thaliana TEs.
Figure 3Correspondence analysis plots of the major explanatory axes of O. sativa nuclear genes and TEs. A. O. sativa nuclear genes; B. O. sativa TEs; C. 59 synonymous codons of O. sativa nuclear genes; D. 59 synonymous codons of O. sativa TEs.