| Literature DB >> 27426467 |
Siddhartha Sankar Satapathy1, Bhesh Raj Powdel2, Alak Kumar Buragohain3,4, Suvendra Kumar Ray3.
Abstract
The different triplets encoding the same amino acid, termed as synonymous codons, are not equally abundant in a genome. Factors such as G + C% and tRNA are known to influence their abundance in a genome. However, the order of the nucleotide in each codon per se might also be another factor impacting on its abundance values. Of the synonymous codons for specific amino acids, some are preferentially used in the high expression genes that are referred to as the 'optimal codons' (OCs). In this study, we compared OCs of the 18 amino acids in 221 species of bacteria. It is observed that there is amino acid specific influence for the selection of OCs. There is also influence of phylogeny in the choice of OCs for some amino acids such as Glu, Gln, Lys and Leu. The phenomenon of codon bias is also supported by the comparative studies of the abundance values of the synonymous codons with same G + C. It is likely that the order of the nucleotides in the triplet codon is also perhaps involved in the phenomenon of codon usage bias in organisms.Entities:
Keywords: bacteria; codon usage bias; genome composition; high expression genes; optimal codon
Year: 2016 PMID: 27426467 PMCID: PMC5066170 DOI: 10.1093/dnares/dsw027
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Bacteria belonging to different phylogenetic groups and their genomic G + C% considered in the present study
| S. No. | Group | No. of strains | Maximum G + C% | Minimum G + C% | Average G + C% |
|---|---|---|---|---|---|
| 1 | Actinobacteria | 23 | 72.83 | 46.31 | 64.45 |
| 2 | α-Proteobacteria | 35 | 68.79 | 27.51 | 48.11 |
| 3 | Aquificae | 1 | 43.3 | 43.3 | 43.3 |
| 4 | Bacteroidetes | 8 | 66.13 | 22.44 | 40.26 |
| 5 | β-Proteobacteria | 17 | 68.49 | 50.72 | 63.47 |
| 6 | Chlamydiae | 6 | 41.31 | 39.19 | 40.10 |
| 7 | Chlorobi | 1 | 56.53 | 56.53 | 56.53 |
| 8 | Chloroflexi | 1 | 47.03 | 47.03 | 47.03 |
| 9 | Cyanobacteria | 8 | 62.00 | 31.32 | 47.09 |
| 10 | Δ-Proteobacteria | 6 | 71.38 | 50.65 | 62.27 |
| 11 | ϵ-Proteobacteria | 11 | 44.54 | 27.05 | 36.56 |
| 12 | Firmicutes | 38 | 56.98 | 28.21 | 38.44 |
| 13 | γ-Proteobacteria | 47 | 68.67 | 22.48 | 48.19 |
| 14 | Spirochaetes | 5 | 40.24 | 27.77 | 33.81 |
| 15 | Tenericutes | 14 | 40.01 | 23.77 | 28.58 |
Note: Aquificae, Chlorobi and Chloroflexi groups consisting of only one bacterium each are not considered in the result analysis shown in Fig. 1b.
Figure 1bOptimal codons of 18 amino acids in bacteria within different phylogenetic groups. As described earlier for Fig.1a, the OCs found out in 221 bacteria and were considered in 15 groups (Table 1b) according to the phylogeny. Out of the 15 groups, three groups having only one bacterium each were not considered for further analysis. The colour code is used to represent % of bacteria in the phylogenetic group where the codon was observed more frequent in the HEGs than the whole genome. For the raw data, please refer to Supplementary Supplementary Table 3.
Number of bacteria studied in the HTN and the LTN groups
| G + C% | HTN | LTN |
|---|---|---|
| VH | 21 | 13 |
| H | 30 | 8 |
| M | 24 | 10 |
| L | 32 | 35 |
| VL | 12 | 36 |
| Total | 119 | 102 |
VH (very high; G + C% ≥ 65), H (high; 55 ≤ G + C% < 65), M (moderate; 45 ≤ G + C% < 55), L (low; 35 ≤ G + C% < 45) and VL (very low; G + C%<35).
Figure 1aOptimal codons of 18 amino acids in the HTN and in the LTN groups of bacteria within different genomic G + C% groups. A codon is considered as the OCs if it is used more frequently in the set of HEGs in comparison to all the genes in a genome. In our study here, to avoid any borderline cases, we considered those codons as the OCs, which are with abundance values of 5% or higher in the HEGs than the same with respect to all the genes of the genome. OCs were found out in 221 bacteria (Supplementary Table 1). The bacteria were considered in two groups: (A) HTN and (B) LTN. Further, in each of the two groups, bacteria were grouped into five sub-groups according to their genome G + C% (i) VH, 65.0 ≤ G + C%; (ii) H, 55.0 ≤ G + C% < 65.0; (iii) M, 45.0 ≤ G + C% < 55.0; (iv) L, 35.0 ≤ G + C% < 45.0 and (v) VL, G + C% < 35.0. Numbers of bacteria considered in each sub-group were given in Table 1. The colour code is used to represent % of bacteria in the G + C% group where the codon was observed more frequent in the HEGs than the whole genome. For the raw data, please refer to Supplementary Table 2.
A general pattern indicating amino acid specific determination of OCs in different bacterial groups
| Degeneracy | Amino acids | Selected OCs in the high expression genes |
|---|---|---|
| Two (NNY) | Phe, Tyr, Asn,His and Asp | C-ending codons in all genomes |
| Cys | C-ending codon in G + C high genomes (e.g. Actinobacteria, β-Proteobacteria) and U-ending codon in G + C low genomes (e.g. Firmicutes) | |
| Two (NNR) | Gln and Lys | The G-ending codons in the G + C high (e.g. Actinobacteria, β-Proteobacteria) and the A-ending codons in the G + C low genomes (Firmicutes). Exception is Spirochaetes (G + C low) where the G-ending codon |
| Glu | The G-ending codons in G + C high/low genomes (e.g. Actinobacteria and Spirochaetes) and the A-ending in G + C high/low genomes (e.g. β-Proteobacteria and Firmicutes) | |
| Three | Ile | C-ending codons in G + C high and U/C-ending in G + C low genomes |
| Four | Val, Thr and Ala | Y-ending codons (C-ending codons in G + C high and U-ending in G + C low genomes) |
| Pro | R-ending codons (G-ending codons in G + C high and A-ending in G + C low genomes) | |
| Gly | Y-ending codons (U/C-ending codons in G + C high and U-ending in G + C low genomes) | |
| Six | Leu and Ser | Family box codons (G/C-ending codons in G + C high and A/U-ending in G + C low genomes) |
| Arg | Y-ending codon (U/C-ending codons in G + C high and U-ending codon in G + C low genomes) |
*All are statistically significant (chi-square significance test P value < 0.01).
Comparison of the abundance values of the synonymous codons in bacteria with same G + C composition within family boxes
| AA | VH | H | M | |
|---|---|---|---|---|
| Val | HTN | G3 > C3 | G3 > C3 | G3 > C3 |
| LTN | G3∼C3 | G3 > C3 | G3 > C3 | |
| Pro | HTN | G3 > C3 | G3 > C3 | G3 > C3 |
| LTN | G3∼C3 | G3∼C3 | G3∼C3 | |
| Thr | HTN | C3 > G3 | C3 > G3 | C3 > G3 |
| LTN | C3 > G3 | C3 > G3 | C3 > G3 | |
| Ala | HTN | C3 > G3 | C3 > G3 | C3∼G3 |
| LTN | C3 > G3 | C3 > G3 | C3 > G3 | |
| Gly | HTN | C3 > G3 | C3 > G3 | C3 > G3 |
| LTN | C3 > G3 | C3 > G3 | C3 > G3 | |
| Leu | HTN | G3 > C3 | G3 > C3 | G3 > C3 |
| LTN | G3 > C3 | G3 > C3 | G3 > C3 | |
| Ser | HTN | G3 > C3 | G3 > C3 | G3∼C3 |
| LTN | G3 > C3 | G3 > C3 | G3∼C3 | |
| Arg | HTN | C3 > G3 | C3 > G3 | C3 > G3 |
| LTN | C3 > G3 | C3 > G3 | C3 > G3 | |
*P value <0.05.
**P value <0.01 (chi-square significant test; H0: G and C equal preference; HA: unequal preference between G and C).