| Literature DB >> 23132389 |
Feng-Biao Guo1, Yuan-Nong Ye, Hai-Long Zhao, Dan Lin, Wen Wei.
Abstract
There has been significant progress in understanding the process of protein translation in recent years. One of the best examples is the discovery of usage bias in successive synonymous codons and its role in eukaryotic translation efficiency. We observed here a similar type of bias in the other two life domains, bacteria and archaea, although the bias strength was much smaller than in eukaryotes. Among 136 prokaryotic genomes, 98 were found to have significant bias from random use of successive synonymous codons with Z scores larger than three. Furthermore, significantly different bias strengths were found between prokaryotes grouped by various genomic or biochemical characteristics. Interestingly, the bias strength measured by a general Z score could be fitted well (R = 0.83, P < 10(-15)) by three genomic variables: genome size, G + C content, and tRNA gene number based on multiple linear regression. A different distribution of synonymous codon pairs between protein-coding genes and intergenic sequences suggests that bias is caused by translation selection. The present results indicate that protein translation is tuned by codon (pair) usage, and the intensity of the regulation is associated with genome size, tRNA gene number, and G + C content.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23132389 PMCID: PMC3514858 DOI: 10.1093/dnares/dss027
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Figure 1.Z score histograms for two groups (isoaccepting and non-isoaccepting) of codon pairs in three types of sequences in E. coli K12. (A) Z score histograms for two groups (isoaccepting and non-isoaccepting) of codon pairs in gene sequences. The means of the two distributions are different with a P-value = 2.2e-14. (B) Z score histograms for two groups (isoaccepting and non-isoaccepting) of codon pairs in sequences generated by randomly shuffling. The means of the two distributions are different with a P-value = 3.3e-11. (C) Z score histograms for two groups (isoaccepting and non-isoaccepting) of codon pairs in intergenic sequences. The means of the two distributions are different with a P-value = 1.0e-3. The difference between the two types of codon pairs for intergenic sequences is not only much smaller than that for the gene sequences but also quite smaller than that for the shuffled gene sequences. Therefore, the pattern of codon reuse is present in protein-coding sequences, and the conserved pattern appears to be rooted in translation selection.
Figure 2.Histogram of the general Z scores among 136 prokaryotic genomes.
The numbers of three kinds of synonymous codon pairs for nine amino acids in E. coli
| Isoaccepting | Non-isoaccepting | |||||
|---|---|---|---|---|---|---|
| Grouped by | Favoured | Neutral | Disfavoured | Favoured | Neutral | Disfavoured |
| Ala | 4 | 2 | 2 | 1 | 1 | 6 |
| Arg | 6 | 2 | 4 | 11 | 3 | 10 |
| Gly | 4 | 2 | 0 | 2 | 1 | 7 |
| Ile | 3 | 0 | 2 | 0 | 2 | 2 |
| Leu | 5 | 3 | 2 | 6 | 13 | 7 |
| Pro | 6 | 0 | 2 | 2 | 2 | 4 |
| Ser | 8 | 2 | 0 | 1 | 11 | 14 |
| Thr | 4 | 2 | 0 | 1 | 3 | 6 |
| Val | 4 | 2 | 0 | 2 | 4 | 4 |
| Total | 44 | 15 | 12 | 26 | 40 | 60 |
Codon pairs are grouped into those with and without isoacceptors (sharing a tRNA), by parsimony. Within each group, pairs were classified as favoured (≥3 s.d.), neutral (between −3 and +3 s.d.), or disfavoured (≤−3 s.d.).
Taxonomic distribution of 136 prokaryotic genomes analysed in this study
| Phylum no. | Class no. | Order no. | Family no. | Genus no. | Species no. | |
|---|---|---|---|---|---|---|
| Bacteria | 14 | 19 | 35 | 55 | 76 | 107 |
| Archaea | 5 | 14 | 18 | 27 | 29 | 29 |
The mean Z scores and SD at the level of phylum
| Phylum | Mean | SD | Genome no. | VC (SD/mean) |
|---|---|---|---|---|
| Actinobacteria | 7.49 | 3.74 | 8 | 0.50 |
| Chlamydiae | 1.61 | 0.41 | 5 | 0.26 |
| Firmicutes | 4.60 | 1.37 | 21 | 0.30 |
| Proteobacteria | 6.70 | 3.31 | 54 | 0.49 |
| Spirochaetes | 3.03 | 0.31 | 4 | 0.10 |
| Tenericutes | 1.99 | 0.81 | 5 | 0.41 |
| Crenarchaeota | 3.16 | 0.71 | 6 | 0.23 |
| Euryarchaeota | 3.39 | 1.57 | 20 | 0.46 |
| 4.00 | 1.53 | 15.4 | 0.34 |
VC denotes variance coefficient in Table 3, 4 and 5, respectively.
The mean Z scores and SD at the level of family
| Family | Mean | SD | Genome no. | VC (SD/Mean) |
|---|---|---|---|---|
| Mycobacteriaceae | 5.01 | 1.08 | 3 | 0.22 |
| Chlamydiaceae | 1.44 | 0.19 | 4 | 0.13 |
| Bacillaceae | 4.84 | 2.29 | 4 | 0.47 |
| Lactobacillaceae | 4.84 | 1.14 | 3 | 0.24 |
| Streptococcaceae | 4.92 | 1.08 | 5 | 0.22 |
| Brucellaceae | 5.33 | 0.14 | 3 | 0.03 |
| Burkholderiaceae | 8.78 | 1.76 | 3 | 0.20 |
| Neisseriaceae | 10.72 | 3.85 | 3 | 0.36 |
| Enterobacteriaceae | 7.43 | 4.29 | 7 | 0.58 |
| Pasteurellaceae | 4.32 | 0.68 | 3 | 0.16 |
| Vibrionaceae | 8.02 | 0.66 | 5 | 0.08 |
| Xanthomonadaceae | 9.75 | 1.56 | 4 | 0.16 |
| Spirochaetaceae | 2.98 | 0.37 | 3 | 0.12 |
| Mycoplasmataceae | 1.99 | 0.81 | 5 | 0.41 |
| 5.74 | 1.42 | 3.9 | 0.24 |
The mean Z scores and SD at the level of genus
| Genus | Mean | SD | Genome no. | VC (SD/Mean) |
|---|---|---|---|---|
| Mycobacterium | 5.01 | 1.08 | 3 | 0.22 |
| Chlamydophila | 1.36 | 0.11 | 3 | 0.08 |
| Bacillus | 4.84 | 2.29 | 4 | 0.47 |
| Lactobacillus | 4.84 | 1.14 | 3 | 0.24 |
| Streptococcus | 5.03 | 1.22 | 4 | 0.24 |
| Brucella | 5.33 | 0.14 | 3 | 0.03 |
| Vibrio | 8.08 | 0.74 | 4 | 0.09 |
| Xanthomonas | 10.51 | 0.43 | 3 | 0.04 |
| Mycoplasma | 2.07 | 0.91 | 4 | 0.44 |
| 5.23 | 0.90 | 3.4 | 0.21 |
Comparison of the general Z scores between any two groups based on six classification criteriaa
| Classifying criteria | Mean | SD | Genome number | |
|---|---|---|---|---|
| Gram type | ||||
| Gram negative | 6.01 | 3.43 | 70 | 0.064 |
| Gram positive | 4.87 | 2.64 | 35 | |
| Growth rateb | ||||
| Fast | 6.28 | 3.05 | 54 | 0.028 |
| Slow | 4.92 | 3.27 | 53 | |
| Oxygen metabolismc | ||||
| Aerobic | 6.39 | 3.05 | 35 | 0.017 |
| Anaerobic | 4.58 | 2.04 | 16 | |
| G + C content | ||||
| Low GC (<46.2%) | 3.57 | 2.19 | 68 | 1.21e-09 |
| High GC (>46.2%) | 6.62 | 3.11 | 68 | |
| tRNA gene number | ||||
| Less tRNA (<32) | 3.30 | 2.15 | 68 | 1.37e-13 |
| More tRNA (>32) | 6.89 | 2.84 | 68 | |
| Genome size | ||||
| Small size (<2.55 Mb) | 3.25 | 1.97 | 68 | 3.08e-14 |
| Large size (>2.55 Mb) | 6.93 | 2.90 | 68 | |
aBecause information of the upper three factors is not available for some of the genomes, the total genome number is less than 136 for these factors. Detailed information of each prokaryotic genomes is shown in Supplementary Table S2.
bOriginal growth rate data were obtained from Vieira-Silva and Rocha[40]. Genomes with generation time longer than 2 h are taken as slow growing, otherwise as fast growing.
cOriginal data on oxygen metabolism were obtained from NCBI at ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/lproks_0.txt.
Figure 3.Scatter plot of general Z scores against three factors (genome size, G + C content, and tRNA gene number) for 136 prokaryotic genomes. In the figure, each point corresponds to a prokaryotic genomes. (A) Scatter plot of general Z scores against genome size: linear fitting by least squares method. (B) Scatter plot of general Z scores against G + C content: linear fitting by least squares method. (C) Scatter plot of general Z scores against tRNA gene number: linear fitting by least squares method.