| Literature DB >> 21787402 |
Adam C Retchless1, Jeffrey G Lawrence.
Abstract
BACKGROUND: Statistics measuring codon selection seek to compare genes by their sensitivity to selection for translational efficiency, but existing statistics lack a model for testing the significance of differences between genes. Here, we introduce a new statistic for measuring codon selection, the Adaptive Codon Enrichment (ACE).Entities:
Mesh:
Substances:
Year: 2011 PMID: 21787402 PMCID: PMC3162537 DOI: 10.1186/1471-2164-12-374
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Normalized Synonymous Codon Usage as a function of alternative codon scoring tables.
| Lys | AAG | 0.303 | 0.380 | 1.000 | 0.427 | 0.441 | 1.000 | 0.634 | ||
| Lys | AAA | 1.000 | 0.800 | 1.000 | 1.000 | 1.000 | 0.385 | 0.607 | 1.000 | |
| Pro | CCG | 1.000 | 1.000 | 1.000 | 1.000 | 0.279* | 0.127 | 1.000 | 0.409 | |
| Pro | CCA | 0.358 | 0.511 | 0.439 | 0.962 | 1.000 | 0.2817 | 1.000 | ||
| Pro | CCT | 0.295 | 0.697 | 0.659 | 0.693 | 0.2374 | 0.959 | |||
| Pro | CCC | 0.231 | 0.074 | 0.206 | 0.039 | 0.086 | 0.4627 | 0.151 | 0.134 | |
| Thr | ACG | 0.613 | 0.082 | 0.050 | 0.652 | 0.233* | 0.140 | 0.264 | 0.046 | 0.078 |
| Thr | ACA | 0.290 | 0.094 | 0.121 | 1.000 | 0.606* | 0.238 | 0.104 | 0.232 | |
| Thr | ACT | 0.374 | 1.000 | 1.000 | 0.392 | 1.000 | 1.000 | 0.137 | 1.000 | |
| Thr | ACC | 1.000 | 0.924*6 | 0.346 | 0.386 | 0.026 | 0.026 | 1.000 | 0.448 | |
| Val | GTG | 1.000 | 0.229* | 0.160 | 0.906 | 0.168 | 0.185 | 1.000 | 0.646* | 0.117 |
| Val | GTA | 0.415 | 0.545 | 0.916 | 0.695 | 0.904 | 0.201 | 0.399 | 0.361 | |
| Val | GTT | 0.698 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.181 | 1.000 | 1.000 |
| Val | GTC | 0.587 | 0.139 | 0.166 | 0.904 | 0.157 | 0.174 | 0.572 | 0.798* | 0.253 |
1. The ftable was constructed using all of the genes in the specified genome; NSCU values are the frequency of each codon normalized to the largest value within each synonymous codon group
2. The ftable was constructed using the Translation40 genes [8].
3. The normalized δ values were calculated as the f/fratio, thus correcting fvalues to the codon composition of the genome as a whole.
4. Bolded underlines indicate that the uncorrected table significantly underestimates selection against this codon and incorrectly denotes it as the preferred codon.
5. Single underlines indicate that the uncorrected table significantly overestimates selection against this codon.
6. Asterisks indicate that the uncorrected table significantly underestimates selection against this codon.
Figure 1Correlations coefficients of five different codon selection statistics with transcript abundance data (see text). The set of genes contributing to fwas systematically increased, 20 genes at a time, using the most highly expressed genes; typical ftables use 5000-15000 codons. All ORFs were used to construct f.
Correlation of codon statistics with rates of sequence divergence.
| Pearson correlation of codon statistic with dS | ||||||||
|---|---|---|---|---|---|---|---|---|
| 0.24 | -0.20 | -0.18 | -0.25 | -0.22 | -0.20 | 0.12 | ||
| 0.43 | -0.23 | -0.18 | -0.18 | -0.22 | -0.23 | 0.13 | ||
| 0.5 | -0.29 | -0.30 | -0.26 | -0.22 | -0.29 | 0.15 | ||
| 0.87 | -0.56 | -0.48 | -0.65 | -0.60 | -0.56 | 0.41 | ||
| 0.95 | -0.48 | -0.45 | -0.47 | -0.47 | -0.48 | 0.19 | ||
| 0.98 | -0.51 | -0.49 | -0.61 | -0.57 | -0.51 | 0.26 | ||
| 1.04 | -0.51 | -0.44 | -0.54 | -0.52 | -0.50 | 0.32 | ||
| 1.17 | -0.59 | -0.14 | -0.58 | -0.60 | -0.59 | 0.40 | ||
1. Genome from which codon bias statistics were calculated. All codon statistics were calculated using the Translation40 gene set to construct fand all ORFs to construct f.
2. Average divergence (dS) and correlation were measured among putative orthologs with dS < 1.5 between reference and target genomes.
3. Correlation to log(probability) of belonging to the core genome as classified by Random Forest classifier [11]; RF values were calculated from a forest of 1000 trees as reported by Supek et al. [11].
ACEχ2 of three genomes calculated for different sets of genes.
| All Genes | 3.70 (5566)a | 10.3 (4144) | 1.97 (564) |
| Genus Core Genes | 6.55 (1675)b | 11.8 (2593)d | n/a |
| Enteric Core Genes | n/ac | 28.7 (499) | 1.94 (499) |
| Translation40 genes | 72.6 (40) | 63.6 (40) | 4.63 (40) |
a. ACEvalues were calculated using the Translation40 gene set to construct fand all ORFs to construct f. Number of genes is reported in parentheses.
b. Putative orthologs of Pseudomonas aeruginosa, P. mendocina, P. stutzeri, P. entomophila, and P. putida.
c. Not applicable.
d. Putative orthologs of E. coli, E. fergusonii, and E. albertii
Figure 2Distribution of ACE. ACEvalues were calculated using the Translation40 gene set to construct fand all ORFs to construct f. A. All genes from E. coli (4144 ORFs) and B. aphidicola (564 ORFs). B. Putative orthologs (499 ORFs) shared between E. coli and B. aphidicola.
Figure 3Codon selection as a function of tRNA gene number in Enterobacteriaceae. All codon statistics were calculated using the Translation40 gene set to construct fand the shared set of ORFs to construct f. A. ACEχ2 calculated on all shared genes, for the set of 14 free-living bacteria (1060 genes; open circles, thick line) and a set of 4 endosymbionts along with 11 free-living bacteria (201 genes; filled circles, thin line). B. ACEχ2, like other statistics, calculated on Translation40 gene set.
Correlation coefficients of genome-wide codon selection measures with the number of tRNA genes in each genome.
| Pearson Correlation Coefficient of tRNA Gene Number and Genome-wide Codon Selection Statistica | |||||||
|---|---|---|---|---|---|---|---|
| Family | Number of Genomes | Number of Orthologues | ACEχ2coreb | ACEχ240b | ENCdiff | ΔN'c | S |
| Enterobacteriaceae | 14 | 1060 | 0.628 (p = 0.008)c | 0.338 (0.119) | 0.264 (0.181) | 0.399 (0.079) | 0.257 (0.188) |
| Mycobacteriaceae | 9 | 982 | 0.409 (0.167) | 0.461 (0.106) | 0.512 (0.080) | 0.495 (0.088) | 0.153 (0.347) |
| Bacilliaceae | 12 | 541 | 0.556 (0.030) | 0.440 (0.076) | 0.373 (0.116) | 0.346 (0.136) | 0.427 (0.083) |
a. All codon statistics were calculated using the Translation40 gene set to construct fand the shared set of ORFs to construct f.
b. ACEχ2core was calculated from all orthologues; ACEχ2 40 was calculated from Translation40 genes.
c. Probability of calculating a correlation coefficient this large in the absence of a true correlation; one-tailed test using Fisher's z-transformed correlation.