| Literature DB >> 19745054 |
Yang Huang1, Eugene V Koonin, David J Lipman, Teresa M Przytycka.
Abstract
In a wide range of genomes, it was observed that the usage of synonymous codons is biased toward specific codons and codon patterns. Factors that are implicated in the selection for codon usage include facilitation of fast and accurate translation. There are two types of translational errors: missense errors and processivity errors. There is considerable evidence in support of the hypothesis that codon usage is optimized to minimize missense errors. In contrast, little is known about the relationship between codon usage and frameshifting errors, an important form of processivity errors, which appear to occur at frequencies comparable to the frequencies of missense errors. Based on the recently proposed pause-and-slip model of frameshifting, we developed Frameshifting Robustness Score (FRS). We used this measure to test if the pattern of codon usage indicates optimization against frameshifting errors. We found that the FRS values of protein-coding sequences from four analyzed genomes (the bacteria Bacillus subtilis and Escherichia coli, and the yeasts Saccharomyces cerevisiae and Schizosaccharomyce pombe) were typically higher than expected by chance. Other properties of FRS patterns observed in B. subtilis, S. cerevisiae and S. pombe, such as the tendency of FRS to increase from the 5'- to 3'-end of protein-coding sequences, were also consistent with the hypothesis of optimization against frameshifting errors in translation. For E. coli, the results of different tests were less consistent, suggestive of a much weaker optimization, if any. Collectively, the results fit the concept of selection against mistranslation-induced protein misfolding being one of the factors shaping the evolution of both coding and non-coding sequences.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19745054 PMCID: PMC2777431 DOI: 10.1093/nar/gkp712
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.+1 frameshifting in the pause-and-slip model. (A) The competition between small number of cognate tRNA and relatively large number of near-cognate tRNA keeps A-site empty for a relatively long time. (B) A near-cognate tRNA enters the A-site, forming a weak bond with the codon at the A-site. (C) The near-cognate tRNA is translocated to P-site and the binding is not stable. (D) The near-cognate tRNA slips one nucleotide to the right, forming a bond with the codon at the +1 shifted frame.
The mean, SD and minimum of +1/−1 FRS in four analyzed genomes E. coli, B. subtilis, S. cerevisiae and S. pombe
| +1 FRS | +1 FRS | +1 FRS | +1 FRS | −1 FRS | −1 FRS | −1 FRS | −1 FRS | |
|---|---|---|---|---|---|---|---|---|
| Mean | 0.9958 (0.9948) | 0.955 | 0.9936 | 0.99918 | 0.974 (0.974) | 0.958 | 0.9924 | 0.9964 |
| SD | 0.0052 (0.0052) | 0.018 | 0.0050 | 0.00031 | 0.015 (0.015) | 0.019 | 0.0058 | 0.0033 |
| Min | 0.940 (0.940) | 0.82 | 0.928 | 0.9960 | 0.84 (0.84) | 0.84 | 0.924 | 0.966 |
The scores obtained with experimental data on tRNA abundance in E. coli are shown in parentheses.
Ec, E. coli, Bs, B. subtilis, Sc, S. cerevisiae and Sp, S. pombe.
Comparison between FRS of sequences in g and g, and between FRS of sequences in g and g for E. coli, B. subtilis, S. cerevisiae and S. pombe
| +1 FRS | +1 FRS | +1 FRS | +1 FRS | −1 FRS | −1 FRS | −1 FRS | −1 FRS | |
|---|---|---|---|---|---|---|---|---|
| ? (0.3) | > (3e-13) | > (0) | > (0) | < (0.02) | > (2e-10) | > (5e-12) | > (3e-16) | |
| < (1e-4) | > (0) | > (2e-16) | > (6e-17) | ? (0.2) | > (3e-14) | > (2e-16) | > (2e-16) |
The symbols ‘>’/‘<’ indicate the mean FRS of sequences in g (g) is higher/lower than the mean FRS of sequences in g (g). ‘?’ is used for tests resulting in P > 0.1. The probability that the observation is by chance is shown in the parentheses.
Ec, E. coli, Bs, B. Subtilis, Sc, S. cerevisiae and Sp, S. pombe.
Pearson partial correlation between FRS and CAI/length of all genes for E. coli, B. subtilis, S. cerevisiae and S. pombe with length/CAI as the control variable
| +1 FRS versus CAI | −1 FRS versus CAI | +1 FRS versus length | −1 FRS versus length | Number of tested genes | |
|---|---|---|---|---|---|
| −0.077 (7.77e-7) | 0.059 (1.57e-4) | 0.13 (2.56e-16) | 0.035 (0.024) | 4077 | |
| 0.30 (1.51e-92) | 0.26 (4.35e-66) | −0.051 (0.0010) | 0.034 (0.029) | 4104 | |
| 0.37 (6.90e-202) | 0.36 (4.82e-189) | 0.021 (?) | 0.099 (3.12e-14) | 5869 | |
| 0.47 (0) | 0.24 (4.65e-67) | 0.014 (?) | 0.028 (0.045) | 5052 |
The probability for the correlation is shown in parentheses. ‘?’ is used for tests resulting in P > 0.05.
Ec, E. coli, Bs, B. subtilis, Sc, S. cerevisiae and Sp, S. pombe.
Pearson partial correlation between FRS and protein abundance/gene length for E. coli, S. cerevisiae and S. pombe with length/protein abundance as the control variable
| +1 FRS versus protein abundance | −1 FRS versus protein abundance | +1 FRS versus length | −1 FRS versus length | Number of tested genes | |
|---|---|---|---|---|---|
| −0.091 (0.0038) | −0.045 (?) | 0.11 (3.69e-4) | −0.036 (?) | 1005 | |
| 0.19 (1.63e-31) | 0.18 (6.64e-29) | −0.022 (?) | 0.051 (1.71e-3) | 3839 | |
| 0.23 (5.15e-19) | 0.15 (1.14e-8) | −0.15 (2.46e-9) | −0.062 (0.017) | 1465 |
The probability for the correlation is shown in parentheses. ‘?’ is used for tests resulting in P > 0.05.
Ec, E. coli, Bs, B. subtilis, Sc, S. cerevisiae and Sp, S. pombe.
Correlation between +1/−1 FRS and CAI in sequence groups of E. coli, B. subtilis, S. cerevisiae and S. pombe
| +1 FRS versus CAI | +1 FRS versus CAI | +1 FRS versus CAI | +1 FRS versus CAI | −1 FRS versus CAI | −1 FRS versus CAI | −1 FRS versus CAI | −1 FRS versus CAI | |
|---|---|---|---|---|---|---|---|---|
| −0.06 (1e-4) | 0.31 (1e-16) | 0.37 (3e-12) | 0.47 (1e-16) | 0.06 (4e-5) | 0.26 (1e-16) | 0.35 (3e-12) | 0.23 (1e-15) | |
| 0.05 (0.4) | 0.14 (0.02) | 0.23 (5e-5) | 0.23 (7e-5) | −0.03 (0.7) | −0.004 (0.9) | 0.04 (0.5) | 0.07 (0.2) | |
| −0.07 (0.2) | 0.32 (1e-8) | 0.31 (4e-8) | 0.14 (2e-2) | 0.24 (2e-5) | 0.30 (1e-7) | 0.27 (3e-6) | 0.16 (6e-3) | |
| 0.15 (0.01) | 0.1 (0.09) | 0.21 (2e-4) | 0.22 (1e-4) | −0.23 (5e-5) | 0.04 (0.5) | 0.24 (3e-5) | 0.07 (0.2) | |
| −0.16 (4e-3) | 0.31 (6e-8) | 0.41 (1e-13) | 0.47 (4e-16) | 0.21 (3e-4) | 0.34 (1e-9) | 0.42 (3e-14) | 0.30 (9e-8) | |
| 0.23 (6e-5) | 0.07 (0.2) | 0.17 (4e-3) | 0.24 (4e-5) | −0.20 (5e-4) | −0.08 (0.1) | 0.16 (4e-3) | 0.11 (0.05) | |
| −0.17 (3e-3) | 0.38 (1e-11) | 0.44 (1e-15) | 0.50 (1e-16) | 0.35 (4e-10) | 0.42 (5e-14) | 0.44 (1e-15) | 0.37 (3e-11) |
The probability for the correlation is shown in parentheses.
Ec, E. coli, Bs, B. subtilis, Sc, S. cerevisiae and Sp, S. pombe.
Figure 2.The relation between +1 FRS and CAI of gene sequences in four groups of B. subtilis: (A) g, (B) g, (C) g and (D) g.
Figure 3.The number of real gene sequences whose +1/−1 FRS is significantly higher (blue bar) or lower (red bar) than FRS of random sequences generated by permuting its synonymous codons in four organisms. (A) +1 FRS was computed using the whole real and random sequence. The P-value for E. coli and B. subtilis in the whole sequence comparison for +1 FRS score is 2.1e-9 and less than 2.2e-16, respectively. (B) −1 FRS was computed using the whole real and random sequence. (C) +1 FRS was computed using the last 200 codons of the real and random sequence. (D) −1 FRS was computed using the last 200 codons of the real and random sequence.
Comparison between FRS of the sequence segment at the start and FRS of the sequence segment at the end of gene sequences for E. coli, B. subtilis, S. cerevisiae and S. pombe
| +1 FRS start versus end | +1 FRS start versus end | +1 FRVS start versus end | +1 FRS start versus end | −1 FRS start versus end | −1 FRS start versus end | −1 FRS start versus end | −1 FRS start versus end | |
|---|---|---|---|---|---|---|---|---|
| ? (0.3) | < (2e-3) | ? (0.5) | < (0.09) | > (0.06) | < (3e-12) | < (0.03) | < (0.07) | |
| ? (0.7) | < (6e-5) | < (0.03) | < (0.05) | > (0.02) | < (2e-16) | ? (0.7) | ? (0.3) | |
| ? ( | < (1e-7) | < (0.07) | ? (0.7) | > (3e-3) | < (3e-12) | ? (0.9) | ? (0.5) | |
| ? (0.9) | < (0.02) | < (3e-9) | < (4e-3) | > (4e-3) | < (3e-12) | < (0.02) | ? (0.2) |
The symbols ‘>’/‘< ’ indicate the mean FRS of sequence segment at the start of the sequences in the group is higher/lower than the mean FRS of sequence segment at the end of the sequences. ‘?’ is used for tests resulting in P > 0.1.
The probability for the correlation is shown in parentheses.
Ec, E. coli, Bs, B. subtilis, Sc, S. cerevisiae and Sp, S. pombe.
The gradient of +1/−1 FRS of the first 200 codons of gene sequence for E. coli, B. subtilis, S. cerevisiae and S. pombe
| +1 FRS gradient | +1FRS gradient | +1 FRS gradient | +1 FRS gradient | −1 FRS gradient | −1 FRS gradient | −1 FRS gradient | −1 FRS gradient | |
|---|---|---|---|---|---|---|---|---|
| NS | NS | NS | NS | NS | NS | NS | NS | |
| NS | NS | NS | 7.5e-7 (8.7e-4) | NS | NS | NS | NS | |
| NS | NS | 1.9e-5 (1.6e-3) | 5.2e-7 (0.024) | NS | NS | NS | NS | |
| 1.4e-5 (1.2e-3) | −3.0e-5 (0.043) | 2.2e-5 (1.9e-7) | 5.2e-7 (6.4e-6) | NS | NS | NS | NS |
‘NS’ indicates gradients that were not significant (P > 0.05).
The probability for the correlation is shown in parentheses.
Ec, E. coli, Bs, B. subtilis, Sc, S. cerevisiae and Sp, S. pombe.