| Literature DB >> 11737948 |
D Gilis1, S Massar, N J Cerf, M Rooman.
Abstract
BACKGROUND: The genetic code is known to be efficient in limiting the effect of mistranslation errors. A misread codon often codes for the same amino acid or one with similar biochemical properties, so the structure and function of the coded protein remain relatively unaltered. Previous studies have attempted to address this question quantitatively, by estimating the fraction of randomly generated codes that do better than the genetic code in respect of overall robustness. We extended these results by investigating the role of amino-acid frequencies in the optimality of the genetic code.Entities:
Mesh:
Substances:
Year: 2001 PMID: 11737948 PMCID: PMC60310 DOI: 10.1186/gb-2001-2-11-research0049
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
The mean frequencies of the individual amino acids (p(a)) in the genomes of living organisms
| Amino acid | ||||
| Ala | 7.85 (2.27) | 8.08 (2.61) | 6.48 (0.76) | 7.80 (2.38) |
| Arg | 5.92 (1.15) | 4.99 (1.61) | 5.24 (0.49) | 5.23 (1.43) |
| Asp | 5.47 (1.57) | 5.06 (0.42) | 5.31 (0.35) | 5.19 (0.81) |
| Asn | 3.40 (1.05) | 4.63 (1.97) | 4.76 (0.90) | 4.37 (1.73) |
| Cys | 0.89 (0.32) | 1.00 (0.31) | 1.86 (0.35) | 1.10 (0.44) |
| Glu | 7.79 (1.13) | 6.35 (1.21) | 6.64 (0.28) | 6.72 (1.24) |
| Gln | 1.90 (0.40) | 3.89 (0.95) | 4.28 (0.69) | 3.45 (1.19) |
| Gly | 7.49 (0.75) | 6.70 (1.46) | 5.88 (0.72) | 6.77 (1.32) |
| His | 1.70 (0.29) | 2.07 (0.39) | 2.41 (0.21) | 2.03 (0.41) |
| Ile | 7.59 (2.19) | 7.05 (2.26) | 5.48 (0.92) | 6.95 (2.16) |
| Leu | 9.65 (1.00) | 10.52 (0.66) | 9.35 (0.42) | 10.15 (0.86) |
| Lys | 6.04 (2.75) | 6.43 (2.78) | 6.30 (0.69) | 6.32 (2.53) |
| Met | 2.49 (0.47) | 2.19 (0.37) | 2.33 (0.21) | 2.28 (0.39) |
| Phe | 4.00 (0.74) | 4.57 (0.97) | 4.20 (0.59) | 4.39 (0.89) |
| Pro | 4.43 (0.92) | 3.99 (1.00) | 5.15 (0.75) | 4.26 (1.01) |
| Ser | 5.93 (1.11) | 6.18 (0.77) | 8.50 (0.47) | 6.46 (1.17) |
| Thr | 4.77 (0.89) | 5.15 (0.63) | 5.57 (0.32) | 5.12 (0.69) |
| Trp | 1.03 (0.20) | 1.10 (0.28) | 1.13 (0.12) | 1.09 (0.25) |
| Tyr | 3.68 (0.66) | 3.23 (0.64) | 3.03 (0.26) | 3.30 (0.63) |
| Val | 7.97 (0.85) | 6.87 (1.19) | 6.09 (0.42) | 7.01 (1.18) |
The frequencies p(a) were computed as averages over the frequencies observed in genomes of archaea (Aeropyrum pernix K1, Archaeoglobus fulgidus,Halobacterium sp. NRC-1, Methanococcus jannaschii, Methanobacterium thermoautotrophicum, Pyrococcus abyssi, Pyrococcus horikoshi and Thermoplasma acidophilum), bacteria (Aquifex aeolicus, Bacillus halodurans, Bacillus subtilis, Borrelia burgdorferi, Buchnera aphidicola, Campylobacter jejuni, Chlamydia trachomatis, Deinococcus radiodurans, Escherichia coli K-12, Haemophilus influenzae, Mycobacterium leprae, Mycobacterium tuberculosis, Mycoplasma genitalium, Mycoplasma pneumoniae, Pasteurella multocida, Pseudomonas aeruginosa, Rickettsia prowazekii, Thermotoga maritima, Treponema pallidum, Ureaplasma parvum, Vibrio cholerae and Xylella fastidiosa) and eukaryotes (Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens and Saccharomyces cerevisiae). The last column contains the average frequencies p(a) computed from all these genomes. The standard deviation of the distributions is given in parentheses.
Figure 1The relative frequency p(a) (in %) of amino acid a (right-hand column of Table 1), as a function of the number of synonyms n(a) that code for it. The linear regression line is indicated; the correlation coefficient is equal to 0.66.
Fraction of random codes that are fitter than the genetic code
| Φ | Φ | Φ | |
| 9.8 × 10-7 | 1.5 × 10-6 | 6.5 × 10-7 | |
| 1.7 × 10-6 | 1.9 × 10-6 | 1.2 × 10-6 | |
| 3.4 × 10-15* | 5.1 × 10-16* | 5.0 × 10-17* | |
| 3.8 × 10-6 | 2.2 × 10-6 | 2.0 × 10-9 | |
| 2.3 × 10-6 | 6.0 × 10-7 | 2.0 × 10-9 |
Fraction f of random codes that have a lower value of the fitness functionΦ, Φ or Φ) than the natural code, using each of the four cost functions g, g, g, g and g. Values marked with an asterisk have been obtained by extrapolation as explained in text. The number of randomly generated codes is equal to 109 and the amino-acid frequencies used are the average ones listed in the right-hand column of Table 1.
Percentage of random amino-acid frequency assignments yielding lower fractions f than the natural one
| % | |
| 3 | |
| <1 | |
| <1 | |
| <1 |
Percentage of the sets of random amino-acid frequency assignments for which the fraction f of random codes that beat the natural code is lower than the corresponding fraction computed with the natural frequency p(a) values. This percentage is estimated for the four cost functions - g, g, g and g - on the basis of 100 random frequencies and, for each of them, 106 random codes. For all cost functions except g, we were only able to give an upper bound (estimated to be equal to 1%), because our sample of random codes is too small and we did not find any random frequency set for which f is lower than that obtained with the natural frequencies.
Fraction f for different sets of allowed amino-acid interchanges in the alternative codes
| Φ | Φ | Φ | |
| Unrestricted set | 2.3 × 10-6 | 6.0 × 10-7 | 2.0 × 10-9 (97%) |
| Biosynthesis-restricted set | 6.1 × 10-6 | 1.9 × 10-6 | 2.9 × 10-8 (98%) |
| Degeneracy-restricted set | 2.3 × 10-6 | 2.1 × 10-6 | 1.3 × 10-6 (97%) |
Fraction f of random codes that have a lower value of the fitness function (Φ, Φ or Φ) than the natural code, using the cost function g. For the unrestricted set, the f values were estimated from 109 randomly generated codes, where the only constraint is the preservation of the code's block structure (as in Table 2). For the biosynthesis-restricted set, only permutations of amino acids sharing the same metabolic pathway were considered, that is, interchanges of amino acids contained in one of the four sets {F, S, Y, C, W}, {L, P, H, Q, R}, {I, M, T, N, K}, {V, A, D, E, G} (single-letter amino-acid notation) [34]. As the number of alternative codes is reasonable (207,360,000), they have not been randomly chosen, but all have been tested. The degeneracy-restricted set contains results obtained by shuffling only amino acids with the same degeneracy in the natural code, corresponding to the sets {M, W}, {C, D, E, F, H, K, N, Q, Y}, {I}, {A, G, P, T, V}, {L, R, S}. Here also, all 522,547,200 possible codes have been systematically tested. The percentage of optimization of the natural code compared to the optimal alternative ones, as defined in the text, is given in parentheses for Φ. For the two restricted sets, for which all alternative codes were exhaustively generated, the Φ value of the optimal code was computed exactly. For the unrestricted set, the optimal Φ value was taken as the best of the unrestricted and two restricted sets.
Figure 2Mutation matrix M.