| Literature DB >> 19430600 |
Paul G Higgs1, Weilong Hao, G Brian Golding.
Abstract
Many different selective effects on DNA and proteins influence the frequency of codons and amino acids in coding sequences. Selection is often stronger on highly expressed genes. Hence, by comparing high- and low-expression genes it is possible to distinguish the factors that are selected by evolution. It has been proposed that highly expressed genes should (i) preferentially use codons matching abundant tRNAs (translational efficiency), (ii) preferentially use amino acids with low cost of synthesis, (iii) be under stronger selection to maintain the required amino acid content, and (iv) be selected for translational robustness. These effects act simultaneously and can be contradictory. We develop a model that combines these factors, and use Akaike's Information Criterion for model selection. We consider pairs of paralogues that arose by whole-genome duplication in Saccharmyces cerevisiae. A codon-based model is used that includes asymmetric effects due to selection on highly expressed genes. The largest effect is translational efficiency, which is found to strongly influence synonymous, but not non-synonymous rates. Minimization of the cost of amino acid synthesis is implicated. However, when a more general measure of selection for amino acid usage is used, the cost minimization effect becomes redundant. Small effects that we attribute to selection for translational robustness can be identified as an improvement in the model fit on top of the effects of translational efficiency and amino acid usage.Entities:
Keywords: Amino Acid Usage; Codon Usage; Saccharomyces cerevisiae; Translational efficiency; Translational robustness
Year: 2007 PMID: 19430600 PMCID: PMC2674637
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Codon group properties.
| π | ||||||
|---|---|---|---|---|---|---|
| Phe(UUY) | 10 | 4.455 | 1 | 0 | 1 | 6 |
| Leu(UUR) | 17 | 5.314 | 0.564 | 0.035 | 2 | 6 |
| Leu(CUY) | 1 | 1.682 | 0.179 | −0.025 | 3 | 6 |
| Leu(CUR) | 3 | 2.425 | 0.257 | −0.009 | 4 | 6 |
| Ile(AUY) | 13 | 4.705 | 0.727 | 0.050 | 2 | 6 |
| Ile(AUA) | 2 | 1.177 | 0.273 | −0.050 | 2 | 6 |
| Met(AUG) | 5 | 2.094 | 1 | 0 | 0 | 6 |
| Val(GUY) | 14 | 3.217 | 0.589 | 0.064 | 3 | 6 |
| Val(GUR) | 4 | 2.241 | 0.411 | −0.064 | 3 | 6 |
| Ser(UCY) | 11 | 3.860 | 0.424 | 0.025 | 3 | 6 |
| Ser(UCR) | 4 | 2.840 | 0.312 | −0.018 | 3 | 6 |
| Ser(AGY) | 4 | 2.404 | 0.264 | −0.006 | 1 | 2 |
| Pro(CCY) | 2 | 2.139 | 0.465 | −0.017 | 3 | 5 |
| Pro(CCR) | 10 | 2.464 | 0.535 | 0.017 | 3 | 5 |
| Thr(ACY) | 11 | 3.196 | 0.553 | 0.047 | 3 | 6 |
| Thr(ACR) | 5 | 2.579 | 0.447 | −0.047 | 3 | 6 |
| Ala(GCY) | 11 | 3.305 | 0.592 | 0.056 | 3 | 5 |
| Ala(GCR) | 5 | 2.281 | 0.408 | −0.056 | 3 | 5 |
| Tyr(UAY) | 8 | 3.371 | 1 | 0 | 1 | 1 |
| His(CAY) | 7 | 2.263 | 1 | 0 | 1 | 4 |
| Gln(CAR) | 9 | 3.906 | 1 | 0 | 1 | 5 |
| Asn(AAY) | 10 | 6.139 | 1 | 0 | 1 | 2 |
| Lys(AAR) | 21 | 7.208 | 1 | 0 | 1 | 2 |
| Asp(GAY) | 15 | 5.858 | 1 | 0 | 1 | 4 |
| Glu(GAR) | 16 | 6.467 | 1 | 0 | 1 | 4 |
| Cys(UGY) | 4 | 1.243 | 1 | 0 | 1 | 1 |
| Trp(UGG) | 6 | 1.007 | 1 | 0 | 0 | 0 |
| Arg(CGY) | 6 | 1.183 | 0.261 | 0.007 | 3 | 3 |
| Arg(CGR) | 1 | 0.160 | 0.035 | −0.013 | 4 | 4 |
| Arg(AGR) | 12 | 3.181 | 0.703 | 0.006 | 2 | 2 |
| Gly(GGY) | 16 | 3.341 | 0.662 | 0.066 | 3 | 4 |
| Gly(GGR) | 5 | 1.704 | 0.338 | −0.066 | 3 | 3 |
Amino acid properties.
| Δ | ||||||
|---|---|---|---|---|---|---|
| Phe | 52.0 | 165 | 5.754 | 4.455 | −1.299 | −0.003 |
| Leu | 27.3 | 131 | 11.852 | 9.420 | −2.432 | −0.275 |
| Ile | 32.3 | 131 | 9.514 | 6.476 | −3.038 | −0.186 |
| Met | 34.3 | 149 | 1.993 | 2.094 | 0.101 | 0.120 |
| Val | 23.3 | 117 | 6.099 | 5.458 | −0.641 | −0.018 |
| Ser | 11.7 | 105 | 9.148 | 9.104 | −0.044 | −0.038 |
| Pro | 20.3 | 115 | 3.232 | 4.603 | 1.370 | 0.086 |
| Thr | 18.7 | 119 | 6.099 | 5.775 | −0.324 | −0.003 |
| Ala | 11.7 | 89 | 3.232 | 5.585 | 2.353 | 0.444 |
| Tyr | 50.0 | 181 | 5.754 | 3.371 | −2.382 | −0.115 |
| His | 38.3 | 155 | 3.049 | 2.263 | −0.786 | 0.040 |
| Gln | 16.3 | 146 | 3.049 | 3.906 | 0.856 | 0.041 |
| Asn | 14.7 | 132 | 5.754 | 6.139 | 0.385 | −0.183 |
| Lys | 30.3 | 146 | 5.754 | 7.208 | 1.455 | −0.096 |
| Asp | 12.7 | 133 | 3.049 | 5.858 | 2.808 | 0.014 |
| Glu | 15.3 | 147 | 3.049 | 6.467 | 3.418 | 0.148 |
| Cys | 24.7 | 121 | 3.049 | 1.243 | −1.807 | −0.074 |
| Trp | 74.3 | 204 | 1.056 | 1.007 | −0.050 | 0.017 |
| Arg | 27.3 | 174 | 6.282 | 4.524 | −1.758 | −0.055 |
| Gly | 11.7 | 75 | 3.232 | 5.045 | 1.813 | 0.133 |
Test of individual asymmetry effects.
| tRNA | 474 | 291 | 61 | 4.0 × 10−7 |
| ATP | 456 | 256 | 56 | 5.0 × 10−3 |
| MW | 474 | 266 | 56 | 4.4 × 10−3 |
| AAU | 481 | 283 | 59 | 6.2 × 10−5 |
| SN | 352 | 165 | 47 | 0.89 |
| CN | 390 | 180 | 46 | 0.94 |
Figure 1.(a) Relationship between relative codon frequency and relative number of tRNAs. (b) Difference in relative codon frequency between high- and low-expression genes as a function of relative number of tRNAs.
Figure 2.(a) Observed average frequency of amino acids versus frequency predicted from GC content. (b) Difference in amino acid frequency between high- and low-expression genes as a function of the difference between the average and predicted frequencies.
ML parameters for the most important models.
| S1 | α1 = 0.0691; α2 = 0.0282; α3 = 0.0238; α4 = 1; α5 = 0.396; α6 = 0.121; |
| W1 | α1 = 0.0850; α2 = 0.0405; α3 = 0.0450; α4 = 1; α5 = 0.375; α6 = 0.118; |
| W3 | α1 = 0.0558; α2 = 0.0198; α3 = 0.0264; α4 = 1; α5 = 0.274; α6 = 0.0784; |
| A10 | α1 = 0.0561; α2 = 0.0199; α3 = 0.0265; α4 = 1; α5 = 0.274; α6 = 0.0785; |
| ɛtRNA = 6.509 × 10−3; ɛAAU = 1.191; ɛSN = 0.0111. | |
Model selection criteria for the symmetric models. ΔAIC is measured relative to the best model in each group.
| Δ | ||||
|---|---|---|---|---|
| S1 | 836619.3 | 39 | 0.0 | Standard Symmetric Model |
| S2 | 837192.4 | 38 | 1144.2 | κ = 1 |
| S3 | 837649.2 | 38 | 2057.8 | α3 = 0 |
| S4 | 848644.1 | 37 | 24045.6 | α2 = 0 and α3 = 0 |
| S5 | 838036.5 | 38 | 2832.4 | α6 = 0 |
| S6 | 841074.9 | 38 | 8909.2 | No distance function |
| S7 | 836991.9 | 39 | 745.2 | Gaussian distance function |
| S8 | 837205.5 | 39 | 1172.4 | Power law distance function |
| W1 | 834446.0 | 47 | 139.8 | Weighted Distance Model |
| W2 | 834384.2 | 48 | 18.2 | 2Γ |
| W3 | 834375.1 | 48 | 0.0 | 3Γ |
| W4 | 834376.9 | 48 | 3.6 | 4Γ |
| W5 | 834780.7 | 47 | 809.2 | 3Γ, κ = 1 |
| W6 | 835088.8 | 47 | 1425.4 | 3Γ, α3 = 0 |
| W7 | 837763.2 | 46 | 6772.2 | 3Γ, α2 = 0 and α3 = 0 |
| W8 | 835454.3 | 47 | 2156.4 | 3Γ, α6 = 0 |
Model selection criteria for the asymmetric models. ΔAIC is measured relative to A10. ΔAIC* is measured relative to W3.
| Δ | Δ | ||||||
|---|---|---|---|---|---|---|---|
| A1 | 834224.5 | 49 | 93.0 | −299.2 | 295 | 1.4 × 10−5 | tRNA |
| A2 | 834224.5 | 50 | 95.0 | −297.2 | 295 | 1.4 × 10−5 | tRNA, tRNA−NS |
| A3 | 834367.9 | 49 | 379.8 | −12.4 | 276 | 6.7 × 10−3 | ATP |
| A4 | 834364.2 | 49 | 372.4 | −19.8 | 276 | 6.7 × 10−3 | MW |
| A5 | 834340.8 | 49 | 325.6 | −66.6 | 310 | 1.4 × 10−8 | AAU |
| A6 | 834375.1 | 49 | 394.2 | 2.0 | 232 | 0.93 | SN |
| A7 | 834367.9 | 49 | 379.8 | −12.4 | 238 | 0.83 | CN |
| A8 | 834175.8 | 55 | 7.6 | −384.6 | 315 | 9.4 × 10−10 | Full Asymmetric Model |
| A9 | 834175.8 | 52 | 1.6 | −390.6 | 315 | 9.4 × 10−10 | tRNA, AAU, SN, CN |
| A10 | 834176.0 | 51 | 0.0 | −392.2 | 315 | 9.4 × 10−10 | tRNA, AAU, SN |
| A11 | 834184.7 | 51 | 17.4 | −374.8 | 318 | 1.7 × 10−10 | tRNA, AAU, CN |
| A12 | 834189.8 | 50 | 25.6 | −366.6 | 324 | 4.2 × 10−12 | tRNA, AAU |
| A13 | 834216.6 | 50 | 79.2 | −313.0 | 299 | 2.7 × 10−6 | tRNA, SN |
| A14 | 834340.8 | 50 | 327.6 | −64.6 | 309 | 2.4 × 10−8 | AAU, SN |
| A15 | 834211.4 | 52 | 72.8 | −319.4 | 314 | 1.6 × 10−9 | tRNA, ATP, MW, SN |