| Literature DB >> 25086000 |
Teresa Krick1, Nina Verstraete2, Leonardo G Alonso3, David A Shub4, Diego U Ferreiro2, Michael Shub5, Ignacio E Sánchez6.
Abstract
The 20 protein-coding amino acids are found in proteomes with different relative abundances. The most abundant amino acid, leucine, is nearly an order of magnitude more prevalent than the least abundant amino acid, cysteine. Amino acid metabolic costs differ similarly, constraining their incorporation into proteins. On the other hand, a diverse set of protein sequences is necessary to build functional proteomes. Here, we present a simple model for a cost-diversity trade-off postulating that natural proteomes minimize amino acid metabolic flux while maximizing sequence entropy. The model explains the relative abundances of amino acids across a diverse set of proteomes. We found that the data are remarkably well explained when the cost function accounts for amino acid chemical decay. More than 100 organisms reach comparable solutions to the trade-off by different combinations of proteome cost and sequence diversity. Quantifying the interplay between proteome size and entropy shows that proteomes can get optimally large and diverse.Entities:
Keywords: amino acid decay; amino acid metabolism; information theory; maximum entropy; proteomics
Mesh:
Substances:
Year: 2014 PMID: 25086000 PMCID: PMC4209132 DOI: 10.1093/molbev/msu228
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Pearson’s Correlation Coefficients for Correlation of Amino Acid Relative Abundances with Amino Acid Metabolic Cost and a Model Based on the Genetic Code.
| Model | DS1 | DS1 (no C) | DS2 | DS2 (no C) |
|---|---|---|---|---|
| Cost(ATP) versus abundance | −0.46 | −0.51 | −0.58 | −0.64 |
| Cost(ATP) versus ln(abundance) | −0.52 | −0.64 | −0.62 | −0.75 |
| Cost(ATP/time) versus abundance | −0.72 | −0.68 | −0.80 | −0.76 |
| Cost(ATP/time) versus ln(abundance) | −0.86 | −0.83 | −0.91 | −0.90 |
| Genetic code model versus ln(abundance) | 0.71 | 0.76 | 0.62 | 0.66 |
Note.—The two columns labeled with (no C) are the results of the same calculations excluding the amino acid cysteine.
FCorrelation of the logarithm of amino acid relative abundances in proteomes with metabolic cost in units of ATP molecules per amino acid molecule (A and D), with metabolic cost in units of ATP molecules per amino acid molecule corrected by amino acid decay (B and E) and with the genetic code model (C and F). (A)–(C) correspond to data set DS1, (D)–(F) correspond to data set DS2. Data points for the amino acid cysteine are shown as empty symbols, the rest of the amino acids are shown as black symbols. The lines are RMA regressions to all data points.
Amino Acid Metabolic Cost.
| Amino Acid | Cost | Decay(1/time) | Cost(ATP/time) |
|---|---|---|---|
| A | 11.7 | 1 | 12 |
| C | 24.7 | 30 | 741 |
| D | 12.7 | 9 | 114 |
| E | 15.3 | 5 | 77 |
| F | 52 | 4 | 208 |
| G | 11.7 | 1 | 12 |
| H | 38.3 | 14 | 536 |
| I | 32.3 | 2 | 65 |
| K | 30.3 | 8 | 242 |
| L | 27.3 | 2 | 55 |
| M | 34.3 | 13 | 446 |
| N | 14.7 | 10 | 147 |
| P | 20.3 | 3 | 61 |
| Q | 16.3 | 8 | 130 |
| R | 27.3 | 4 | 109 |
| S | 11.7 | 6 | 70 |
| T | 18.7 | 6 | 112 |
| V | 23.3 | 2 | 47 |
| W | 74.3 | 12 | 892 |
| Y | 50 | 7 | 350 |
Note.—Costs in units of ATP molecules per amino acid molecule are from Akashi and Gojobori (2002), costs in units of ATP molecules per amino acid molecule corrected by amino acid decay are from this work. The estimation of amino acid reactivity and decay rates (in relative units) is described in the supplementary text, Supplementary Material online.
FCorrelation of amino acid relative abundances in proteomes with metabolic cost in units of ATP molecules per amino acid molecule (black line: plain abundances; blue line: logarithm of the abundances), with metabolic cost in units of ATP molecules per amino acid molecule corrected by amino acid decay (red line: plain abundances; green line: logarithm of the abundances) and with the genetic code model (dashed line). (A) corresponds to data set DS1, (B) corresponds to data set DS2. The data are shown as a function of genomic GC content in the x axis.
FTrade-off between amino acid metabolic cost and proteome sequence diversity. (A) Genomic GC content dependence of the average metabolic cost per amino acid. (B) Genomic GC content dependence of the proteome entropy. (C) Genomic GC content dependence of the target function f. (D) Trade-off between amino acid metabolic cost (x axis) and proteome sequence diversity measured as entropy (y axis). The contour lines indicate the value for the target function, and the triangles correspond to the trade-off model using the values of m for DS1 and DS2 from figure 1B and E. All panels display the 107 organisms in data set DS1 (white symbols), the 17 organisms in data set DS2 (black symbols), and the genetic code model (red symbols). (D) includes genomic GC contents between 0.15 (lower right corner) and 0.75 (lower left corner). The y axis legend to the right of (B) and (D) illustrates the number of probable peptide chains of length 100 given by , where h is the entropy (Shannon 1948; Shannon and Weaver 1949).