| Literature DB >> 19910385 |
Abstract
The sequences of proteins encoded by a genome evolve at different rates. A correlate of a protein's evolutionary rate is its expression level: highly expressed proteins tend to evolve slowly. Some explanations of rate variation and the correlation between rate and expression predict that more slowly evolving and more highly expressed proteins have more favorable equilibrium constants for folding. Proteins from thermophiles generally have more stable folds than proteins from mesophiles, and it is known that there are systematic differences in amino acid content between thermophilic and mesophilic proteins. I examined whether there are analogous correlations of amino acid frequencies with evolutionary rate and expression level within genomes. In most of the organisms analyzed, there is a striking tendency for more slowly evolving proteins to be more thermophile-like in their amino acid compositions when adjustments are made for variation in GC content. More highly expressed proteins also tend to be more thermophile-like by the same criteria. These results suggest that part of the evolutionary rate variation among proteins is due to variation in the strength of selection for stability of the folded state. They also suggest that increasing strength of this selective force with expression level plays a role in the correlation between evolutionary rate and expression level.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19910385 PMCID: PMC2822289 DOI: 10.1093/molbev/msp270
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FThe relationship between amino acid frequency and evolutionary rate among human proteins. The curve for each amino acid conveys how its frequency varies with protein sequence distance when GC content is taken into account. Each curve was produced by smoothing the data with a Gaussian kernel with an SD of 1/ln(10). The raw data points were the residuals of cubic polynomial fits of amino acid frequencies and the logarithm (base 10) of sequence distance to GC fraction. (A) Amino acids that are overrepresented in thermophiles. (B) Amino acids that are underrepresented in thermophiles. (C) Other amino acids.
Correlation Results for Human Protein Distances, Controlling for GC Content.
| Amino acid | Correlation coefficient | Agreement with prediction | |
| A | −0.039 | 0.00027 | Disagrees |
| C | 0.107 | 2.9 × 10−23 | Agrees |
| D | −0.149 | 1.7 × 10−43 | |
| E | −0.013 | 0.24 | ns |
| F | 0.003 | 0.75 | |
| G | −0.031 | 0.0039 | |
| H | 0.008 | 0.46 | ns |
| I | −0.111 | 1.6 × 10−24 | Agrees |
| K | −0.011 | 0.33 | ns |
| L | 0.053 | 9.5 × 10−07 | |
| M | −0.087 | 9.8 × 10−16 | |
| N | −0.095 | 2.5 × 10−18 | |
| P | 0.052 | 1.6 × 10−06 | |
| Q | 0.057 | 1.2 × 10−07 | Agrees |
| R | 0.077 | 1.3 × 10−12 | |
| S | 0.016 | 0.13 | |
| T | 0.033 | 0.0027 | Agrees |
| V | −0.069 | 1.6 × 10−10 | Agrees |
| W | 0.127 | 5.8 × 10−32 | Agrees |
| Y | −0.111 | 7.5 × 10−25 | Agrees |
NOTE.—ns, not significant.
Correlation Results for Amino Acid Frequencies and Evolutionary Rate.
| Number of genes | Protein evolutionary rate | ||||
| Protein distance | |||||
| 8,502 | 7/8 | 7/8 | 9/9 | 3/9 | |
| 5,369 | 6/7 | 6/7 | 7/8 | 2/11 | |
| 5,532 | 10/11 | 10/10 | 11/11 | 6/11 | |
| 3,367 | 7/8 | 6/7 | 6/7 | 3/5 | |
| 1,720 | 9/9 | 9/9 | 8/8 | 9/10 | |
| 4,922 | 5/10 | 5/10 | 6/8 | 3/9 | |
| 878 | 7/7 | 7/7 | 5/5 | 2/2 | |
NOTE.—For each organism and each rate measure, the number of statistically significant correlations that have the predicted sign is given by the numerator and the total number of significant correlations for which a prediction exists is given by the denominator.
FSummary of correlation results for dN across organisms. For each correlation, “+” indicates a statistically significant (P < 0.05) positive correlation, “−” indicates a significant negative correlation, and “ns” indicates that the correlation was not statistically significant. Negative correlations, which are predicted for amino acids more common are thermophiles, are also indicated by orange coloring, and positive correlations are indicated by blue coloring.
Correlation Results for Amino Acid Frequencies and Expression Level.
| Number of genes | Correlation results | Discordant amino acid(s) | ||
| 8,143 | 6/7 | 0.063 | A | |
| 7,752 | 9/10 | 0.011 | A | |
| 7,791 | 8/9 | 0.020 | A | |
| 5,450 | 9/10 | 0.011 | A | |
| 3,542 | 8/10 | 0.055 | A, Y | |
| 2,820 | 8/9 | 0.020 | A | |
| 2,712 | 8/9 | 0.020 | Y | |
| 7,053 | 5/6 | 0.109 | A | |
| 6,269 | 9/10 | 0.011 | A |
NOTE.—The number of statistically significant correlations that have the predicted sign is shown, along with the total number of significant correlations for which a prediction exists. Discordant amino acids are those with significant correlations in the opposite of the predicted direction.