| Literature DB >> 20624731 |
Abstract
Whether particular amino acids are favored by selection at high temperatures over others has long been an open question in protein evolution. One way to approach this question is to compare homologous sites in proteins from one thermophile and a closely related mesophile; asymmetrical substitution patterns have been taken as evidence for selection favoring certain amino acids over others. However, most pairs of prokaryotic species that differ in optimum temperature also differ in genome-wide GC content, and amino acid content is known to be associated with GC content. Here, I compare homologous sites in nine thermophilic prokaryotes and their mesophilic relatives, all with complete published genome sequences. After adjusting for the effects of differing GC content with logistic regression, 139 of the 190 pairs of amino acids show significant substitutional asymmetry, evidence of widespread adaptive amino acid substitution. The patterns are fairly consistent across the nine pairs of species (after taking the effects of differing GC content into account), suggesting that much of the asymmetry results from adaptation to temperature. Some amino acids in some species pairs deviate from the overall pattern in ways indicating that adaptation to other environmental or physiological differences between the species may also play a role. The property that is best correlated with the patterns of substitutional asymmetry is transfer free energy, a measure of hydrophobicity, with more hydrophobic amino acids favored at higher temperatures. The correlation of asymmetry and hydrophobicity is fairly weak, suggesting that other properties may also be important.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20624731 PMCID: PMC2997543 DOI: 10.1093/gbe/evq017
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Species Pairs Used in This Study
| Species | GC | Genome Reference | |
| 33 | 43.8 | ||
| 55 | 39.7 | ||
| 26 | 70.7 | ||
| 50–55 | 67.5 | ||
| 35–40 | 33.1 | ||
| 85 | 31.4 | ||
| 30–37 | 67.0 | ||
| 68 | 69.4 | ||
| 37 | 47.4 | ||
| 55 | 53.0 | ||
| 26 | 47.7 | ||
| 55 | 53.9 | ||
| 25–35 | 43.5 | ||
| 60 | 52.1 | ||
| 37 | 28.7 | ||
| 75 | 37.6 | ||
| 36–40 | 27.6 | ||
| 65–70 | 49.5 |
NOTE.—GC, GC content of the major chromosome (excluding plasmids and extrachromosomal elements). Topt and GC from the NCBI Genome Project database, except Topt for Sulfurovum and Nitratiruptor (Nakagawa et al. 2007); Desulfitobacterium (Suyama et al. 2001), Geobacillus (Takami et al. 2004), and Synechocystis (growth temperature recommended by the American Type Culture Collection). Topt, optimum growth temperature.
FExample of logistic regression of substitutional asymmetry and difference in GC content. GCtherm − GCmeso, the percent difference in GC content between the thermophile and the mesophile in each species pair. Hmeso → Ythermo, the proportion of sites in each species pair that have histidine in the mesophile and tyrosine in the thermophile, as a proportion of all aligned sites that have histidine in one species and tyrosine in the other. Error bars are 95% confidence intervals of the binomial proportion. The solid line is the logistic regression line, given by solving ln[Y/(1 − Y)] = a + bX for Y, where Y is Hmeso → Ythermo, X is GCtherm − GCmeso, a is the intercept, and b is the slope. The dashed line shows the estimation of the expected asymmetry in a species pair with zero difference in GC content.
The Substitutional Asymmetry Predicted for a Mesophile–Thermophile Pair with No Difference in GC Content, Based on the Intercept of the Logistic Regression of Asymmetry Versus Difference in GC Content
| SN | 0.508 | DG | 0.509 | GC | 0.529 | HK | 0.517 | KP | 0.558* |
| SD | 0.521* | DQ | 0.558* | GV | 0.553* | HC | 0.537 | KY | 0.547* |
| ST | 0.542* | DM | 0.565* | GI | 0.546* | HV | 0.554* | CV | 0.579* |
| SG | 0.482* | DH | 0.578* | GF | 0.564* | HI | 0.578* | CI | 0.504 |
| SQ | 0.561* | DE | 0.574* | GL | 0.589* | HF | 0.581* | CF | 0.526 |
| SM | 0.569* | DA | 0.547* | GR | 0.608* | HL | 0.554* | CL | 0.539* |
| SH | 0.562* | DK | 0.561* | GW | 0.555 | HR | 0.559* | CR | 0.446* |
| SE | 0.582* | DC | 0.576* | GP | 0.601* | HW | 0.598* | CW | 0.487 |
| SA | 0.593* | DV | 0.565* | GY | 0.603* | HP | 0.591* | CP | 0.492 |
| SK | 0.603* | DI | 0.538 | QM | 0.536* | HY | 0.632* | CY | 0.520 |
| SC | 0.590* | DF | 0.619* | QH | 0.556* | EA | 0.479* | VI | 0.507* |
| SV | 0.610* | DL | 0.571* | QE | 0.518* | EK | 0.513* | VF | 0.502 |
| SI | 0.609* | DR | 0.622* | QA | 0.511 | EC | 0.554 | VL | 0.511* |
| SF | 0.607* | DW | 0.693* | QK | 0.516* | EV | 0.505 | VR | 0.555* |
| SL | 0.610* | DP | 0.618* | QC | 0.598* | EI | 0.515 | VW | 0.512 |
| SR | 0.624* | DY | 0.653* | QV | 0.538* | EF | 0.542* | VP | 0.540* |
| SW | 0.641* | TG | 0.460* | QI | 0.584* | EL | 0.518* | VY | 0.538* |
| SP | 0.604* | TQ | 0.521* | QF | 0.588* | ER | 0.550* | IF | 0.490 |
| SY | 0.676* | TM | 0.511 | QL | 0.581* | EW | 0.571* | IL | 0.522* |
| ND | 0.500 | TH | 0.554* | QR | 0.579* | EP | 0.566* | IR | 0.536* |
| NT | 0.546* | TE | 0.553* | QW | 0.631* | EY | 0.574* | IW | 0.506 |
| NG | 0.502 | TA | 0.545* | QP | 0.610* | AK | 0.520* | IP | 0.521 |
| NQ | 0.549* | TK | 0.563* | QY | 0.644* | AC | 0.453* | IY | 0.529* |
| NM | 0.576* | TC | 0.523 | MH | 0.500 | AV | 0.522* | FL | 0.512* |
| NH | 0.611* | TV | 0.595* | ME | 0.522 | AI | 0.500 | FR | 0.526 |
| NE | 0.545* | TI | 0.607* | MA | 0.517 | AF | 0.526* | FW | 0.498 |
| NA | 0.552* | TF | 0.580* | MK | 0.540* | AL | 0.536* | FP | 0.517 |
| NK | 0.587* | TL | 0.591* | MC | 0.537 | AR | 0.571* | FY | 0.500 |
| NC | 0.594* | TR | 0.607* | MV | 0.574* | AW | 0.525 | LR | 0.513 |
| NV | 0.629* | TW | 0.604* | MI | 0.583* | AP | 0.605* | LW | 0.515 |
| NI | 0.612* | TP | 0.611* | MF | 0.596* | AY | 0.546* | LP | 0.505 |
| NF | 0.593* | TY | 0.619* | ML | 0.607* | KC | 0.532 | LY | 0.508 |
| NL | 0.626* | GQ | 0.529* | MR | 0.556* | KV | 0.487 | RW | 0.576* |
| NR | 0.650* | GM | 0.514 | MW | 0.569* | KI | 0.503 | RP | 0.503 |
| NW | 0.624* | GH | 0.548* | MP | 0.628* | KF | 0.541* | RY | 0.549* |
| NP | 0.604* | GE | 0.544* | MY | 0.617* | KL | 0.501 | WP | 0.551 |
| NY | 0.685* | GA | 0.561* | HE | 0.493 | KR | 0.599* | WY | 0.495 |
| DT | 0.508 | GK | 0.543* | HA | 0.494 | KW | 0.623* | PY | 0.482 |
NOTE.—The number is the predicted proportion of sites with the first amino acid in the mesophile and the second amino acid in the thermophile; an asterisk indicates that the proportion is significantly different from 0.50 (P < 0.05). Amino acids are ordered from least preferred (serine, S) to most preferred (tyrosine, Y) in thermophiles.
Average Asymmetry and Transfer Free Energy for Each Amino Acid
| Amino Acid | Average Asymmetry | Transfer Free Energy |
| Serine (S) | 0.416 | 0.04 |
| Asparagine (N) | 0.417 | −0.01 |
| Aspartic acid (D) | 0.430 | 0.54 |
| Threonine (T) | 0.450 | 0.44 |
| Glycine (G) | 0.451 | 0.00 |
| Glutamine (Q) | 0.459 | −0.10 |
| Methionine (M) | 0.470 | 1.30 |
| Histidine (H) | 0.485 | 1.10 |
| Glutamic acid (E) | 0.497 | 0.55 |
| Alanine (A) | 0.500 | 0.73 |
| Lysine (K) | 0.504 | 1.50 |
| Cysteine (C) | 0.523 | 0.70 |
| Valine (V) | 0.529 | 1.69 |
| Isoleucine (I) | 0.531 | 2.97 |
| Phenylalanine (F) | 0.542 | 2.65 |
| Leucine (L) | 0.544 | 2.49 |
| Arginine (R) | 0.551 | 0.73 |
| Tryptophan (W) | 0.562 | 3.00 |
| Proline (P) | 0.565 | 2.60 |
| Tyrosine (Y) | 0.575 | 2.97 |
NOTE.—Average asymmetry is the predicted proportion, in a pair of species with equal GC contents, of substitutions from other amino acids in the mesophile to the given amino acid in the thermophile. Transfer free energy is from Simon (1976). Amino acids are ordered from least preferred (serine) to most preferred (tyrosine) in thermophiles.
FMean of the 19 residuals (differences between the observed number of substitutions and that expected from the logistic regression) for each amino acid in each species pair. Values above 0 indicate that sites with that amino acid in the thermophile and other amino acids in the mesophile are more common than expected from the logistic regression of all species. Error bars are 95% confidence intervals.
FSubstitutional asymmetry (proportion of all A ↔ B sites that have A in the mesophile and B in the thermophile) versus the difference in transfer free energy of the amino acids (B-A), where B is the amino acid with greater transfer free energy.