| Literature DB >> 21942934 |
Mingcong Wang1, Maxim V Kapralov, Maria Anisimova.
Abstract
BACKGROUND: One of the key forces shaping proteins is coevolution of amino acid residues. Knowing which residues coevolve in a particular protein may facilitate our understanding of protein evolution, structure and function, and help to identify substitutions that may lead to desired changes in enzyme kinetics. Rubisco, the most abundant enzyme in biosphere, plays an essential role in the process of carbon fixation through photosynthesis, thus facilitating life on Earth. This makes Rubisco an important model system for studying the dynamics of protein fitness optimization on the evolutionary landscape. In this study we investigated the selective and coevolutionary forces acting on large subunit of land plants Rubisco using Markov models of codon substitution and clustering approaches applied to amino acid substitution histories.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21942934 PMCID: PMC3190394 DOI: 10.1186/1471-2148-11-266
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Known interactions of the inferred coevolving residues
| Interactions | Residue no |
|---|---|
| Intradimer (ID) | 15, 63, 64, 106, 109, |
| Dimer-Dimer (DD) | 34, 105, |
| Small Subunit (SSU) | 76, 163, 166, 223, 226, 227, 229, 230, 260, |
| DD and ID | 210 |
| SSU and DD | 219, |
| SSU and ID | 74, 412 |
The residue number is according to the spinach Rubisco sequence (8RUC); * positively selected site; ** most often positively selected site; underlined residues coevolve with sites under positive selection; interactions are after Knight et.al. (1990).
Figure 1The amino acid composition of residues inferred as coevolving for different biochemical properties: (A) polarity, (B) Grantham distance, (C) charge and (D) volume, as shown by symbols "plus", "triangle", "rhombus" and "cross', respectively. The amino acids are ordered according to their frequency in all RBCL sequences, (as shown by "circle").
Figure 2The amino acid composition of all inferred coevolving sites (marked with "rhombus"), as compared to all RBCL sequences (marked with "circle"). The amino acids are ordered according to their frequency in all RBCL sequences.
Figure 3The color-coded representation of the coevolution frequency matrix of amino acid pairs inferred coevolving with respect to different properties: (A) Grantham distance, (B) polarity, (C) volume, (D) charge. The residues are arranged alphabetically.
Figure 4The color-coded representation of the coevolution frequency matrix for all inferred coevolving pairs. The residues are arranged alphabetically.
Figure 5The amino acid composition at the positively selected sites ("rhombus"), as inferred with (A) PAML, (B) FitModel, and (C) both PAML and FitModel. The amino acids are ordered according to their frequency in all RBCL sequences (as shown by "circle").
Fourteen of the most often positively selected residues of the Rubisco large subunit
| Residue no1 | Fitmode2 | PAML3 | Location of residues | Residues within 9 Å4 | Coevolving residue | Interactions |
|---|---|---|---|---|---|---|
| 449 | 26 | 5 | Helix G | 374, 375, 376, 396, 399, 400, 401, 402, 407, 410, 411, 445, 446, 447, 448, 449,451, 452, 453, 454 | 128, 147 | ID, SSU, DD |
| 225 | 20 | 7 | Helix 2 | 154, 155, 184, 187, 189, 219, 220, 221, 223, 224, 226, 227 | None | DD, SSU, ID |
| 251 | 20 | 4 | Helix 3 | 209, 210, 213, 217,247, 248, 249, 250, 252, 253 | 258, 261* | DD, ID, SSU |
| 145 | 16 | 7 | Helix D | 142**, 240 | DD | |
| 142 | 14 | 4 | Helix D | 140,141, 143, 144, 145, 272, 276, 311, 312, 313, 314, 315, 364, 365 | 255**, 240 | DD |
| 95 | 13 | 4 | 23, 86**, 326*, 332 | SSU, ID | ||
| 439 | 12 | 2 | Helix G | 413, 414, 417, 435, 436, 437, 438, | 33*, 152, 151, 153, 135*, 281*, 310*, 440*, 470* | ID |
| 219 | 11 | 4 | Helix 2 | 180, 181, 182, 184, 185, 186, 215, 216, 217, 218, 220, 221, 222, 223, 224, 225,227 | 121, 388, 423 | DD, SSU, ID |
| 279 | 11 | 1 | Helix 4 | 143, 144, 152, 249, 253, 254, 274, 285, 286, 277, 278, 280, 281 | 301, 346 | DD |
| 328 | 11 | 3 | Loop 6 | 319, 320, 321, 322, 323, 324, 326, 327, 329, 330, 332, 333, 462, 464 | 228*, 281* | AS, ID |
| 375 | 11 | 9 | Strand 7 | 373, 374, 376, 377, 378, 379, 393, 396, 410, 411, 414, 436, 446, 449, 450, 453 | 395, 419* | AS, ID, SSU |
| 255 | 9 | 6 | Helix 3 | 190, 228, 229, 230, 231, 248, 253, 254, | 101, 86**, 167, 149*, 169*, 256*, 320*, 371*, 398 | SSU |
| 28 | 8 | 9 | N-terminus | 26, 27, 29, 30, 76, 91, 94, 128, 129, 130, 131, 132 | 19, 355*, 93* | SSU, ID |
| 86 | 8 | 9 | Strand C | 33, 34, 35, 36, 37, 39, 41, 81, 84, 85, 87, 88 | 149*, 256*, 167, 169*, 222*, 317*, 320*, 371*, 398, 23, 30* | DD |
1 The residue number is according to the spinach Rubisco sequence (8RUC)
2 Number of groups with detected positively selected residues in Fitmodel
3 Number of groups with detected positively selected residues in PAML
4 *positively selected site; ** most often positively selected site; underlined residues are both within 9 Å and coevolve
Figure 6Proportions of sites in different secondary structures: (A) color-coded bars from dark to light blue correspond respectively to the residues coevolving for Grantham distance, volume, charge, all coevolving sites and the whole RBCL sequence; (B) color-coded bars from dark to light blue represent respectively the positively selected sites detected by PAML, FitModel, both and the whole sequence.
Physical distances between residue pairs
| Species | Rubisco state | PDB | 3D-distance | Coevolving sites distance | Positively selected sites distance | Positively selected site/active site distance | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mean | median | mean | median | p-value# | Difference* | mean | median | p-value# | Difference* | mean | median | p-value# | Difference* | ||||
| Nicotiana tabacum | Activated | 31.38 | 29.66 | 14.34 | 28.14 | 26.66 | 6.7E-9 | 3.24 | 33.76 | 31.97 | 0.057 | 2.38 | 28.50 | 26.23 | 0.144 | 2.88 | |
| Unactivated | 31.44 | 29.86 | 14.24 | 27.5 | 25.6 | 5.6E-10 | 3.94 | 33.01 | 32.40 | 0.146 | 1.57 | 27.97 | 25.28 | 0.099 | 3.47 | ||
| Average minimum | 28.76 | 27.13 | 13.58 | 25.27 | 22.77 | 1.8E-8 | 3.50 | 31.55 | 30.27 | 0.025 | 2.78 | 26.18 | 24.02 | 0.156 | 2.59 | ||
| Spinacia oleracea | Activated | 31.48 | 29.77 | 14.37 | 28.21 | 26.56 | 1.1E-8 | 3.27 | 34.15 | 32.08 | 0.038 | 2.62 | 28.74 | 26.38 | 0.156 | 2.74 | |
| Unactivated | 31.24 | 29.54 | 14.26 | 27.97 | 26.23 | 8.5E-9 | 3.27 | 33.97 | 32.12 | 0.034 | 2.73 | 28.51 | 26.09 | 0.155 | 2.73 | ||
| Average minimum | 31.22 | 29.53 | 14.26 | 27.96 | 26.23 | 9.3E-9 | 3.26 | 33.94 | 32.08 | 0.034 | 2.72 | 28.50 | 26.09 | 0.156 | 2.72 | ||
*The mean difference between the values of the 3D distance with the corresponding distance.
# p-values as calculated from one sample Z-test and, in parentheses, from the Wilcoxon Rank Sum Rank Test.
Figure 7The 3D distance distribution of the coevolving residue pairs compared to all residue pairs in Rubisco. This is the minimum distance of the pair-wise residues between the active state and un-active state of Rubisco (based on the Spinacia oleracea structure 8RUC). Solid line is the distribution of pair-wise distances for coevolving residues. Dash line is the distribution of pair-wise distances for all protein residues.