| Literature DB >> 16749919 |
Matthew N Davies1, Christopher P Toseland, David S Moss, Darren R Flower.
Abstract
BACKGROUND: pKa values are a measure of the protonation of ionizable groups in proteins. Ionizable groups are involved in intra-protein, protein-solvent and protein-ligand interactions as well as solubility, protein folding and catalytic activity. The pKa shift of a group from its intrinsic value is determined by the perturbation of the residue by the environment and can be calculated from three-dimensional structural data.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16749919 PMCID: PMC1513386 DOI: 10.1186/1471-2091-7-18
Source DB: PubMed Journal: BMC Biochem ISSN: 1471-2091 Impact factor: 4.059
Model pKa values for all protein basic and acidic titratable groups. See reference 5.
| Mean pK | ||||
| N-Termini | 7.5 | - | - | - |
| C-Termini | 3.8 | - | - | - |
| Arg | 12 | 1 | 1 | - |
| Asp | 4 | 143 | 112 | 3.5 |
| Cys | 9.5 | 11 | 4 | 6.6 |
| Glu | 4.4 | 126 | 105 | 4.3 |
| His | 6.3 | 130 | 24 | 6.4 |
| Lys | 10.4 | 57 | 23 | 9.6 |
| Tyr | 10 | 26 | 16 | 9.5 |
Overview of the prediction accuracy of the Large Dataset (404 Residues) I. The table shows the RMSD values for each of the residues from the whole dataset and the dataset following removal of non-physical values and of all outliers outside of a range of 3, 5, 7 and 10 pH units. Figures marked in bold indicate significant results (P = 0.05).
| * | * | * | * | * | * | * | * | * | * | * | * | |
| 2.698 | 1.928 | 2.513 | 1.495 | 1.354 | 1.273 | 3.743 | 1.729 | 2.251 | 1.837 | 1.25 | 1.106 | |
| * | * | * | * | * | * | * | * | * | * | * | * | |
| 1.885 | 1.466 | 1.616 | 1.62 | 1.42 | 1.026 | 3.175 | 1.48 | 1.841 | 1.681 | 1.442 | 1.034 | |
| 3.691 | 2.171 | 3.016 | 2.281 | 2.022 | 1.417 | 3.691 | 2.122 | 2.997 | 2.293 | 1.993 | 1.465 | |
| 1.244 | 1.263 | 1.263 | 1.263 | 1.122 | 0.84 | 25.78 | 1.162 | 1.162 | 1.162 | 0.991 | 0.741 | |
| 2.739 | 2.195 | 2.195 | 2.195 | 1.939 | 1.516 | 3.163 | 2.143 | 2.143 | 2.143 | 1.882 | 1.481 | |
| * | * | * | * | * | * | * | * | * | * | * | * | |
| 2.024 | 1.774 | 2.032 | 1.79 | 1.534 | 1.242 | 1.301 | 1.041 | 1.313 | 1.301 | 0.806 | ||
| * | * | * | * | * | * | * | * | * | * | * | * | |
| 1.642 | 1.255 | 1.335 | 1.296 | 1.259 | 0.97 | 1.011 | 0.851 | 0.994 | 0.997 | 0.611 | ||
| 4.503 | 1.736 | 2.598 | 2.104 | 1.718 | 1.865 | 1.586 | 1.819 | 1.631 | 1.551 | |||
| 1.137 | 1.125 | 1.125 | 1.125 | 1.005 | 1.129 | 0.417 | 0.412 | 0.412 | 0.412 | 0.423 | ||
| 5.426 | 2.643 | 2.634 | 2.643 | 1.668 | 1.593 | 2.225 | 1.551 | 1.551 | 1.551 | 1.049 | ||
Outliers
COMPLETE 0 removed
TRUE 52 removed
WITHIN 10 23 removed
WITHIN 7 27 removed
WITHIN 5 89 removed
WITHIN 3 110 removed
Overview of the prediction accuracy of the Small Dataset (242 Residues) I. The table shows the RMSD values for each of the residues from the whole dataset and the dataset following removal of non-physical values and of all outliers outside of a range of 3, 5, 7 and 10 pH units. Figures marked in bold indicate significant results (P = 0.05).
| * | * | * | * | * | * | * | * | * | * | * | * | |
| 2.078 | 1.787 | 2.079 | 2.077 | 1.691 | 1.781 | 3.774 | 1.582 | 2.142 | 1.912 | 1.53 | 1.267 | |
| * | * | * | * | * | * | * | * | * | * | |||
| 1.641 | 1.416 | 1.603 | 1.603 | 1.294 | 1.047 | 3.041 | 1.474 | 1.71 | 1.71 | 1.334 | 1.051 | |
| 2.786 | 1.689 | 1.689 | 1.689 | 1.347 | 1.118 | 2.736 | 1.809 | 1.81 | 1.81 | 1.488 | 1.402 | |
| 1.291 | 1.29 | 1.291 | 1.291 | 1.291 | 1.291 | 1.278 | 1.278 | 1.278 | 1.278 | 1.278 | 1.278 | |
| 2.06 | 1.297 | 1.297 | 1.297 | 1.933 | 0.766 | 2.368 | 1.262 | 1.262 | 1.262 | 1.871 | 0.792 | |
| * | * | * | * | * | * | * | * | * | * | * | * | |
| 1.915 | 1.735 | 1.921 | 1.731 | 1.319 | 1.419 | 0.95 | 0.824 | 0.838 | 0.842 | 0.89 | 0.641 | |
| * | * | * | * | * | * | * | * | * | * | |||
| 1.575 | 1.185 | 1.237 | 1.237 | 1.188 | 0.893 | 0.508 | 0.478 | 0.493 | 0.493 | 0.395 | ||
| 1.985 | 1.584 | 1.584 | 1.584 | 1.056 | 1.593 | 0.634 | 0.453 | 0.453 | 0.453 | 0.428 | ||
| 1.152 | 1.152 | 1.152 | 1.152 | 1.152 | 1.152 | 0.412 | 0.412 | 0.412 | 0.412 | 0.412 | 0.412 | |
| 6.027 | 1.373 | 1.373 | 1.373 | 1.456 | 1.419 | 0.687 | 0.582 | 0.582 | 0.582 | 0.61 | 0.631 | |
| * | * | * | * | * | * | |||||||
| 1.827 | 1.826 | 1.806 | 1.809 | 0.745 | ||||||||
| * | * | * | * | * | ||||||||
| 0.987 | 0.773 | 0.959 | 0.959 | 0.781 | 0.632 | |||||||
| 2.235 | 2.11 | 2.11 | 2.11 | 1.724 | 2.172 | |||||||
| 0.394 | 0.394 | 0.394 | 0.394 | 0.394 | ||||||||
| 1.533 | 0.992 | 0.991 | 0.991 | 1.011 | ||||||||
Outliers
COMPLETE 0 removed
TRUE 25 removed
WITHIN 10 11 removed
WITHIN 7 13 removed
WITHIN 5 34 removed
WITHIN 3 54 removed
Overview of the prediction accuracy of the Large Dataset (404 Residues) II Three tables show the accuracy of the predictions to the measured pKexp within the ranges of <2 to <0.5. This is taken as the number of residues predicted within each range. Figure marked in bold indicate significant results (P = 0.05).
| <2 | 322 | 80 | 320 | 79 | 357 | 88 | 371 | |
| <1.5 | 283 | 70 | 282 | 70 | 328 | 81 | 354 | |
| <1 | 213 | 53 | 221 | 55 | 283 | 70 | 317 | |
| <0.5 | 108 | 27 | 139 | 34 | 168 | 42 | 195 | |
| <2 | 285 | 85 | 284 | 84 | 311 | 92 | 329 | |
| <1.5 | 255 | 76 | 253 | 75 | 290 | 86 | 317 | |
| <1 | 196 | 58 | 200 | 59 | 248 | 74 | 285 | |
| <0.5 | 98 | 29 | 132 | 39 | 151 | 45 | 184 | |
| <2 | 37 | 56 | 36 | 55 | 46 | 42 | 64 | |
| <1.5 | 28 | 42 | 29 | 44 | 38 | 37 | 56 | |
| <1 | 17 | 26 | 21 | 32 | 35 | 32 | 48 | |
| <0.5 | 10 | 15 | 7 | 11 | 17 | 11 | 17 | |
Overview of the prediction accuracy of the Small Dataset (242 Residues) II. Three tables show the accuracy of the predictions to the measured pKexp. This is taken as the number of residues predicted within each range. Figures marked in bold indicate significant results (P = 0.05).
| <2 | 195 | 81 | 191 | 79 | 216 | 89 | 225 | 93 | 230 | 95 |
| <1.5 | 174 | 72 | 171 | 71 | 201 | 83 | 209 | 86 | 220 | 91 |
| <1 | 131 | 54 | 135 | 56 | 172 | 71 | 161 | 67 | 195 | 81 |
| <0.5 | 63 | 26 | 87 | 36 | 110 | 45 | 97 | 40 | 125 | 52 |
| <2 | 179 | 86 | 176 | 84 | 193 | 92 | 199 | 95 | 205 | |
| <1.5 | 163 | 78 | 159 | 76 | 181 | 87 | 186 | 89 | 199 | |
| <1 | 126 | 60 | 125 | 60 | 156 | 75 | 152 | 73 | 181 | |
| <0.5 | 60 | 29 | 83 | 40 | 100 | 48 | 94 | 45 | 122 | |
| <2 | 16 | 48 | 15 | 45 | 23 | 70 | 26 | 25 | 76 | |
| <1.5 | 11 | 33 | 12 | 36 | 20 | 61 | 23 | 21 | 64 | |
| <1 | 5 | 15 | 10 | 30 | 16 | 9 | 27 | 14 | 42 | |
| <0.5 | 3 | 9 | 4 | 12 | 16 | 3 | 9 | 3 | 9 | |
Figure 1Correlation plots for the individual programs. The bold line indicates perfect prediction (pKpred = pKexp). The outer lines indicate +/- 1 unit from the pKexp.
Carboxyl sites of interest. B = Buried, S = Surface. Figures marked in bold indicate predictions >2 units from the pKexp.
| 1A2P20 | ASP-101 | S | 3.56 | 2.31 | 3.75 | 1.20 | 2.00 | |
| ASP-93 | S | 1.00 | 3.92 | 0.69 | 2.00 | |||
| ASP-54 | S | 0.74 | 0.82 | 1.34 | 3.57 | 2.70 | 2.00 | |
| GLU-73 | B | 4.10 | 2.37 | 3.11 | 2.10 | |||
| 1A9121 | ASP-7 | S | 3.99 | 4.17 | 4.04 | 3.87 | 5.60 | |
| ASP-44 | S | 6.00 | 5.55 | 4.69 | 4.19 | 5.60 | ||
| ASP-61 | S | 5.52 | 5.01 | 7.00 | ||||
| GLU-2 | S | 4.14 | 7.45 | 4.53 | 4.45 | 4.50 | 5.50 | |
| GLU-37 | S | 5.15 | 4.27 | 4.66 | 4.32 | 5.50 | ||
| 1BEO22 | ASP-21 | S | 3.11 | 3.75 | 1.35 | 2.50 | ||
| ASP-30 | S | 3.56 | 3.80 | 4.38 | 4.00 | 2.64 | 2.50 | |
| ASP-72 | S | 3.84 | 3.95 | 3.69 | 4.34 | 3.30 | 2.60 | |
| 1DE323 | GLU-96 | B | 3.53 | 5.70 | 4.10 | 5.10 | ||
| GLU-115 | S | 5.19 | 5.23 | 3.81 | 4.45 | 4.50 | 4.90 | |
| 1KXI24 | ASP-59 | S | 3.13 | 3.90 | 2.33 | 4.20 | 2.49 | 2.30 |
| 1LZ325 | ASP-18 | S | 3.79 | 3.94 | 4.05 | 3.19 | 2.70 | |
| ASP-48 | S | 3.57 | 3.17 | 3.99 | 2.51 | 2.50 | ||
| ASP-66 | S | 0.23 | 3.07 | 1.19 | 2.00 | |||
| ASP-87 | S | 4.06 | 3.96 | 3.89 | 2.17 | 2.10 | ||
| GLU-7 | S | 3.92 | 3.44 | 4.36 | 3.01 | 2.70 | ||
| GLU-35 | B | 4.92 | 5.23 | 7.80 | 4.78 | 5.40 | 6.10 | |
| 1RNZ26 | ASP-14 | B | 3.51 | 2.00 | ||||
| GLU-2 | B | 1.44 | 2.66 | 2.80 | ||||
| 1TRS27 | ASP-26 | B | 6.18 | 7.84 | 8.10 | |||
| GLU-6 | S | 3.91 | 3.92 | 4.54 | 4.44 | 4.50 | 4.90 | |
| GLU-68 | S | 4.27 | 4.24 | 4.59 | 4.55 | 4.57 | 5.10 | |
| 1TRW27 | ASP-26 | B | 8.23 | 8.63 | 9.90 | |||
| GLU-68 | S | 5.07 | 5.33 | 3.55 | 4.88 | 4.34 | 4.90 | |
| 1XNB28 | ASP-11 | S | 1.83 | 0.57 | 3.44 | 3.82 | 1.99 | 2.50 |
| ASP-83 | B | 6.35 | 1.36 | 2.00 | ||||
| ASP-101 | B | 2.94 | 1.50 | 2.00 | ||||
| ASP-106 | S | 3.18 | 3.02 | 2.70 | ||||
| GLU-172 | B | 6.62 | 6.42 | 5.94 | 5.22 | 7.32 | 6.70 | |
| 2OVO29 | ASP-7 | S | 4.05 | 4.01 | 3.72 | 2.51 | 2.50 | |
| ASP-27 | S | 2.08 | 2.32 | 2.77 | 3.77 | 2.39 | 2.50 | |
| 2RN230 | ASP-10 | B | 6.99 | 6.10 | ||||
| ASP-70 | B | 4.11 | 3.55 | 3.15 | 3.50 | 4.10 | 2.60 | |
| ASP-102 | B | 3.00 | 3.40 | 0.13 | 2.00 | |||
| ASP-148 | B | -1.10 | 0.55 | 3.79 | 2.00 |
Accuracy of prediction for the carboxyl sites. The accuracy was tested to the <2 to <0.5 ranges. The individual accuracy of the residues is given in the bottom two tables. Figures marked in bold indicate the greatest accuracy.
| <2 | 26 | 66.67 | 20 | 51.28 | 30 | 76.92 | 30 | 76.92 | 34 | |
| <1.5 | 19 | 48.72 | 15 | 38.46 | 25 | 64.10 | 20 | 51.28 | 32 | |
| <1 | 10 | 25.64 | 10 | 25.64 | 17 | 43.59 | 8 | 20.51 | 26 | |
| <0.5 | 6 | 15.38 | 4 | 10.26 | 8 | 20.51 | 3 | 7.69 | 14 | |
| <2 | 17 | 62.96 | 13 | 48.15 | 10 | 70.37 | 19 | 70.37 | 22 | |
| <1.5 | 10 | 37.04 | 8 | 29.63 | 16 | 59.26 | 11 | 40.74 | 20 | |
| <1 | 4 | 14.81 | 4 | 14.81 | 12 | 44.44 | 2 | 7.41 | 16 | |
| <0.5 | 2 | 7.41 | 1 | 3.70 | 5 | 18.52 | 0 | 0.00 | 10 | |
| <2 | 9 | 75.00 | 9 | 75.00 | 11 | 91.67 | 11 | 91.67 | 12 | |
| <1.5 | 9 | 75.00 | 7 | 58.33 | 9 | 75.00 | 9 | 75.00 | 12 | |
| <1 | 6 | 50.00 | 6 | 50.00 | 5 | 41.67 | 6 | 50.00 | 10 | |
| <0.5 | 4 | 3 | 25.00 | 3 | 25.00 | 3 | 25.00 | 4 | ||
Overview of the combination methods (242 Residues). The residue RMSD values are given for all of the 25 combinations consisting of AMBER (A), PARSE (P), MCCE (M), UHBD (U) and PROPKA (P). Figures marked in bold indicate an improvement while the asterisk indicates the best score.
| 1.556 | 1.245 | 1.174 | 1.161 | ||
| 1.301 | 1.012 | 0.816 | 0.676 | 1.026 | |
| 1.331 | 0.631 | 0.827 | 1.000 | 0.727 | |
| 1.281 | 0.763 | 0.742 | 0.646 | 0.793 | |
| 1.899 | 1.512 | 1.238 | 1.028 | 1.502 | |
| 1.074 | 0.910 | ||||
| 0.834 | 0.690 | 0.660 | 0.766 | ||
| 0.812 | 0.987 | 0.566 | 1.289 | 0.955 | |
| 0.742 | 0.647 | 0.713 | 0.755 | ||
| 1.212 | 1.016 | 0.974 | 0.956 | ||
| 1.262 | 1.242 | 1.000 | 1.038 | ||
| 1.064 | 0.973 | 0.844 | 0.761 | 0.714 | |
| 0.792 | 0.954 | 0.942 | 0.520 | 0.839 | |
| 0.868 | 0.913 | 0.846 | 0.615 | 0.590 | |
| 1.607 | 1.446 | 1.300 | 1.195 | 1.097 | |
| 0.972 | 1.101 | ||||
| 0.539 | 0.770 | 0.720 | 0.529 | 0.870 | |
| 0.771 | 0.523 | 0.840 | 0.897 | 0.646 | |
| 0.518 | 0.634 | 0.610 | 0.597 | 0.718 | |
| 0.869 | 1.188 | 1.098 | 0.782 | 1.345 | |
| 0.929 | 0.905 | ||||
| 0.798 | 0.706 | 0.588 | 0.592 | 0.690 | |
| 0.754 | 0.771 | 0.690 | 0.668 | 0.650 | |
| 0.683 | 0.689 | 0.525 | 0.540 | 0.605 | |
| 1.256 | 1.112 | 0.959 | 0.959 | 1.115 | |
Accuracy of the multiple regression. The accuracy is given as the number of predictions within a range of the pKexp. For comparison the UHBD + PROPKA combination is added. Figures marked in bold indicate improvements.
| % | % | |||
| <2 | 234 | 96.69 | 233 | 96 |
| <1.5 | 228 | 226 | 93 | |
| <1 | 205 | 197 | 81 | |
| <0.5 | 140 | 124 | 51 |
Figure 2Comparative performance of the prediction methods. The accuracy ranges (0.5 – 2) apply to the deviation from the measured pKa value. The percentage score represents the number of residues predicted in each range.