| Literature DB >> 17346357 |
Susan Costantini1, Angelo M Facchiano, Giovanni Colonna.
Abstract
BACKGROUND: The knowledge of the three-dimensional structure of globular proteins is fundamental for a detailed investigation of their functional properties. Experimental methods are too slow for structure investigation on a large scale, while computational prediction methods offer alternatives that are continuously being improved. The international Comparative Assessment of Structure Prediction (CASP), an "a posteriori" evaluation of the quality of theoretical models when the experimental structure becomes available, demonstrates that predictions can be successful as well as unsuccessful, and this suggests the necessity for evaluations able to discard "a priori" the wrong models.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17346357 PMCID: PMC1828058 DOI: 10.1186/1472-6807-7-9
Source DB: PubMed Journal: BMC Struct Biol ISSN: 1472-6807
Training protein sets.
| 124 | 0.648(± 0.12) | 0.36(± 0.12) | ||
| 81 | 0.37(± 0.13) | 0.63(± 0.13) | ||
| 75 | 0.38(± 0.08) | 0.21(± 0.04) | 0.41(± 0.07) | |
| 132 | 0.29(± 0.08) | 0.27(± 0.079 | 0.44(± 0.09) |
We report the number of structures in the four sets and the mean ratios between the residues in helix, in beta-strands and in "coil" and the total residue number. Standard deviations are reported in parentheses.
Correlation coefficients in the four structural classes.
| MM-type | 0.95 | 0.83 | 0.97 | 0.96 |
| MS-type | 0.61 | 0.49 | 0.83 | 0.79 |
| SM-type | 0.70 | 0.39 | 0.85 | 0.81 |
| SS-type | 0.67 | 0.41 | 0.82 | 0.81 |
| Total | 0.93 | 0.80 | 0.97 | 0.95 |
| polar- | 0.89 | 0.89 | 0.93 | 0.94 |
| non polar- | 0.92 | 0.90 | 0.95 | 0.92 |
| Total | 0.95 | 0.93 | 0.96 | 0.94 |
| total void volumes | 0.87 | 0.9 | 0.92 | 0.91 |
| void numbers | 0.97 | 0.91 | 0.99 | 0.97 |
| molecule number | 0.93 | 0.91 | 0.97 | 0.93 |
| protein-water H-bonds | 0.8 | 0.76 | 0.71 | 0.73 |
| protein-water H-bonds/total accessibility | 0.84 | 0.81 | 0.78 | 0.73 |
The correlation coefficients are evaluated between the molecular weights and the different types of H-bonds, the total solvent accessibility and its components, the number of water molecules and the related H-bonds between residue proteins and water molecules, and void number and total void volumes (calculated with probe = 0) obtained for each protein belonging to "mainly-alpha, "mainly-beta", "alpha/beta" and "alpha+beta" classes. The last row reports the correlation coefficient between the total accessibility and the number of H-bonds between residue proteins and water molecules.
Figure 1Parameters plotted against values of molecular weights obtained for each protein belonging to "mainly-alpha" class. (A) MM-type H-bonds (B) total accessibility (C) void number (D) water molecules. The data were fit by linear least squares (the equations obtained are in Additional file 4). The related correlation coefficients (R) are also reported.
Mean ratios between the components and related total values for some properties.
| 0.84(± 0.10) | 0.74(± 0.14) | 0.74(± 0.08) | 0.77(± 0.11) | |
| 0.04(± 0.03) | 0.09(± 0.07) | 0.08(± 0.03) | 0.08(± 0.04) | |
| 0.07(± 0.05) | 0.10(± 0.07) | 0.09(± 0.04) | 0.08(± 0.05) | |
| 0.05(± 0.05) | 0.07(± 0.07) | 0.09(± 0.04) | 0.07(± 0.05) | |
| 0.59(± 0.06) | 0.57(± 0.05) | 0.57(± 0.03) | 0.56(± 0.04) | |
| 0.41(± 0.06) | 0.43(± 0.05) | 0.43(± 0.03) | 0.44(± 0.04) | |
| (0.95± 0.05) | 0.96(± 0.06) | 0.85(± 0.07) | 0.92(± 0.07) | |
| 0.05(± 0.04) | 0.04(± 0.03) | 0.15(± 0.07) | 0.08(± 0.06) | |
| 0.91(± 0.08) | 0.93(± 0.07) | 0.76(± 0.08) | 0.86(± 0.08) | |
| 0.09(± 0.08) | 0.07(± 0.07) | 0.24(± 0.08) | 0.14(± 0.08) |
(A) H-bonds, (B) accessible surface area, (c) void. Standard deviations are reported in parentheses.
Target proteins used as testing dataset.
| mainly-alpha | 1SUM_3-223 | 5.1 | 154 | 90.9 | |
| mainly-alpha | 1W33_70-222 | 5.8 | 154 | 86.4 | |
| mainly-beta | 1XQB_9-138 | 3.11 | 152 | 87.6 | |
| mainly-beta | 1TZA_3-121 | 3.4 | 182 | 41.2 | |
| alpha+beta | 1STZ_145-226 | 2.10 | 223 | 50.7 | |
| alpha+beta | 1S12_1-90 | 1.6 | 204 | 20.6 | |
| alpha+beta | 1XQB_159-231 | 2.35 | 229 | 41 | |
| alpha+beta | 1VL4_2-214 | 4.3 | 155 | 93.5 | |
| alpha+beta | 1VL4_221-433 | 3.3 | 147 | 95.3 | |
| alpha+beta | 1RKI_1-98 | 1.8 | 251 | 29.1 | |
| alpha+beta | 1TD6_107-193 | 2.9 | 239 | 32.6 | |
| alpha/beta | 2BLK_2-116 | 3.1 | 200 | 43 | |
| alpha/beta | 1WDJ_2-187 | 1.2 | 141 | 90 |
The columns report the target codes, structural class, code of the experimental structure deposited in the PDB with the segment analyzed, their score value, the number of full-atom models analyzed in this work, and the percentage of models for which the score value resulted over the threshold (i.e. ≥ 5.9 for "mainly-alpha" and "mainly-beta" proteins and ≥ 5.1 for "alpha/beta" and "alpha+beta" proteins).
Figure 2Score value calculated for all the proteins belonging to the four structural classes. (A) "mainly-alpha", (B) "mainly-beta", (C) "alpha+beta" and (D) "alpha/beta". The score values are calculated summing for all the proteins the ratios of the differences between the calculated and predicted values for each of four properties (number of MM-type H-bonds, void and water molecules, and total accessibility) versus the related RMSE.
Details for the analysis of testing datasets.
| 152 | 110/152 | 42/152 | 14/152 | 9/110 | 5/42 | |
| 154 | 117/154 | 37/154 | 21/154 | 18/117 | 3/37 | |
| 152 | 108/152 | 43/152 | 20/152 | 20/108 | 0/43 | |
| 181 | 135/181 | 46/181 | 107/181 | 87/135 | 20/46 | |
| 223 | 177/223 | 46/223 | 110/223 | 96/177 | 14/46 | |
| 203 | 165/203 | 38/203 | 162/203 | 136/165 | 26/38 | |
| 229 | 172/229 | 57/229 | 135/229 | 103/172 | 32/57 | |
| 155 | 123/155 | 32/155 | 10/155 | 6/123 | 4/32 | |
| 147 | 114/147 | 33/147 | 7/147 | 6/114 | 1/33 | |
| 251 | 207/251 | 44/251 | 178/251 | 150/251 | 28/44 | |
| 238 | 180/238 | 58/238 | 161/238 | 125/180 | 36/58 | |
| 200 | 154/200 | 46/200 | 114/200 | 96/154 | 18/46 | |
| 141 | 107/141 | 34/141 | 14/141 | 10/107 | 4/34 | |
For each target, columns report the total number of models analyzed, the ratio of models from "human" and "server" predictors, the ratios of models which have globularity score below the threshold value.
Correlation coefficients for whole set of models.
| 0.16 | 0.04 | -0.026 | |
| -0.18 | -0.11 | -0.05 | |
| 0.16 | -0.35 | -0.27 | |
| 0.16 | -0.04 | -0.14 | |
| 0.56 | -0.4 | -0.074 | |
| 0.25 | -0.12 | 0.37 | |
| -0.17 | 0.036 | -0.093 | |
| 0.51 | -0.40 | 0.0066 | |
| 0.54 | -0.36 | 0.077 |
In columns we report the correlation coefficients between the quality assessment methods (Prosa, Modcheck, Victor/FRST, Anolea and Globularity score), as well as our four individual features (MM-type H-bonds, void number, water molecules and total accessibility), and three correct quality measures (i.e. RMSD, GDT_TS and MaxSub)