| Literature DB >> 18957132 |
Yoshihide Makino1, Nobuya Itoh.
Abstract
BACKGROUND: The use of knowledge-based potential function is a powerful method for protein structure evaluation. A variety of formulations that evaluate single or multiple structural features of proteins have been developed and studied. The performance of functions is often evaluated by discrimination ability using decoy structures of target proteins. A function that can evaluate coarse-grained structures is advantageous from many aspects, such as relatively easy generation and manipulation of model structures; however, the reduction of structural representation is often accompanied by degradation of the structure discrimination performance.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18957132 PMCID: PMC2600639 DOI: 10.1186/1472-6807-8-46
Source DB: PubMed Journal: BMC Struct Biol ISSN: 1472-6807
Figure 1Schematic representation of the pairwise residue parameters for pseudo-energy components. (A) DABG component. Distance d (Å) is measured between two Catoms. The α angle (degree) is formed with the C-pseudo-Cvector of ith residue and the C-Cvector. The β angle (degree) is formed similarly for jth residue. The γ is the dihedral angle (degree) formed with the four atom coordinates of the Cand the pseudo-Cfor the respective ith and jth residues. (B) HBND component. Distance d (Å) is measured between pseudo-H atom of the ith residue and pseudo-O atoms of the jth residue. The η angle (degree) is formed with the pseudo-H-N vector of the ith residue and the pseudo-H-pseudo-O vector. The θ angle (degree) is formed with the pseudo-O-C vector of the jth residue and the pseudo-O-pseudo-H vector.
Parameters and their values for tuning the function.
| scan | component | parameter | initial value | scanned values | selected value |
| 1 | DIST | 12.0 | 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0 | 12.0 | |
| 0.627 | 0.525, 0.550, 0.575, 0.603, 0.631, 0.661, 0.692 | 0.661 | |||
| 2 | DIST | sequence separation limit | 5 | 2, 3, 4, 5, 6, 7, 8 | 5 |
| 3 | DABG | range of the bin averaging | |||
| distance | 1 | 0, 1, 2 | 0 | ||
| 0 | 0, 1 | 1 | |||
| 0 | 0, 1 | 0 | |||
| 2 | 0, 1, 2 | 1 | |||
| 4 | DABG | sequence separation limit | 5 | 2, 3, 4, 5, 6, 7, 8 | 5 |
| 5 | DIST, DABG | sequence separation limit of evaluation | 3 | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | 3 |
| 6 | DIST | 0 count penalty | 8.0 | 0.0, 2.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0 | 8.0 |
| 7 | DABG | 0 count penalty | 2.0 | 0.0, 1.0, 2.0, 3.0, 4.0, 5.0 | 2.0 |
| 8 | SURR | radius range | 15.0 | 9.0, 12.0, 15.0, 18.0 | 15.0 |
| 9 | SURR | 0 count penalty | 0.0 | 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0 | 0.0 |
| 10 | HBND | 0 count penalty | 2.0 | 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0 | 2.0 |
| 11 | PPDA | 0 count penalty | 12.0 | 0.0, 2.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0 | 12.0 |
| 12 | OMDA | 0 count penalty | 6.0 | 0.0, 2.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0 | 0.0 |
Short descriptions of scanned parameters of the components are shown in the order of scanning during the tuning process. The b and c of scan 1 are the constants for calculating Nexp(d). Sequence separation limit of scan 2 and 4 are the lower limit of separation between ith and jth residues that was incorporated into the respective databases. The four parameters of scan 3 are the range for averaging among adjacent database bins. The sequence separation limit of evaluation of scan 5 is the lower limit of separation applied for evaluation, not for database construction. The 0 count penalties of scan 6, 7, 9, 10, 11 and 12 are the energy penalty value when no count was recorded in the bin of the compiled database. The radius range of scan 8 is the radius of the sphere for SURR component calculation. Details of the respective parameters are in Methods. The values determined by initial scanning before tuning, the list of scanned values during tuning, and the selected values with better CRMSD are shown. Multiple parameters in a single scan indicate the scanning of all combinations among the listed values.
Summary of performance of the DFMAC function on the training and test decoy sets.
| target set | Nall | Nn | Nnn | C | Z-score | C.C. | F.E.(%) | RB1 | logPB1 | RB10 | logPB10 | C.C.decoy | F.E.decoy(%) |
| training set | 154 | 115 | 135 | 0.764 | 2.552 | 0.539 | 38.9 | 171.3 | -0.78 | 36.1 | -1.38 | 0.499 | 27.6 |
| test set | 77 | 59 | 68 | 1.174 | 2.630 | 0.559 | 38.7 | 164.0 | -0.75 | 15.8 | -1.41 | 0.518 | 25.1 |
Summarized values are shown by the respective decoy sets. The numbers of total target proteins evaluated (Nall), and correct identification of the native (Nn) or near-native (CRMSD < 2 Å) (Nnn) structures are shown. The CRMSD, Z-score, C.C., F.E., RB1, logPB1, RB10, logPB10, C.C.decoy, F.E.decoy are the average of the respective scores of the target proteins evaluated. The definition of each index is described in Methods.
Performance of the DFMAC function on the test decoy sets grouped by their generation methods.
| protein | N | Rnat | C | Z-score | C.C. | F.E.(%) | RB1 | logPB1 | RB10 | logPB10 | C.C.decoy | F.E.decoy(%) |
| 4state_reduced | ||||||||||||
| 631 | 1 | 0.000 | 4.485 | 0.817 | 68.3 | 61 | -1.01 | 3 | -2.32 | 0.815 | 66.7 | |
| 675 | 1 | 0.000 | 3.166 | 0.822 | 53.7 | 7 | -1.98 | 2 | -2.53 | 0.820 | 53.7 | |
| 678 | 1 | 0.000 | 2.895 | 0.670 | 65.7 | 71 | -0.98 | 3 | -2.35 | 0.665 | 64.2 | |
| fisa | ||||||||||||
| 501 | 1 | 0.000 | 4.190 | 0.280 | 24.0 | 3 | -2.22 | 2 | -2.40 | 0.253 | 24.0 | |
| fisa_casp3 | ||||||||||||
| 972 | 8 | 5.522 | 2.174 | 0.302 | 20.6 | 15 | -1.81 | 15 | -1.81 | 0.296 | 19.6 | |
| 1401 | 1 | 1.882 | 3.835 | 0.128 | 18.6 | 455 | -0.49 | 3 | -2.67 | 0.111 | 17.9 | |
| hg_structal | ||||||||||||
| 30 | 1 | 0.000 | 1.868 | 0.904 | 66.7 | 10 | -0.46 | 2 | -1.16 | 0.892 | 0.0 | |
| 30 | 1 | 0.000 | 1.364 | 0.896 | 100.0 | 2 | -1.16 | 2 | -1.16 | 0.898 | 50.0 | |
| 30 | 1 | 0.000 | 2.395 | 0.880 | 33.3 | 4 | -0.86 | 2 | -1.16 | 0.845 | 0.0 | |
| 30 | 1 | 0.000 | 1.205 | 0.893 | 33.3 | 6 | -0.68 | 2 | -1.16 | 0.888 | 0.0 | |
| 30 | 1 | 0.000 | 1.661 | 0.812 | 33.3 | 15 | -0.29 | 2 | -1.16 | 0.812 | 0.0 | |
| 30 | 1 | 0.000 | 1.775 | 0.904 | 33.3 | 23 | -0.10 | 2 | -1.16 | 0.893 | 0.0 | |
| 30 | 18 | 1.823 | -0.270 | 0.754 | 33.3 | 3 | -0.99 | 2 | -1.16 | 0.835 | 0.0 | |
| 30 | 1 | 0.000 | 2.429 | 0.762 | 100.0 | 2 | -1.16 | 2 | -1.16 | 0.725 | 50.0 | |
| 30 | 1 | 0.000 | 1.976 | 0.563 | 33.3 | 12 | -0.38 | 4 | -0.86 | 0.460 | 0.0 | |
| 30 | 1 | 0.000 | 3.221 | 0.839 | 33.3 | 21 | -0.14 | 2 | -1.16 | 0.750 | 0.0 | |
| ig_structal | ||||||||||||
| 61 | 1 | 0.000 | 2.304 | 0.605 | 16.7 | 27 | -0.35 | 10 | -0.78 | 0.554 | 0.0 | |
| 61 | 7 | 1.854 | 0.864 | 0.530 | 16.7 | 12 | -0.70 | 4 | -1.18 | 0.528 | 16.7 | |
| 61 | 4 | 1.736 | 1.172 | 0.481 | 16.7 | 13 | -0.66 | 4 | -1.18 | 0.462 | 0.0 | |
| 61 | 59 | 1.702 | -2.283 | 0.349 | 0.0 | 21 | -0.46 | 6 | -1.00 | 0.504 | 0.0 | |
| 61 | 2 | 1.333 | 1.358 | 0.583 | 33.3 | 8 | -0.88 | 4 | -1.18 | 0.565 | 16.7 | |
| 61 | 1 | 0.000 | 2.112 | 0.606 | 16.7 | 22 | -0.44 | 2 | -1.48 | 0.574 | 0.0 | |
| 61 | 1 | 0.000 | 2.787 | 0.547 | 16.7 | 10 | -0.78 | 10 | -0.78 | 0.469 | 0.0 | |
| 61 | 1 | 0.000 | 2.103 | 0.644 | 33.3 | 8 | -0.88 | 2 | -1.48 | 0.613 | 16.7 | |
| 61 | 2 | 1.774 | 1.353 | 0.607 | 16.7 | 31 | -0.29 | 8 | -0.88 | 0.591 | 0.0 | |
| 61 | 1 | 0.000 | 2.561 | 0.592 | 33.3 | 8 | -0.88 | 3 | -1.30 | 0.540 | 16.7 | |
| 61 | 1 | 0.000 | 1.380 | 0.333 | 16.7 | 41 | -0.17 | 5 | -1.08 | 0.286 | 0.0 | |
| 61 | 1 | 0.000 | 2.143 | 0.623 | 66.7 | 3 | -1.30 | 2 | -1.48 | 0.585 | 50.0 | |
| 61 | 1 | 0.000 | 2.305 | 0.379 | 16.7 | 51 | -0.07 | 7 | -0.93 | 0.264 | 0.0 | |
| 61 | 1 | 0.000 | 2.716 | 0.543 | 33.3 | 2 | -1.48 | 2 | -1.48 | 0.456 | 16.7 | |
| 61 | 1 | 0.000 | 2.175 | 0.575 | 33.3 | 44 | -0.14 | 3 | -1.30 | 0.529 | 33.3 | |
| 61 | 1 | 0.000 | 2.323 | 0.567 | 33.3 | 4 | -1.18 | 4 | -1.18 | 0.523 | 16.7 | |
| 61 | 1 | 0.000 | 2.766 | 0.208 | 16.7 | 45 | -0.13 | 26 | -0.36 | -0.020 | 0.0 | |
| 61 | 1 | 0.000 | 2.277 | 0.486 | 16.7 | 32 | -0.27 | 13 | -0.66 | 0.422 | 0.0 | |
| 61 | 1 | 0.000 | 2.648 | 0.243 | 33.3 | 29 | -0.32 | 4 | -1.18 | 0.055 | 16.7 | |
| 61 | 1 | 0.000 | 2.941 | 0.614 | 50.0 | 6 | -1.00 | 3 | -1.30 | 0.531 | 33.3 | |
| ig_structal_hires | ||||||||||||
| 20 | 1 | 0.000 | 2.310 | 0.724 | 50.0 | 6 | -0.50 | 2 | -0.98 | 0.633 | 0.0 | |
| 20 | 1 | 0.000 | 2.827 | 0.649 | 50.0 | 8 | -0.38 | 2 | -0.98 | 0.493 | 0.0 | |
| 20 | 1 | 0.000 | 1.518 | 0.636 | 50.0 | 11 | -0.24 | 3 | -0.80 | 0.567 | 0.0 | |
| 20 | 10 | 1.719 | 0.169 | 0.399 | 0.0 | 7 | -0.43 | 4 | -0.68 | 0.452 | 0.0 | |
| 20 | 1 | 0.000 | 2.334 | 0.385 | 50.0 | 15 | -0.10 | 2 | -0.98 | -0.116 | 0.0 | |
| 20 | 1 | 0.000 | 2.532 | 0.725 | 50.0 | 5 | -0.58 | 2 | -0.98 | 0.614 | 0.0 | |
| 20 | 1 | 0.000 | 2.487 | 0.295 | 50.0 | 17 | -0.05 | 8 | -0.38 | -0.228 | 0.0 | |
| lattice_ssfit | ||||||||||||
| 1995 | 1 | 0.000 | 7.349 | -0.049 | 8.5 | 996 | -0.30 | 242 | -0.92 | -0.087 | 8.0 | |
| 1997 | 1 | 0.000 | 13.649 | 0.138 | 17.1 | 1909 | -0.02 | 60 | -1.52 | 0.087 | 16.6 | |
| lmds | ||||||||||||
| 498 | 1 | 0.000 | 2.819 | 0.066 | 20.4 | 336 | -0.17 | 10 | -1.70 | 0.038 | 18.4 | |
| 216 | 70 | 7.224 | 0.375 | 0.044 | 4.8 | 42 | -0.71 | 24 | -0.95 | 0.038 | 4.8 | |
| 437 | 1 | 0.000 | 4.275 | 0.064 | 11.6 | 378 | -0.06 | 2 | -2.34 | -0.004 | 9.3 | |
| 344 | 3 | 9.434 | 2.570 | 0.098 | 23.5 | 220 | -0.19 | 4 | -1.93 | 0.063 | 20.6 | |
| semfold | ||||||||||||
| 11442 | 61 | 12.125 | 2.342 | 0.070 | 13.6 | 6511 | -0.25 | 434 | -1.42 | 0.069 | 13.5 | |
| 11282 | 1 | 0.000 | 7.782 | 0.096 | 19.2 | 2 | -3.75 | 2 | -3.75 | 0.091 | 19.2 | |
| moulder | ||||||||||||
| 301 | 1 | 0.000 | 2.803 | 0.774 | 73.3 | 10 | -1.48 | 2 | -2.18 | 0.768 | 70.0 | |
| 301 | 1 | 0.000 | 2.759 | 0.753 | 53.3 | 38 | -0.90 | 2 | -2.18 | 0.748 | 53.3 | |
| 300 | 1 | 0.000 | 4.713 | 0.828 | 90.0 | 11 | -1.43 | 3 | -2.00 | 0.819 | 89.7 | |
| 301 | 1 | 0.000 | 1.993 | 0.847 | 73.3 | 11 | -1.44 | 3 | -2.00 | 0.845 | 73.3 | |
| 301 | 1 | 0.000 | 2.506 | 0.911 | 36.7 | 15 | -1.30 | 6 | -1.70 | 0.911 | 33.3 | |
| 301 | 85 | 3.523 | 0.723 | 0.816 | 66.7 | 30 | -1.00 | 4 | -1.88 | 0.817 | 66.7 | |
| 301 | 1 | 0.000 | 2.106 | 0.842 | 46.7 | 16 | -1.27 | 4 | -1.88 | 0.840 | 46.7 | |
| rosetta | ||||||||||||
| 141 | 1 | 0.000 | 2.608 | 0.624 | 64.3 | 11 | -1.11 | 3 | -1.67 | 0.608 | 64.3 | |
| 141 | 15 | 1.385 | 0.805 | 0.777 | 7.1 | 40 | -0.54 | 25 | -0.75 | 0.776 | 7.1 | |
| 141 | 1 | 0.000 | 2.562 | 0.820 | 78.6 | 14 | -1.00 | 3 | -1.67 | 0.812 | 78.6 | |
| 141 | 2 | 9.242 | 1.895 | 0.544 | 50.0 | 131 | -0.03 | 2 | -1.85 | 0.532 | 50.0 | |
| 141 | 1 | 0.000 | 3.317 | 0.848 | 64.3 | 11 | -1.11 | 2 | -1.85 | 0.851 | 57.1 | |
| 141 | 1 | 0.000 | 4.288 | 0.783 | 28.6 | 30 | -0.67 | 6 | -1.37 | 0.780 | 28.6 | |
| 141 | 22 | 3.619 | 1.082 | -0.063 | 7.1 | 132 | -0.03 | 12 | -1.07 | -0.070 | 7.1 | |
| 141 | 1 | 0.000 | 3.194 | 0.587 | 35.7 | 22 | -0.80 | 2 | -1.85 | 0.564 | 35.7 | |
| 141 | 1 | 0.000 | 3.018 | 0.540 | 21.4 | 61 | -0.36 | 15 | -0.97 | 0.514 | 21.4 | |
| 141 | 1 | 0.000 | 6.093 | 0.595 | 71.4 | 9 | -1.19 | 2 | -1.85 | 0.629 | 71.4 | |
| 141 | 1 | 0.000 | 2.664 | 0.741 | 71.4 | 8 | -1.24 | 3 | -1.67 | 0.731 | 64.3 | |
| 141 | 1 | 0.000 | 2.232 | 0.821 | 35.7 | 75 | -0.27 | 5 | -1.45 | 0.820 | 28.6 | |
| 141 | 8 | 13.461 | 1.445 | 0.440 | 21.4 | 89 | -0.20 | 14 | -1.00 | 0.427 | 21.4 | |
| 141 | 1 | 0.000 | 4.986 | 0.866 | 92.9 | 5 | -1.45 | 2 | -1.85 | 0.876 | 85.7 | |
| 141 | 5 | 0.842 | 2.002 | 0.763 | 64.3 | 16 | -0.94 | 3 | -1.67 | 0.755 | 57.1 | |
| 141 | 1 | 0.000 | 2.452 | 0.694 | 64.3 | 9 | -1.19 | 3 | -1.67 | 0.680 | 57.1 | |
| 141 | 1 | 0.000 | 3.470 | 0.791 | 71.4 | 2 | -1.85 | 2 | -1.85 | 0.781 | 71.4 | |
| 141 | 1 | 0.000 | 4.974 | 0.426 | 7.1 | 118 | -0.07 | 42 | -0.52 | 0.390 | 0.0 | |
| 141 | 80 | 10.219 | 0.075 | -0.020 | 0.0 | 123 | -0.06 | 71 | -0.30 | -0.022 | 0.0 | |
The performance scores for respective PDB IDs of the target proteins and their average are shown by individual generation methods. The number of decoy structures and single native structure (N), the rank of the native structure relative to decoy structures based on the calculated pseudo-energy (Rnat), and the rest of the scores, described in the Methods, are shown.
Figure 2Examples of the distribution of total pseudo-energy against C. Examples of the distribution of total pseudo-energy (Energy) against CRMSD are shown according to the correlation coefficient (C.C.) value from the test result. The native structures are at 0.0 of CRMSD. (A) 2cmd from the moulder decoy set (the best C.C. of 0.911). (B) 1fvd from the ig_structal decoy set (median C.C. of 0.606). (C) 1elw from the rosetta decoy set (the worst C.C. of -0.063).
Comparison of the function performances.
| decoy set | protein | DFIRE-A | DFIRE-B | DOPE | RAPDF | PC2CA | DFMAC |
| 4state_reduced | 1ctf | 1 | 1 | 1 | 1 | 1 | 1 |
| 4state_reduced | 2cro | 1 | 2 | 1 | 1 | 1 | 1 |
| 4state_reduced | 4rxn | 1 | 19 | 1 | 1 | 667 | 1 |
| fisa | 2cro | 1 | 1 | 1 | 14 | 1 | 1 |
| fisa_casp3 | 1bl0 | 1 | 3 | 1 | 1 | 1 | 8 |
| lattice_ssfit | 1dkt-A | 1 | 1 | 1 | 1 | 1 | 1 |
| lattice_ssfit | 1pgb | 1 | 1 | 1 | 1 | 1 | 1 |
| lmds | 1b0n-B | 430 | 261 | 34 | 359 | 1 | 1 |
| lmds | 1dtk | 1 | 5 | 1 | 116 | 2 | 70 |
| lmds | 1shf-A | 1 | 1 | 1 | 1 | 1 | 1 |
| lmds | 4pti | 1 | 1 | 1 | 157 | 1 | 3 |
| average | 40.0 | 26.9 | 4.0 | 59.4 | 61.6 | 8.1 | |
| correct | 10 | 6 | 10 | 7 | 9 | 8 | |
The rank of the native structure identified by respective functions is shown for the targets listed. The results of DFIRE-A, DFIRE-B and RAPDF were from the literature [7]. The results of DOPE were from [15]. The results of PC2CA were from [16]. The average rank (average) and the number of correctly identified native structures (correct) in 11 targets are shown.
Comparison of PC2CA and DFMAC functions on the test set.
| PC2CA | DFMAC | ||||||||||
| decoy set | total | correct | C | Z-score | C.C. | F.E.(%) | correct | C | Z-score | C.C. | F.E.(%) |
| 4state_reduced | 3 | 2 | 0.7 | 1.4 | 0.59 | 53.4 | 3 | 0.0 | 3.5 | 0.77 | 62.6 |
| fisa | 1 | 1 | 0.0 | 7.3 | 0.17 | 22.0 | 1 | 0.0 | 4.2 | 0.28 | 24.0 |
| fisa_casp3 | 2 | 2 | 0.0 | 4.4 | -0.02 | 10.4 | 1 | 3.7 | 3.0 | 0.22 | 19.6 |
| hg_structal | 10 | 5 | 0.8 | 1.3 | 0.70 | 53.3 | 9 | 0.2 | 1.8 | 0.82 | 50.0 |
| ig_structal | 20 | 0 | 2.2 | -0.8 | 0.31 | 18.3 | 15 | 0.4 | 1.9 | 0.51 | 25.8 |
| ig_structual_hires | 7 | 0 | 2.6 | -0.2 | 0.32 | 0.0 | 6 | 0.2 | 2.0 | 0.54 | 42.9 |
| lattice_ssfit | 2 | 2 | 0.0 | 3.9 | 0.02 | 11.1 | 2 | 0.0 | 10.5 | 0.04 | 12.8 |
| lmds | 4 | 3 | 1.6 | 3.7 | 0.10 | 19.5 | 2 | 4.2 | 2.5 | 0.07 | 15.1 |
| semfold | 2 | 1 | 0.2 | 2.7 | 0.05 | 13.0 | 1 | 6.1 | 5.1 | 0.08 | 16.4 |
| Summary | 51 | 16 | 1.5 | 0.9 | 0.35 | 24.1 | 40 | 0.9 | 2.6 | 0.50 | 33.1 |
Only the results for targets listed in our test set are compiled and shown. The number of correctly identified native structures (correct) out of the total targets (total) is shown by individual generation methods. The averages of CRMSD, Z-score, C.C., and F.E. for the respective decoy sets are also shown. In the "Summary" column, the sum of total and correct counts, and the averages of CRMSD, Z-score, C.C., and F.E. over the respective protein targets, are shown. The results of PC2CA [16] were used and the respective score averages were calculated.
Effects of the omission of each energy calculation component from the DFMAC function.
| omitted component | Rnat | C | Z-score | C.C. | F.E.(%) | logPB1 | logPB10 | C.C.decoy | F.E.decoy(%) |
| none | 6.8 | 1.174 | 2.630 | 0.559 | 38.7 | -0.75 | -1.41 | 0.518 | 25.1 |
| DIST | 6.9 | 1.087 | 2.687 | 0.547 | 38.1 | -0.76 | -1.40 | 0.508 | 25.4 |
| DABG | 16.2 | 2.444 | 2.062 | 0.554 | 36.9 | -0.69 | -1.40 | 0.507 | 25.8 |
| HBND | 6.0 | 1.301 | 2.523 | 0.558 | 39.3 | -0.76 | -1.44 | 0.520 | 26.5 |
| PPDA | 6.7 | 1.197 | 2.617 | 0.558 | 39.1 | -0.75 | -1.44 | 0.518 | 25.0 |
| OMDA | 12.0 | 1.036 | 2.582 | 0.555 | 39.5 | -0.76 | -1.42 | 0.517 | 25.4 |
| SURR | 11.6 | 1.519 | 2.737 | 0.487 | 35.2 | -0.68 | -1.37 | 0.442 | 22.0 |
The measures are as described in Table 3. The average scores over the test set are shown by the omitted component.