| Literature DB >> 16803615 |
Qiwen Dong1, Xiaolong Wang, Lei Lin.
Abstract
BACKGROUND: The development and testing of functions for the modeling of protein energetics is an important part of current research aimed at understanding protein structure and function. Knowledge-based mean force potentials are derived from statistical analyses of interacting groups in experimentally determined protein structures. Current knowledge-based mean force potentials are developed at the atom or amino acid level. The evolutionary information contained in the profiles is not investigated. Based on these observations, a class of novel knowledge-based mean force potentials at the profile level has been presented, which uses the evolutionary information of profiles for developing more powerful statistical potentials.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16803615 PMCID: PMC1534065 DOI: 10.1186/1471-2105-7-324
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparative results of potentials at our structure models
| Potentials | CP | Success rates | Z-scores | Potentials | CP | Success rates | Z-scores |
| Distance | 0.86 | 400/431 | 2.86 | Dihedral | 0.81 | 256/431 | 1.92 |
| Distance_profile | 0.91 | 422/431 | 3.26 | Dihedral_profile | 0.82 | 270/431 | 2.08 |
| Contact | 0.81 | 221/431 | 1.84 | Surface | 0.85 | 309/431 | 2.33 |
| Contact_profile | 0.83 | 232/431 | 1.96 | Surface_profile | 0.90 | 335/431 | 2.78 |
The distance, contact, dihedral and surface refer to the four kinds of potentials at the residue level. The potentials with _profile suffix indicate the corresponding potentials at the profile level. In the success rates columns, the first number is the number of native structures ranked number one; the second number is the total number of proteins in the decoy set.
Figure 2ROC curves of various potentials tested on our structure models. The lower the curve, the better the discrimination between the good and bad models. Subfigure (A), (B), (C) and (D) show the performance of residue-level and profile-level potentials of distance-dependent, contact, Φ/Ψ dihedral angle and accessible surface statistical potentials respectively. The potentials with _profile suffix indicate the corresponding potentials at the profile level.
Comparative fold assessment results of potentials at the Baker's set
| Potentials | CP | Success rates | Z-scores | Potentials | CP | Success rates | Z-scores |
| Distance | 0.77 | 25/41 | 2.58 | Dihedral | 0.75 | 17/41 | 1.41 |
| Distance_profile | 0.81 | 30/41 | 2.74 | Dihedral_profile | 0.77 | 20/41 | 1.58 |
| Contact | 0.74 | 15/41 | 1.36 | Surface | 0.74 | 18/41 | 2.47 |
| Contact_profile | 0.75 | 17/41 | 1.27 | Surface_profile | 0.78 | 22/41 | 2.54 |
See the footnote of table 1 for the name of the potentials.
The results of PROSTAR decoy set evaluation
| Decoy set | MISFOLD | IFU | PDBERR |
| Number of decoy pair | 25 | 41 | 3 |
| Distance | 25 | 28 | 3 |
| Distance_profile | 25 | 35 | 3 |
| Contact | 24 | 25 | 1 |
| Contact_profile | 25 | 28 | 3 |
| Dihedral | 25 | 26 | 3 |
| Dihedral_profile | 25 | 29 | 3 |
| Surface | 25 | 21 | 1 |
| Surface_profile | 25 | 24 | 3 |
Given in the table are the number of decoy pair and correctly recognized decoy pair for all potentials on the three decoy sets. See the footnote of table 1 for the name of the potentials.
The success rates and the average Z-scores of different potentials on the multiple decoy sets
| Source | 4state | Lattice_ssfit | Lmds | Fisa | Fisa_casp3 | Summary |
| DFIRD-SCM | 6/7 (3.94)a | 8/8 (6.19) | 3/10 (2.56) | 3/4 (4.70) | 3/3 (6.05) | 23/32 (4.68) |
| Distance | 5/7 (2.48) | 6/8 (4.97) | 2/10 (1.78) | 2/4 (3.06) | 1/3 (1.93) | 16/32 (2.84) |
| Distance_profile | 7/7 (3.53) | 8/8 (5.72) | 3/10 (2.45) | 2/4 (3.32) | 2/3 (2.94) | 22/32 (3.59) |
| Contact | 3/7 (1.38) | 4/8 (2.32) | 1/10 (0.83) | 0/4 (0.65) | 0/3 (1.69) | 8/32 (1.37) |
| Contact_profile | 3/7 (1.52) | 5/8 (2.96) | 1/10 (1.15) | 0/4 (0.72) | 0/3 (1.73) | 9/32 (1.61) |
| Dihedral | 7/7 (2.69) | 6/8 (3.51) | 2/10 (1.62) | 1/4 (1.05) | 1/3 (1.72) | 17/32 (2.12) |
| Dihedral_profile | 7/7 (2.72) | 7/8 (3.88) | 3/10 (1.55) | 1/4 (1.22) | 2/3 (2.58) | 20/32 (2.39) |
| Surface | 4/7 (1.80) | 4/8 (3.15) | 3/10 (1.21) | 1/4 (1.28) | 2/3 (2.26) | 14/32 (1.94) |
| Surface_profile | 4/7 (2.07) | 4/8 (3.57) | 5/10 (2.68) | 2/4 (1.89) | 2/3 (2.96) | 17/32 (2.64) |
aThe first number is the number of native structures ranked as number one; the second number is total number of proteins in the decoy set. The numbers in parentheses are the average Z-scores. The results of DFIRD-SCM method are directly taken from Zhang et al., Protein Sci. 2004, 13: 400–411.
The optimized results of probability threshold.
| Probability threshold | Number of profiles | Distance_profile | Contact_profile | Dihedral_profile | Surface_profile |
| 0.04 | 21355 | - | 0.828263 | 0.815943 | 0.899127 |
| 0.05 | 19868 | - | 0.826101 | 0.815291 | 0.900184 |
| 0.06 | 15935 | - | 0.825473 | 0.816991 | 0.900127 |
| 0.08 | 7444 | - | 0.825815 | 0.815693 | 0.898678 |
| 0.10 | 3145 | - | 0.827828 | 0.815039 | 0.900998 |
| 0.12 | 1442 | - | 0.826786 | 0.814627 | 0.899441 |
| 0.14 | 759 | 0.907069 | 0.82626 | 0.816084 | 0.899387 |
| 0.16 | 404 | 0.909359 | 0.826889 | 0.815488 | 0.899437 |
| 0.17 | 303 | 0.906705 | 0.82597 | 0.81585 | 0.899063 |
| 0.18 | 235 | 0.909907 | 0.824466 | 0.816644 | 0.899639 |
| 0.20 | 186 | 0.908468 | 0.828051 | 0.815918 | 0.899407 |
| 0.22 | 138 | 0.906744 | 0.823012 | 0.811962 | 0.896226 |
| 0.24 | 81 | 0.907125 | 0.825444 | 0.81052 | 0.895877 |
| 0.26 | 46 | 0.904767 | 0.823669 | 0.809967 | 0.896212 |
| 0.28 | 28 | 0.892552 | 0.816189 | 0.799421 | 0.887908 |
| 0.30 | 21 | 0.873879 | 0.782432 | 0.777818 | 0.867548 |
| 0.32 | 21 | 0.872684 | 0.781907 | 0.779257 | 0.866134 |
Given in the table are the average CP scores of profile-level statistical potentials at different probability threshold. The discrimination is performed on our structure models. The distance_profile, contact_profile, dihedral_profile and surface_profile refer to the profile-level statistical potentials of distance-dependent, contact, Φ/Ψ dihedral angle and accessible surface respectively. Note that for small Pvalue (<0.12), the profile-level distance-dependent potentials cannot produce efficient output, because the parameters of this potential are proportional to the square of the number of profiles.
Figure 3A scatter plot of energy versus RMSD. A horizontal line highlights the score of the native state. Subfigure (A), (B), (C) and (D) show the correlation of profile-level potentials of distance-dependent, contact, Φ/Ψ dihedral angle and accessible surface statistical potentials respectively. The total number of structure models included in each plot is 1858. Shown in the plot are the structure models of the sequence 1vcc from Baker's dataset.
The results of two threading methods
| Method | W0 | W1 | Ws | Training accuracy | Test accuracy |
| Residue-threading | 4.5 | 0.5 | 0.175 | 78.2% | 75.6% |
| Profile-threading | 5 | 0.4 | 0.348 | 82.5% | 79.4% |
W0, W1 and Ws are the gap-open penalty, gap-extension penalty and the structure factor. The training accuracy and test accuracy are the alignment accuracy on the training set and test set.
Figure 1The process of calculating frequency profiles and converting it into binary profiles. (a) For a given amino acid sequence, (b) the multiple sequence alignment is obtained by PSI-BLAST. (c) The frequency profile is calculated on the multiple sequence alignment and (d) transforms into a binary profile with a probability threshold. (e) A substring of amino acid combination is then obtained by collecting the binary profile with non-zero value for each position of the protein sequences.