| Literature DB >> 19457232 |
Pascal Benkert1, Torsten Schwede, Silvio Ce Tosatto.
Abstract
BACKGROUND: The selection of the most accurate protein model from a set of alternatives is a crucial step in protein structure prediction both in template-based and ab initio approaches. Scoring functions have been developed which can either return a quality estimate for a single model or derive a score from the information contained in the ensemble of models for a given sequence. Local structural features occurring more frequently in the ensemble have a greater probability of being correct. Within the context of the CASP experiment, these so called consensus methods have been shown to perform considerably better in selecting good candidate models, but tend to fail if the best models are far from the dominant structural cluster. In this paper we show that model selection can be improved if both approaches are combined by pre-filtering the models used during the calculation of the structural consensus.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19457232 PMCID: PMC2709111 DOI: 10.1186/1472-6807-9-35
Source DB: PubMed Journal: BMC Struct Biol ISSN: 1472-6807
Short description of the terms and their combinations used in this work.
| torsion | Extended torsion potential over 3 consecutive residues. Bin sizes: 45 degree for the centre residue, 90 degree for the 2 adjacent residues |
| pair residue | Residue-level, secondary structure specific interaction potential using Cβ atoms as interaction centres. Range 3...25 Å, step size: 1 Å |
| solvation | Potential reflecting the propensity of a certain amino acid for a certain degree of solvent exposure based on the number of Cβ atoms within a sphere of 9 Å around the centre Cβ. |
| pair all-atom | All-atom, secondary structure specific interaction potential using all 167 atom types. Range 3...20 Å, step size: 0.5 Å |
| SSE agreement | Agreement between the predicted secondary structure of the target sequence (using PSIPRED) and the calculated secondary structure of the model (using DSSP). |
| ACC agreement | Agreement between the predicted relative solvent accessibility using ACCpro (buried/exposed) and the relative solvent accessibility derived from DSSP (> 25% accessibility => exposed) |
| QMEAN3 | linear combination of torsion, pair residue, salvation |
| QMEAN4 | linear combination of torsion, pair residue, solvation, pair all-atom |
| QMEAN5 | linear combination torsion, pair residue, solvation, SSE, ACC |
| QMEAN6 | linear combination of torsion, pair residue, solvation, pair all-atom, SSE, ACC |
Comparison between QMEAN, various QMEANclust implementations and selfQMEAN on all CASP7 server models.
| QMEAN3 | 0.645 | 0.551 | 50.17 |
| QMEAN3 * fraction modelled | 0.663 | 0.605 | 51.92 |
| QMEAN4 | 0.647 | 0.540 | 49.57 |
| QMEAN4 * fraction modelled | 0.671 | 0.609 | 52.65 |
| QMEAN5 | 0.729 | 0.630 | 54.87 |
| QMEAN5 * fraction modelled | 0.740 | 0.676 | 55.32 |
| QMEAN6 | 0.741 | 0.638 | 56.36 |
| QMEAN6 * fraction modelled | |||
| Median | 0.872 | 0.812 | 56.64 |
| Mean | 0.889 | 0.821 | 57.16 |
| Weighted mean | 0.883 | 0.824 | 57.63 |
| Median: Z-score > -1 | 0.877 | 0.815 | 57.05 |
| Mean: Z-score > -1 | 0.876 | 0.817 | 57.30 |
| Weighted mean: Z-score > -1 | 0.882 | 0.823 | 57.60 |
| Median: Z-score > 0 | 0.884 | 0.824 | 57.52 |
| Mean: Z-score > 0 | 0.879 | 0.822 | 57.35 |
| Weighted mean: Z-score > 0 | 0.882 | 0.826 | 57.31 |
| Median: Z-score > 0.5 | 0.885 | 0.828 | 57.33 |
| Mean: Z-score > 0.5 | 0.880 | 0.830 | 56.96 |
| Weighted mean: Z-score > 0.5 | 0.883 | 0.832 | 57.18 |
| Median: 20% TBM, 20% FM | 0.888 | 0.842 | 57.37 |
| Median: 10% TBM, 10% FM | 0.890 | 57.83 | |
| Median: 5% TBM, 5% FM | 0.873 | 0.826 | 56.98 |
| Median: 10% TBM, 20% FM | 0.886 | 57.23 | |
| Median: 20% TBM, 10% FM | 0.842 | 57.97 | |
| Median: Δ < 0.05 Å TBM, Δ < 0.05 Å FM | 0.867 | 0.826 | 57.65 |
| Median: Δ < 0.1 Å TBM, Δ < 0.1 Å FM | 0.837 | 57.69 | |
| Median: Δ < 0.05 Å TBM, Δ < 0.1 Å FM | |||
| Median: Δ < 0.1 Å TBM, Δ < 0.05 Å FM | 0.868 | 0.822 | 57.23 |
| Linear combination of 5 terms (w/o all-atom) | 0.811 | 0.755 | 55.53 |
| Sum of Z-scores (5 terms) | |||
| Sum of Z-scores (6 terms) | 0.753 | 55.60 | |
Average correlation coefficient and total maximum GDT_TS score of the selected models of different QMEAN versions obtained on the test set containing all CASP7 server models. A description of all QMEAN versions is given in Table 1. For the QMEANclust consensus score, a multitude of strategies for pre-selecting reference models based on QMEAN score is investigated. The models of the reference set are defined based on a certain Z-score cut-off, by using only a percentage of top scoring models or by including only models being close to the highest scoring model. The different cut-offs used for template-based modelling targets (TBM) of free modelling targets (FM) are indicated. Underlined values are used in Table 3 for comparison to other methods. The selfQMEAN scoring function is based on ensemble-specific statistical potentials.
Comparison of the best QMEAN versions with other methods participating in CASP7.
| QMEAN | 0.752 | 0.684 | 56.70 |
| Circle-QA | 0.718 | 0.643 | 56.03 |
| ProQ | 0.700 | 0.571 | 54.29 |
| ProQlocal | 0.698 | 0.563 | 54.17 |
| Bilab | 0.683 | 0.561 | 54.50 |
| ModFOLD | 0.661 | 0.580 | 54.19 |
| ABIpro | 0.653 | 0.605 | 56.40 |
| selfQMEAN | 0.830 | 0.749 | 56.60 |
| QMEANclust | |||
| Pcons | 0.801 | 0.714 | 54.36 |
| TASSER-QA | 0.828 | 0.785 | 57.23 |
| Zhang server | - | - | 57.35 |
| Random model selection | - | - | |
| Best model per target | - | - | |
Average correlation coefficient and total maximum GDT_TS score of the optimised QMEAN, QMEANclust and selfQMEAN versions and the top performing methods of CASP7. Only scoring functions with predictions for all 98 targets are shown.
Figure 1Analysis of the statistical significance based on a one-sided paired t-test (95% confidence level). Green: Method denoted on the horizontal performs significantly better. Red: Method denoted on the horizontal performs significantly worse. a) Pearson's correlation coefficient, b) Spearman's rank correlation coefficient, c) GDT_TS values of the models selected model by a scoring function.
Figure 2Comparison of QMEAN, a 3d-Jury like approach and QMEANclust on 3 selected CASP7 targets. The table shows the GDT_TS difference between the best select model by QMEANclust and the 3D-jury approach. Correlations between predicted score and GDT_TS of three targets are shown for QMEAN, 3D-jury and QMEANclust (from left to right). The dashed areas mark the models selected by QMEAN as the basis for QMEANclust. The arrow on the right of each plot denotes the best selected model.
Performance comparison of QMEAN to other single model scoring functions based on the MOULDER test set.
| torsion | 4.50 | 4.06 |
| pairwise Cbeta, SSE | 1.48 | 3.00 |
| Salvation Cbeta | 1.06 | 1.91 |
| SSE agreement | 0.92 | 1.24 |
| ACC_agreement | 0.79 | 1.07 |
| pairwise all-atom, SSE | 0.68 | 0.96 |
| QMEAN5 | 0.42 | 0.59 |
| SIFT | 5.68 | 5.20 |
| Anolea_Z | 1.94 | 2.29 |
| SOLVX | 1.76 | 2.21 |
| Xd | 1.68 | 2.63 |
| FRST | 1.55 | 2.41 |
| MP_SURF | 1.36 | 1.90 |
| MP_PAIR | 1.20 | 1.70 |
| EEF1 | 1.09 | 1.52 |
| GB | 1.06 | 1.36 |
| DOPE_BB | 0.96 | 1.27 |
| PROSA_COMB | 0.89 | 1.52 |
| GA341 | 0.84 | 0.86 |
| MODCHECK | 0.83 | 1.29 |
| MP_COMBI | 0.82 | 1.19 |
| DFIRE | 0.81 | 1.37 |
| DOPE_AA | 0.77 | 1.21 |
| ROSETTA | 0.71 | 1.05 |
| SVM_SCORE | 0.46 | 0.66 |
The table shows the RMSD difference (in Ångstrom) between the selected model by the scoring function and best model in the ensemble, averaged over the 20 protein targets of the MOULDER test set. In order to increase the robustness of the statistics, each calculation is repeated 2000 times on random subsets of 25% of the model ensemble. For comparison, the mean ΔRMSD and standard deviations for QMEANclust (based on consensus scoring of all 300 models) are 1.15 and 1.39 Å respectively. For a detailed comparison of QMEAN and QMEANclust see Table 5.
Comparison between QMEAN and QMEANclust in the task of selecting near native models on the MOULDER test set.
| 2cmd | 5.76 | 100 | 2.75 | 0.67 |
| 1bbh | 6.49 | 86 | 0.00 | 0.17 |
| 2mta | 6.66 | 119 | 0.29 | 0.31 |
| 1dxt | 7.19 | 79 | 1.11 | 0.72 |
| 2pna | 7.29 | 57 | 0.14 | 0.14 |
| 1lga | 8.17 | 106 | 0.82 | 1.10 |
| 1mup | 8.18 | 65 | 0.40 | 0.36 |
| 8i1b | 8.34 | 115 | 0.62 | 0.47 |
| 2afn | 8.54 | 42 | 0.12 | 0.58 |
| 2fbj | 8.84 | 59 | 0.29 | 0.26 |
| 1mdc | 9.27 | 105 | 0.07 | 0.18 |
| 1onc | 10.46 | 106 | 0.47 | 0.15 |
| 1c2r | 10.46 | 7 | 0.00 | 1.95 |
| 2sim | 10.98 | 55 | 0.00 | 0.96 |
| 1cid | 11.16 | 0 | 0.11 | 0.63 |
| 1gky | 11.56 | 15 | 0.66 | 1.16 |
| 1cau | 11.92 | 11 | 0.42 | 3.54 |
| 1eaf | 12.64 | 1 | 0.34 | 1.72 |
| 1cew | 14.74 | 21 | 2.77 | 2.24 |
| 4sbv | 17.40 | 1 | 0.00 | 5.74 |
The first two data columns contain the median RMSD of the models in the decoy set and the number of models with RMSD < 5 Å (out of totally 300). For all 20 target proteins, the RMSD difference (in Ångstrom) is given between the selected model and best model in the ensemble.
Comparison of consensus and non-consensus based methods in the estimation of the local model quality.
| QMEANclust_local | 2.2 | |||||
| selfQMEAN_local | 0.49 | 0.35 | 0.84 | 0.90 | 1.3 | 5.8 |
| QMEAN_local | 0.43 | 0.32 | 0.80 | 0.83 | 4.3 | |
| ProQres | 0.28 | 0.26 | 0.74 | 0.77 | 0.9 | 5.8 |
r = average Pearson's correlation coefficient; tau = Kendalls's tau on a per model basis; ROC = area under ROC curve averaged over all 98 targets (avg) or using all residues pooled together (all); low/top10% = average Cα distance of the 10% lowest/highest scoring residues per target.
Figure 3Receiver operator characteristic (ROC) curves for the different local QMEAN versions and ProQres. A Cα distance cut-off of 2.5 Å has been used. Two alternative QMEANclust approaches have been tested which combine the local Cα distances using median or weighted mean.