| Literature DB >> 20823310 |
Abstract
SUMMARY: The identification of good protein structure models and their appropriate ranking is a crucial problem in structure prediction and fold recognition. For many alignment methods, rescoring of alignment-induced models using structural information can improve the separation of useful and less useful models as compared with the alignment score. Vorescore, a template-based protein structure model rescoring system is introduced. The method scores the model structure against the template used for the modeling using Vorolign. The method works on models from different alignment methods and incorporates both knowledge from the prediction method and the rescoring.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20823310 PMCID: PMC2935407 DOI: 10.1093/bioinformatics/btq369
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Overview of the VORSPI (box) and the VORESCORE methods. Proposed models are rescored with VOROLIGN against the used template structure (VORPSI) or against all members of the template's fold except for the template's own family (VORESCORE), respectively.
Fold recognition rates for the CATHSCOP set
| Maximum similarity in the CATHSCOP set | |||
|---|---|---|---|
| Family | Superfamily | Fold | |
| (4859) | (3919) | (2197) | |
| GOTOH | 84.07% (4085) | 40.50% (1587) | 23.81% (523) |
| 123D | 77.94% (3784) | 30.75% (1205) | 21.94% (482) |
| PPA | 93.72% (4554) | 73.92% (2897) | 50.71% (1114) |
| HHALIGN | 94.11% (4573) | 76.24% (2988) | 47.06% (1034) |
| VOROLIGN | |||
We show for every test the number of targets involved in the given set in parentheses. Recognizing the correct fold having all similarity levels (family column) available is an easy task for the current best alignment methods (PPA and HHALIGN), which achieve almost the performance of VOROLIGN. For the superfamily and fold level only more distant similarities are available. Thus, the recognition rates for sequence-based methods are much lower. On the fold level, even the best methods fail on the fold-recognition task on every second target. Also for this case, the use of structural neighborhoods exploited by VOROLIGN improves by >50% on the best sequence method (PPA) to over 75% fold recognition rate. The maximum values in each column, i.e. maximum fold recognition rate for each level, are indicated in bold.
Fold recognition rates on the test set
| Maximum similarity in the test set | |||
|---|---|---|---|
| Family | Superfamily | Fold | |
| (410) | (338) | (181) | |
| GOTOH | 22.68% (93) | 27.51% (93) | 16.57% (30) |
| 123D | 22.93% (94) | 22.19% (75) | 19.34% (35) |
| PPA | 81.46% (334) | 50.07% (176) | 20.99% (38) |
| HHALIGN | 89.76% (368) | 66.57% (225) | 37.02% (67) |
| VOROLIGN | |||
The test set is a subset of the CATHSCOP set with 410 query proteins. It is somewhat harder for the sequence methods, but the fold recognition performance of VOROLIGN on the test set is about the same as on the comprehensive CATHSCOP set. The maximum values in each column, i.e. maximum fold recognition rate for each level, are indicated in bold.
Fig. 2.Model quality of GOTOH alignments. The figure shows the quality (measured with the TM-score) for the models build from GOTOH alignments for the family-level recognition test. The y-axis shows the number of targets having a selected model with TM-score larger than the value on the x-axis. Due to the construction of the test set the rates for the best model of GOTOH are rather low, but as shown by the ‘best(GOTOH)’ rates, there are a large number of high-quality models which could be predicted via perfect rescoring of the GOTOH models. In fact, ‘VORESCORE’ can improve the quality of the selected models significantly. Of course, much better models beyond the GOTOH alignments are possible [‘best(all models)’].
Rescore success rate for ROSETTA, PROQ and VORESCORE on two alignment methods GOTOH and HHALIGN
| Family | Superfamily | Fold | |
|---|---|---|---|
| Rescored model worse | |||
| ROSETTA | 21.95% (90) | 24.56% (83) | 14.92% (27) |
| PROQ | 9.02% (37) | 11.83% (40) | 6.08% (11) |
| VORESCORE | |||
| Rescored model better | |||
| ROSETTA | 4.88% (20) | 4.14% (14) | 6.63% (12) |
| PROQ | 47.80% (196) | 34.02% (115) | 35.91% (65) |
| VORESCORE | |||
| Rescored model worse | |||
| ROSETTA | 11.22% (46) | 17.46% (59) | 20.44% (37) |
| PROQ | 6.14% (25) | 7.25% (24) | 10.23% (18) |
| VORESCORE | |||
| Rescored model better | |||
| ROSETTA | 5.12% (21) | 17.75% (60) | 30.39% (55) |
| PROQ | 8.35% (34) | 40.91% (72) | |
| VORESCORE | 23.67% (80) |
If the alignment method predicts with high confidence, all three rescoring methods simply accept this prediction. Otherwise, the rescoring with ROSETTA, PROQ and VORESCORE is based only on the models predicted by the respective alignment (GOTOH and HHALIGN) method. We call the rescored model worse or better if the TM-score difference between the rescored model and the methods first model is smaller than −0.05 or larger than 0.05, respectively, and neutral otherwise. The net improvement of a method is given by the respective difference between the number of better and worse models. The best performance among the three methods ROSETTA, PROQ and VORESCORE for both GOTOH and HHALIGN alignments and the three levels Family, Superfamily and Fold is highlighted as bold.
Fig. 3.Comparative recognition performances for alignment and rescoring methods. The three figures show the recognition rates for the family (A) superfamily (B) and fold (C) levels for the relevant range of model qualities (TM-score >0.3 up to convergence). The relative performance is similar for all levels and all model quality ranges: the pairwise alignments are clearly outperformed by profile alignment and these by rescoring methods.
Rescore success rate of VORESCORE over GOTOH and HHALIGN on all models
| Model | Family | Superfamily | Fold |
|---|---|---|---|
| GOTOH | |||
| Worse | 1.46% (6) | 4.73% (16) | 1.66% (3) |
| Neutral | 17.80% (73) | 30.47% (103) | 31.49% (57) |
| Better | |||
| HHALIGN | |||
| Worse | 5.85% (24) | 5.62% (19) | 3.87% (7) |
| Neutral | 27.62% (50) | ||
| Better | 14.88% (61) | 38.76% (131) |
Rescoring is based on both GOTOH and HHALIGN predictions and, additionally, TM-align- and PPM-based models. We call the rescoring worse or better if the TM-score difference between the rescored model and the methods first model is smaller than -0.05 or larger than 0.05, respectively, and neutral otherwise. The largest of the respective three values is highlighted as bold.
Fig. 4.Theoretical model quality for the test set (fold level). The figure shows the number of models above a certain TM-score threshold for several methods as compared with the theoretical optimum. Here, structure alignment-based models are also available for the rescoring. The number for the best possible template-based model, the best of all predicted models, and the best VORESCORE models (rescoring of all models) are compared with the actual VORESCORE (on all predicted models) and HHALIGN performance.