| Literature DB >> 18523556 |
Wei Zhang1, Song Liu, Yaoqi Zhou.
Abstract
How to recognize the structural fold of a protein is one of the challenges in protein structure prediction. We have developed a series of single (non-consensus) methods (SPARKS, SP(2), SP(3), SP(4)) that are based on weighted matching of two to four sequence and structure-based profiles. There is a robust improvement of the accuracy and sensitivity of fold recognition as the number of matching profiles increases. Here, we introduce a new profile-profile comparison term based on real-value dihedral torsion angles. Together with updated real-value solvent accessibility profile and a new variable gap-penalty model based on fractional power of insertion/deletion profiles, the new method (SP(5)) leads to a robust improvement over previous SP method. There is a 2% absolute increase (5% relative improvement) in alignment accuracy over SP(4) based on two independent benchmarks. Moreover, SP(5) makes 7% absolute increase (22% relative improvement) in success rate of recognizing correct structural folds, and 32% relative improvement in model accuracy of models within the same fold in Lindahl benchmark. In addition, modeling accuracy of top-1 ranked models is improved by 12% over SP(4) for the difficult targets in CASP 7 test set. These results highlight the importance of harnessing predicted structural properties in challenging remote-homolog recognition. The SP(5) server is available at http://sparks.informatics.iupui.edu.Entities:
Mesh:
Year: 2008 PMID: 18523556 PMCID: PMC2391293 DOI: 10.1371/journal.pone.0002325
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The alignment accuracies for Prosup and SALIGN benchmark.
| SP3 | SP4 | SP5 | |
| Prosup | 65.3±0.22% | 66.8±0.20% | 68.7±0.20% |
| SALIGN | 56.3±0.14% | 57.3±0.13% | 59.7±0.15% |
One-to-one match given by the method and Prosup.
One-to-one match given by the method and TMalign.
Mean value and the standard error (estimated by bootstrap simulation on 10,000 re-sampling of the data set).
The success rate for recognizing proteins within the same family, superfamily, or fold in the Lindahl benchmark.
| Methods | Family only (%) | Superfamily only (%) | Fold only (%) | |||
| Top 1 | Top 5 | Top 1 | Top 5 | Top 1 | Top 5 | |
| PSI-BLAST | 62.4 | 67.6 | 16.0 | 25.8 | 2.2 | 9.8 |
| SPARKS | 81.6 | 88.1 | 52.5 | 69.1 | 24.3 | 47.7 |
| HHpred | 82.9 | 87.1 | 58.8 | 70.0 | 25.2 | 39.4 |
| FOLDpro | 85.0 | 89.9 | 55.5 | 70.0 | 26.5 | 48.3 |
| SP3,
| 81.6±0.07 | 86.8±0.06 | 55.3±0.11 | 67.7±0.11 | 28.7±0.14 | 47.4±0.16 |
| SP4,
| 80.9±0.07 | 86.3±0.06 | 57.8±0.11 | 68.9±0.11 | 30.8±0.15 | 53.6±0.15 |
| SP5,
| 82.4±0.07 | 87.6±0.06 | 59.8±0.11 | 70.0±0.11 | 37.9±0.15 | 58.7±0.16 |
| SP5,
| 81.6 | 87.0 | 59.9 | 70.2 | 37.4 | 58.6 |
The percentage in each cell is the fraction of correctly recognized match of proteins in the same fold, super family, and family as first rank or within top 5 rank of the template .
From Ref. [10].
From Ref. [48].
From Ref. [11].
From Ref. [12].
This work.
This work (The 43 proteins with >30% sequence similarity to PREFAB training set are removed).
The standard error was estimated by bootstrap simulation on 10,000 re-sampling of the data set.
The model quality of top-1 ranked models in Lindahl benchmark per protein.
| Total | Family | Superfamily | Fold | |
| SP3 | 0.358 (±0.03%) | 0.529 (±0.05%) | 0.232 (±0.05%) | 0.107 (±0.05%) |
| SP4 | 0.361 (±0.03%) | 0.532 (±0.05%) | 0.251 (±0.05%) | 0.116 (±0.05%) |
| SP5 | 0.374 (±0.03%) | 0.538 (±0.05%) | 0.257 (±0.05%) | 0.153 (±0.06%) |
All 976 proteins.
Family only.
Superfamily only.
Fold only.
The mean MaxSub score and the standard error (estimated by bootstrap simulation on 10,000 re-sampling of the data set) for the first-ranked models.
The model quality of top-1 ranked models for CASP7 test set.
| Full | ALL | TBM | FM | |
| SP3 | 0.364 (±0.20%) | 0.375 (±0.17%) | 0.408 (±0.19%) | 0.152 (±0.37%) |
| SP4 | 0.373 (±0.20%) | 0.387 (±0.17%) | 0.420 (±0.19%) | 0.153 (±0.32%) |
| SP5 | 0.383 (±0.21%) | 0.397 (±0.17%) | 0.431 (±0.18%) | 0.171 (±0.38%) |
95 full chain targets.
All 124 domains (There are 4 targets belonging to both TBM and FM categories).
109 Template-based Modeling domains.
19 Free Modeling domains.
The mean Maxsub score and the standard error (estimated by bootstrap simulation on 10,000 re-sampling of the data set) for top 1 model.