| Literature DB >> 18694501 |
Dorota Latek1, Andrzej Kolinski.
Abstract
BACKGROUND: Several different methods for contact prediction succeeded within the Sixth Critical Assessment of Techniques for Protein Structure Prediction (CASP6). The most relevant were non-local contact predictions for targets from the most difficult categories: fold recognition-analogy and new fold. Such contacts could provide valuable structural information in case a template structure cannot be found in the PDB.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18694501 PMCID: PMC2527566 DOI: 10.1186/1472-6807-8-36
Source DB: PubMed Journal: BMC Struct Biol ISSN: 1472-6807
Description of the selected nine CASP6 contact predictors.
| Contact predictors | Method | Input data | Accuracy [%] | Coverage [%] |
| Baker | Neural network | Contact predictions from 24 servers, predicted by JUFO secondary structure, amino acid properties, PSI-BLAST generated PSSM matrix, length of a protein sequence | 25.5 | 3.7 |
| PROFcon | Feed-forward neural network with back-propagation | evolutionary profiles obtained using PSI-BLAST, predicted secondary structure and solvent accessibility, sequence conservation, biophysical features and "complexity" of residues | 24.2 | 3.6 |
| Baldi-group-server | RNN – Recursive neural network | PSI-BLAST generated sequence profiles, correlated mutations, predicted secondary structure, solvent accessibility | 21.9 | 2.9 |
| GPCPRED | Genetic programming with self-organizing maps | PSI-BLAST generated sequence profiles, sequence separation | 17.4 | 2.7 |
| Karypis | Support Vector Machines | Sequence profiles, correlated mutations from multiple sequence alignment analysis, sequence conservation, sequence separation, predicted secondary structure | 11.0 | 1.5 |
| KIAS | CMA analysis | Multiple sequence alignment, hydrophobic packing of residues (data obtained from sequence conservation and hydrophobicity) | 11.0 | 1.7 |
| SAM-T04 | Neural network | Alignments, predicted secondary structure and propensities of residues in contact | 9.6 | 1.43 |
| Hamilton-Huber-Torda | Feed-forward neural network | Mutational correlations from multiple sequence alignments, biophysical class of contacting pair of residues, predicted secondary structure, sequence separation, length of protein sequence | 9.1 | 1.3 |
| CORNET | Neural network | PSI-BLAST generated sequence profiles, correlated mutations and sequence conservation, sequence separation | 2.5 | 0.34 |
Mean accuracy and coverage was evaluated for all NF and FR-analogy targets. Here, we defined the accuracy and the coverage in the same fashion as in[4] for top N/5 predicted contacts with a sequence separation of 12: .
Results of the contact-based ranking of Kolinski-Bujnicki's models for NF and FR/A CASP6 targets.
| Set of contact data | Number of predicted contacts | Accuracy | Coverage | Accuracy | False contacts | ΔRMSD | Spearman corr. coeff. | Spearman corr. coeff. |
| N/2 top-scoring contacts from each of the best two predictors (a) | N | 19.58 | 12.91 | 49.37 | 19.59 | 1.069 | -0.300 | 0.329 |
| N/2 top-scoring contacts from each of the best three predictors(b) | 1.5 N | 17.21 | 15.22 | 46.90 | 21.13 | 1.453 | -0.321 | 0.275 |
| Consensus of the whole data from the best three predictors | N/2 | 23.94 | 9.12 | 53.17 | 17.93 | 1.393 | -0.325 | 0.344 |
| N | 18.90 | 14.44 | 49.17 | 20.35 | 1.387 | -0.333 | 0.340 | |
| 1.5 N | 15.53 | 17.83 | 46.60 | 22.15 | 1.453 | -0.329 | 0.288 | |
| Consensus of the whole data from the best five predictors(c) | N/2 | 23.78 | 8.97 | 52.47 | 17.67 | 1.272 | -0.196 | 0.360 |
| N | 19.78 | 15.07 | 51.01 | 20.04 | 1.498 | -0.350 | 0.348 | |
| 1.5 N | 16.64 | 18.91 | 47.70 | 22.11 | 1.443 | -0.338 | 0.400 | |
| Consensus of the whole data from all nine predictors | N/2 | 24.98 | 9.50 | 51.69 | 20.56 | 1.252 | -0.238 | 0.432 |
| N | 20.63 | 15.74 | 50.58 | 21.85 | 1.365 | -0.333 | 0.400 | |
| 1.5 N | 18.08 | 20.37 | 49.35 | 23.39 | 1.322 | -0.342 | 0.440 |
(a) Predictors: Baker and PROFcon
(b) Predictors: Baker, PROFcon, GPCPRED
(c) Predictors: Baker, PROFcon, GPCPRED, Karypis, SAM-T04
Spearman correlation coefficients together with the average difference of Cα RMSD of the best model and the first ranked model (ΔRMSD) were computed for each set of contact data. The average ΔRMSD of the Kolinski-Bujnicki group was 1.484 and the Spearman correlation coefficient: 0.213 (RMSD) and -0.138 (GDT-score). These values were improved for every set of contact data. Apart from the mean accuracy and the mean coverage of contact data, averaged over all targets, we also present the percentage of semi-accurate contacts (shifted with respect to the real by at most two residues) and totally false contacts (shifted by more than five residues).
Comparison of different approaches to contact-based modeling tested in this work.
| Set of contact data(a) | Number of predicted contacts | avg. Cα RMSD [Å] | ||||
| Contact-based ranking | De novo folding | Refinement | ||||
| First (c) | Best (d) | First (c) | Best (d) | |||
| N/2 top-scoring contacts from each of the best two predictors | N | 9.58 | 8.93 | 7.53 | 7.69 | 7.10 |
| N/2 top-scoring contacts from each of the best three predictors | 1.5 N | 9.96 | 8.69 | 8.02 | 8.23 | 7.36 |
| Consensus of the whole data from the best three predictors | N/2 | 9.82 | 8.15 | 7.14 | 8.28 | 7.17 |
| N | 9.80 | 8.82 | 7.35 | 8.11 | 7.30 | |
| 1.5 N | 9.81 | 9.21 | 8.03 | 8.68 | 7.73 | |
| Consensus of the whole data from the best five predictors | N/2 | 9.83 | 8.92 | 7.92 | 8.44 | 7.02 |
| N | 9.91 | 8.94 | 7.65 | 7.89 | 7.21 | |
| 1.5 N | 9.79 | 8.70 | 7.57 | 7.71 | 7.11 | |
| Consensus of the whole data from all nine predictors | N/2 | 9.70 | 9.26 | 7.98 | 8.06 | 6.44 |
| N | 9.77 | 8.79 | 7.52 | 7.63 | 6.41 | |
| 1.5 N | 9.64 | 8.42 | 7.07 | 8.02 | 6.99 | |
(a) The same as in Table 2.
(b) RMSD computed only for models ranked as first.
(c) RMSD component computed only for that model of each target which was likely to be selected as the first model (the most probable), because it was a centroid structure of the most populated cluster obtained in a simulation.
(d) RMSD component computed only for the best model of each target (a centroid structure of a cluster which was most similar to the native structure).
The performance of each method is presented as the Cα RMSD of the final protein model averaged over 14 targets selected, for the sake of shortening computation time, from NF and FR/A categories (i.e. T0198, T0199-3, T0209-1, T0212, T0215, T0230, T0235-2, T0239, T0248-1, T0262-1, T0272-1, T0272-2, T0280-2, T0281). The average Cα RMSD of the first models for this subset of 14 CASP6 targets for the Kolinski-Bujnicki group was 10.27 and we observed improvement of this value for every method and every contact data set. The lowest average RMSD was obtained in the case of the refinement simulations with contact restraints based on the data from the best two predictors.
Figure 1Comparison of the CASP6 results of the Kolinski-Bujnicki group with post-CASP contact-based modeling. Results of the contact-based ranking of Kolinski-Bujnicki's models from CASP6 is shown in (a). The scoring function was based on the contact data set from the best two predictors (Baker and PROFcon). The accuracy of the contact data used for scoring 5 models of each target is plotted against the RMSD of the model ranked as first in CASP6 by the Kolinski-Bujnicki group (green squares) and against the RMSD of the protein model ranked as first by the contact-based scoring function (red lines which join corresponding points). Results for NF and FR/A categories are presented separately. The most significant improvement is observed in the case of FR/A targets with the accuracy of the contact prediction range of 15–30%. In a similar way the results of the refinement simulations are presented in the right-hand panels (b). The refinement simulations performed better than the post-simulation ranking of the models.
Figure 2Contact-based scoring of NF and FR/A models submitted as first by all groups in CASP6. For each model, the final score computed using the scoring function, which was based on the contacts provided by the Baker group and PROFcon, was plotted as a function of GDT-TS. Although most of wrong or low quality models (with GDT-TS < 40) could be discarded by the contacts based scoring function, it seems inevitable to use some additional discriminating tools for assessing models with GDT-TS > 40.
Figure 3Contact maps presenting results of fold-refinement simulations. The contact maps of the selected CASP6 targets are presented in (a). In the upper triangle in each contact map real (red) and predicted (green) contacts are compared. In the bottom triangle a contact map of Kolinski-Bujnicki's first model (blue) is superposed on a contact map of the final model obtained after the refinement simulations (grey). In most cases we observed improvement of the contact maps for models after the refinement. Some accurate contacts were rebuilt by the CABS despite not being preliminarily predicted (T0209-2). Some falsely predicted contacts in diffused clusters were not observed in the final model (T0281). Predicted contacts in dense and numerous clusters were observed almost in all cases (A and A' in the T0272-1 contact map), contrary to diffuse sparse contact clusters. (b) Lower triangles, contact maps of the T0272-1 models obtained after the simulations with restraints based on the data sets (upper triangles) with either the A or B group of contacts modified. (1) Reduction of the influence of restraints based on the A group of contacts on the simulation with respect to the original contact data in (a) by diffusing these contacts. Intensification of the effect of B contact-based restraints by increasing the number of these contacts (2) and by increasing the scaling factor in the restraint potential corresponding to these contacts(3).
Figure 4Distinctive results of two simulation methods involving the predicted contacts: de novo folding and refinement. Results of de novo folding (a) are represented by the best model of the T0248-1 target superimposed on the native structure (RMSD = 2.3 Å) and by the contact map with depicted real and quite accurate and precisely predicted contacts (upper triangle) and contacts of the best model obtained in the folding simulation and the first model of the Kolinski-Bujnicki group (lower triangle). Significant improvement of model quality and its contact map with respect to the native is observed. Results of the refinement simulations (b) are represented by the best model of the T0215 target (green) superimposed on the native structure (blue) (RMSD = 5.5 Å) and by the contact map constructed in the same fashion as in (a). Despite the low quality of the contact data predicted for the T0215 target the quality of the final refined model improved (but not significantly) in comparison to the original Kolinski-Bujnicki results (RMSD = 7.9 Å from the crystallographic structure Cα-trace).