| Literature DB >> 24410833 |
Predrag Kukic1, Claudio Mirabello, Giuseppe Tradigo, Ian Walsh, Pierangelo Veltri, Gianluca Pollastri.
Abstract
BACKGROUND: Protein inter-residue contact maps provide a translation and rotation invariant topological representation of a protein. They can be used as an intermediary step in protein structure predictions. However, the prediction of contact maps represents an unbalanced problem as far fewer examples of contacts than non-contacts exist in a protein structure.In this study we explore the possibility of completely eliminating the unbalanced nature of the contact map prediction problem by predicting real-value distances between residues. Predicting full inter-residue distance maps and applying them in protein structure predictions has been relatively unexplored in the past.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24410833 PMCID: PMC3893389 DOI: 10.1186/1471-2105-15-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Reconstruction of C -traces from native and non-native maps
| Binary | 4.38 (0.90, 14.98) | 0.72 (0.42, 0.96) | 0.77 (0.29, 0.97) |
| Binary ± 3Å | 4.05 (1.50, 12.44) | 0.64 (0.36, 0.82) | 0.74 (0.42, 0.90) |
| Binary ± 6Å | 4.26 (2.54, 9.78) | 0.53 (0.32, 0.67) | 0.64 (0.29, 0.78) |
| 4-Class | 1.04 (0.47, 6.90) | 0.94 (0.73, 1.00) | 0.95 (0.79, 0.98) |
| 4-Class ± 3Å | 1.41 (0.88, 6.80) | 0.85 (0.67, 0.93) | 0.90 (0.72, 0.96) |
| 4-Class ± 6Å | 2.25 (1.53, 4.08) | 0.70 (0.56, 0.81) | 0.81 (0.57, 0.88) |
| Distance | 0.48 (0.22, 0.87) | 0.99 (0.94, 1.00) | 0.99 (0.94, 0.998) |
| Distance ± 3Å | 0.96 (0.66, 1.46) | 0.92 (0.85, 0.98) | 0.94 (0.73, 0.99) |
| Distance ± 6Å | 1.62 (1.03, 4.20) | 0.81 (0.57, 0.88) | 0.87 (0.48, 0.96) |
The reconstruction of Cα-traces derived from binary contact maps, 4-class contact maps and distance maps. The native maps and the maps with a random error of 3Å and 6Å are used with the basic reconstruction protocol. Average RMSD [Å], GDT_TS [fraction] and TM-score, along with their range (min, max) are reported using the CASP7 targets.
Figure 1The workflow of the algorithm. The overall workflow of the protein structure prediction algorithm on the example of Protein-Tyrosine Phosphotase 1B (PDB ID: 2HNP). The first stage includes predictions of protein secondary structure, contact density and relative solvent accessibility, as well as finding and ranking appropriate templates. Using the structural features from the first step, the distance map is predicted in the second stage. In the last stage the actual 3D coordinates of all atoms and residues in the structure are reconstructed.
Performance of the distance map algorithm
| 5.7±3.4 | 6.5±3.5 | 4.4±3.2 | 3.1±2.1 | 2.6±1.6 | 2.4±1.5 | 2.4±1.3 | 2.3±1.3 | 2.5±1.4 | 2.6±1.9 | 3.70±2.9 | |
| | (7.1) | (7.1) | (4.6) | (3.3) | (3.1) | (2.5) | (2.5) | (2.3) | (2.4) | (2.8) | (4.52) |
| 5.5±2.5 | 6.3±2.8 | 5.9±2.5 | 5.8±2.6 | 5.6±1.9 | 5.7±2.0 | 5.5±1.9 | 5.6±2.0 | 6.0±2.9 | 6.1±3.2 | 5.85±2.6 | |
| | (7.0) | (6.9) | (6.6) | (6.5) | (6.6) | (6.3) | (6.3) | (6.5) | (6.7) | (6.8) | (6.75) |
| 5.5±2.6 | 6.3±2.8 | 5.9±2.6 | 5.8±2.5 | 5.6±1.9 | 5.7±2.1 | 5.5±1.9 | 5.7±2.0 | 6.0±3.0 | 6.1±3.1 | 5.85±2.6 | |
| Compl. | (7.1) | (6.9) | (6.7) | (6.5) | (6.6) | (6.3) | (6.2) | (6.4) | (6.7) | (6.7) | (6.75) |
| 5.6±2.6 | 6.3±2.5 | 5.9±2.4 | 5.8±2.4 | 5.6±1.7 | 5.6±1.6 | 5.7±2.1 | 5.6±1.9 | 6.2±3.0 | 6.3±3.4 | 5.90±2.6 | |
| Correl. | (7.1) | (7.0) | (6.7) | (6.3) | (6.6) | (6.3) | (6.6) | (6.6) | (6.8) | (6.9) | (6.81) |
RMSD [Å] of ab initio (AI) and template-based (TB) predictions of inter-residue distances as a function of sequence identity to the best template. RMSD is calculated for all residue pairs belonging to the particular protein and then averaged for all proteins in the data set. Values in the brackets are obtained by averaging the obtained RMSDs across the all residue pairs in the dataset.
Figure 2An example of the distance map prediction. An example of the template-based (left) and ab initio (right) distance map predicted for the protein with PDB ID: 3KHT (145 residues). The best template sequence identity to the query is 24.6%. Residue numbers are given on the axes, whereas the inter-residue distances [Å] are depicted by the colour scheme provided. Average RMSDs of the predicted template-based and ab initio maps are 2.86Å and 5.47Å respectively.
Figure 3Distance maps prediction vs. sequence separation. RMSD [Å] of the classical model predictions for residue pairs with sequence separation (a) between 6 and 11 residues (b) between 12 and 23 residues (c) of more than 23 residues. X-axis represents the sequence identity between the query and the best template.
Performance for non-template regions
| TB classical | 7.3 | 8.3 | 6.8 | 4.7 | 6.0 | 4.0 | 4.7 | 4.1 | 4.8 | 4.8 | 5.67 |
| AI classical | 7.1 | 7.5 | 7.9 | 7.7 | 8.8 | 8.4 | 7.1 | 8.5 | 9.5 | 9.2 | 7.46 |
RMSD [Å] of ab initio (AI) and template-based (TB) predictions of inter-residue distances for non-template regions of the distance map.
Performance for template-covered regions
| TB classic. | 6.7 | 6.0 | 3.9 | 3.0 | 2.5 | 1.9 | 1.8 | 1.9 | 1.9 | 1.9 | 3.7 |
| Baseline | 8.8 (7.3) | 9.7 (7.2) | 5.2 (4.3) | 3.2 (2.9) | 2.7 (2.4) | 1.9 (1.8) | 1.6 (1.7) | 1.9 (1.8) | 1.8 (1.8) | 1.8 (1.8) | 5.1 (4.1) |
RMSD [Å] of template-based (TB) predictions of inter-residue distances for template-covered regions of the distance map. Baseline is a predictor that copies the distances from the best hit template or the weighted templates (given in brackets).
Reconstruction of CASP 9 targets
| 4-Class (template) | 0.61 (0.11, 0.97) | 0.66 (0.18, 0.98) |
| 4-Class ( | 0.22 (0.09, 0.43) | 0.23 (0.12, 0.31) |
| Distance (template) | 0.54 (0.11, 0.91) | 0.62 (0.15, 0.91) |
| Distance ( | 0.22 (0.07, 0.44) | 0.24 (0.12, 0.43) |
The reconstruction of CASP 9 targets using predicted 4-class contact maps and distance maps. Average GDT_TS [fraction] and TM-score, along with their range (min, max) are reported.
Figure 4Correlation between the quality of the predicted distance maps and the quality of the reconstructed structures exemplified on the CASP9 targets. The x-axis depicts the RMSD [Å] between the predicted and native distance maps of the CASP9 targets. The y-axis depicts the GDT_TS [%] score between the reconstructed and native CASP9 targets. The correlation coefficient between the RMSD and GDT_TS values is 0.78. Ab initio maps are given in red, whereas template-based maps are given in black. Proteins with different secondary structure content are shown separately.