| Literature DB >> 18315844 |
Vijayalakshmi Chelliah1, William R Taylor.
Abstract
BACKGROUND: The prediction of protein structure can be facilitated by the use of constraints based on a knowledge of functional sites. Without this information it is still possible to predict which residues are likely to be part of a functional site and this information can be used to select model structures from a variety of alternatives that would correspond to a functional protein.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18315844 PMCID: PMC2259414 DOI: 10.1186/1471-2105-9-S1-S13
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Topology cartoons for the training-set and independent test-set proteins. Each protein is shown with β-strands represented as triangles and α-helices as circles. They are identified by their PDB code. Training-set proteins: 3chy (Chemotaxis Y protein), 1coz (Glycerol-3-phosphate cytidylyltransferase), 1di0 (Lumazine synthase), 2trx (Thioredoxin), 1f4p (Flavodoxin). Independent test-set proteins: 1v9w, 1rlj, 1kjn, 1vq1, 1uxo, 1t57, 1vk2. Inverted triangle denotes the strands in the opposite direction.
Figure 2Clustering of models. Clustering of the 200 decoy models. (a) 200 decoy models obtained from the de novo protein structure prediction method. (b) Classification of the 200 models based on their fold-types. (c) Clustering of models of same fold-type by pair-wise superposition using SAP [18]. Models with ≤2 Å RMSD and ≥60% PID were clustered together.
Example decoy fold distribution for 3chy. Number of fold-types, strand and helix order in the fold (HI() denotes the helix order in layer I, SII() the strand order in layer II and HIII() the helix order in layer III), Number of models, Number of clusters and scores of the best model in each fold-type is detailed in this table. In the second column '-' denotes the change in the direction of the secondary structure element when compared to the native. F1 and F7 are correct folds and are in bold type.
| Fold type | Strand and helix order | No. of models in each fold type in 200 Models | No. of cluster with ≤2 Å RMSD; ≥60% PID cut-off | Score of the best Model |
| F2 | HI(-1,5);SII(2,-1,3,4,5);HIII(2,3,4) | 3 | 2 | 202.21 |
| F3 | HI(1,5);SII(2,3,1,4,5);HIII(2,3,4) | 16 | 11 | 145.19 |
| F4 | HI(1,-3,-4);SII(2,1,-3,-4,-5);HIII(-2,-5) | 2 | 2 | 150.83 |
| F5 | HI(1,4);SII(2,3,1,4,-5);HIII(2,3,-5) | 1 | 1 | 108.62 |
| F6 | HI(1,3,5);SII(2,1,4,3,5);HIII(2,4) | 11 | 7 | 250.20 |
| F8 | HI(1,-5);SII(2,1,3,4,5);HIII(2,3,4) | 1 | 1 | 67.24 |
Figure 3Example "proximity plots" for 3chyand 2trx. The "proximity plots" for the best models of each fold-type for (a) Chemotaxis Y protein (3chy) and (b) Thioredoxin protein (2trx) are shown. The thick blue line indicates the native crystal structure in both 3chy and 2trx plots. For 3chy, F1 (thick green line) and F7 (thick red line) corresponds to the correct fold. For 2trx, F1 (thick green line) corresponds to the correct fold.
Figure 4"Summary plots" for five training-set proteins. The "summary plots" for each of the five training-set proteins (a) 3chy, (b) 1coz, (c) 2trx, (d) 1f4p and (e) 1di0 are shown. In each plot, the "Fold Score" is plotted against the measure of structural correspondence to the native protein. (Note, both these measures are plotted on reversed scales). The best models lie towards the lower left corner in each plot having a high score and high structural similarity to the native. The native structure itself is plotted as a large red dot and all folds that correspond to the native are also red with others blue. The different symbols designate different decoy fold families.
Correct and incorrect folds in top and bottom 25 ranked models. The correct folds in top 25 ranked models and the wrong folds in the bottom 25 ranked models for the five proteins are tabulated in order to show the strength of the method.
| PDB | Correct in top 25 ranked models(best) | Incorrect in bottom 25 ranked models (worst) |
| 22 (top 4) | 13 (low 4) | |
| 14 (top 4) | 22 (low 11) | |
| 24 (top 22) | 25 (low 25) | |
| 7 (2nd) | 23 (low 16) | |
| 18 (top 7) | 15 (low 7) |
Figure 5Specificity-Sensitivity plots for five training-set proteins. The Specificity-Sensitivity curves for each of the five training-set proteins (a) 3chy, (b) 1coz, (c) 2trx, (d) 1f4p and (e) 1di0 using "Fold Score" (red) are shown. Sensitivity = TP/(TP+FN) and Specificity = TN/(TN+FP). i.e. Specificity is defined as the fraction of significant hits (hits with scores above a threshold) being correct. Sensitivity is defined as the fraction of possible correct hits being significant. (TP = True Positives, TN = True Negatives, FP = False Positives, FN = False Negatives).
RMSD values for larger proteins. For each of the proteins in the larger set, the RMSD of the best model in the top four ranked fold-types is tabulated (along with the fold-type in parentheses). Where this corresponds to the native fold, the value is in bold type.
| PDB (length) | Rank-1 | Rank-2 | Rank-3 | Rank-4 |
| 13.4 (F23) | 11.3 (F24) | 6.9 (F26) | ||
| 13.7 (F10) | 11.2 (F1) | 13.7 (F8) | ||
| 5.0 (F3) | 5.0 (F4) | 9.4 (F6) | ||
| 8.5 (F12) | 9.5 (F3) | 7.1 (F5) | 7.9 (F1) | |
| 13.7 (F2) | 11.4 (F14) | 8.9 (F6) | 11.8 (F11) | |
| 14.8 (F8) | 9.8 (F10) | 14.9 (F1) | 9.9 (F3) | |
| 16.4 (F13) | 14.7 (F4) | 14.5 (F2) | 15.9 (F10) |