| Literature DB >> 18823532 |
Marcin Pawlowski1, Michal J Gajda, Ryszard Matlak, Janusz M Bujnicki.
Abstract
BACKGROUND: Computational models of protein structure are usually inaccurate and exhibit significant deviations from the true structure. The utility of models depends on the degree of these deviations. A number of predictive methods have been developed to discriminate between the globally incorrect and approximately correct models. However, only a few methods predict correctness of different parts of computational models. Several Model Quality Assessment Programs (MQAPs) have been developed to detect local inaccuracies in unrefined crystallographic models, but it is not known if they are useful for computational models, which usually exhibit different and much more severe errors.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18823532 PMCID: PMC2573893 DOI: 10.1186/1471-2105-9-403
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The distribution of GDT_TS scores indicating (dis)similarity between the native structures of CASP targets in the CM and FR(H) categories and the corresponding models: original ones (the CASP5&6 all) and their idealized versions (the CASP5&6+ all).
Figure 2Absolute value of correlation coefficient of MQAP scores and local residue features with the deviation of residues in the models (all CASP5&6+ all) compared with the native structures. The dendrogram on the right hand side presents the results of cluster analysis. The linkage between parameters corresponds to the value of (1 – |Spearman's rank correlation coefficient|). On this figure we compare only the primary MQAPs, therefore we do not show results for MetaMQAP that was developed based on these results. The detailed study of MetaMQAP's performance is presented on Figure 3.
Figure 3Benchmark of MQAPs to compare the predictive power of MetaMQAP with other MQAPs (CASP5&6+ test). Black bars represents absolute values of correlation coefficient of MQAP scores with the deviation of residues in the models compared with the native structures. Blue bars present absolute values of partial correlation, where the following parameters were used as controlling variables: global model quality (GDT_TS) and residue depth in the structure (ResDepth and BuriedArea). At the top of each column, the 95 % confidence interval of correlation is shown.
Description of scores returned by the primary MQAP methods as well as other local features analyzed in this work.
| VERIFY3D | 3D-1D profile score for a single residue |
| VERIFY3Dw5 | VERIFY3D score averaged over a 5 residue window |
| PROSApair | pair energy (atom-atom interactions) |
| PROSAsurf | surface energy (atom-solvent interactions) |
| PROSA | combination of PROSApair and PROSAsurf |
| ANOLEA | distance – dependent empirical potential. It evaluates the non-local environment (NLE) of each heavy atom in the model |
| ANOLEAw5 | ANOLEA score averaged over a 5 residue window |
| PROVE | average relative volume for all atoms of a residue |
| BALA | mean of a four-body statistical potential, applied to tetrahedral quadruplets or spatially neighbouring residues |
| REFINERloc | pseudoenergy of local contacts |
| REFINERnonloc | pseudoenergy of long-distance contacts |
| REFINERhydro | pseudoenergy of H-H bond interaction |
| REFINERbur | pseudoenergy of burial |
| REFINER | weighted sum of all REFINER pseudoenergies |
| TUNE | score based on neural network that predict local quality of residue from both a local and non local contact of residues in the mode |
| FractionPolar | fraction of non-polar residues in area of given residue (ENVIRONMENT) |
| BuriedArea | burial of the residue (ENVIRONMENT) |
| LocalNeighbours | number of residues within the distance of 10 Å in space and within 8 residues in the sequence |
| NonLocalNeighbours | number of residues within the distance of 10 Å in space and more distant than 8 residues in the sequence. |
| ResDepth | the distance between the C-α atom of a residue and the closest geometrically plausible position of a water molecule on the surface of the protein |
| PROQ | It is a neural network based predictor that based on a number of structural features predicts the quality of a protein model. ProQ is optimized to find correct models in contrast to other methods which are optimized to find native structures |
| PROQRES | score based on neural network which estimate local structure from: atom-atom contacts, residue-residue contacts, secondary structure context, and solvent accessibility |
| AbsAccessibility | absolute value of solvent accessibility for all atom off a residue (according to NACCESS) |
| RelAccessibility | proportion of absolute solvent accessibility of a given residue to the solvent accessibility of the same type of residue (X) in a model tripeptyde Ala-X-Ala (according to NACCESS) |
| LoopProb | probability of a loop conformation in secondary structure predicted by PSIPRED |
| HelixProb | probability of a helical conformation in secondary structure predicted by PSIPRED |
| StrandProb | probability of an extended conformation in secondary structure predicted by PSIPRED |
| SSAgreement | agreement between secondary structure predicted by PSIPRED and secondary structure observed in the model (calculated by DSSP) |
Local (per residue) deviation for the best and worst residues according to different MQAPs.
| 10 % highest quality residues | 10 % lowest quality residues | ||||
| Method | average | std. deviation | average | std. deviation | Area under the ROC curve |
| TRUE DEVIATION | 0.44 | 0.25 | 31.3 | 13.24 | 1.000 |
| VERIFY3D | 2.36 | 4.22 | 12.23 | 14.98 | 0.699 |
| PROSApair | 1.83 | 3.47 | 11.27 | 11.86 | 0.751 |
| PROSA | 1.71 | 2.71 | 12.98 | 13.77 | 0.752 |
| ANOLEA | 2.12 | 3.89 | 9.17 | 11.04 | 0.685 |
| BALA | 1.50 | 2.08 | 15.19 | 16.71 | 0.767 |
| REFINER | 1.85 | 3.71 | 10.55 | 11.83 | 0.732 |
| PROQRES | 1.42 | 2.08 | 16.45 | 15.99 | 0.814 |
| MetaMQAP | 1.13 | 0.82 | 20.39 | 16.76 | 0.875 |
The table presents average 95 % confidence interval of average and standard deviation for residue deviation. For a well-performing method, the average of 10 % highest quality residues should be low and for the 10 % lowest quality residues it should be high. In general, the bigger the interval between average residue deviation for best and worst quality residues, the more accurate a method (CASP5&6+ test). In addition the area under the ROC curve (3Å cutoff) is shown
Figure 4Absolute value of Spearman's rank correlation between deviation of variants for each residue and their MQAP scores (calculated for the CASP5&6+ test). The results are showed for all residues as well as classes of residues whose variants in our dataset exhibit mean deviations less than 2 Å, between 2–4 Å, between 4–8 Å and at least 8 Å. At the top of each column the 95 % confidence interval of correlation is shown.
Figure 5Correlation between the global score (prediction) and the true model quality (GDT_TS). Panel A – Pearson's correlation coefficient between global model accuracy (expressed as a GDT_TS score) and predicted global score (CASP7 server models). Panel B – the mean Spearman's rank between global model accuracy (GDT_TS) and predicted global score of model variants (CASP7 server models). Hatched bars – results for models evaluated by all 7 MQAPs considered here. Black bars results for all CASP7 server models. At the top of each column the 95 % confidence interval of correlation is shown.
Figure 6Correlation coefficient between the MetaMQAP global score and the model GDT_TS for each of single domain CASP7 targets.
Figure 7The ranking abilities of MetaMQAP compared to the best MQAP methods in CASP7 (QA_556 and QA_634, QA_713, QA_704) and ProQ, represented as the GDT_TS score of a model with the highest MQAPs ranking vs. the truly best server model for each target. Most significant mispredictions made by MetaMQAP are emphasized with red frames. This evaluation was performed on a set of all CASP7 server models.
MQAPs ability to detect most accurate model in set of alternative models (analysis performed on a set of all CASP7 server models).
| Percentile of GDT_TS | ||||||||
| Method | average GDT_TS of top-ranked models | 1 | 5 | 25 | 50 (median) | 75 | 95 | 99 |
| QA_634 | 57.74 | 8.85 | 18.1 | 35.38 | 57.32 | 73.63 | 93.70 | 96.85 |
| QA_556 | 64.03 | 13.11 | 17.66 | 46.60 | 68.24 | 81.01 | 94.71 | 99.14 |
| QA_704 | 62.23 | 16.65 | 22.31 | 40.90 | 62.97 | 78.56 | 93.34 | 97.10 |
| QA_713 | 64.67 | 15.34 | 21.65 | 41.78 | 68.17 | 81.58 | 95.15 | 97.07 |
| MetaMQAP | 60.78 | 14.93 | 20.39 | 37.75 | 64.13 | 81.11 | 95.64 | 98.18 |
| ProQ | 59.53 | 14.78 | 19.09 | 32.67 | 62.56 | 76.32 | 94.63 | 97.28 |
| Best server model | 70.93 | 23.20 | 30.98 | 54.63 | 72.05 | 85.47 | 96.13 | 99.20 |
The table shows the average GDT_TS score calculated for each top-ranked (according to different MQAPs) model for all targets, as well as cumulative scores for different percentiles of top-ranked models.
Figure 8Correlation between local MQAPs scores and local model quality. Panel A – Pearson's correlation between predicted and observed residue deviation, calculated for CASP7 server models. Hatched bars – correlation for models evaluated by all 3 MQAPs (MetaMQAP, QA_634, and QA_692). Black bars – correlation for all CASP7 server models. In addition we also present the correlation calculated for our CASP7 predictions – QA_038. In CASP7 experiment, QA_038 submitted scores only for a fraction of models (48% of all scorable residues). Panel B – Pearson's correlation between predicted and observed residue deviation as a function of residue difficulty (calculated for CASP7 server models, only for single domain targets). Here we only consider residues scored by all 3 methods (MetaMQAP, QA_634, and QA_692).
Figure 9Visual identification of potential errors in protein models using 'coloring' by MetaMQAP. The spectrum of colors from blue to red indicates the spectrum of residues predicted to be correct to incorrect. A) The crystal structure of the N-terminal GIY-YIG endonuclease domain of UvrC from Thermotoga maritima (PDB code 1ycz). B) A comparative model of the same protein based on an ideal alignment to a closely related structure of UvrC from Bacillus caldotenax (PDB code 1yd6). C) & D) Models with local 1 aa alignment shifts indicated by a white ellipse and predicted deviation from the native structure indicated by the shift of the color spectrum from blue towards yellow and red.
Published analyses describing the use of MetaMQAP prior to publication of this article.
| MnmC | Bifunctional tRNA methyltransferase and oxidoreductase | [ |
| R.Eco124I | Nuclease/ATPase subunit of Type I restriction-modification system | [ |
| Bud23 | RNA methyltransferase | [ |
| Mom | DNA modification enzyme | [ |
| Sgm | RNA methyltransferase | [ |
| MiaA, MiaB, MiaE | Enzymes involved in the ms2io6A biosynthesis pathway: a P-loop NTPase, a Radical SAM enzyme, and a diiron carboxylate oxidase | [ |
| M.EcoRII | DNA methyltransferase | [ |
| R.MvaI | Restriction endonuclease | [ |
| I-Ssp6803I | Homing endonuclease | [ |
| R.HphI | Restriction endonuclease | [ |