| Literature DB >> 28460136 |
Ali H A Maghrabi1, Liam J McGuffin1.
Abstract
Methods that reliably estimate the likely similarity between the predicted and native structures of proteins have become essential for driving the acceptance and adoption of three-dimensional protein models by life scientists. ModFOLD6 is the latest version of our leading resource for Estimates of Model Accuracy (EMA), which uses a pioneering hybrid quasi-single model approach. The ModFOLD6 server integrates scores from three pure-single model methods and three quasi-single model methods using a neural network to estimate local quality scores. Additionally, the server provides three options for producing global score estimates, depending on the requirements of the user: (i) ModFOLD6_rank, which is optimized for ranking/selection, (ii) ModFOLD6_cor, which is optimized for correlations of predicted and observed scores and (iii) ModFOLD6 global for balanced performance. The ModFOLD6 methods rank among the top few for EMA, according to independent blind testing by the CASP12 assessors. The ModFOLD6 server is also continuously automatically evaluated as part of the CAMEO project, where significant performance gains have been observed compared to our previous server and other publicly available servers. The ModFOLD6 server is freely available at: http://www.reading.ac.uk/bioinf/ModFOLD/.Entities:
Mesh:
Year: 2017 PMID: 28460136 PMCID: PMC5570241 DOI: 10.1093/nar/gkx332
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Flow of data for local quality assessment scoring in ModFOLD6. The target sequence and 3D model were evaluated with three pure-single model scoring methods (Secondary Structure Agreement (SSA), Contact Distance Agreement (CDA) and ProQ2) and three quasi-single model methods (Disorder B-factor Agreement (DBA), ModFOLD5_single (MF5s) and ModFOLDclustQ_single (MFcQs)). The new methods developed for ModFOLD6 are highlighted in green. The per-residue scores from all six methods were combined into a single residue score using an artificial neural network (see Supplementary Figure S1).
Figure 2.ModFOLD6 server results for models submitted to CASP12 generated for target T0859 (PDB ID: 5jzr). (A) An example of the graphical output from the server showing the main results page with a summary of the results from each method (truncated here to fit page). Clicking on the thumbnail images in the main table allows results to be visualized in more detail. (B) A histogram of the local or per-residue errors for the top ranked model, with the residue number on the x-axis and the predicted residue error (distance of the Cα atom from the native structure in Å) on the y-axis, which may be downloaded. (C) Interactive views of models, which can be manipulated in 3D using the JSmol/HTML5 framework and/or downloaded for local viewing.
Independent benchmarking of local scoring with CAMEO using 6 months of common data comparing five publicly available published methods (177 025 common residues, 725 common models, 113 650 high quality residues, 63 375 low quality residues)
| Method | AUC | StdErr | AUC 0–0.1 | AUC 0–0.1 rescaled |
|---|---|---|---|---|
| ModFOLD6 (server18) |
| 0.00096 |
|
|
| ModFOLD4 (server7) | 0.8638 | 0.00099 | 0.0467 | 0.4669 |
| ProQ2 (server 8) | 0.8374 | 0.00107 | 0.0428 | 0.4283 |
| Verify3d (server0) | 0.7020 | 0.00134 | 0.0208 | 0.2081 |
| Dfire v1.1 (server1) | 0.6606 | 0.00138 | 0.0168 | 0.1675 |
Twenty-six weeks of data between 29 April 2016 and 21 October 2016 downloaded from http://www.cameo3d.org/. AUC = Area Under the ROC Curve. StdErr = Standard Error in AUC score. AUC 0-0.1 = Area Under the ROC curve with False Positive Rate ≤ 0.1. The table is sorted by the AUC score. See also Supplementary Tables S1–5 for independent local score benchmarks.
Independent benchmarking of global scoring with official CASP12 data
| GDT_TS | LDDT | CAD(AA) | SG | |||
|---|---|---|---|---|---|---|
| Rank | Gr.Name | Gr.Model | AUC | AUC | AUC | AUC |
| 1 |
| QA072_1 | 0.993 |
|
|
|
| 2 |
| QA360_1 |
| 0.988 | 0.885 | 0.949 |
| 3 |
| QA201_1 | 0.994 | 0.988 | 0.878 | 0.944 |
| 4 | qSVMQA | QA120_1 | 0.982 | 0.983 | 0.862 | 0.937 |
| 5 | ProQ3 | QA213_1 | 0.985 | 0.978 | 0.892 | 0.916 |
| 6 | ProQ3_1_diso | QA095_1 | 0.982 | 0.978 | 0.891 | 0.922 |
| 7 | ProQ3_1 | QA302_1 | 0.981 | 0.977 | 0.889 | 0.917 |
| 8 | ProQ2 | QA203_1 | 0.944 | 0.971 | 0.921 | 0.932 |
| 9 | MUfoldQA_S | QA334_1 | 0.977 | 0.968 | 0.898 | 0.913 |
| 10 | MULTICOM-CLUSTER | QA287_1 | 0.956 | 0.968 | 0.893 | 0.921 |
The ability of methods to separate good models (accuracy score ≥ 50) from bad (<50) according to GDT_TS, LDDT, CAD and SG scores is evaluated using the Areas Under the Curve (AUC) (see http://predictioncenter.org/casp12/doc/presentations/CASP12_QA_AK.pdf). Only the top 10 methods are shown and the table is sorted using LDDT scores. The scores are calculated over all models for all targets (QA stage 1–select 20). The table is sorted by the LDDT AUC score. Data are from http://predictioncenter.org/casp12/qa_aucmcc.cgi. See also Supplementary Tables S5–10.