| Literature DB >> 28545390 |
Abstract
BACKGROUND: Protein structure prediction has achieved a lot of progress during the last few decades and a greater number of models for a certain sequence can be predicted. Consequently, assessing the qualities of predicted protein models in perspective is one of the key components of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, which could be roughly divided into three categories: single methods, quasi-single methods and clustering (or consensus) methods. Although these methods achieve much success at different levels, accurate protein model quality assessment is still an open problem.Entities:
Keywords: Learning-to-rank; Protein model quality assessment; Protein structure prediction
Mesh:
Substances:
Year: 2017 PMID: 28545390 PMCID: PMC5445322 DOI: 10.1186/s12859-017-1691-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The overall flowchart of the proposed MQAPRank
The performances of the MQAPRank and several leading methods on CASP12 dataset based on GDT_TS score
| Method | Method Type | Best 150a | Sel20b | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Diffc↓ | MCCd↑ | AUCe↑ | Lossf↓ | Diff↓ | MCC↑ | AUC↑ | Loss↓ | ||
|
| quasi-clustering |
|
|
|
| 5.76 | 0.41 | 0.93 | 7.18 |
| MUfoldQA_C | clustering | 5.51 | 0.84 |
| 7.46 | 3.82 | 0.15 | 0.96 |
|
| Davis-consensus | clustering | 6.78 | 0.83 |
| 7.68 | 5.61 | 0.00 | 0.78 | 15.56 |
| ModFOLD6_cor | quasi-single | 6.75 | 0.86 |
| 10.55 | 6.70 |
|
| 1.28 |
| MUfoldQA_S | single | 8.90 | 0.71 | 0.93 | 13.15 |
| 0.76 | 0.98 | 2.56 |
aBest 150: the dataset comprised of the best 150 models submitted on a target according to the benchmark consensus method. bSelect 20: the dataset comprised of 20 models spanning the whole range of server model difficulty on each target. cDiff: The average difference between the predicted and GDT_TS scores. dMCC: Matthews correlation coefficient (the threshold is 50 GDT_TS). eAUC: The area under the ROC curve. fLoss: The loss in quality between the best available model and the predicted best model. Bold value indicates highest performance
Fig. 2Comparison of the performance on Diff metric between the MQAPRank and other methods. a MUfoldQA_C. b Davis-consensus. c ModFOLD6_cor. d MUfoldQA_S. (Line x = y is shown for reference. Due to smaller Diff value indicates better performance, the method with less scatter points is better in this figure.)
The GDT_TS scores and predicted scores from different methods for the first 15 decoy models of target T0912 on best 150 dataset
| Decoy model | GDT_TS | MQAPRank | MUfoldQA_C | Davis-consensus | ModFOLD6_cor | MUfoldQA_S |
|---|---|---|---|---|---|---|
| T0912TS005_1 |
|
| 32.58 | 25.02 | 32.24 | 34.87 |
| T0912TS220_1 |
|
| 34.96 | 26.41 | 34.37 | 36.12 |
| T0912TS005_3 |
|
| 33.20 | 25.49 | 32.75 | 35.14 |
| T0912TS005_4 |
|
| 31.95 | 24.55 | 32.00 | 34.32 |
| T0912TS479_1 |
| 43.79 |
|
|
|
|
| T0912TS005_5 | 43.85 | 47.40 | 33.32 | 25.56 | 33.01 | 35.65 |
| T0912TS005_2 | 43.77 |
| 32.24 | 24.88 | 32.63 | 34.62 |
| T0912TS479_4 | 42.47 | 42.06 |
|
|
|
|
| T0912TS183_4 | 41.31 | 41.76 |
|
|
|
|
| T0912TS287_1 | 40.79 | 41.63 | 37.45 |
|
| 38.94 |
| T0912TS357_2 | 40.63 | 40.36 |
| 28.28 | 33.78 |
|
| T0912TS236_1 | 40.63 | 41.52 | 37.47 |
|
| 38.99 |
| T0912TS220_2 | 40.38 | 40.31 | 32.43 | 24.82 | 33.17 | 34.19 |
| T0912TS357_3 | 40.10 | 39.83 | 37.91 | 28.20 | 33.79 |
|
| T0912TS357_1 | 39.94 | 39.93 |
| 28.19 | 33.59 | 39.05 |
Bold value indicates the first five decoy models withhighest GDT_TS score
The performances of the MQAPRank and several leading methods on CASP11 dataset based on GDT_TS score
| Method | Method Type | Best 150 | Sel20 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Diff | MCC | AUC | Loss | mPCCa | PCCb | Diff | MCC | AUC | Loss | mPCC | PCC | ||
|
| quasi-clustering |
|
|
|
|
|
|
|
| 0.97 | 9.55 | 0.77 | 0.91 |
| MULTICOM-REFINE | clustering | 6.06 | 0.87 |
| 7.62 | 0.68 | 0.94 | 7.99 | 0.61 |
| 5.20 | 0.90 | 0.92 |
| DAVIS-QAconsensus | clustering | 6.17 | 0.87 | 0.98 | 7.74 | 0.68 | 0.94 | 7.33 | 0.62 |
| 5.51 | 0.90 |
|
| Pcons-net | clustering | 7.50 | 0.81 | 0.98 | 5.28 | 0.71 | 0.94 | 9.08 | 0.57 |
|
| 0.91 | 0.93 |
| MULTICOM-CLUSTER | single | 13.2 | 0.66 | 0.91 | 7.06 | 0.43 | 0.79 | 12.4 | 0.62 | 0.92 | 9.47 | 0.71 | 0.82 |
| MQAPsingleA | quasi-single | 13.8 | 0.60 | 0.90 | 8.95 | 0.65 | 0.75 | 9.66 | 0.68 | 0.95 | 3.64 |
| 0.88 |
amPCC: mean Pearson’s correlation coefficient between the predicted and GDT_TS scores of per target protein
bPCC: Pearson’s correlation coefficient between the predicted and GDT_TS scores on overall models. Bold value indicates highest performance on corresponding evaluation metric
The performances of the MQAPRank on 3DRobot dataset based on GDT_TS score
| Method | Method Type | Diff | MCC | AUC | Loss | mPCC | PCC |
|---|---|---|---|---|---|---|---|
|
| quasi-clustering |
|
|
|
|
|
|
| RFMQA | single | 9.73 | 0.74 | 0.96 | 1.70 | 0.92 | 0.87 |
| ModFOLDclust2 | clustering | 11.42 | 0.80 |
| 7.51 | 0.95 | 0.90 |
| Pcons | clustering | 25.12 | 0.17 |
| 5.19 | 0.96 | 0.90 |
Bold value indicates highest performance on correspondingevaluation metric