| Literature DB >> 34900138 |
Wenbo Wang1, Junlin Wang1, Zhaoyu Li1, Dong Xu1,2, Yi Shang1.
Abstract
Protein tertiary structure prediction is an active research area and has attracted significant attention recently due to the success of AlphaFold from DeepMind. Methods capable of accurately evaluating the quality of predicted models are of great importance. In the past, although many model quality assessment (QA) methods have been developed, their accuracies are not consistently high across different QA performance metrics for diverse target proteins. In this paper, we propose MUfoldQA_G, a new multi-model QA method that aims at simultaneously optimizing Pearson correlation and average GDT-TS difference, two commonly used QA performance metrics. This method is based on two new algorithms MUfoldQA_Gp and MUfoldQA_Gr. MUfoldQA_Gp uses a new technique to combine information from protein templates and reference protein models to maximize the Pearson correlation QA metric. MUfoldQA_Gr employs a new machine learning technique that resamples training data and retrains adaptively to learn a consensus model that is better than naïve consensus while minimizing average GDT-TS difference. MUfoldQA_G uses a new method to combine the results of MUfoldQA_Gr and MUfoldQA_Gp so that the final QA prediction results achieve low average GDT-TS difference that is close to the results from MUfoldQA_Gr, while maintaining high Pearson correlation that is the same as the results from MUfoldQA_Gp. In CASP14 QA categories, MUfoldQA_G ranked No. 1 in Pearson correlation and No. 2 in average GDT-TS difference.Entities:
Keywords: Multi-model QA methods; Protein model quality assessment; Protein structure prediction
Year: 2021 PMID: 34900138 PMCID: PMC8636996 DOI: 10.1016/j.csbj.2021.11.021
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1An illustration of multi-criteria performance comparison of scores generated by three different QA methods.
Fig. 2MUfoldQA_Gr Time Consumption on CASP12 Targets on Intel(R) Xeon(R) Gold 6140 CPU, using MATLAB Linux R2019b.
Fig. 3An illustration of how the MUfoldQA_G process merges two sets of predictions. (A) Using smaller artificial data to intuitively show how the merging process works. (B) MUfoldQA_G transforms the results from MUfoldQA_Gp and MUfoldQA_Gr using real-word target T1019s1. The x-axis is the true GDT-TS value, and the y-axis is the predicted score. (B1) Results from MUfoldQA_Gp. (B2) Results from MUfoldQA_Gr. (B3) Results from MUfoldQA_G, which is calculated using the results from MUfoldQA_Gp and MUfoldQA_Gr.
Fig. 4Performance comparison between MUfoldQA_Gr and MUfoldQA_G in terms of average GDT-TS difference.
MUfoldQA_Gr pretraining cross-validation results measured in RMSE.
| Test set | Training error (RMSEx100) | Test error (RMSEx100) | ||
|---|---|---|---|---|
| Consensus | MUfoldQA_Gr Pre | Consensus | MUfoldQA_Gr Pre | |
| CASP5 | 9.84 | 15.22 | ||
| CASP6 | 10.92 | 9.54 | ||
| CASP7 | 10.90 | 7.49 | ||
| CASP8 | 10.37 | 11.95 | ||
| CASP9 | 10.80 | 9.79 | ||
| CASP10 | 10.74 | 9.87 | ||
| CASP11 | 10.76 | 8.28 | ||
| CASP12 | 10.70 | 8.98 | ||
Performance comparison between Naïve Consensus, MUfoldQA_Gr, MUfoldQA_Gp, and MUfoldQA_G on CASP12 dataset.
| Method | Average GDT-TS Difference | Pearson Correlation |
|---|---|---|
| Naïve Consensus | 0.06222 | 0.7899 |
| MUfoldQA_Gr | 0.04930 | 0.8183 |
| MUfoldQA_Gp | 0.05520 | 0.8401 |
| MUfoldQA_G | 0.04948 | 0.8401 |
Fig. 5Performance comparison between MUfoldQA_G and other top QA methods including DeepFold-Boom, ModFOLD6_cor, and Wallner.
Performance comparison between Naïve Consensus, MUfoldQA_Gr, MUfoldQA_Gp, and MUfoldQA_G on CASP13 dataset.
| Method | Average GDT-TS Difference | Pearson Correlation |
|---|---|---|
| Naïve Consensus | 0.07365 | 0.8792 |
| MUfoldQA_Gr | 0.05677 | 0.8818 |
| MUfoldQA_Gp | 0.05837 | 0.8938 |
| MUfoldQA_G | 0.05760 | 0.8938 |
Pearson correlation coefficient between predicted and observed in CASP14 averaged over all targets (top 20 groups).
| Ranking | Group No | Group Name | Pearson | Sample Size |
|---|---|---|---|---|
| 1 | QA446 | MUfoldQA_G | 0.7460 | 67 |
| 2 | QA433 | DAVIS-EMAconsensus | 0.7426 | 67 |
| 3 | QA263 | DAVIS-EMAconsensusAL | 0.7392 | 67 |
| 4 | QA075 | MULTICOM-CLUSTER | 0.7313 | 67 |
| 5 | QA035 | ModFOLDclust2 | 0.7310 | 67 |
| 6 | QA214 | MESHI_consensus | 0.7279 | 66 |
| 7 | QA032 | MESHI | 0.7276 | 65 |
| 8 | QA216 | EMAP_CHAE | 0.7218 | 67 |
| 9 | QA149 | Bhattacharya-Server | 0.7046 | 67 |
| 10 | QA460 | Yang_TBM | 0.7029 | 67 |
| 11 | QA198 | MULTICOM-CONSTRUCT | 0.6962 | 67 |
| 12 | QA140 | Yang-Server | 0.6894 | 67 |
| 13 | QA187 | MULTICOM-HYBRID | 0.6851 | 67 |
| 14 | QA379 | Wallner | 0.6785 | 67 |
| 15 | QA409 | UOSHAN | 0.6652 | 67 |
| 16 | QA275 | MULTICOM-AI | 0.6557 | 67 |
| 17 | QA167 | ModFOLD8 | 0.6185 | 67 |
| 18 | QA209 | BAKER-ROSETTASERVER | 0.6107 | 67 |
| 19 | QA183 | tFold-CaT | 0.6009 | 67 |
| 20 | QA024 | DeepPotential | 0.5810 | 66 |
*Seder2020 and Seder2020hard only submitted 1 prediction, making it an unfair comparison when other groups submitted at least 65 predictions. As a result, we removed these two groups from the ranking.
GDT-TS differences between predicted and observed in CASP14, averaged over all targets (top 20 groups).
| Ranking | Group No | Group Name | AGD(x100) |
|---|---|---|---|
| 1 | QA433_2 | DAVIS-EMAconsensus | 6.737 |
| 2 | QA446_2 | MUfoldQA_G | 7.233 |
| 3 | QA214_2 | MESHI_consensus | 7.240 |
| 4 | QA032_2 | MESHI | 7.254 |
| 5 | QA035_2 | ModFOLDclust2 | 7.358 |
| 6 | QA216_2 | EMAP_CHAE | 7.396 |
| 7 | QA460_2 | Yang_TBM | 8.044 |
| 8 | QA409_2 | UOSHAN | 8.365 |
| 9 | QA140_2 | Yang-Server | 8.553 |
| 10 | QA075_2 | MULTICOM-CLUSTER | 8.886 |
| 11 | QA263_2 | DAVIS-EMAconsensusAL | 9.230 |
| 12 | QA198_2 | MULTICOM-CONSTRUCT | 9.240 |
| 13 | QA379_2 | Wallner | 9.993 |
| 14 | QA187_2 | MULTICOM-HYBRID | 10.573 |
| 15 | QA275_2 | MULTICOM-AI | 11.100 |
| 16 | QA257_2 | P3De | 12.020 |
| 17 | QA073_2 | RaptorX-QA | 12.060 |
| 18 | QA024_2 | DeepPotential | 12.239 |
| 19 | QA081_2 | MUFOLD | 12.557 |
| 20 | QA209_2 | BAKER-ROSETTASERVER | 12.682 |