| Literature DB >> 32334508 |
Jianquan Ouyang1, Ningqiao Huang2, Yunqi Jiang3.
Abstract
BACKGROUND: Quality assessment of protein tertiary structure prediction models, in which structures of the best quality are selected from decoys, is a major challenge in protein structure prediction, and is crucial to determine a model's utility and potential applications. Estimating the quality of a single model predicts the model's quality based on the single model itself. In general, the Pearson correlation value of the quality assessment method increases in tandem with an increase in the quality of the model pool. However, there is no consensus regarding the best method to select a few good models from the poor quality model pool.Entities:
Keywords: Linear combination; Poor quality protein structural; Protein model quality assessment; Protein structure ranking
Mesh:
Substances:
Year: 2020 PMID: 32334508 PMCID: PMC7183596 DOI: 10.1186/s12859-020-3499-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison results of various QA methods from Stage 1
| QA methods | Corr. on stage 1 | Top 1 GDT-TS on stage 1 / Z-score sum | Best model GDT-TS on stage 1 / Z-score sum |
|---|---|---|---|
| Ours (Random Search) | 0.42 | 27.02/59.62 | 29.43/89.74 |
| Ours (Linear Regression) | 0.42 | 27.42/59.62 | 29.18/89.74 |
| Ours (Random Forest) | 0.32 | 22.66/14.27 | 26.65/58.09 |
| Ours (Multilayer Perceptron) | 0.37 | 25.24/45.25 | 29.15/85.34 |
| DOPE | 0.21 | 23.06/22.11 | 26.53/51.61 |
| GOAP | 0.21 | 23.48/23.79 | 27.09/63.87 |
| ProQ4 | 0.22 | 23.45/22.58 | 27.38/61.14 |
| ProQ3D | 0.22 | 23.88/33.54 | 26.87/64.30 |
| DeepQA | 0.21 | 22.65/14.88 | 26.07/51.27 |
Z-score is calculated from the GDT-TS of the selected model and the GDT-TS of all models of the target. Sum the Z scores of each target to get the Z-score sum
Performance of various QA methods measured by GDT-TS and TM-scores (CASP13 FM domains, poor quality dataset)
| QA methods | Corr. TM-score on stage2 | Corr. GDT-TS on stage 2 | Top 1 TM-score on stage 2 / Z-score sum | Top 1 GDT-TS on stage 2 / Z-score sum | Best model TM-score on stage 2 / Z-score sum | Best model GDT-TS on stage 2 / Z-score sum |
|---|---|---|---|---|---|---|
| Ours | 0.79 | 0.80 | 44.91/45.43 | 39.58/49.17 | 51.33/66.94 | 44.16/66.47 |
| DOPE | 0.48 | 0.48 | 40.05/29.04 | 34.67/31.73 | 44.27/42.28 | 38.32/44.79 |
| GOAP | 0.40 | 0.42 | 33.70/11.66 | 28.94/11.87 | 42.77/39.61 | 36.89/40.07 |
| ProQ4 | 0.47 | 0.43 | 32.70/13.74 | 27.87/12.80 | 43.93/45.57 | 37.70/45.93 |
| ProQ3D | 0.61 | 0.62 | 42.26/41.89 | 36.52/45.08 | 48.94/61.72 | 42.39/64.74 |
| DeepQA | 0.55 | 0.55 | 34.23/13.92 | 29.33/16.21 | 47.05/55.47 | 40.22/56.76 |
Z-score is calculated from the GDT-TS of the selected model and the GDT-TS of all models of the target. Sum the Z scores of each target to get the Z-score sum
Performance of various QA methods measured by GDT-TS and TM-scores (CASP13 TBM-hard domains, poor quality dataset)
| QA methods | Corr. TM-scores on stage2 | Corr. GDT-TS On stage 2 | Top 1 TM-score on stage 2 / Z-score sum | Top 1 GDT-TS on stage 2 / Z-score sum | Best model TM-score on stage 2 / Z-score sum | Best model GDT-TS on stage 2 / Z-score sum |
|---|---|---|---|---|---|---|
| Ours | 0.72 | 0.72 | 49.22/22.93 | 41.10/21.45 | 55.88/28.86 | 45.16/28.66 |
| DOPE | 0.42 | 0.41 | 47.31/13.09 | 38.25/15.03 | 53.25/21.36 | 43.05/22.84 |
| GOAP | 0.34 | 0.36 | 45.51/12.67 | 37.00/14.07 | 50.89/20.15 | 41.12/20.78 |
| ProQ4 | 0.44 | 0.48 | 36.26/1.39 | 27.76/0.99 | 41.25/21.85 | 53.14/23.34 |
| ProQ3D | 0.68 | 0.68 | 52.92/20.78 | 41.27/20.22 | 56.02/24.51 | 43.90/24.01 |
| DeepQA | 0.46 | 0.47 | 39.76/0.61 | 29.35/0.14 | 47.77/11.68 | 31.16/11.64 |
Z-score is calculated from the GDT-TS of the selected model and the GDT-TS of all models of the target. Sum the Z scores of each target to get the Z-score sum
Six features of our method
| Feature Name | Descriptions |
|---|---|
| DOAP score | A statistical potential score for assessment and prediction of protein structures. |
| GOAP score | A generalized orientation-dependent, all-atom statistical potential score. |
| Secondary structure penalty score for helix | Calculate different ratios of predicted alpha-helix from amino acid sequence and a model parsed by DSSP. |
| Secondary structure penalty score for strand | Calculate different ratios of predicted beta-strand from amino acid sequence and a model parsed by DSSP. |
| Solvent accessibility penalty score | The difference of solvent accessibility prediction and model parsed by DSSP. |
| Contact penalty score | The percentage of contact prediction matching with the model structure. |
Fig. 1GDT-TS range of ab initio predictions made using Rosetta for targets in 31 FM (Free Modelling) domains of CASP13. The range of the generated decoys for each target is represented by a box plot, with the minimum, lower quartile, median, upper quartile, and maximum values in turn from left to right
Fig. 2Correlation matrix of our method’s features obtained using CASP12 data, the true scores corresponding to the ground truth GDT-TS score
Fig. 3Ranges of the top 50 weight combinations obtained using CASP12 data and random search. The weight distribution range of each feature uses the box plot of the corresponding colour of the legend to represent the minimum, lower quartile, median, upper quartile, and maximum value