| Literature DB >> 27919220 |
Renzhi Cao1, Debswapna Bhattacharya2, Jie Hou3, Jianlin Cheng4,5.
Abstract
BACKGROUND: Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem.Entities:
Keywords: Deep belief network; Machine learning; Protein model quality assessment; Protein structure prediction
Mesh:
Substances:
Year: 2016 PMID: 27919220 PMCID: PMC5139030 DOI: 10.1186/s12859-016-1405-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
16 features for benchmarking DeepQA
| Feature Name | Feature descriptions |
|---|---|
| (1). Surface score (SU) | The total area of exposed nonpolar residues divided byc the total area of all residues |
| (2). Exposed mass score (EM) | The percentage of mass for exposed area, equal to the total mass of exposed area divided by the total mass of all area |
| (3). Exposed surface score (ES) | The total exposed area divided by the total area |
| (4). Solvent accessibility score (SA) | The difference of solvent accessibility predicted by SSpro4 [ |
| (5). RF_CB_SRS_OD score [ | A novel distance dependent residue-level potential energy score. |
| (6). DFIRE2 score [ | A distance-scaled all atom energy score. |
| (7). Dope score [ | A new statistical potential discrete optimized protein energy score. |
| (8). GOAP score [ | A generalized orientation-dependent, all-atom statistical potential score. |
| (9). OPUS score [ | A knowledge-based potential score. |
| (10). ProQ2 score [ | A single-model quality assessment method by machine learning techniques. |
| (11). RWplus score [ | A new energy score using pairwise distance-dependent atomic statistical potential function and side-chain orientation-dependent energy term |
| (12). ModelEvaluator score [ | A single-model quality assessment score based on structural features using support vector machine. |
| (13). Secondary structure similarity score (SS) | The difference of secondary structure information predicted by Spine X [ |
| (14). Secondary structure penalty score (SP) | Calculated from the predicted secondary structure alpha-helix and beta-sheet matching with the one parsed by DSSP. |
| (15). Euclidean compact score (EC) | The pairwise Euclidean distance of all residues divided by the maximum Euclidean distance (3.8) of all residues. |
| (16). Qprob [ | A single-model quality assessment score that utilizes 11 structural and physicochemical features by feature-based probability density functions. |
Fig. 1The Deep Belief Network architecture for DeepQA
The accuracy of Deep Belief Network, Support Vector Machines, and Neural Networks in terms of Mean Absolute Error (MAE) based on cross validation of training datasets with 16 features, the average per-target correlation, and loss on stage 1 and stage 2 of CASP11 datasets for all three difference techniques. P-value is calculated for the significance of DBN compared to other two methods
| MAE based on cross validation | Corr. on stage 1/significance of | Loss on stage 1/significance of | Corr. on stage 2/significance of | Loss on stage 2/significance of | |
|---|---|---|---|---|---|
| Deep Belief Network | 0.08 | 0.63/- | 0.09/- | 0.34/- | 0.06/- |
| Support Vector Machine | 0.12 | 0.58/1.97E-01 | 0.10/6.17E-01 | 0.32/4.45E-04 | 0.07/7.41E-01 |
| Neural Network | 0.08 | 0.51/9.74E-04 | 0.12/8.35E-02 | 0.25/1.05E-05 | 0.07/1.19E-01 |
| Mean | 0.09 | 0.57/9.88E-02 | 0.10/3.50E-01 | 0.30/2.28E-04 | 0.07/4.30E-01 |
Average per-target correlation and loss for DeepQA and other top performing single-model QA methods on CASP11. The table is ranked based on the average per-target loss on stage two of CASP11. P-value of Wilcoxon signed ranked sum test* between DeepQA and other methods is also included in the table
| QA methods | Corr. on stage 1 / | Loss on stage 1 / | Corr. on stage 2 / | Loss on stage 2 / |
|---|---|---|---|---|
| DeepQA | 0.64/- | 0.09/- | 0.42/- | 0.06/- |
| ProQ2 | 0.64/4.80E-01 | 0.09/8.32E-01 | 0.37/2.84E-03 | 0.06/9.95E-01 |
| Qprob | 0.63/8.08E-01 | 0.10/9.38E-01 | 0.38/8.63E-03 | 0.07/7.12E-01 |
| VoroMQA | 0.56/1.60E-04 | 0.11/2.73E-01 | 0.40/2.57E-01 | 0.07/9.14E-01 |
| ProQ2-refine | 0.65/6.08E-02 | 0.09/9.17E-01 | 0.37/4.71E-03 | 0.07/4.86E-01 |
| Wang_SVM | 0.66/5.49E-02 | 0.11/7.98E-02 | 0.36/1.54E-02 | 0.09/4.91E-02 |
| raghavagps-qaspro | 0.35/3.79E-13 | 0.16/1.87E-04 | 0.22/1.92E-10 | 0.09/1.02E-03 |
| Wang_deep_2 | 0.63/9.98E-01 | 0.12/7.18E-02 | 0.31/2.16E-06 | 0.09/8.22E-03 |
| Wang_deep_1 | 0.61/3.06E-01 | 0.13/1.64E-03 | 0.30/5.93E-06 | 0.09/5.00E-03 |
| Wang_deep_3 | 0.63/7.18E-02 | 0.12/3.15E-02 | 0.30/8.22E-03 | 0.09/8.22E-03 |
| FUSION | 0.10/8.43E-14 | 0.15/9.78E-04 | 0.05/1.81E-13 | 0.11/2.83E-07 |
| RFMQA | 0.54/1.61E-01 | 0.12/8.74E-01 | 0.29/3.80E-03 | 0.08/3.80E-03 |
| ProQ3 | 0.65/1.62E-01 | 0.07/3.60E-02 | 0.38/4.44E-01 | 0.06/4.09E-01 |
| ResQ* | 0.67/- | 0.05/- | 0.58/- | 0.09/- |
| ModFOLDclust2 | 0.74/3.96E-05 | 0.05/6.34E-04 | 0.56/1.80E-03 | 0.07/1.41E-01 |
| Mean | 0.57 | 0.11 | 0.33 | 0.08 |
* The Wilcoxon signed ranked sum test was performed on the correlation and loss of targets between each method against DeepQA
* ResQ was evaluated on 54 targets in CASP11, the local quality scores were converted into global quality score by equation . More detailed results can be found in Additional file 1: Table S4
Model selection ability on ab initio datasets for DeepQA, ProQ2, Dope2, and RWplus score based on TM-score and RMSD, and their summation of Z-score
| QA methods | TM-score on top 1 model/SUM Z-score (>0.0) | RMSD on top 1 model/SUM Z-score (<0.0) | TM-score on best of top 5/SUM Z-score (>0.0) | RMSD on best of top 5/SUM Z-score (<0.0) |
|---|---|---|---|---|
| DeepQA | 0.23/0.86 | 19.01/-0.76 | 0.26/1.78 | 17.14/-1.52 |
| ProQ2 | 0.22/0.40 | 19.73/-0.37 | 0.25/1.28 | 17.93/-1.04 |
| Dope | 0.22/0.49 | 19.55/-0.51 | 0.24/1.13 | 18.10/-1.00 |
| RWplus | 0.22/0.53 | 19.68/-0.35 | 0.25/1.49 | 17.38/-1.41 |
| Mean | 0.22/0.68 | 19.49/-0.64 | 0.25/1.46 | 17.64/-1.26 |
Average per-target correlation and loss on Stage 1 and Stage 2 for DeepQA and its training features on CASP11. The significance between DeepQA and individual feature was assessed by Wilcoxon signed ranked sum paired t-test*, and its P-value was included to represent the improvement of DeepQA against its input features
| QA methods | Corr. onstage 1/ | Loss on stage 1/ | Corr. on stage 2/ | Loss on stage 2/ |
|---|---|---|---|---|
| DeepQA | 0.64/- | 0.09/- | 0.42/- | 0.06/- |
| Dope | 0.54/1.77E-06 | 0.11/0.0421 | 0.30/4.63E-10 | 0.08/2.76E-01 |
| EC score | 0.37/4.29E-11 | 0.18/5.71E-07 | 0.02/3.23E-14 | 0.14/2.08E-10 |
| GOAP score | 0.54/2.74E-05 | 0.13/0.0016 | 0.31/5.07E-07 | 0.07/1.06E-01 |
| ModelEvaluator score | 0.56/0.0001 | 0.10/0.2160 | 0.28/1.87E-09 | 0.08/1.99E-02 |
| OPUS score | 0.43/2.14E-11 | 0.12/0.0588 | 0.30/4.53E-09 | 0.08/3.54E-01 |
| Qprob score | 0.63/0.8080 | 0.09/0.9382 | 0.38/8.63E-03 | 0.06/7.12E-01 |
| RWplus score | 0.54/4.80E-06 | 0.14/0.0009 | 0.30/9.41E-09 | 0.08/4.49E-02 |
| SP score | 0.47/3.07E-10 | 0.14/0.0067 | 0.26/6.17E-10 | 0.10/1.10E-05 |
| SU score | 0.50/3.78E-09 | 0.18/4.94E-07 | 0.19/6.34E-11 | 0.11/3.95E-07 |
| Mean | 0.52/0.09 | 0.13/0.14 | 0.27/0.00 | 0.09/0.17 |
* The Wilcoxon signed ranked sum paired t-test was performed on the correlation and loss of targets between each feature against DeepQA