| Literature DB >> 26072473 |
Renzhi Cao1, Debswapna Bhattacharya1, Badri Adhikari1, Jilong Li1, Jianlin Cheng2.
Abstract
MOTIVATION: Sampling structural models and ranking them are the two major challenges of protein structure prediction. Traditional protein structure prediction methods generally use one or a few quality assessment (QA) methods to select the best-predicted models, which cannot consistently select relatively better models and rank a large number of models well.Entities:
Mesh:
Year: 2015 PMID: 26072473 PMCID: PMC4553833 DOI: 10.1093/bioinformatics/btv235
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
All 14 QA methods with the details
| Methods | Type | Features |
|---|---|---|
| Single | Structural, physical, chemical features | |
| OPUS-PSP | S | Contact potentials based on side chain functional groups |
| ProQ2 | S | Structural features |
| RWplus | S | Side-chain orientation dependent potential |
| S | Structural features, contacts | |
| S | Structural features, contacts, disorder, conservation | |
| RF_CB_SRS | S | Distance dependent statistical potential |
| SELECTpro | S | Energy-based (h-bond, angle, electrostatics, vdw) |
| Dope | S | Statistical potential |
| DFIRE2 | S | Energy-based potential |
| ModFOLDclust2 | Multi | Pairwise model similarity (geometry) |
| M | Pairwise model similarity | |
| Pcons | M | Pairwise model similarity |
| M+S | Weighted pairwise model similarity | |
| MULTICOM (human) | Consensus | Average ranking |
The highlighted methods are built in house. S: single-model method; M: multi-model method.
Fig. 1.The workflow of the MULTICOM method comprised of six steps. (1) A pool of tertiary structure models is predicted for a target protein. (2) Models are scored and ranked by different QA methods. (3) Models are clustered into groups based on structural similarity. (4) The consensus of individual QA rankings and other information are synthesized to generate the final ranking of all the models. (5) The final ranking and the clustering results are integrated to select top five diverse models for submission. (6) The top five models are combined to generate five refined models to be submitted to CASP11
The top 10 tertiary structure predictors ranked based on the summation of the Z-scores of the first models, and their summation of the Z-scores of best of the five submitted models
| Server name | Sum of Z/rank | Sum of Z of best of five/rank |
|---|---|---|
| MULTICOM (human) | 57.49/1 | 78.42/1 |
| Zhang-Server | 53.62/2 | 70.57/3 |
| QUARK | 51.90/3 | 71.93/2 |
| Nns | 35.07/4 | 51.79/6 |
| Myprotein-me | 34.11/5 | 52.73/5 |
| MULTICOM-CLUSTER | 31.39/6 | 39.03/10 |
| MULTICOM-CONSTRUCT | 31.33/7 | 38.65/11 |
| RBO_Aleph | 30.77/8 | 40.65/9 |
| BAKER-ROSETTASERVER | 28.80/9 | 63.64/4 |
| MULTICOM-NOVEL | 25.71/10 | 43.43/7 |
The top 10 predictors ranked based on the total number times their models were selected by our MULTICOM predictor on all the human targets or template-based (TBM) human targets only
| Rank | Servers on all human targets | Num. on all | Servers on TBM | Num. on TBM |
|---|---|---|---|---|
| 1 | Zhang-Server | 58 | Zhang-Server | 43 |
| 2 | BAKER-ROSETTASERVER | 36 | BAKER-ROSETTASERVER | 27 |
| 3 | QUARK | 29 | QUARK | 22 |
| 4 | RBO_Aleph | 29 | myprotein-me | 20 |
| 5 | myprotein-me | 28 | Nns | 19 |
| 6 | Nns | 21 | Seok-server | 14 |
| 7 | Seok-server | 17 | RBO_Aleph | 13 |
| 8 | MULTICOM-REFINE | 10 | MULTICOM-REFINE | 8 |
| 9 | FUSION | 7 | RaptorX | 4 |
| 10 | RaptorX | 5 | FUSION | 4 |
Comparison of MULTICOM with each QA method and the two different consensus methods (one based on 6 QA methods and another one based on 14 QA methods) on the average GDT-TS score and Z-score of the top models selected, and the significance of difference between each QA method and MULTICOM
| QA method | Ave. GDT-TS score on all | Ave. GDT-TS score on TBM | Ave. Z-score on all | Ave. Z-score removed | |
|---|---|---|---|---|---|
| MULTICOM | 0.374 | 0.425 | 1.364 | – | – |
| Consensus of 14 QA scores | 0.369 | 0.420 | 1.217 | – | – |
| Consensus of 14 Z-scores | 0.357 | 0.402 | 1.406 | – | – |
| 0.351 | 0.407 | 0.893 | 1.831e-05 | 1.338 | |
| 0.343 | 0.387 | 0.887 | 1.19e-02 | 1.365 | |
| 0.340 | 0.383 | 0.861 | 5.612e-03 | ||
| ModFOLDclust2 | 0.339 | 0.399 | 0.734 | 2.074e-04 | 1.356 |
| APOLLO | 0.338 | 0.403 | 0.584 | 9.331e-05 | 1.379 |
| 0.334 | 0.382 | 0.819 | 1.861e-03 | 1.360 | |
| Pcons | 0.333 | 0.397 | 0.565 | 1.831e-05 | 1.325 |
| 0.333 | 0.378 | 0.870 | 9.840e-03 | 1.334 | |
| 0.329 | 0.367 | 0.826 | 1.662e-03 | 1.360 | |
| QApro | 0.328 | 0.371 | 0.783 | 2.889e-02 | 1.430 |
| RWplus | 0.327 | 0.373 | 0.752 | 5.193e-04 | 1.365 |
| 0.326 | 0.366 | 0.793 | 5.784e-03 | 1.356 | |
| 0.300 | 0.343 | 0.372 | 7.13e-05 | 1.365 | |
| 0.297 | 0.347 | 0.559 | 1.192e-02 | 1.340 |
Italic font denotes single-model methods.
The total number times that each QA method performed better than other QA methods on all human targets or all template-based (TBM) human targets only
| QA methods | Frequency on all targets | QA methods | Frequency on TBM |
|---|---|---|---|
| MULTICOM | 17 | MULTICOM | 11 |
| QApro | 12 | QApro | 8 |
| 11 | 7 | ||
| 9 | 7 | ||
| 9 | 7 | ||
| 9 | 6 | ||
| 8 | 6 | ||
| 8 | 6 | ||
| 8 | 6 | ||
| 4 | 4 | ||
| 4 | 3 | ||
| APOLLO | 4 | 3 | |
| ModFOLDclust2 | 3 | ModFOLDclust2 | 3 |
| Pcons | 2 | Pcons | 2 |
| 1 | 1 |
Italic denotes single-model methods.
Fig. 2.Tertiary structure prediction of domain 2 of T0783 (T0783-D2). (A) The superposition of the MULTICOM human TS1 model on domain 2 with the native structure. (B) The distribution of 191 models in the model pool. (C). The plot of the true GDT-TS scores of models against their predicted ranking
Fig. 3.Tertiary structure prediction of domain 1 of T0767 (T0767-D1). (A) The superposition of the MULTICOM human TS1 model on domain 1 with the native structure. (B) The distribution of 195 models in the model pool. (C) The plot of the true GDT-TS scores of models against their predicted ranking
Fig. 4.The plot of the difference between the initial GDT-TS scores before model combination and the GDT-TS scores after model combination against the initial GDT-TS scores of top one models of 42 targets