| Literature DB >> 20150411 |
Zheng Wang1, Jesse Eickholt, Jianlin Cheng.
Abstract
MOTIVATION: Protein structure prediction is one of the most important problems in structural bioinformatics. Here we describe MULTICOM, a multi-level combination approach to improve the various steps in protein structure prediction. In contrast to those methods which look for the best templates, alignments and models, our approach tries to combine complementary and alternative templates, alignments and models to achieve on average better accuracy.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20150411 PMCID: PMC2844995 DOI: 10.1093/bioinformatics/btq058
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.A multi-level combination pipeline for protein structure prediction.
Implementation details of five MULTICOM servers predictors and one MULTICOM human predictor
| Steps | Methods | M-CLUSTER | M-RANK | M-CMFR | M-REFINE | MUProt | MULTICOM |
|---|---|---|---|---|---|---|---|
| (1) Template identification and ranking | PSI-BLAST | ✓ | ✓ | ✓ | |||
| Hhsearch | ✓ | ✓ | |||||
| COMPASS | ✓ | ||||||
| FOLDPro | ✓ | ✓ | ✓ | ||||
| (2) Template combination | Greedy algorithm | ✓ | ✓ | ✓ | |||
| (3) Model generation | Modeller | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| ROSETTA | ✓ | ||||||
| MULTICOM models | ✓ | ✓ | |||||
| CASP8 server models | ✓ | ||||||
| (4) Model evaluation | ModelEvaluator (SVM) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| SPIKER (clustering) | ✓ | ||||||
| (5) Model combination and refinement | Global-local algorithm | ✓ | ✓ | ✓ |
Evaluation results of MULTICOM predictors for the first and the best-of-five models (inside parentheses) on 120 CASP8 targets
| Predictor | Avg. TM | Avg. GDT-TS | Avg. MaxSub | GDT-TS S.E. |
|---|---|---|---|---|
| MULTICOM | 70 (72) | 63 (65) | 60 (62) | 1.99 (1.94) |
| M-REFINE | 67 (69) | 60 (62) | 56 (58) | 2.06 (1.99) |
| M-CLUSTER | 67 (70) | 60 (62) | 56 (59) | 2.03 (1.99) |
| MUprot | 67 (69) | 60 (61) | 59 (58) | 2.04 (1.98) |
| M-RANK | 66 (68) | 59 (61) | 55 (57) | 2.05 (2.02) |
| M-CMFR | 66 (68) | 58 (61) | 55 (57) | 2.04 (2.02) |
aThe standard error of the average of GDT-TS scores of the first/best models.
Top 10 server predictors evaluated on the first models (out of five submissions) of the 50 TBM-HA domains
| Domain No. | Sum | Avg. GDT-TSb | |
|---|---|---|---|
| Zhang-Server | 50 | 27 | 88 |
| RAPTOR | 50 | 25 | 87 |
| MULTICOM-REFINE | 50 | 24 | 87 |
| MUProt | 50 | 24 | 87 |
| Phyre_de_novo | 50 | 24 | 87 |
| MULTICOM-CLUSTER | 50 | 23 | 87 |
| HHpred5 | 50 | 23 | 85 |
| MULTICOM-RANK | 50 | 22 | 86 |
| HHpred2 | 50 | 22 | 86 |
| pro-sp3-TASSER | 50 | 21 | 86 |
The standard error of the average GDT-TS scores of MULTICOM-REFINE, MUProt, MULTICOM-CLUSTER and MULTICOM-RANK are 0.84, 0.81, 0.81 and 0.96, respectively. This analysis only considers predictors that predicted more than 46 domains. For details about each CASP8 server, please refer to CASP8 meeting abstract (http://www.predictioncenter.org/casp8/doc/CASP8_book.pdf).
aSum of the Z-scores of GDT-TS.
bAverage GDT-TS score.
cZhang (2009).
dXu et al. (2009).
eKelley et al. (2008).
fHildebrand et al. (2009).
gZhou et al. (2009).
Top 10 human and server predictors and MULTICOM predictors on the first models (of the five possible submissions) of 64 TBM domains from human/server targets
| Predictor | Domain No. | Sum | Average GDT-TS |
|---|---|---|---|
| IBT_LT | 64 | 67 | 65 |
| DBAKER | 64 | 64 | 64 |
| Zhang | 64 | 56 | 64 |
| fams-ace2 | 64 | 52 | 63 |
| Zhang-server | 64 | 52 | 63 |
| TASSER | 64 | 51 | 63 |
| SAM-T08-human | 62 | 51 | 62 |
| ZicoFullSTP | 64 | 50 | 61 |
| Zico | 64 | 48 | 61 |
| MULTICOM | 64 | 48 | 61 |
The standard error of the average GDT-TS scores of MULTICOM on these domains is 2.37. This analysis only considers predictors that predicted more than 60 domains.
aVenclovas and Margelevicius (2009).
bRaman et al. (2009).
cZhang (2009).
dTerashi et al. (2008).
eIndicates a server predictor; otherwise, it is a human predictor.
fZhou et al. (2009).
gKarplus (2008).
hGirgis et al. (2008).
iGirgis and Fischer (2008).
Top 10 CASP8 server predictors on the first models (of five possible submissions) of 154 TBM domains
| Predictor | Domain No. | Sum | Avg. GDT-TS |
|---|---|---|---|
| Zhang-server | 154 | 104 | 71 |
| RAPTOR | 154 | 86 | 69 |
| Pro-sp3-TASSER | 154 | 81 | 68 |
| Phyre_de_novo | 154 | 79 | 68 |
| HHpred5 | 154 | 79 | 66 |
| BAKER-ROBETTA | 154 | 76 | 67 |
| METATASSER | 154 | 75 | 67 |
| HHpred4 | 154 | 75 | 67 |
| MULTI-CLUSTER | 154 | 73 | 67 |
| MULTI-REFINE | 154 | 71 | 67 |
The standard error of the average GDT-TS scores of MULTICOM-CLUSTER and MULTICOM-REFINE are 1.66 and 1.69, respectively.
This analysis only considers predictors that predicted more than 150 domains.
aRaman et al. (2009).
bZhou et al. (2009).
cHildebrand et al. (2009).
Fig. 2.Comparisons between the experimental structure (a) of domain 1 of T0435, the first model of MULTICOM (b), and four models combined by MULTICOM (c-f). The GDT-TS scores are listed inside parentheses. MULTICOM model (b) is the best model among all the server and human models for this domain. (c) The second best sever model and the best model selected and combined by MULTICOM. MULTICOM correctly predicted a beta-strand [green in (b)], which was not correctly predicted by any of the four models it combined [green, (c-f)]. Furthermore, a helix (red) was correctly modeled in (b) and (c), but not in any of the other models [red, (d-f)]. This indicates that the model combination algorithm can detect and combine portions with good qualities, and further refine structural portions to achieve a better overall quality.
Fig. 3.Comparisons between the experimental structure (a) of domain 2 of T0501, the first MULTICOM model (b), and eight of the 20 models MULTICOM combined (c-j). The GDT-TS scores are listed inside parentheses. (b) The best model among all the server and human models for this domain. (c) The best server model. METATASSER did not rank its best model (c) as the top one model, but this model was included into the combination process of MULTICOM. In this case, the combined model (b) achieved a better quality than all the models it did or did not combine.