| Literature DB >> 25161237 |
Mirco Michel1, Sikander Hayat2, Marcin J Skwark2, Chris Sander2, Debora S Marks2, Arne Elofsson1.
Abstract
MOTIVATION: Recently it has been shown that the quality of protein contact prediction from evolutionary information can be improved significantly if direct and indirect information is separated. Given sufficiently large protein families, the contact predictions contain sufficient information to predict the structure of many protein families. However, since the first studies contact prediction methods have improved. Here, we ask how much the final models are improved if improved contact predictions are used.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25161237 PMCID: PMC4147911 DOI: 10.1093/bioinformatics/btu458
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.PconsFold pipeline. Based on a given protein sequence, amino acid contacts are predicted with PconsC. These contacts then facilitate protein folding with Rosetta. In the end, PconsFold outputs a structural model for the given sequence
Fig. 2.Model quality in TM-score for adjustments of two different Rosetta parameters. (a) Performance distributions for two different sample sizes of 20 000 (left) and 2000 (right) decoy structures. The black boxes indicate upper and lower quartile with white dots at the median of the distributions. For each protein in the full PSICOV dataset the top-ranked model was selected from the decoys by its Rosetta score and compared with the native structure. (b) Effects of adjustments to the well-depth parameter of the FADE function. A low absolute well-depth (left side) puts low weight on predicted constraints. Constraints are stronger weighted by higher absolute values of well-depth (right side). A subset of 14 proteins of the PSICOV dataset was used here
Fig. 3.Folding performance on the full PSICOV dataset. (a) The number of contacts used in structure prediction is plotted against average TM-score for three different methods: PconsFold (green circles), Rosetta/plmDCA (blue triangles) and Rosetta/PSICOV (black squares). For each protein, the number of top-ranked contacts was selected relative to its sequence length. A value of 1.0 on the x-axis represents one contact per residue on average. Error bars indicate standard errors. (b) TM-scores are compared with the PPV of underlying contact maps for PconsFold (using PconsC). The colours represent all four CATH fold classes. Lines are fitted to the data to illustrate performance differences between the fold classes
Average TM-scores for top-ranked models
| Method | EVfold-PLM | Rosetta/plmDCA | PconsFold |
|---|---|---|---|
| Rosetta | – | 0.50 | 0.55 |
| Pcons | 0.47 | 0.47 | 0.53 |
| ProQ2 | 0.36 | 0.46 | 0.51 |
| DOPE | 0.46 | 0.32 | 0.36 |
Models were ranked by different MQAPs.
Fig. 4.Analysis of contact maps in native structures and top-ranked models. PPVs were calculated for the sets of contacts that were used during folding (1.0 · l top-ranked contacts) with a C distance cutoff of 8 Å in the structures. (a) PPV values for PconsC contacts on native structures (x-axis) against PPVs on the top-ranked models from PconsFold (y-axis). The colours represent TM-scores of models against native structures. (b) Native structure of 1JWQ. Lines represent all predicted contacts. The colour scheme indicates spatial distances of residue pairs in the structure. The PPV is 0.83. (c) Predicted contacts in the top-ranked model for 1JWQ with the same color scheme. This model has a TM-score of 0.62 and a PPV of 0.46
Fig. 5.TM-score comparison for top-ranked models of the proteins in the PSICOV dataset. The decoys for each method were re-ranked using Pcons to assess the performance of the structure prediction process independent of the model ranking scheme. The colours represent all four CATH fold classes. (a) PconsFold compared with EVfold-PLM. (b) Rosetta/plmDCA compared with EVfold-PLM
TM-scores for top-ranked models comparing EVfold with mean field, EVfold-PLM and PconsFold with 20 000 decoys
| Protein | EVfold | EVfold-PLM | PconsFold |
|---|---|---|---|
| BPT1_BOVIN | 0.49 | 0.25 | 0.57 |
| CADH1_HUMAN | 0.55 | 0.54 | 0.53 |
| CD209_HUMAN | 0.39 | 0.64 | 0.54 |
| CHEY_ECOLI | 0.65 | 0.66 | 0.82 |
| ELAV4_HUMAN | 0.57 | 0.61 | 0.80 |
| O45418_CAEEL | 0.48 | 0.62 | 0.65 |
| OMPR_ECOLI | 0.35 | 0.44 | 0.59 |
| OPSD_BOVIN | 0.50 | 0.55 | 0.56 |
| PCBP1_HUMAN | 0.25 | 0.43 | 0.60 |
| RASH_HUMAN | 0.70 | 0.62 | 0.67 |
| RNH_ECOLI | 0.54 | 0.66 | 0.61 |
| SPTB2_HUMAN | 0.37 | 0.51 | 0.74 |
| THIO_ALIAC | 0.55 | 0.56 | 0.83 |
| TRY2_RAT | 0.53 | 0.78 | 0.54 |
| YES_HUMAN | 0.35 | 0.31 | 0.57 |
| Mean | 0.48 | 0.55 (0.09 | 0.64 (0.04 |
aThe Uniprot entry A8MVQ9_HUMAN of the EVfold publication was renamed into CD209_HUMAN.
bThis value was corrected, as in the original publication it showed the value for the best possible model.
*P-value for a one-sample t-test of TM-score difference to EVfold TM-scores.
**P-value for a one-sample t-test of TM-score difference to EVfold-PLM TM-scores.