| Literature DB >> 21989082 |
Jesse Eickholt1, Zheng Wang, Jianlin Cheng.
Abstract
BACKGROUND: Protein residue-residue contact prediction is important for protein model generation and model evaluation. Here we develop a conformation ensemble approach to improve residue-residue contact prediction. We collect a number of structural models stemming from a variety of methods and implementations. The various models capture slightly different conformations and contain complementary information which can be pooled together to capture recurrent, and therefore more likely, residue-residue contacts.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21989082 PMCID: PMC3200154 DOI: 10.1186/1472-6807-11-38
Source DB: PubMed Journal: BMC Struct Biol ISSN: 1472-6807
Figure 1A conformation ensemble approach for residue-residue contact prediction. The starting point for our conformation ensemble contact predictor is a collection of structural models. From each model in the ensemble, the residue-residue contacts are extracted and then counted across all models. This list of contacts is then normalized and the most commonly occurring contacts are selected as the predicted contacts.
Precision and recall of conformation ensemble contact predictions on CASP8 FM targets
| Evaluation criteria | Medium range contacts | Long range contacts |
|---|---|---|
| Top L/5 | .48(.18) | .36(.08) |
| Top L/5, δ = 1 | .70(.24) | .61(.13) |
| Top L/5, δ = 2 | .77(.26) | .69(.14) |
The performance of the conformation ensemble approach on the free modelling (FM) targets from CASP8. The input ensembles were sets of server submitted tertiary structure predictions for each FM target during CASP8. L is the sequence length of each target domain. δ is the neighbourhood size in residues. For δ = 1, a prediction is considered correct if a true contact occurs within ± 1 residues of the prediction. The precision of the predictions is shown first with the recall in parentheses.
Precision and recall of conformation ensemble contact predictions on CASP9 FM targets
| Evaluation criteria | Medium range contacts | Long range contacts |
|---|---|---|
| Top L/5 | .34 (.18) | .30 (.05) |
| Top L/5, δ = 1 | .55 (.27) | .48 (.07) |
| Top L/5, δ = 2 | .64 (.29) | .56 (.08) |
The performance of the conformation ensemble approach on the free modelling (FM) targets from CASP9. The input ensembles were sets of server submitted tertiary structure predictions for each FM target during CASP9. δ is the neighbourhood size in residues. L is the sequence length of each target domain. The precision of the predictions is shown first with the recall in parentheses.
Comparison of contact predictors on top L/5 predictions for CASP9 FM targets
| Prediction Methods | Medium range contacts | Long range contacts |
|---|---|---|
| Conformation ensemble | .34 | .30 |
| SVMcon | .19 | .19 |
| BAKER-ROSETTASERVER ensemble | .27 | .20 |
| Zhang-Server ensemble | .28 | .23 |
The precision of predicted contacts obtained by various contact prediction methods. For our conformation ensemble, we used sets of server submitted tertiary structure predictions for each FM target during CASP9. SVMcon is a machine learning, sequence based contact prediction methods. BAKER-ROSETTASERVER ensemble and Zhang-Server ensemble were made by applying the conformation ensemble approach to the structural predictions made by each predictor during CASP9. L is the sequence length of each target domain.
Precision of top L/5 contact predictions obtained from filtered ensembles on CASP9 FM targets
| Filter type | Medium range contacts | Long range contacts |
|---|---|---|
| Remove-poor | .34 | .35 |
| Remove-top | .32 | .25 |
| Only-top | .32 | .37 |
The performance of the conformation ensemble approach when applied to filtered ensembles. The input ensembles were filtered sets of server submitted tertiary structure predictions for each FM target during CASP9. For Remove-poor, ModelEvaluator was used and any model with a predicted GDT-TS score of less than 30 was removed from an ensemble. For Remove-top, the top 20 models when ranked by TM-Score were removed from an ensemble. For Only-top, the ensemble consisted of only the top 20 models when ranked by TM-Score. L is the sequence length of each target domain.
Figure 2Contact maps for CASP9 targets T0618 and T0624. Visualized contact maps for (a) T0618 and (b) T0624. The lower portion of each figure represents true long range contacts (colored red) extracted from the experimental structure. The upper portion shows the top L/5 predicted long range contacts obtained from the conformation ensemble. The contacts cover several distinct regions of long range interaction and show their proximity to true contacts.
The average loss on CASP9 FM targets
| Ranking Mechanism | Avg. Loss (in GDT-TS score) |
|---|---|
| Scoring w/conformation ensemble contacts | 0.07 |
| MULTICOM (QA) | 0.07 |
| MULTICOM-CLUSTER (QA) | 0.08 |
| Random baseline measure | 0.17 |
Models ranked by satisfaction of contacts predicted by conformation ensemble approach. Random baseline measure is the loss of middlemost model from a group when ranked by GDT-TS score.
Figure 3Key long range interactions for T0618. Several tertiary structure predictors had difficulty arranging the helical bundles for this target. Our conformation ensemble approach correctly predicted several key long range interactions for this target which help pull the helical bundles together. The input ensemble was the collection of server submitted models for T0618 during CASP9. The long range interactions are 16-53 (red), 119-153 (green) and 83-116 and 27-116 (orange).
Representation of predicted contact clusters in an ensemble
| Target | Num. Clusters | Cluster Coverage (percentage of models from ensemble with stated coverage) | |||
|---|---|---|---|---|---|
| T0534[31-80,257-384] | 5 | 5(.05) | 4(.14) | 3(.16) | 2(.17) |
| T0534[81-255] | 3 | 3(.11) | 2(.24) | 1(.23) | 0(.40) |
| T0544[1-135] | 7 | 7(.07) | 6(.12) | 5(.13) | 4(.14) |
| T0550[178-339] | 13 | 11(.01) | 10(.01) | 9(.07) | 8(.09) |
| T0561[1-109,112-161] | 5 | 5(.12) | 4(.12) | 3(.20) | 2(.16) |
| T0571[197-331] | 6 | 6(.03) | 5(.08) | 4(.10) | 3(.16) |
| T0608[29-117] | 5 | 5(.06) | 4(.08) | 3(.11) | 2(.17) |
| T0621[2-170] | 6 | 6(.03) | 5(.10) | 4(.19) | 3(.22) |
Long range contacts predicted by the conformation ensemble are clustered for a number of CASP9 FM targets. The cluster coverage is the percentage of models in the ensemble that cover a given number of clusters recovered by the ensemble method. A cluster is considered covered by a model if the model contains a contact within 4 residues of the cluster's representative contact. For each target, the cluster coverages are calculated for the top four cluster counts. Num. Clusters (column 2) is the total number of true contact clusters recovered by the ensemble method for the target. Other columns followed list the percent of models in the pool containing a specific number of the true clusters. The results show that the ensemble method can recover more contact clusters even though the proportion of models in the pool having high cluster coverage is very low.