| Literature DB >> 25763838 |
Joan Segura1, Manuel Alejandro Marín-López2, Pamela F Jones1, Baldo Oliva2, Narcis Fernandez-Fuentes1.
Abstract
The experimental determination of the structure of protein complexes cannot keep pace with the generation of interactomic data, hence resulting in an ever-expanding gap. As the structural details of protein complexes are central to a full understanding of the function and dynamics of the cell machinery, alternative strategies are needed to circumvent the bottleneck in structure determination. Computational protein docking is a valid and valuable approach to model the structure of protein complexes. In this work, we describe a novel computational strategy to predict the structure of protein complexes based on data-driven docking: VORFFIP-driven dock (V-D2OCK). This new approach makes use of our newly described method to predict functional sites in protein structures, VORFFIP, to define the region to be sampled during docking and structural clustering to reduce the number of models to be examined by users. V-D2OCK has been benchmarked using a validated and diverse set of protein complexes and compared to a state-of-art docking method. The speed and accuracy compared to contemporary tools justifies the potential use of VD2OCK for high-throughput, genome-wide, protein docking. Finally, we have developed a web interface that allows users to browser and visualize V-D2OCK predictions from the convenience of their web-browsers.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25763838 PMCID: PMC4357426 DOI: 10.1371/journal.pone.0118107
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1VD2OCK workflow.
(a) (a’) V-PATCH algorithm is used to define the protein binding sites based; (b) rigid-body docking is driven by interface predictions; (and c) clustering stage where dockings poses are structurally clustered and clusters’ centroids selected as representatives.
Statistical performance of V-PATCH and fixed thresholds.
| Method | R(%) | P(%) | F1 | MCC |
|---|---|---|---|---|
|
| 61 | 27 | 0.37 | 0.34 |
|
| 60 | 22 | 0.32 | 0.29 |
|
| 60 | 24 | 0.34 | 0.30 |
VPATCH and fixed thresholds using raw and normalized score were used to compare performance. Results are shown for (R) recall, (P) precision, the (F1) F1 score and (MCC) Matthews correlation coefficient
Effect of clustering in the quality of the models.
| # of solutions to cluster(a) | CAPRI evaluation system(b) | # Docking poses(c) | |||
|---|---|---|---|---|---|
| A | B | C | D | ||
|
| 1 | 46 | 75 | 53 | 4509 |
|
| 0 | 21 | 93 | 61 | 898 |
|
| 0 | 20 | 83 | 72 | 635 |
|
| 0 | 17 | 73 | 85 | 218 |
|
| 0 | 13 | 60 | 102 | 162 |
|
| 0 | 9 | 47 | 119 | 96 |
(a) Sets of solutions used: all poses, centroids for all clusters, centroids for the top 1000, centroids for the top 200, centroids for the top 100 and centroids for the top 50 clusters respectively. (b) CAPRI evaluation system where A, B, C and D represent the number of predictions considered as high-quality quality (three stars), medium quality (two stars), acceptable and wrong respectively are shown alongside the average number of docking models (c)
Fig 2Relationship between l-RMSD (Ang) and interface coverage (%).
RMSD was calculated using the main chain atoms. The interface coverage represents the lowest coverage of the predicted binding sites in either ligand / or receptor. Red empty circles and green empty triangles represent the best l-RMSD using all docking poses or the best poses among the top 200 clusters respectively.
Fig 3Success rates for all test cases (left) and medium/difficult cases (right) on Benchmark v4.0.
PatchDock[15], ES3DC potential[27] and ZRANK scores[28] are shown as solid, dashed and dotted lines respectively.
Fig 4Examples of structural models.
Rows (from top to bottom) show the comparison between native and predicted structures of protein complexes: camelid VHH domain and porcine pancreatic alpha-amylase (PDB code 1kxq)[29], BET3 and TPC6B core of TRAPP (PDB code 2cfh)[30], and Pol II epsilon and Hot proofreading complex (PDB code 2ido)[31]. Colums (from left to right) show: 1) the structure of native and predicted complex where receptor is depicted in surface (grey) and receptor as ribbon representation (native: dark grey; predicted: orange). 2) Surface representation of both receptor (left) and ligand (right) and the overlap (red) between native (dark grey) and predicted (orange). 3) Surface representation as in 2) showing the overlap (red) between predicted interface (green) and docking interface (orange).