| Literature DB >> 31199455 |
Cunliang Geng1, Yong Jung2,3,4, Nicolas Renaud5, Vasant Honavar2,3,4,6,7,8,9, Alexandre M J J Bonvin1, Li C Xue1.
Abstract
MOTIVATION: Protein complexes play critical roles in many aspects of biological functions. Three-dimensional (3D) structures of protein complexes are critical for gaining insights into structural bases of interactions and their roles in the biomolecular pathways that orchestrate key cellular processes. Because of the expense and effort associated with experimental determinations of 3D protein complex structures, computational docking has evolved as a valuable tool to predict 3D structures of biomolecular complexes. Despite recent progress, reliably distinguishing near-native docking conformations from a large number of candidate conformations, the so-called scoring problem, remains a major challenge.Entities:
Mesh:
Substances:
Year: 2020 PMID: 31199455 PMCID: PMC6956772 DOI: 10.1093/bioinformatics/btz496
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Schematic workflow of our graph kernel-based scoring method. Docking models for a protein–protein complex are first represented as graphs by treating the interface residues as graph nodes and the intermolecular contacts they form as graph edges. Interface features are added to the graph as node or edge labels (only PSSM profiles as node labels in this case). Then, each of the interface graphs of the docking models is compared to the interface graphs of both the positive (native) structure and negative (non-native) models. This graph comparison generates a similarity matrix for the docking models with the number of rows and columns corresponding to the number of docking models and the total number of positive and negative graphs, respectively. Next, the support vector machine takes the graph kernel matrix as input and predicts decision values that are used as the GraphRank score. The final scoring function iScore is a linear combination of the GraphRank score and HADDOCK energetic terms (van der Waals, electrostatic and desolvation energies). The weights of this linear combination are optimized using the genetic algorithm (GA) over the BM4 HADDOCK dataset
Rankings of GraphRank and iScore in comparison with the scorer groups on the CAPRI score set
| Performance | # Submitted targets | |
|---|---|---|
| iScore | 9/2***/5** | 13 |
| Weng | 8/3***/2** | 9 |
| Bonvin | 8/2***/3** | 9 |
| Bates | 8/1***/4** | 10 |
| GraphRank | 8/1***/4** | 13 |
| Zou | 7/4***/1** | 9 |
| Wang | 6/2***/3** | 6 |
| Fernandez-Recio | 5/2***/3** | 8 |
| Elber | 5/1***/1** | 5 |
| Wolfson | 4/1*** | 5 |
| Camacho | 3/2***/1** | 5 |
| … and many others |
Note: In total 37 scorer groups were assessed (Supplementary Table S3), but only scorer groups that have submitted predictions for at least 5 out of the 13 CAPRI targets are shown here. The scoring functions/groups are ordered based on their performance. Number of targets with submitted predictions are shown for each function/group.
Fig. 2.Success rate of HADDOCK score, GraphRank and iScore on the BM4 HADDOCK training dataset over top N clusters of models
Fig. 3.Success rates measured at cluster level on four sets of docking program-specific models for newly added protein–protein complexes in BM5. GraphRank and iScore are compared with scoring functions from HADDOCK (A), SwarmDock (B), pyDock (C) and ZDock (D) on the docking models of the corresponding docking program, respectively
Comparison of GraphRank and iScore with IRaPPA on docking program-specific models of BM5 protein–protein complexes
| Docking models | #Complexes | GraphRank | iScore | IRaPPA |
|---|---|---|---|---|
| SwarmDock | 18 | 7/1***/6** | 10/2***/6** | 12/1***/6** |
| pyDock | 14 | 5/3** | 6/3** | 10/3** |
| ZDock | 10 | 4/3** | 6/5** | 8/3** |
Note: 10 models are selected and evaluated. The scoring performance for each complex is reported as the number of acceptable or better models (hits), followed by the number of high (indicated with ***) or medium quality models (**). The overall performance of each method on all complexes is reported here. For example, 7/1***/6** means that a scoring function is successful in 7 complexes, 1 complex out of the 7 complexes has at least a *** model and 6 out of 7 have at least a ** model in the top 10.
Comparison of GraphRank and iScore with CAPRI best performing group per target on the CAPRI score set
| CAPRI targets | GraphRank | iScore | CAPRI best | # Total models | #Near-native |
|---|---|---|---|---|---|
| T29 | 4 | 4 | 9/5** | 1979 | 166 |
| T30 | 0 | 0 | 0 | 1148 | 2 |
| T32 | 4/1** | 4/1** | 2 | 599 | 15 |
| T35 | 0 | 0 | 1 | 497 | 3 |
| T37 | 2/1** | 4/2** | 6/1*** | 1364 | 97 |
| T39 | 0 | 0 | 0 | 1295 | 4 |
| T40 | 4/3** | 4/1*** | 10/10*** | 1987 | 535 |
| T41 | 8 | 10/2** | 10/2*** | 1101 | 347 |
| T46 | 3 | 4 | 4 | 1570 | 24 |
| T47 | 8/5***/3** | 10/6***/4** | 10/10*** | 1015 | 608 |
| T50 | 0 | 4/3** | 7/6** | 1447 | 133 |
| T53 | 5/1** | 5/1** | 8/3** | 1360 | 122 |
| T54 | 0 | 0 | 0 | 1304 | 19 |
| Total | 8/1***/4** | 9/2***/5** | 10/4***/3** |
Note: 10 models are selected and evaluated. The values are labeled in green/red when the performance of our scoring functions is better/worse than the CAPRI best scoring group. The scoring performance for each target is reported as the number of acceptable or better models (hits), followed by the number of high (indicated with ***) or medium quality models (**). For example, 8/2** means that there are totally 8 hits among the top 10 models, 2 models out of which are medium-quality models. The overall performance of each method on all 13 targets (the last row) is reported in a similar way. For example, 9/2***/5** means that a scoring function is successful in 9 targets, 2 targets out of 9 have at least a *** model and 5 out of 9 have at least a ** model in the top 10. Note that the CAPRI best column consists of results from 37 different groups (refer to Table 3 for a comparison of the performance per group and Supplementary Table S3 per target).