| Literature DB >> 27846259 |
Edrisse Chermak1, Renato De Donato1,2, Marc F Lensink3, Andrea Petta2, Luigi Serra2, Vittorio Scarano2, Luigi Cavallo1, Romina Oliva4.
Abstract
Correctly scoring protein-protein docking models to single out native-like ones is an open challenge. It is also an object of assessment in CAPRI (Critical Assessment of PRedicted Interactions), the community-wide blind docking experiment. We introduced in the field the first pure consensus method, CONSRANK, which ranks models based on their ability to match the most conserved contacts in the ensemble they belong to. In CAPRI, scorers are asked to evaluate a set of available models and select the top ten ones, based on their own scoring approach. Scorers' performance is ranked based on the number of targets/interfaces for which they could provide at least one correct solution. In such terms, blind testing in CAPRI Round 30 (a joint prediction round with CASP11) has shown that critical cases for CONSRANK are represented by targets showing multiple interfaces or for which only a very small number of correct solutions are available. To address these challenging cases, CONSRANK has now been modified to include a contact-based clustering of the models as a preliminary step of the scoring process. We used an agglomerative hierarchical clustering based on the number of common inter-residue contacts within the models. Two criteria, with different thresholds, were explored in the cluster generation, setting either the number of common contacts or of total clusters. For each clustering approach, after selecting the top (most populated) ten clusters, CONSRANK was run on these clusters and the top-ranked model for each cluster was selected, in the limit of 10 models per target. We have applied our modified scoring approach, Clust-CONSRANK, to SCORE_SET, a set of CAPRI scoring models made recently available by CAPRI assessors, and to the subset of homodimeric targets in CAPRI Round 30 for which CONSRANK failed to include a correct solution within the ten selected models. Results show that, for the challenging cases, the clustering step typically enriches the ten top ranked models in native-like solutions. The best performing clustering approaches we tested indeed lead to more than double the number of cases for which at least one correct solution can be included within the top ten ranked models.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27846259 PMCID: PMC5112798 DOI: 10.1371/journal.pone.0166460
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Scoring results for the analysed targets/interfaces.
| Target.Interface | # models | NL | %NL | H | M | A | I | R | CONSRANK | Clust-CONSRANK | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| S25 | S30 | C40 | C50 | C60 | C80 | MC200 | MC/5 | MC/10 | ||||||||||
| 2083 | 144 | 6.9 | 2 | 72 | 70 | 1629 | 310 | 10/9* | 2/2* | 2/2* | 4/3* | 4/2* | 4/2* | 3/2* | 2/1* | 2/1* | 2/1* | |
| 1343 | 2 | 0.15 | 0 | 0 | 2 | 1104 | 237 | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | |
| 599 | 15 | 2.5 | 0 | 3 | 12 | 557 | 27 | 0/0* | 0/0* | 2/0* | 0/0* | 0/0* | 1/0* | 1/0* | 2/0* | 2/0* | 1/0 | |
| 499 | 2 | 0.40 | 0 | 0 | 2 | 465 | 32 | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | |
| 1500 | 78 | 5.2 | 8 | 35 | 35 | 1060 | 362 | 9/9* | 0/0* | 0/0* | 3/2* | 1/1* | 0/0* | 0/0* | 1/1* | 1/1* | 2/1* | |
| 1400 | 4 | 0.29 | 0 | 3 | 1 | 1257 | 139 | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 1/1* | |
| 2180 | 354 | 16 | 90 | 141 | 123 | 1531 | 295 | 10/10* | 2/2* | 1/1* | 5/4* | 6/4* | 7/5* | 6/4* | 1/1* | 4/4* | 2/2* | |
| 2180 | 134 | 6.2 | 86 | 22 | 26 | 1751 | 295 | 0/0* | 1/1* | 1/1* | 2/2* | 1/1* | 1/1* | 1/1* | 1/1* | 1/1* | 1/1* | |
| 1200 | 299 | 25 | 2 | 99 | 198 | 730 | 171 | 10/5* | 1/0* | 1/0* | 3/1* | 5/2* | 5/2* | 4/2* | 4/1* | 5/2* | 3/1* | |
| 1699 | 24 | 1.4 | 0 | 0 | 24 | 1297 | 378 | 0/0* | 0/0* | 1/0* | 0/0* | 0/0* | 1/0* | 0/0* | 1/0* | 0/0* | 0/0* | |
| 1051 | 600 | 57 | 278 | 301 | 21 | 388 | 63 | 10/10* | 1/1* | 1/1* | 8/8* | 6/6* | 4/4* | 1/1* | 6/6* | 8/8* | 2/2* | |
| 1451 | 124 | 8.6 | 0 | 35 | 89 | 1141 | 184 | 0/0* | 2/0* | 2/1* | 0/0* | 1/0* | 1/1* | 2/0* | 3/1* | 2/0* | 2/2* | |
| 1400 | 101 | 7.2 | 0 | 9 | 92 | 1090 | 209 | 3/0* | 1/0* | 1/0* | 0/0* | 1/0* | 2/0* | 1/0* | 1/0* | 1/0* | 1/0* | |
| 1400 | 19 | 1.4 | 0 | 1 | 18 | 1195 | 185 | 0/0* | 0/0* | 0/0* | 1/0* | 0/0* | 0/0* | 0/0* | 1/0* | 0/0* | 1/0* | |
| 914 | 6 | 0.66 | 0 | 0 | 6 | 659 | 249 | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | |
| 999 | 20 | 2.0 | 0 | 7 | 13 | 701 | 278 | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | |
| 999 | 63 | 6.3 | 0 | 7 | 56 | 658 | 278 | 0/0* | 2/0* | 2/0* | 3/0* | 3/0* | 1/0* | 2/0* | 2/0* | 2/0* | 1/0* | |
| 999 | 2 | 0.20 | 0 | 0 | 2 | 719 | 278 | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | 0/0* | |
| 1010 | 30 | 3.0 | 0 | 5 | 25 | 942 | 38 | 0/0* | 1/0* | 1/0* | 1/0* | 1/0* | 1/0* | 0/0* | 0/0* | 0/0* | 1/0* | |
| 1010 | 25 | 2.5 | 0 | 8 | 17 | 947 | 38 | 0/0* | 1/0* | 1/0* | 2/1* | 1/0* | 2/1* | 1/0* | 1/0* | 1/0* | 1/0* | |
Columns 1–9: features of analysed targets: H, M, A, I and R indicate the high, medium-quality, acceptable, incorrect and removed models. NL stays for native-like, that is the sum of H, M and A models. Columns 10–19: results of the scoring with original CONSRANK scoring algorithm and with different combined Consrank-clustering approaches. In each column the total number of NL/H+M* models per target/interface is reported. “S” stays for Single, “C” for Complete, with numbers indicating the corresponding thresholds used in the clustering; MC200, MC/5 and MC/10” indicate results of a clustering approach with a maximum number of clusters fixed to 200 and to 1/5 and 1/10 of the total number of models per target, respectively (see Methods) Positive results are highlighted in cyan.
Fig 1Schematic representation of the CONSRANK and Clust-CONSRANK workflow.
Number of interfaces for which at least one acceptable/high-medium quality (*) solution has been selected by each scoring approach.
| Method | Total Interfaces with ≥ 1 NL/HM |
|---|---|
| 6/5* | |
| 10/4* | |
| 12/5* | |
| 10/7* | |
| 11/6* | |
| 12/7* | |
| 10/5* | |
| 13/7* | |
| 11/6* | |
| 14/8* | |
Fig 2T50 scoring.
(a) X-ray structure contact map obtained by COCOMAPS [28] (left) and consensus map from the 1451 available models (right). (b, c) 3D representation of the T50 target experimental structure and of selected models by CONSRANK (b) and by Clust-CONSRANK—MC/10 (c). X-ray receptor and ligand are colored silver and gold, respectively. Ligands of models selected by CONSRANK are colored deep blue, while incorrect and correct solutions selected by MC/10 are colored light blue and hot pink, respectively.
Fig 3T86 scoring.
(a) Consensus map (from the 1010 models) and contact map of the two target assessed interfaces (above) and consensus maps from the models in the 2nd, 3rd and 7th MC/10 clusters (below). Regions highlighted in the maps correspond to specific models/interfaces. For the color code, see below. (b, c) 3D representation of the T86 target experimental structure and of selected models by CONSRANK (b) and by Clust-CONSRANK—MC/10 (c). X-ray receptor is colored in silver, while the ligand at the interface 1 and 2 is colored in gold and copper, respectively. Ligands of models selected by CONSRANK are colored deep blue, incorrect solutions selected by MC/10 are colored light blue, while correct solutions according to interface 1 and 2 are colored hot pink and green, respectively.