| Literature DB >> 33995918 |
Shahabeddin Sotudian1, Israel T Desta2, Nasser Hashemi1, Shahrooz Zarbafian1, Dima Kozakov3, Pirooz Vakili1, Sandor Vajda2,4, Ioannis Ch Paschalidis1,2,5.
Abstract
We develop a Regression-based Ranking by Pairwise Cluster Comparisons (RRPCC) method to rank clusters of similar protein complex conformations generated by an underlying docking program. The method leverages robust regression to predict the relative quality difference between any pair or clusters and combines these pairwise assessments to form a ranked list of clusters, from higher to lower quality. We apply RRPCC to clusters produced by the automated docking server ClusPro and, depending on the training/validation strategy, we show improvement by 24-100% in ranking acceptable or better quality clusters first, and by 15-100% in ranking medium or better quality clusters first. We compare the RRPCC-ClusPro combination to a number of alternatives, and show that very different machine learning approaches to scoring docked structures yield similar success rates. Finally, we discuss the current limitations on sampling and scoring, looking ahead to further improvements. Interestingly, some features important for improved scoring are internal energy terms that occur only due to the local energy minimization applied in the refinement stage following rigid body docking.Entities:
Keywords: Machine learning; Protein docking; Ranking
Year: 2021 PMID: 33995918 PMCID: PMC8102165 DOI: 10.1016/j.csbj.2021.04.028
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Performance comparison of ClusPro and RRPCC using 3-fold cross-validation.
| 1 | 3 | 6 | 7 | 1 | 6 | 6 | 2 | 3 | 3 | 1 | 3 | 3 | 8 | 3 | ||
| 1 | 3 | 6 | 7 | 3 | 7 | 8 | 3 | 4 | 4 | 3 | 4 | 4 | 10 | 7 | ||
| 1 | 3 | 7 | 7 | 2 | 5 | 8 | 2 | 5 | 5 | 2 | 3 | 5 | 11 | 5 | ||
| 9 (50%) | 19 | 21 | 6 | 18 | 22 | 7 (17%) | 12 | 12 | 6 | 10 | 12 | 29 | 15 | |||
| 10 | 9 | 16 | 17 | 6 | 16 | 16 | 4 | 7 | 9 | 3 | 7 | 7 | 21 | 14 | ||
| 10 | 9 | 14 | 18 | 8 | 14 | 17 | 4 | 6 | 10 | 4 | 7 | 9 | 23 | 10 | ||
| 10 | 3 | 12 | 17 | 3 | 13 | 17 | 3 | 5 | 7 | 3 | 5 | 7 | 22 | 12 | ||
| 21 (24%) | 42 | 52 | 17 | 43 | 50 | 11 (10%) | 18 | 26 | 10 | 19 | 23 | 66 | 36 | |||
| 10 | 3 | 8 | 9 | 2 | 8 | 9 | 1 | 2 | 2 | 0 | 2 | 2 | 11 | 4 | ||
| 10 | 5 | 8 | 10 | 4 | 8 | 9 | 1 | 3 | 4 | 1 | 2 | 4 | 16 | 6 | ||
| 10 | 8 | 10 | 12 | 8 | 10 | 12 | 3 | 5 | 5 | 3 | 5 | 5 | 19 | 7 | ||
| 16 (14%) | 26 | 31 | 14 | 26 | 30 | 5 (25%) | 10 | 11 | 4 | 9 | 11 | 46 | 17 | |||
Performance comparison of ClusPro and RRPCC using BM5 additions as the test set.
| 10 | 4 (100%) | 7 | 8 | 2 | 7 | 7 | 3 (50%) | 3 | 3 | 2 | 3 | 3 | ||
| 1 | 4 (100%) | 7 | 7 | 2 | 7 | 7 | 2 (100%) | 4 | 4 | 1 | 4 | 4 | ||
| 1 | 2 (100%) | 7 | 7 | 1 | 6 | 7 | 1 | 1 | 1 | 0 | 1 | 1 | ||
Case by case results for ClusPro with and without RRPCC-based ranking compared with ATTRACT results ranked by SOAP and PAUL-SOAP.
| PDBID | Type | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2VXT | A | – | * | * | – | * | * | * | * | * | * | * | * |
| 3L5W | A | – | – | – | – | – | * | * | * | * | * | * | * |
| 3MXW | A | * | * | * | * | * | * | * | * | * | * | * | * |
| 4DN4 | A | – | * | * | – | * | * | – | – | * | – | * | * |
| 4G6J | A | – | – | – | – | – | – | – | – | – | – | – | – |
| 4G6M | A | – | * | * | * | * | * | – | * | * | – | * | * |
| 2A1A | E | – | – | – | – | – | – | – | – | – | – | – | – |
| 2GAF | E | – | * | * | – | – | – | – | – | – | – | – | – |
| 2YVJ | E | * | * | * | * | * | * | * | * | * | * | * | * |
| 3A4S | E | * | * | * | * | * | * | – | * | * | * | * | * |
| 3H11 | E | – | * | * | * | * | * | – | * | * | – | * | * |
| 3K75 | E | – | – | – | – | – | – | – | – | – | – | – | – |
| 3PC8 | E | – | – | – | * | * | * | * | * | * | * | * | * |
| 3VLB | E | – | * | * | – | * | * | * | * | * | * | * | * |
| 4HX3 | E | – | – | – | – | – | – | – | – | – | – | – | – |
| 1M27 | O | – | * | * | – | * | * | – | – | – | – | – | – |
| 2GTP | O | – | – | – | – | – | – | – | – | – | – | – | – |
| 2X9A | O | – | * | * | – | * | * | – | – | – | – | – | - |
| 3BX7 | O | – | – | * | – | * | * | – | – | – | – | – | - |
| 3DAW | O | – | * | * | * | * | * | – | – | * | – | – | * |
| 3F1P | O | – | – | – | – | * | * | – | * | * | – | * | * |
| 3L89 | O | – | * | * | – | * | * | – | – | * | – | * | * |
| 3S9D | O | * | * | * | * | * | * | * | * | * | * | * | * |
| 1 | 4 | 4 | 2 | 4 | 5 | 3 | 4 | 5 | 3 | 5 | 5 | ||
| 2 | 5 | 5 | 4 | 5 | 5 | 3 | 5 | 5 | 4 | 5 | 5 | ||
| 1 | 5 | 6 | 2 | 7 | 7 | 1 | 2 | 4 | 1 | 3 | 4 | ||
| 4 | 14 | 15 | 8 | 16 | 17 | 7 | 11 | 14 | 8 | 13 | 14 | ||
| Type: Antibody containing (A), Enzyme containing (E), and Others (O). | |||||||||||||
Comparing RRPCC-based ranking with the original ClusPro and SWARMDOCK.
| 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 3 | 7 | 7 | 6 | 7 | 7 | 5 | 9 | 9 | 5 | 10 | 12 | |
| 2 | 12 | 13 | 4 | 13 | 14 | 0 | 5 | 6 | 7 | 8 | 10 | |
| Total | 5 | 20 | 21 | 10 | 21 | 22 | 5 | 14 | 15 | 12 | 18 | 22 |
Fig. 1The most important features in Enzymes. The first two features, var-mem-Elec (variance of Coulombic electrostatics potential of each member of the cluster) and mincomp-hbond-sr-bb (backbone-backbone hydrogen bonds close in primary sequence), are both relevant to intermolecular interactions.
Fig. 2The most important features in Antibodies. The first two features are movedcomp-rama-prepro (Ramachandran preferences) and var-mem-Elec (variance of Coulombic electrostatics potential of each member of the cluster).
Fig. 3The most important features in Others. The first three features are piper-ref (reference energy of each amino acid), mincomp-pro-close and piper-pro-close (proline ring closured energy and energy of psi angle of preceding residue).
Fig. 4The common eight features in all three classes of protein complexes.