Zeynep Kurkcuoglu1, Alexandre M J J Bonvin1. 1. Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, the Netherlands.
Abstract
Incorporating the dynamic nature of biomolecules in the modeling of their complexes is a challenge, especially when the extent and direction of the conformational changes taking place upon binding is unknown. Estimating whether the binding of a biomolecule to its partner(s) occurs in a conformational state accessible to its unbound form ("conformational selection") and/or the binding process induces conformational changes ("induced-fit") is another challenge. We propose here a method combining conformational sampling using ClustENM-an elastic network-based modeling procedure-with docking using HADDOCK, in a framework that incorporates conformational selection and induced-fit effects upon binding. The extent of the applied deformation is estimated from its energetical costs, inspired from mechanical tensile testing on materials. We applied our pre- and post-docking sampling of conformational changes to the flexible multidomain protein-protein docking benchmark and a subset of the protein-DNA docking benchmark. Our ClustENM-HADDOCK approach produced acceptable to medium quality models in 7/11 and 5/6 cases for the protein-protein and protein-DNA complexes, respectively. The conformational selection (sampling prior to docking) has the highest impact on the quality of the docked models for the protein-protein complexes. The induced-fit stage of the pipeline (post-sampling), however, improved the quality of the final models for the protein-DNA complexes. Compared to previously described strategies to handle conformational changes, ClustENM-HADDOCK performs better than two-body docking in protein-protein cases but worse than a flexible multidomain docking approach. However, it does show a better or similar performance compared to previous protein-DNA docking approaches, which makes it a suitable alternative.
Incorporating the dynamic nature of biomolecules in the modeling of their complexes is a challenge, especially when the extent and direction of the conformational changes taking place upon binding is unknown. Estimating whether the binding of a biomolecule to its partner(s) occurs in a conformational state accessible to its unbound form ("conformational selection") and/or the binding process induces conformational changes ("induced-fit") is another challenge. We propose here a method combining conformational sampling using ClustENM-an elastic network-based modeling procedure-with docking using HADDOCK, in a framework that incorporates conformational selection and induced-fit effects upon binding. The extent of the applied deformation is estimated from its energetical costs, inspired from mechanical tensile testing on materials. We applied our pre- and post-docking sampling of conformational changes to the flexible multidomain protein-protein docking benchmark and a subset of the protein-DNA docking benchmark. Our ClustENM-HADDOCK approach produced acceptable to medium quality models in 7/11 and 5/6 cases for the protein-protein and protein-DNA complexes, respectively. The conformational selection (sampling prior to docking) has the highest impact on the quality of the docked models for the protein-protein complexes. The induced-fit stage of the pipeline (post-sampling), however, improved the quality of the final models for the protein-DNA complexes. Compared to previously described strategies to handle conformational changes, ClustENM-HADDOCK performs better than two-body docking in protein-protein cases but worse than a flexible multidomain docking approach. However, it does show a better or similar performance compared to previous protein-DNA docking approaches, which makes it a suitable alternative.
The dynamic nature of biomolecules enables them to fulfill their functions in the cell such as catalysis, signaling, and regulation, while interacting with other bioentities. Understanding their structure and dynamics is therefore key to elucidate biological mechanisms, identify the underlying causes of diseases at molecular level and develop new therapeutical strategies. The extent of biomolecular flexibility can range from local side chain rotations of a few angstroms to global displacements like domain motions in the order of nanometers.1 Although experimental methods like X‐ray crystallography, Nuclear Magnetic Resonance (NMR) and cryo‐Electron Microscopy (EM) reveal biomolecular structures at atomic resolution, information about their mobility and flexibility is challenging to assess, especially for highly flexible biomolecules or supramolecules. Therefore, there is a need for complementary computational techniques to reveal the flexible properties and dynamics of such systems.2, 3, 4In regard to molecular recognition mechanisms, the puzzle still exists whether a biomolecule binds to a pre‐existing conformer out of the pool of conformers existing in solution for a given partner (“conformational selection”) or rather conformational changes are induced upon binding (“induced‐fit”).5, 6 There is increasing evidence showing that different states are indeed accessible in the absence of the partner6, 7 but it is also likely that both mechanisms play a role in molecular recognition.8, 9 A study on a set of complexes with rather limited conformational changes10 has addressed the ability of current computational techniques to capture the conformational changes upon binding and concluded that current “conformational selection” methods can capture 22% of unbound‐bound transitions, “induced‐fit” methods about 57%; but backbone motions (21%) are not yet adequately covered by any technique.Previously, ClustENM has been developed as an iterative and unbiased conformer generation technique, combining global modes from Elastic Network Model (ENM), with energy minimization and clustering.11, 12 ClustENM was applied to systems of different sizes and oligomeric states including ribosome,11, 13 highly flexible proteins like adenylate kinase and calmodulin,11, 12 and to protein‐small ligand docking.12 The generated conformers were in good agreement with experimental structures and conformations from molecular dynamics simulations. This illustrated the efficiency of ClustENM especially for large systems where molecular dynamics could become computationally time‐consuming and encounter difficulties in sampling large conformational changes. The method also has the potential to capture induced‐fit effects upon binding, as demonstrated for calmodulin where a peptide was docked onto an intermediate state between the unbound and bound forms: The conformational sampling applied to the docked complex indeed allowed to sample the bound state.12Following this observation, we propose here a method combining molecular docking with conformer generation by ClustENM, to account both for conformational selection and induced‐fit binding mechanisms. For this, we combine conformational sampling of the individual partners (pre‐docking sampling) with that of the docked models (post‐docking sampling). As molecular docking tool we use HADDOCK,14, 15, 16 an information‐driven approach for the modeling of biomolecular complexes, which has demonstrated sustained performance in CAPRI ‐ the blind modeling experiment for the prediction of complexes,17, 18 and is a widely used method for integrative modeling of biomolecular complexes.19 HADDOCK can handle mixed systems of proteins, nucleic acids and small molecules.To assess the performance of our ClustENM‐HADDOCK approach, we apply it to the protein‐protein flexible multidomain docking benchmark20 and to a subset of the protein‐DNA benchmark.21 Both datasets were used in previous studies focusing on modeling conformational changes, which allows us to make performance comparisons. Notably, we apply our method without prior knowledge of the extent and direction of the conformational changes.
MATERIALS AND METHODS
Dataset
We applied our method on two different datasets:The flexible multidomain docking benchmark (FMD)20 (Table 1A), where the ligands (smaller partner) are mainly rigid (0.5‐1.7 å) while the receptors undergo varying degrees of conformational change (between 1.6‐19.6 å) corresponding mainly to domain‐domain motions. In this case, in order to concentrate on the conformational change aspects, we assume we have an ideal interface information.
Table 1
List of protein‐protein (A) and protein‐DNA complexes (B) used for benchmarking
A. Protein‐protein flexible multidomain dataset20
Complex ID
Receptor ID
Ligand ID
Backbone conformational changes [å]
Receptor
Ligand
1IRA37
1G0Y_R38
1ILR_139
19.6
0.7
1H1V40
1D0N_B41
1IJJ_B42
13.8
1.6
1Y6443
1UX5_A44
2FXU_A45
10.3
1.1
1F6M46
1CL0_A47
2TIR_A48
7.3
0.9
1FAK49
1QFK_HL50
1TFH_B51
6.3
1.0
1ZLI52
2JTO_A53
1KWM_A54
4.0
0.6
1E4K55
3AVE_AB56
1FNL_A57
2.9
1.7
1IBR58
1F59_A59
1QG4_A60
2.9
1.1
1KKL61
1JB1_A62
2HPR63
2.2
0.5
1NPE64
1KLO_A65
1NPE_A64
1.9
‐
1DFJ66
2BNH_A67
9RSA_B68
1.6
0.7
A subset of the protein‐DNA benchmark21 for which experimental information about interfaces is available (Table 1B). Both protein and DNA (“ligand”) undergo conformational changes, but we focused here on the flexibility of the DNA.List of protein‐protein (A) and protein‐DNA complexes (B) used for benchmarkingBoth datasets have been previously used in benchmarking studies with HADDOCK. In the first case a multibody docking approach was followed in which the protein showing rigid‐body motion conformational changes was cut into domains with connectivity restraints between them,20 while in the second case a two steps docking strategy was followed in which DNA conformations from the first docking run were used to generate an ensemble of pre‐bent DNA conformations for a second, ensemble‐based docking stage.21
ClustENM method
ClustENM is an unbiased, iterative conformational sampling method combining ENM, energy minimization and clustering.11, 12 In ClustENM, first the initial structure is energetically minimized in implicit solvent, then a modified version of ENM (mixed resolution ENM)22 is applied to it to extract low frequency normal modes. The modified version of ENM employed in ClustENM involves placing the nodes at the residue centroids where node pairs within a cut‐off distance of 10 å are connected by elastic springs, thereby representing the residues as low‐resolution nodes. The spring constant connecting the node pairs is proportional to the total number of interacting atom pairs between the connected residues (bonded or nonbonded) within the specified cut‐off. Based on the sudden break in the smooth progression of eigenvalues in the eigenvalue spectrum, we define the number of modes “m” which are to be linearly combined using coefficients −1, 0 and 1 to obtain deformation vectors (3m vectors in total). The minimized structure is deformed in the direction of the deformation vectors with a fixed deformation RMSD d, where the deformation vector of a residue centroid is applied to all the atoms of that residue. The resulting structures are clustered based on mutual RMSD of heavy atoms with a cut‐off equal to d. The cluster containing the parent structure is discarded and a representative structure from each cluster is selected. Minimization‐ENM‐deformation‐clustering steps are applied on each representative structure for k generations.
Energy minimization
For proteins, the energy minimization in implicit solvent was performed using NAMD v2.1023 with the CHARMM22 force field.24 A generalized Born implicit solvent model was used with a 16 å cut‐off for non‐bonded interactions and a 14 å cut‐off for Born radius. The ion concentration was set to 0.3 M. 4000 steps of conjugate gradient were applied.The DNA structures and protein‐DNA complexes were minimized in implicit solvent using AMBER14 with the ff14SB force field parameters.25 The pairwise generalized Born model26, 27 was used with a 16 å cut‐off for non‐bonded interactions. We used the modified generalized Born theory‐based Debye‐Huckel limiting law for ion screening of interactions.28 The concentration of 1‐1 mobile counter‐ions in solution was set to 0.1 M. Steepest descent was applied for 500 steps followed by conjugate gradient with a maximum number of steps of 4000 and a convergence criterion of 0.01 kcal/mol/å.
Deformation Parameter d Estimation
The advantage of ClustENM is its computational efficiency in yielding conformers at atomic resolution and its reduction in the redundancy of the sampled conformations due to clustering. However, the degree of conformational change is usually arbitrarily defined by the user. To overcome this problem and inspired from mechanical tensile testing on material, we deformed the structures in the direction of the slow modes to assess the extent of flexibility. First, we determined the relevant low frequency modes based on the sudden break in eigenvalue spectrum; then we deformed the structure constantly in their direction and determined the energy of the deformed state by energy‐minimization. Based on the energy vs strain (deformation) curve and the reference energy state of the initial, non‐deformed structure, we assessed the maximum deformation the system could handle. The results of such energy‐deformation curves for the ribonuclease inhibitor structure (1DFJ) are shown in Figure 1. According to the consensus of four modes, the structure can handle a RMSD deformation of 1‐2 å, which actually covers the observed 1.6 å RMSD conformational change upon binding to its partner. This prediction scheme is applied to all receptors in the flexible multidomain protein docking benchmark and the DNA structures in the protein‐DNA dataset to determine the deformation RMSD d used in ClustENM. The same procedure is then applied to the docked complexes, to estimate d for post‐docking ClustENM using these complexes as starting point.
Figure 1
Tensile test on ribonuclease inhibitor structure along the direction of normal modes. Based on eigenvalue spectrum shown on top, the first four slowest modes are selected for application of tensile testing. The structure is first minimized (“Start”), then deformations of 1‐5 å are introduced in both negative and positive directions along each mode. To visualize the type of motions, the first and second slowest modes are also shown in the right panel. The energy of the structure increases as the deformation along an individual mode increases. The stable region corresponds to the deformations that keep the energy level below the reference state: For the first and second modes the energy starts increasing after 3 å, for the third mode after 2 å and the fourth mode after 1 å in both directions. The consensus of the four modes hints to a maximum deformation of 1‐2 å for this structure [Color figure can be viewed at http://wileyonlinelibrary.com]
Tensile test on ribonuclease inhibitor structure along the direction of normal modes. Based on eigenvalue spectrum shown on top, the first four slowest modes are selected for application of tensile testing. The structure is first minimized (“Start”), then deformations of 1‐5 å are introduced in both negative and positive directions along each mode. To visualize the type of motions, the first and second slowest modes are also shown in the right panel. The energy of the structure increases as the deformation along an individual mode increases. The stable region corresponds to the deformations that keep the energy level below the reference state: For the first and second modes the energy starts increasing after 3 å, for the third mode after 2 å and the fourth mode after 1 å in both directions. The consensus of the four modes hints to a maximum deformation of 1‐2 å for this structure [Color figure can be viewed at http://wileyonlinelibrary.com]The ClustENM parameters used for the conformer generation in the pre‐ and post‐sampling stages are listed in Table 2A and B for the FMD and protein‐DNA datasets, respectively.
Table 2
ClustENM pre‐ and post‐sampling parameters
A. Protein‐protein complexes of the FMD benchmark
Sampling prior to docking (receptor only)
Sampling after docking (on complex structure)a
Complexes
Deformation RMSD [å]
Number of modes
Deformation RMSD [å]
Number of modes
1IRA
3
3
1
5
1H1V
2
3
1
4
1Y64
4
3
2/3
3
1F6M
2
3
1
5/4
1FAK
2
3
1
3
1ZLI
1
3
1
5
1E4K
1
4
2/3
3
1IBR
2
5
1
5
1KKL
2
5
1/2
3
1NPE
3
4
1/2
3
1DFJ
2
4
1
4/5
The presence of two different values indicates the use of different parameters for the complexes. The first value was used for the complex from the best cluster and the second value for the second‐best cluster. Single value means same parameters were used for both complex.
ClustENM pre‐ and post‐sampling parametersThe presence of two different values indicates the use of different parameters for the complexes. The first value was used for the complex from the best cluster and the second value for the second‐best cluster. Single value means same parameters were used for both complex.
HADDOCK settings
We used the HADDOCK2.2 webserver16 for all dockings in this study. For the FMD benchmark, the docking was performed with “ideal” interface information, that is, the interface residues from one subunit within a 5 å distance from any atom of the partner were selected as active. No passive residues were defined. This effectively brings the interfaces together during the docking without predefining their relative orientation since no specific contacts are defined. Random removal of ambiguous interface restraints (AIR) was switched off. Since we use ensembles of conformations, the sampling was increased to 10 000/400/400 structures for rigid (it0), semi‐flexible(it1) and water stages, respectively. Clustering was performed using the fraction of common contacts (FCC)29 with a cut‐off of 0.75 and a minimum cluster size of 4.For protein‐DNA docking, the experimental restraints described in the study by van Dijk and Bonvin, 201021 were used to drive the docking. Since we do not use the ideal interface information here, random removal of AIRs was switched on. The sampling was increased to 10 000/400/400 for the it0/it1/water stages, respectively. Because of the high charge of DNA, the value for the dielectric constant was set to 78. FCC clustering was used with 0.75 cut‐off and a minimum cluster size of 4.Details on the data used to drive the docking for both protein‐protein and protein‐DNA systems are given in Tables S1 and S2 of the Supporting Information in Data S1.
HADDOCK scoring
The HADDOCK score was used to select the top pose of the top two cluster for the post‐docking sampling stage and also to score the models resulting from that post‐docking sampling. The HADDOCK scoring function (HS)17, 30 is a linear weighted sum of energetic terms:where , , and stand for van der Waals, Coulomb electrostatics, desolvation and restraint energies, respectively. The non‐bonded components of the score (, ) were calculated with the OPLS forcefield,31 the desolvation energy is a solvent accessible surface area‐dependent empirical term32 which estimates the energetic gain or penalty of burying specific sidechains upon complex formation. The restraint energy term was only used for ranking the docking poses but was excluded for the scoring of the post‐docking sampling models.
Quality assessment
The quality of the modeled complexes was assessed based on the well‐established CAPRI criteria33:Interface RMSD (i‐RMSD): Backbone RMSD of the interface residues, defined as all residues having at least one atom within 10 å of an atom on the other partnerLigand RMSD (l‐RMSD): Backbone RMSD of the ligand (smaller partner in the complex) after fitting on the backbone of the receptorFraction of native contacts (fnat): The number of correct residue‐residue contacts in the predicted complex divided by the number of residue‐residue contacts in the target complex. Two residues are defined to be in contact if they have any atom within 5 å distance to each other.Using i‐RMSD, l‐RMSD and fnat, the quality of models is defined as:High (three‐stars): fnat ≥0.5 with l‐RMSD ≤1 å or i‐RMSD ≤1 åMedium (two‐stars): fnat ≥0.3 with 1 å < l‐RMSD ≤5 å or 1 å < i‐RMSD ≤2 åAcceptable (one‐star): fnat ≥0.1 with 5 å < l‐RMSD ≤10 å or 2 å < i‐RMSD ≤4 åIncorrect (zero‐star): fnat <0.1
General workflow
The modeling pipeline combines conformational selection and induced‐fit mechanisms using conformational sampling prior and after complexation, respectively. Pre‐sampling was applied to the receptor in the case of protein‐protein docking, and to the DNA for the protein‐DNA dataset. The pipeline consists of the following steps (also shown in Figure 2):
Figure 2
Pipeline for ClustENM‐HADDOCK [Color figure can be viewed at http://wileyonlinelibrary.com]
The initial structure is energetically minimized and subjected to elastic network modeling to define the parameters for ClustENM. The eigenvalue spectrum is extracted to decide on the number of modes (“m”) to be used. The deformation test is applied to specify the deformation RMSD (“d”) in ClustENM.Conformers are generated using ClustENM with the parameters defined in Step 1. For the proteins in the FMD dataset the number of generations is set to two for each system. For the DNA structures in the protein‐DNA dataset we increased the number of generations to four since DNA conformational sampling is computationally much cheaper due to the DNA's smaller size. During the generation, clusters containing the parent structures are discarded.A maximum of 20 conformations are selected based on the minimization energy, to be used in ensemble docking. The initial structure is included in the ensemble regardless of its energy value.Ensemble semi‐flexible docking is performed in HADDOCK.The best scoring docked model from the best two scoring clusters are selected for the second round of ClustENM (post‐sampling).Parameters for ClustENM are determined for each structure (same as in Step 1)ClustENM is applied to each selected docked complex for two generations using parameters from Step 6 resulting in the final models.The final models are scored using HADDOCK scoring and their quality is assessed.Pipeline for ClustENM‐HADDOCK [Color figure can be viewed at http://wileyonlinelibrary.com]
RESULTS AND DISCUSSION
Sampling prior to docking
Protein‐protein cases from the FMD benchmark
Starting from the unbound state of the receptors in the FMD benchmark, two generations of ClustENM sampling were applied prior to docking. The RMSDs (closest and range) of the receptor conformers to the reference bound state are given in Table 3 and the resulting ClustENM ensembles are shown in Figure 3. The RMSD range for the ensemble reflects the blind sampling of ClustENM: Some of the sampled conformations get closer to the bound state and others move further away from it. The number of conformers at the end of two generations of sampling is not so large thanks to the clustering steps in ClustENM. Two generations of ClustENM is clearly not sufficient to reach bound‐like conformations, especially for the challenging cases like 1IRA and 1H1V, where the RMSD of the closest structure to the bound state is almost the same as the unbound state (less than 1 å deviation from unbound). Figure 3 shows the generated ensemble for each case, which also reveals the expected‐to‐fail cases of 1IRA, 1H1V, 1Y64, 1FAK where the large domain motions could not be sampled in two generations. The remaining cases however do not reflect this issue.
Table 3
RMSD values from the bound conformation for the receptors in FMD benchmark
Complex ID
Receptor RMSD [å] unbound
Receptor RMSD [å] closest generated
Receptor RMSD [å] range in ensemble
Number of conformers inensemble
1IRA
19.6
19.1
19.1‐20.4
8
1H1V
13.8
13.2
13.2‐14.0
11
1Y64
10.3
6.98
6.98‐17.0
16
1F6M
7.32
6.40
6.40‐10.0
17
1FAK
6.33
5.57
5.57‐7.58
16
1ZLI
4.04
3.73
3.73‐5.06
15
1E4K
2.91
2.85
2.85‐3.53
6
1IBR
2.96
1.55
1.55‐6.26
20
1KKL
2.64
2.72
2.72‐3.34
8
1NPE
1.91
1.52
1.52‐4.86
22
1DFJ
1.55
1.09
1.09‐4.61
20
Figure 3
Ensembles of receptor conformers generated by ClustENM for protein‐protein docking. The starting structure is shown in blue, the bound conformation in red and the ClustENM generated conformers in green [Color figure can be viewed at http://wileyonlinelibrary.com]
RMSD values from the bound conformation for the receptors in FMD benchmarkEnsembles of receptor conformers generated by ClustENM for protein‐protein docking. The starting structure is shown in blue, the bound conformation in red and the ClustENM generated conformers in green [Color figure can be viewed at http://wileyonlinelibrary.com]
Protein‐DNA dataset
A standard B‐DNA conformation built using our 3D‐DART webserver34 was the starting conformation for all protein‐DNA cases. We applied four generations of ClustENM on the DNA structures. The RMSD ranges of the resulting ensemble are given in Table 4. As in the case of the protein FMD benchmark, both close‐to‐bound and further‐away DNA conformations were sampled. Figure 4 shows the sampled DNA conformers for each case.
Table 4
RMSD values from the bound conformation for the DNA structures in the protein‐DNA dataset
Complex ID
RMSD [å]B‐DNA
RMSD [å]Closest generated
RMSD range [å]Ensemble
Number of conformers inensemble
1BY4
1.63
1.71
1.63‐4.01
20
3CRO
2.66
1.69
1.79‐4.06
20
1AZP
3.95
2.90
2.90‐4.12
20
1JJ4
3.66
2.00
2.00‐7.80
20
1A74
8.34
4.98
4.98‐9.18
20
1ZME
4.77
2.58
2.58‐5.82
20
Figure 4
Ensembles of DNA conformers generated by ClustENM for protein‐DNA dockings. Starting structures are shown in blue, bound structure in red and the generated conformers in orange [Color figure can be viewed at http://wileyonlinelibrary.com]
RMSD values from the bound conformation for the DNA structures in the protein‐DNA datasetEnsembles of DNA conformers generated by ClustENM for protein‐DNA dockings. Starting structures are shown in blue, bound structure in red and the generated conformers in orange [Color figure can be viewed at http://wileyonlinelibrary.com]
Docking
After the pre‐docking conformational sampling, the resulting ensembles of conformers were used for ensemble docking with HADDOCK following the settings specified in Materials and Methods. This docking phase corresponds to the “conformational selection” part of our pipeline (with some induced fit as well since the flexible stage of HADDOCK allows for limited conformational changes). In this stage, ideally, HADDOCK should select the best conformers in its docking workflow from rigid body to final flexible refinement in explicit solvent. The best scoring docking pose from the top 2 ranked clusters were subsequently selected for the post‐docking induced‐fit part of the pipeline. The selected poses and their i‐RMSD, l‐RMSD, fnat and CAPRI quality metrics are shown in Figures 5 and 6 for the FMD and protein‐DNA docking, respectively.
Figure 5
Protein‐protein docking top pose from the best two clusters of HADDOCK (green receptor/blue ligand) superimposed onto bound structure (red) [Color figure can be viewed at http://wileyonlinelibrary.com]
Figure 6
Protein‐DNA docking top pose from the best two clusters of HADDOCK (green receptor/orange DNA) superimposed on bound structure (gray) [Color figure can be viewed at http://wileyonlinelibrary.com]
Protein‐protein docking top pose from the best two clusters of HADDOCK (green receptor/blue ligand) superimposed onto bound structure (red) [Color figure can be viewed at http://wileyonlinelibrary.com]Protein‐DNA docking top pose from the best two clusters of HADDOCK (green receptor/orange DNA) superimposed on bound structure (gray) [Color figure can be viewed at http://wileyonlinelibrary.com]In the FMD dataset, we were able to obtain models of at least acceptable quality in 7/11 cases. The failed cases are, as expected, 1IRA, 1H1V, 1Y64, and 1FAK, which all undergo large domain motions. As already explained in the “Sampling prior docking” section, the conformational sampling using only two generations was clearly not sufficient to model such large conformational change in these cases.As for the protein‐DNA complexes, HADDOCK was able to obtain at least acceptable quality structures in 4/6 cases as shown in Figure 6. The binding mode in 1AZP is reverted (180° rotation), which already nullifies its chance to be a successful case at the end of the pipeline. In this particular case, however, near native poses were generated during docking but did not end up in the top 2 clusters.
Post‐docking sampling
We applied a second round of ClustENM to the best ranking model of the top 2 ranked clusters obtained from docking to perform the “induced‐fit” stage of our pipeline. The resulting models were ranked using the HADDOCK score calculated after a short energy minimization of each model.17 The results for protein‐protein and protein‐DNA systems are presented in the following sections.The quality of the starting and final conformers is summarized in Table 5, together with the rank of the best generated conformer. Additionally, we show the change in i‐RMSD, l‐RMSD and fnat through the “generations” of post‐sampling ClustENM in Figures S1–S3 in Data S1, respectively. We were able to obtain acceptable/medium quality (1‐2‐star) poses ranked within the top 20 in 7/11 cases, 4 of which were ranked as top 1. Unsurprisingly, the failed cases are no other than 1IRA, 1H1V, 1Y64 and 1FAK. Figures S1–S3 in Data S1 indicate that the quality of the starting docked model (which is the result of pre‐sampling followed by docking) is the limiting factor for the final complex structures. Indeed the “induced‐fit” sampling phase does not dramatically change the quality of the final conformers for the FMD benchmark. Conformational sampling prior to docking is thus contributing more to the quality of the final complexes than post‐sampling in the case of the protein‐protein FMD benchmark. The final complex structures are shown in Figure 7.
Table 5
Quality and rank of the sampled conformers after the docking—FMD benchmark
Quality—Number of conformers
Quality/Rank
Complex ID
Starting docked models
All ‐post‐docking ClustENM generated complex models
Best conformer
First acceptable or better model
1IRA
0*,0**,0***—2
0*,0**,0***—58
—
—
1H1V
0*,0**,0***—2
0*,0**,0***—17
—
—
1Y64
0*,0**,0***—2
0*,0**,0***—31
—
—
1F6M
1*,0**,0***—2
22*,0**,0***—72
*/15
*/15
1FAK
0*,0**,0***—2
0*,0**,0***—30
‐
‐
1ZLI
1*,1**,0***—2
20*,7**,0***—30
**/1
**/1
1E4K
1*,0**,0***—2
13*,0**,0***—28
*/1
*/1
1IBR
1*,0**,0***—2
27*,0**,0***—63
*/1
*/1
1KKL
0*,1**,0***—2
12*,3**,0***—30
**/20
*/16
1NPE
1*,1*,0***—2
17*,17**,0***—34
**/15
*/1
1DFJ
0*,2**,0***—2
0*,18**,0***—18
**/1
**/1
Figure 7
Final protein‐protein complexes for the protein‐protein FMD benchmark. The receptor is shown in green, the ligand in blue and the reference bound structure in red. The quality of the best conformer (listed in Table 5) together with its i‐RMSD [å], l‐RMSD [å] and fnat is given for each protein‐protein complex. In cases where no acceptable or better model is present, the quality of best scoring complex is given [Color figure can be viewed at http://wileyonlinelibrary.com]
Quality and rank of the sampled conformers after the docking—FMD benchmarkFinal protein‐protein complexes for the protein‐protein FMD benchmark. The receptor is shown in green, the ligand in blue and the reference bound structure in red. The quality of the best conformer (listed in Table 5) together with its i‐RMSD [å], l‐RMSD [å] and fnat is given for each protein‐protein complex. In cases where no acceptable or better model is present, the quality of best scoring complex is given [Color figure can be viewed at http://wileyonlinelibrary.com]
Protein‐DNA cases
Table 6 shows the quality of the starting and final conformers with the rank of the best conformer for the protein‐DNA dataset. The change in i‐RMSD, l‐RMSD and fnat through the ClustENM generations are shown in Figures S4–S6 in Data S1, respectively. We obtained 1‐2 star poses for 5 out of 6 cases, 3 of which were ranked as top 1 and one as top 2 and the last one as 21st. As expected, there is no successful conformation generated for 1AZP.
Table 6
Quality and rank of the sampled conformers after the docking—protein‐DNA dataset
Quality—Number of conformers
Quality/Rank
Complex ID
Starting docked models
All post‐docking ClusENM generated complex models
Best conformer
First acceptable or better model
1BY4
1*, 1**, 0***—2
59*,15**,0***—75
**/1
**/1
3CRO
1*,1**,0***—2
84*,74**,0***—158
**/1
**/1
1AZP
0*,0**,0***—2
0*,0**,0***—174
—
—
1JJ4
1*,1**,0***—2
235*,52**,0***—287
**/2
*/1
1A74
2*,0**,0***—2
34*,0**,0***—53
*/1
*/1
1ZME
0*,0**,0***—2
5*,0**,0***—91
*/21
*/21
Quality and rank of the sampled conformers after the docking—protein‐DNA datasetDifferent from the protein‐protein FMD benchmark, sampling after docking for induced‐fit increases the quality of the models for the protein‐DNA cases as shown in Figures S4–S6 in Data S1. For example, in the case of 1ZME, we were able to obtain 1‐star quality structures by starting from zero quality docking poses. Nevertheless, the quality of the starting structures has again the most impact on the quality of the final conformers (shown in Figure 8).
Figure 8
Final ClustENM‐HADDOCK models for protein‐DNA complexes (green protein/orange DNA), aligned on bound structure (red). The quality of the best conformer (listed in Table 6) together with its i‐RMSD [å], l‐RMSD [å] and fnat is given for each protein‐DNA complex
Final ClustENM‐HADDOCK models for protein‐DNA complexes (green protein/orange DNA), aligned on bound structure (red). The quality of the best conformer (listed in Table 6) together with its i‐RMSD [å], l‐RMSD [å] and fnat is given for each protein‐DNA complex
Comparison with previous studies
FMD benchmark: ClustENM‐HADDOCK vs Multidomain Flexible docking and Two‐Body Docking
We compared the performance of our pipeline with that of the multidomain flexible docking and two‐body docking approaches reported in the study of Karaca and Bonvin20 (Table 7). For multidomain flexible docking, the possible hinge regions were predicted using HingeProt35; the proteins were cut at those hinges and connectivity restraints were defined between the domains, allowing rigid‐body motion conformational changes during docking. The two‐body docking was only allowing for limited conformational changes in the interface. The results indicate that the ClustENM‐HADDOCK approach performs much better than two‐body docking but worse than the multidomain flexible docking approach of Karaca and Bonvin.20 The latter is especially advantageous to effectively incorporate rigid‐body hinge motions compared to our elastic network‐based conformational sampling, where two generations of ClustENM are clearly insufficient for the systems undergoing extremely large conformational changes (e.g. 1IRA, 1H1V, 1Y64). This might however depend on the type of motions since, in a previous study,11 for the adenylate kinase system undergoing conformational change of 7 å for example, we were able to observe full closing and opening of the domains when we applied ClustENM for 5 generations.
Table 7
Comparison of ClustENM‐HADDOCK with previous protocols for FMD benchmark
ClustENM‐HADDOCK
Flexible Multidomain Docking results20
2‐Body Docking results20
Complex
Quality/Rank
i‐RMSD [å]
Fnat
Quality/Rank
i‐RMSD [å]
Fnat
Quality/Rank
i‐RMSD [å]
Fnat
1IRA
—
17.5
0.08
*/1
3.9
0.55
—
17.5
0.04
1H1V
—
9.3
0.21
*/11
4.6
0.49
—
11.9
0.08
1Y64
—
11.1
0.45
*/5
3.9
0.48
—
10.3
0.07
1F6M
*/15
5.2
0.36
*/1
3.5
0.69
—
14.1
0.00
1FAK
—
5.9
0.27
**/37
2.8
0.55
—
11.4
0.01
1ZLI
**/1
2.9
0.48
**/1
2.1
0.74
—
14.8
0.02
1E4K
*/1
3.9
0.48
**/5
2.3
0.70
*/1
4.1
0.58
1IBR
*/1
3.6
0.35
**/1
2.3
0.63
—
9.6
0.11
1KKL
**/20
2.1
0.57
**/1
2.2
0.67
*/1
3.1
0.56
1NPE
**/15
1.4
0.80
**/16
1.2
0.95
**/1
1.7
0.85
1DFJ
**/1
1.9
0.69
**/5
2.0
0.68
**/116
1.8
0.63
Comparison of ClustENM‐HADDOCK with previous protocols for FMD benchmark
Protein‐DNA dataset: ClustENM‐HADDOCK vs unbound‐unbound docking using canonical B‐DNA models and unbound‐unbound docking using custom‐build B‐DNA models
We compared our protein‐DNA results with those reported by van Dijk and Bonvin21 to assess our performance against (i) one round of unbound docking starting from canonical B‐DNA models (“Unbound flex”) and (ii) two‐rounds of unbound docking where the second round uses an ensemble of custom‐built bent B‐DNA models obtained from an analysis of the conformations obtained during the first docking stage (“DNA lib”). The results in Table 8 indicate that ClustENM‐HADDOCK approach performs better than “Unbound flex” in 4/6 cases, the same in 1 case (1AZP) and worse in one case (1ZME) when the top 10 best scoring structures are considered. Generated acceptable quality solutions for 1ZME from ClustENM‐HADDOCK were not scored in the top 10 unfortunately (rank of first acceptable model is 21st). The “DNA lib” approach on the other hand was able to obtain at least acceptable quality models in all cases. One thing to note is that for both “Unbound flex” and “DNA lib” approaches, a multidomain flexible docking method (ie, cutting from the hinge(s) and using connectivity restraints) was used for the protein in the 1ZME case, which prevents a one‐to‐one comparison.
Table 8
Comparison of the performance of ClustENM‐HADDOCK with the previous protocols by van Dijk and Bonvin21 for the protein‐DNA dataset
ClustENM‐HADDOCK
Unbound flex21
DNA lib21
Complex
Quality
i‐RMSD [å]
Fnat
Quality
i‐RMSD [å]
Fnat
Quality
i‐RMSD [å]
Fnat
1BY4
7*,3**,0***
4.55 (0.70)
0.47 (0.05)
5*,0**,0***
5.87 (1.71)
0.17 (0.05)
4*,3**,0***
4.91 (2.32)
0.27 (0.09)
3CRO
8*,2**,0***
3.19 (0.60)
0.50 (0.01)
6*,2**,0***
3.29 (0.68)
0.27 (0.07)
3*,7**,0***
2.62 (0.73)
0.40 (0.06)
1AZP
0*,0**,0***
6.25 (0.52)
0.37 (0.06)
0*,0**,0***
6.68 (2.26)
0.04 (0.04)
5*,0**,0***
4.00 (0.45)
0.10 (0.04)
1JJ4
5*,5**,0***
2.39 (0.18)
0.51 (0.03)
6*,0**,0***
4.55 (0.58)
0.16 (0.07)
9*,1**,0***
3.62 (0.38)
0.21 (0.07)
1A74
10*,0**,0***
3.81 (0.40)
0.45 (0.02)
8*,0**,0***
6.30 (0.46)
0.14 (0.04)
9*,1**,0***
3.37 (0.37)
0.24 (0.05)
1ZME
0*,0**,0***
8.08 (0.70)
0.50 (0.01)
4*,0**,0***
5.29 (0.59)
0.12 (0.06)
8*,0**,0***
4.63 (0.80)
0.15 (0.04)
Note: Values between parenthesis correspond to standard deviations.
Comparison of the performance of ClustENM‐HADDOCK with the previous protocols by van Dijk and Bonvin21 for the protein‐DNA datasetNote: Values between parenthesis correspond to standard deviations.Based on the quality of the modeled complexes generated by our ClustENM‐HADDOCK approach, our pipeline offers a nice alternative for protein‐DNA docking cases.
CONCLUSIONS
We applied our ClustENM‐HADDOCK pre‐ and post‐sampling method to protein‐protein and protein‐DNA complexes to incorporate dynamics and simulate conformational selection and induced‐fit effects upon binding. The method does not use any prior knowledge on the direction and extent of the conformational changes. The extent of the applied deformation is estimated from its energetical costs, inspired from mechanical tensile testing on material. Our pre‐sampling/docking/post‐sampling approach was able to produce acceptable to medium quality models in 7 out of 11 cases for the protein‐protein complexes and 5 out of 6 for the protein‐DNA complexes. We observed that the conformational selection (“pre‐sampling” and also docking) has the highest impact on the quality of the final models especially in the case of protein‐protein complexes. For protein‐DNA complexes, in contrast, the induced‐fit stage of the pipeline (post‐sampling) significantly improves the quality of the final models.Comparison of our method with previous studies showed that the previously proposed flexible multidomain docking remains the best performing approach, provided that reliable hinge region information is available. In the absence of such information, our approach could however be a good alternative to incorporate flexibility. As for the protein‐DNA cases, our ClustENM‐HADDOCK approach does perform better than standard flexible docking and shows a comparable performance with a two‐stage docking approach in which a library of pre‐bent DNA conformation is used in a second docking stage. This makes ClustENM‐HADDOCK a suitable alternative for modeling conformational changes in protein‐DNA binding.We also observed that 2‐generations of ClustENM for pre‐sampling was not sufficient in the cases of large conformational changes (eg, 1IRA, 1H1V, 1Y64). For these cases, the number of ClustENM generations could be increased further to obtain large domain motions. Additionally, here we considered only the best two scoring clusters from the docking as starting points for post‐sampling. Instead of two, we could select more clusters from docking and apply post‐sampling. This would have increased our chance of success for the 1AZP case for example (although near native poses were generated during the docking, these did not end up in the top 2 clusters).Data S1. Supporting InformationClick here for additional data file.
Authors: E A Galburt; M S Chadsey; M S Jurica; B S Chevalier; D Erho; W Tang; R J Monnat; B L Stoddard Journal: J Mol Biol Date: 2000-07-21 Impact factor: 5.469
Authors: Burak T Kaynak; James M Krieger; Balint Dudas; Zakaria L Dahmani; Mauricio G S Costa; Erika Balog; Ana Ligia Scott; Pemra Doruker; David Perahia; Ivet Bahar Journal: Front Mol Biosci Date: 2022-02-04