Literature DB >> 31441121

Pre- and post-docking sampling of conformational changes using ClustENM and HADDOCK for protein-protein and protein-DNA systems.

Zeynep Kurkcuoglu¹, Alexandre M J J Bonvin¹.

Abstract

Incorporating the dynamic nature of biomolecules in the modeling of their complexes is a challenge, especially when the extent and direction of the conformational changes taking place upon binding is unknown. Estimating whether the binding of a biomolecule to its partner(s) occurs in a conformational state accessible to its unbound form ("conformational selection") and/or the binding process induces conformational changes ("induced-fit") is another challenge. We propose here a method combining conformational sampling using ClustENM-an elastic network-based modeling procedure-with docking using HADDOCK, in a framework that incorporates conformational selection and induced-fit effects upon binding. The extent of the applied deformation is estimated from its energetical costs, inspired from mechanical tensile testing on materials. We applied our pre- and post-docking sampling of conformational changes to the flexible multidomain protein-protein docking benchmark and a subset of the protein-DNA docking benchmark. Our ClustENM-HADDOCK approach produced acceptable to medium quality models in 7/11 and 5/6 cases for the protein-protein and protein-DNA complexes, respectively. The conformational selection (sampling prior to docking) has the highest impact on the quality of the docked models for the protein-protein complexes. The induced-fit stage of the pipeline (post-sampling), however, improved the quality of the final models for the protein-DNA complexes. Compared to previously described strategies to handle conformational changes, ClustENM-HADDOCK performs better than two-body docking in protein-protein cases but worse than a flexible multidomain docking approach. However, it does show a better or similar performance compared to previous protein-DNA docking approaches, which makes it a suitable alternative.

Entities: Chemical Disease Gene Mutation Species

Keywords: biomolecular complexes; conformational flexibility; elastic network modeling

Year: 2019 PMID： 31441121 PMCID： PMC6973081 DOI： 10.1002/prot.25802

Source DB: PubMed Journal: Proteins ISSN： 0887-3585

INTRODUCTION

The dynamic nature of biomolecules enables them to fulfill their functions in the cell such as catalysis, signaling, and regulation, while interacting with other bioentities. Understanding their structure and dynamics is therefore key to elucidate biological mechanisms, identify the underlying causes of diseases at molecular level and develop new therapeutical strategies. The extent of biomolecular flexibility can range from local side chain rotations of a few angstroms to global displacements like domain motions in the order of nanometers.1 Although experimental methods like X‐ray crystallography, Nuclear Magnetic Resonance (NMR) and cryo‐Electron Microscopy (EM) reveal biomolecular structures at atomic resolution, information about their mobility and flexibility is challenging to assess, especially for highly flexible biomolecules or supramolecules. Therefore, there is a need for complementary computational techniques to reveal the flexible properties and dynamics of such systems.2, 3, 4 In regard to molecular recognition mechanisms, the puzzle still exists whether a biomolecule binds to a pre‐existing conformer out of the pool of conformers existing in solution for a given partner (“conformational selection”) or rather conformational changes are induced upon binding (“induced‐fit”).5, 6 There is increasing evidence showing that different states are indeed accessible in the absence of the partner6, 7 but it is also likely that both mechanisms play a role in molecular recognition.8, 9 A study on a set of complexes with rather limited conformational changes10 has addressed the ability of current computational techniques to capture the conformational changes upon binding and concluded that current “conformational selection” methods can capture 22% of unbound‐bound transitions, “induced‐fit” methods about 57%; but backbone motions (21%) are not yet adequately covered by any technique. Previously, ClustENM has been developed as an iterative and unbiased conformer generation technique, combining global modes from Elastic Network Model (ENM), with energy minimization and clustering.11, 12 ClustENM was applied to systems of different sizes and oligomeric states including ribosome,11, 13 highly flexible proteins like adenylate kinase and calmodulin,11, 12 and to protein‐small ligand docking.12 The generated conformers were in good agreement with experimental structures and conformations from molecular dynamics simulations. This illustrated the efficiency of ClustENM especially for large systems where molecular dynamics could become computationally time‐consuming and encounter difficulties in sampling large conformational changes. The method also has the potential to capture induced‐fit effects upon binding, as demonstrated for calmodulin where a peptide was docked onto an intermediate state between the unbound and bound forms: The conformational sampling applied to the docked complex indeed allowed to sample the bound state.12 Following this observation, we propose here a method combining molecular docking with conformer generation by ClustENM, to account both for conformational selection and induced‐fit binding mechanisms. For this, we combine conformational sampling of the individual partners (pre‐docking sampling) with that of the docked models (post‐docking sampling). As molecular docking tool we use HADDOCK,14, 15, 16 an information‐driven approach for the modeling of biomolecular complexes, which has demonstrated sustained performance in CAPRI ‐ the blind modeling experiment for the prediction of complexes,17, 18 and is a widely used method for integrative modeling of biomolecular complexes.19 HADDOCK can handle mixed systems of proteins, nucleic acids and small molecules. To assess the performance of our ClustENM‐HADDOCK approach, we apply it to the protein‐protein flexible multidomain docking benchmark20 and to a subset of the protein‐DNA benchmark.21 Both datasets were used in previous studies focusing on modeling conformational changes, which allows us to make performance comparisons. Notably, we apply our method without prior knowledge of the extent and direction of the conformational changes.

MATERIALS AND METHODS

Dataset

We applied our method on two different datasets: The flexible multidomain docking benchmark (FMD)20 (Table 1A), where the ligands (smaller partner) are mainly rigid (0.5‐1.7 å) while the receptors undergo varying degrees of conformational change (between 1.6‐19.6 å) corresponding mainly to domain‐domain motions. In this case, in order to concentrate on the conformational change aspects, we assume we have an ideal interface information.

Table 1

List of protein‐protein (A) and protein‐DNA complexes (B) used for benchmarking

A. Protein‐protein flexible multidomain dataset20
Complex ID	Receptor ID	Ligand ID	Backbone conformational changes [å]
Complex ID	Receptor ID	Ligand ID	Receptor	Ligand
1IRA37	1G0Y_R38	1ILR_139	19.6	0.7
1H1V40	1D0N_B41	1IJJ_B42	13.8	1.6
1Y6443	1UX5_A44	2FXU_A45	10.3	1.1
1F6M46	1CL0_A47	2TIR_A48	7.3	0.9
1FAK49	1QFK_HL50	1TFH_B51	6.3	1.0
1ZLI52	2JTO_A53	1KWM_A54	4.0	0.6
1E4K55	3AVE_AB56	1FNL_A57	2.9	1.7
1IBR58	1F59_A59	1QG4_A60	2.9	1.1
1KKL61	1JB1_A62	2HPR63	2.2	0.5
1NPE64	1KLO_A65	1NPE_A64	1.9	‐
1DFJ66	2BNH_A67	9RSA_B68	1.6	0.7

A subset of the protein‐DNA benchmark21 for which experimental information about interfaces is available (Table 1B). Both protein and DNA (“ligand”) undergo conformational changes, but we focused here on the flexibility of the DNA. List of protein‐protein (A) and protein‐DNA complexes (B) used for benchmarking Both datasets have been previously used in benchmarking studies with HADDOCK. In the first case a multibody docking approach was followed in which the protein showing rigid‐body motion conformational changes was cut into domains with connectivity restraints between them,20 while in the second case a two steps docking strategy was followed in which DNA conformations from the first docking run were used to generate an ensemble of pre‐bent DNA conformations for a second, ensemble‐based docking stage.21

ClustENM method

ClustENM is an unbiased, iterative conformational sampling method combining ENM, energy minimization and clustering.11, 12 In ClustENM, first the initial structure is energetically minimized in implicit solvent, then a modified version of ENM (mixed resolution ENM)22 is applied to it to extract low frequency normal modes. The modified version of ENM employed in ClustENM involves placing the nodes at the residue centroids where node pairs within a cut‐off distance of 10 å are connected by elastic springs, thereby representing the residues as low‐resolution nodes. The spring constant connecting the node pairs is proportional to the total number of interacting atom pairs between the connected residues (bonded or nonbonded) within the specified cut‐off. Based on the sudden break in the smooth progression of eigenvalues in the eigenvalue spectrum, we define the number of modes “m” which are to be linearly combined using coefficients −1, 0 and 1 to obtain deformation vectors (3m vectors in total). The minimized structure is deformed in the direction of the deformation vectors with a fixed deformation RMSD d, where the deformation vector of a residue centroid is applied to all the atoms of that residue. The resulting structures are clustered based on mutual RMSD of heavy atoms with a cut‐off equal to d. The cluster containing the parent structure is discarded and a representative structure from each cluster is selected. Minimization‐ENM‐deformation‐clustering steps are applied on each representative structure for k generations.

Energy minimization

For proteins, the energy minimization in implicit solvent was performed using NAMD v2.1023 with the CHARMM22 force field.24 A generalized Born implicit solvent model was used with a 16 å cut‐off for non‐bonded interactions and a 14 å cut‐off for Born radius. The ion concentration was set to 0.3 M. 4000 steps of conjugate gradient were applied. The DNA structures and protein‐DNA complexes were minimized in implicit solvent using AMBER14 with the ff14SB force field parameters.25 The pairwise generalized Born model26, 27 was used with a 16 å cut‐off for non‐bonded interactions. We used the modified generalized Born theory‐based Debye‐Huckel limiting law for ion screening of interactions.28 The concentration of 1‐1 mobile counter‐ions in solution was set to 0.1 M. Steepest descent was applied for 500 steps followed by conjugate gradient with a maximum number of steps of 4000 and a convergence criterion of 0.01 kcal/mol/å.

Deformation Parameter d Estimation

The advantage of ClustENM is its computational efficiency in yielding conformers at atomic resolution and its reduction in the redundancy of the sampled conformations due to clustering. However, the degree of conformational change is usually arbitrarily defined by the user. To overcome this problem and inspired from mechanical tensile testing on material, we deformed the structures in the direction of the slow modes to assess the extent of flexibility. First, we determined the relevant low frequency modes based on the sudden break in eigenvalue spectrum; then we deformed the structure constantly in their direction and determined the energy of the deformed state by energy‐minimization. Based on the energy vs strain (deformation) curve and the reference energy state of the initial, non‐deformed structure, we assessed the maximum deformation the system could handle. The results of such energy‐deformation curves for the ribonuclease inhibitor structure (1DFJ) are shown in Figure 1. According to the consensus of four modes, the structure can handle a RMSD deformation of 1‐2 å, which actually covers the observed 1.6 å RMSD conformational change upon binding to its partner. This prediction scheme is applied to all receptors in the flexible multidomain protein docking benchmark and the DNA structures in the protein‐DNA dataset to determine the deformation RMSD d used in ClustENM. The same procedure is then applied to the docked complexes, to estimate d for post‐docking ClustENM using these complexes as starting point.

Figure 1

Tensile test on ribonuclease inhibitor structure along the direction of normal modes. Based on eigenvalue spectrum shown on top, the first four slowest modes are selected for application of tensile testing. The structure is first minimized (“Start”), then deformations of 1‐5 å are introduced in both negative and positive directions along each mode. To visualize the type of motions, the first and second slowest modes are also shown in the right panel. The energy of the structure increases as the deformation along an individual mode increases. The stable region corresponds to the deformations that keep the energy level below the reference state: For the first and second modes the energy starts increasing after 3 å, for the third mode after 2 å and the fourth mode after 1 å in both directions. The consensus of the four modes hints to a maximum deformation of 1‐2 å for this structure [Color figure can be viewed at http://wileyonlinelibrary.com]

Table 2

ClustENM pre‐ and post‐sampling parameters

A. Protein‐protein complexes of the FMD benchmark
	Sampling prior to docking (receptor only)		Sampling after docking (on complex structure)a
Complexes	Deformation RMSD [å]	Number of modes	Deformation RMSD [å]	Number of modes
1IRA	3	3	1	5
1H1V	2	3	1	4
1Y64	4	3	2/3	3
1F6M	2	3	1	5/4
1FAK	2	3	1	3
1ZLI	1	3	1	5
1E4K	1	4	2/3	3
1IBR	2	5	1	5
1KKL	2	5	1/2	3
1NPE	3	4	1/2	3
1DFJ	2	4	1	4/5

The presence of two different values indicates the use of different parameters for the complexes. The first value was used for the complex from the best cluster and the second value for the second‐best cluster. Single value means same parameters were used for both complex.

ClustENM pre‐ and post‐sampling parameters The presence of two different values indicates the use of different parameters for the complexes. The first value was used for the complex from the best cluster and the second value for the second‐best cluster. Single value means same parameters were used for both complex.

HADDOCK settings

We used the HADDOCK2.2 webserver16 for all dockings in this study. For the FMD benchmark, the docking was performed with “ideal” interface information, that is, the interface residues from one subunit within a 5 å distance from any atom of the partner were selected as active. No passive residues were defined. This effectively brings the interfaces together during the docking without predefining their relative orientation since no specific contacts are defined. Random removal of ambiguous interface restraints (AIR) was switched off. Since we use ensembles of conformations, the sampling was increased to 10 000/400/400 structures for rigid (it0), semi‐flexible(it1) and water stages, respectively. Clustering was performed using the fraction of common contacts (FCC)29 with a cut‐off of 0.75 and a minimum cluster size of 4. For protein‐DNA docking, the experimental restraints described in the study by van Dijk and Bonvin, 201021 were used to drive the docking. Since we do not use the ideal interface information here, random removal of AIRs was switched on. The sampling was increased to 10 000/400/400 for the it0/it1/water stages, respectively. Because of the high charge of DNA, the value for the dielectric constant was set to 78. FCC clustering was used with 0.75 cut‐off and a minimum cluster size of 4. Details on the data used to drive the docking for both protein‐protein and protein‐DNA systems are given in Tables S1 and S2 of the Supporting Information in Data S1.

HADDOCK scoring

The HADDOCK score was used to select the top pose of the top two cluster for the post‐docking sampling stage and also to score the models resulting from that post‐docking sampling. The HADDOCK scoring function (HS)17, 30 is a linear weighted sum of energetic terms:where , , and stand for van der Waals, Coulomb electrostatics, desolvation and restraint energies, respectively. The non‐bonded components of the score (, ) were calculated with the OPLS forcefield,31 the desolvation energy is a solvent accessible surface area‐dependent empirical term32 which estimates the energetic gain or penalty of burying specific sidechains upon complex formation. The restraint energy term was only used for ranking the docking poses but was excluded for the scoring of the post‐docking sampling models.

Quality assessment

The quality of the modeled complexes was assessed based on the well‐established CAPRI criteria33: Interface RMSD (i‐RMSD): Backbone RMSD of the interface residues, defined as all residues having at least one atom within 10 å of an atom on the other partner Ligand RMSD (l‐RMSD): Backbone RMSD of the ligand (smaller partner in the complex) after fitting on the backbone of the receptor Fraction of native contacts (fnat): The number of correct residue‐residue contacts in the predicted complex divided by the number of residue‐residue contacts in the target complex. Two residues are defined to be in contact if they have any atom within 5 å distance to each other. Using i‐RMSD, l‐RMSD and fnat, the quality of models is defined as: High (three‐stars): fnat ≥0.5 with l‐RMSD ≤1 å or i‐RMSD ≤1 å Medium (two‐stars): fnat ≥0.3 with 1 å < l‐RMSD ≤5 å or 1 å < i‐RMSD ≤2 å Acceptable (one‐star): fnat ≥0.1 with 5 å < l‐RMSD ≤10 å or 2 å < i‐RMSD ≤4 å Incorrect (zero‐star): fnat <0.1

General workflow

The modeling pipeline combines conformational selection and induced‐fit mechanisms using conformational sampling prior and after complexation, respectively. Pre‐sampling was applied to the receptor in the case of protein‐protein docking, and to the DNA for the protein‐DNA dataset. The pipeline consists of the following steps (also shown in Figure 2):

Figure 2

Pipeline for ClustENM‐HADDOCK [Color figure can be viewed at http://wileyonlinelibrary.com]

The initial structure is energetically minimized and subjected to elastic network modeling to define the parameters for ClustENM. The eigenvalue spectrum is extracted to decide on the number of modes (“m”) to be used. The deformation test is applied to specify the deformation RMSD (“d”) in ClustENM. Conformers are generated using ClustENM with the parameters defined in Step 1. For the proteins in the FMD dataset the number of generations is set to two for each system. For the DNA structures in the protein‐DNA dataset we increased the number of generations to four since DNA conformational sampling is computationally much cheaper due to the DNA's smaller size. During the generation, clusters containing the parent structures are discarded. A maximum of 20 conformations are selected based on the minimization energy, to be used in ensemble docking. The initial structure is included in the ensemble regardless of its energy value. Ensemble semi‐flexible docking is performed in HADDOCK. The best scoring docked model from the best two scoring clusters are selected for the second round of ClustENM (post‐sampling). Parameters for ClustENM are determined for each structure (same as in Step 1) ClustENM is applied to each selected docked complex for two generations using parameters from Step 6 resulting in the final models. The final models are scored using HADDOCK scoring and their quality is assessed. Pipeline for ClustENM‐HADDOCK [Color figure can be viewed at http://wileyonlinelibrary.com]

RESULTS AND DISCUSSION

Sampling prior to docking

Protein‐protein cases from the FMD benchmark

Starting from the unbound state of the receptors in the FMD benchmark, two generations of ClustENM sampling were applied prior to docking. The RMSDs (closest and range) of the receptor conformers to the reference bound state are given in Table 3 and the resulting ClustENM ensembles are shown in Figure 3. The RMSD range for the ensemble reflects the blind sampling of ClustENM: Some of the sampled conformations get closer to the bound state and others move further away from it. The number of conformers at the end of two generations of sampling is not so large thanks to the clustering steps in ClustENM. Two generations of ClustENM is clearly not sufficient to reach bound‐like conformations, especially for the challenging cases like 1IRA and 1H1V, where the RMSD of the closest structure to the bound state is almost the same as the unbound state (less than 1 å deviation from unbound). Figure 3 shows the generated ensemble for each case, which also reveals the expected‐to‐fail cases of 1IRA, 1H1V, 1Y64, 1FAK where the large domain motions could not be sampled in two generations. The remaining cases however do not reflect this issue.

Table 3

RMSD values from the bound conformation for the receptors in FMD benchmark

Complex ID	Receptor RMSD [å] unbound	Receptor RMSD [å] closest generated	Receptor RMSD [å] range in ensemble	Number of conformers inensemble
1IRA	19.6	19.1	19.1‐20.4	8
1H1V	13.8	13.2	13.2‐14.0	11
1Y64	10.3	6.98	6.98‐17.0	16
1F6M	7.32	6.40	6.40‐10.0	17
1FAK	6.33	5.57	5.57‐7.58	16
1ZLI	4.04	3.73	3.73‐5.06	15
1E4K	2.91	2.85	2.85‐3.53	6
1IBR	2.96	1.55	1.55‐6.26	20
1KKL	2.64	2.72	2.72‐3.34	8
1NPE	1.91	1.52	1.52‐4.86	22
1DFJ	1.55	1.09	1.09‐4.61	20

Figure 3

Ensembles of receptor conformers generated by ClustENM for protein‐protein docking. The starting structure is shown in blue, the bound conformation in red and the ClustENM generated conformers in green [Color figure can be viewed at http://wileyonlinelibrary.com]

RMSD values from the bound conformation for the receptors in FMD benchmark Ensembles of receptor conformers generated by ClustENM for protein‐protein docking. The starting structure is shown in blue, the bound conformation in red and the ClustENM generated conformers in green [Color figure can be viewed at http://wileyonlinelibrary.com]

Protein‐DNA dataset

A standard B‐DNA conformation built using our 3D‐DART webserver34 was the starting conformation for all protein‐DNA cases. We applied four generations of ClustENM on the DNA structures. The RMSD ranges of the resulting ensemble are given in Table 4. As in the case of the protein FMD benchmark, both close‐to‐bound and further‐away DNA conformations were sampled. Figure 4 shows the sampled DNA conformers for each case.

Table 4

RMSD values from the bound conformation for the DNA structures in the protein‐DNA dataset

Complex ID	RMSD [å]B‐DNA	RMSD [å]Closest generated	RMSD range [å]Ensemble	Number of conformers inensemble
1BY4	1.63	1.71	1.63‐4.01	20
3CRO	2.66	1.69	1.79‐4.06	20
1AZP	3.95	2.90	2.90‐4.12	20
1JJ4	3.66	2.00	2.00‐7.80	20
1A74	8.34	4.98	4.98‐9.18	20
1ZME	4.77	2.58	2.58‐5.82	20

Figure 4

Ensembles of DNA conformers generated by ClustENM for protein‐DNA dockings. Starting structures are shown in blue, bound structure in red and the generated conformers in orange [Color figure can be viewed at http://wileyonlinelibrary.com]

RMSD values from the bound conformation for the DNA structures in the protein‐DNA dataset Ensembles of DNA conformers generated by ClustENM for protein‐DNA dockings. Starting structures are shown in blue, bound structure in red and the generated conformers in orange [Color figure can be viewed at http://wileyonlinelibrary.com]

Docking

After the pre‐docking conformational sampling, the resulting ensembles of conformers were used for ensemble docking with HADDOCK following the settings specified in Materials and Methods. This docking phase corresponds to the “conformational selection” part of our pipeline (with some induced fit as well since the flexible stage of HADDOCK allows for limited conformational changes). In this stage, ideally, HADDOCK should select the best conformers in its docking workflow from rigid body to final flexible refinement in explicit solvent. The best scoring docking pose from the top 2 ranked clusters were subsequently selected for the post‐docking induced‐fit part of the pipeline. The selected poses and their i‐RMSD, l‐RMSD, fnat and CAPRI quality metrics are shown in Figures 5 and 6 for the FMD and protein‐DNA docking, respectively.

Figure 5

Figure 6

Protein‐DNA docking top pose from the best two clusters of HADDOCK (green receptor/orange DNA) superimposed on bound structure (gray) [Color figure can be viewed at http://wileyonlinelibrary.com]

Protein‐protein docking top pose from the best two clusters of HADDOCK (green receptor/blue ligand) superimposed onto bound structure (red) [Color figure can be viewed at http://wileyonlinelibrary.com] Protein‐DNA docking top pose from the best two clusters of HADDOCK (green receptor/orange DNA) superimposed on bound structure (gray) [Color figure can be viewed at http://wileyonlinelibrary.com] In the FMD dataset, we were able to obtain models of at least acceptable quality in 7/11 cases. The failed cases are, as expected, 1IRA, 1H1V, 1Y64, and 1FAK, which all undergo large domain motions. As already explained in the “Sampling prior docking” section, the conformational sampling using only two generations was clearly not sufficient to model such large conformational change in these cases. As for the protein‐DNA complexes, HADDOCK was able to obtain at least acceptable quality structures in 4/6 cases as shown in Figure 6. The binding mode in 1AZP is reverted (180° rotation), which already nullifies its chance to be a successful case at the end of the pipeline. In this particular case, however, near native poses were generated during docking but did not end up in the top 2 clusters.

Post‐docking sampling

We applied a second round of ClustENM to the best ranking model of the top 2 ranked clusters obtained from docking to perform the “induced‐fit” stage of our pipeline. The resulting models were ranked using the HADDOCK score calculated after a short energy minimization of each model.17 The results for protein‐protein and protein‐DNA systems are presented in the following sections. The quality of the starting and final conformers is summarized in Table 5, together with the rank of the best generated conformer. Additionally, we show the change in i‐RMSD, l‐RMSD and fnat through the “generations” of post‐sampling ClustENM in Figures S1–S3 in Data S1, respectively. We were able to obtain acceptable/medium quality (1‐2‐star) poses ranked within the top 20 in 7/11 cases, 4 of which were ranked as top 1. Unsurprisingly, the failed cases are no other than 1IRA, 1H1V, 1Y64 and 1FAK. Figures S1–S3 in Data S1 indicate that the quality of the starting docked model (which is the result of pre‐sampling followed by docking) is the limiting factor for the final complex structures. Indeed the “induced‐fit” sampling phase does not dramatically change the quality of the final conformers for the FMD benchmark. Conformational sampling prior to docking is thus contributing more to the quality of the final complexes than post‐sampling in the case of the protein‐protein FMD benchmark. The final complex structures are shown in Figure 7.

Table 5

Quality and rank of the sampled conformers after the docking—FMD benchmark

	Quality—Number of conformers		Quality/Rank
Complex ID	Starting docked models	All ‐post‐docking ClustENM generated complex models	Best conformer	First acceptable or better model
1IRA	0,0,0**—2	0,0,0**—58	—	—
1H1V	0,0,0**—2	0,0,0**—17	—	—
1Y64	0,0,0**—2	0,0,0**—31	—	—
1F6M	1,0,0**—2	22,0,0**—72	*/15	*/15
1FAK	0,0,0**—2	0,0,0**—30	‐	‐
1ZLI	1,1,0**—2	20,7,0**—30	**/1	**/1
1E4K	1,0,0**—2	13,0,0**—28	*/1	*/1
1IBR	1,0,0**—2	27,0,0**—63	*/1	*/1
1KKL	0,1,0**—2	12,3,0**—30	**/20	*/16
1NPE	1,1,0***—2	17,17,0**—34	**/15	*/1
1DFJ	0,2,0**—2	0,18,0**—18	**/1	**/1

Figure 7

Final protein‐protein complexes for the protein‐protein FMD benchmark. The receptor is shown in green, the ligand in blue and the reference bound structure in red. The quality of the best conformer (listed in Table 5) together with its i‐RMSD [å], l‐RMSD [å] and fnat is given for each protein‐protein complex. In cases where no acceptable or better model is present, the quality of best scoring complex is given [Color figure can be viewed at http://wileyonlinelibrary.com]

Quality and rank of the sampled conformers after the docking—FMD benchmark Final protein‐protein complexes for the protein‐protein FMD benchmark. The receptor is shown in green, the ligand in blue and the reference bound structure in red. The quality of the best conformer (listed in Table 5) together with its i‐RMSD [å], l‐RMSD [å] and fnat is given for each protein‐protein complex. In cases where no acceptable or better model is present, the quality of best scoring complex is given [Color figure can be viewed at http://wileyonlinelibrary.com]

Protein‐DNA cases

Table 6 shows the quality of the starting and final conformers with the rank of the best conformer for the protein‐DNA dataset. The change in i‐RMSD, l‐RMSD and fnat through the ClustENM generations are shown in Figures S4–S6 in Data S1, respectively. We obtained 1‐2 star poses for 5 out of 6 cases, 3 of which were ranked as top 1 and one as top 2 and the last one as 21st. As expected, there is no successful conformation generated for 1AZP.

Table 6

Quality and rank of the sampled conformers after the docking—protein‐DNA dataset

	Quality—Number of conformers		Quality/Rank
Complex ID	Starting docked models	All post‐docking ClusENM generated complex models	Best conformer	First acceptable or better model
1BY4	1, 1, 0**—2	59,15,0**—75	**/1	**/1
3CRO	1,1,0**—2	84,74,0**—158	**/1	**/1
1AZP	0,0,0**—2	0,0,0**—174	—	—
1JJ4	1,1,0**—2	235,52,0**—287	**/2	*/1
1A74	2,0,0**—2	34,0,0**—53	*/1	*/1
1ZME	0,0,0**—2	5,0,0**—91	*/21	*/21

Quality and rank of the sampled conformers after the docking—protein‐DNA dataset Different from the protein‐protein FMD benchmark, sampling after docking for induced‐fit increases the quality of the models for the protein‐DNA cases as shown in Figures S4–S6 in Data S1. For example, in the case of 1ZME, we were able to obtain 1‐star quality structures by starting from zero quality docking poses. Nevertheless, the quality of the starting structures has again the most impact on the quality of the final conformers (shown in Figure 8).

Figure 8

Final ClustENM‐HADDOCK models for protein‐DNA complexes (green protein/orange DNA), aligned on bound structure (red). The quality of the best conformer (listed in Table 6) together with its i‐RMSD [å], l‐RMSD [å] and fnat is given for each protein‐DNA complex

Comparison with previous studies

FMD benchmark: ClustENM‐HADDOCK vs Multidomain Flexible docking and Two‐Body Docking

We compared the performance of our pipeline with that of the multidomain flexible docking and two‐body docking approaches reported in the study of Karaca and Bonvin20 (Table 7). For multidomain flexible docking, the possible hinge regions were predicted using HingeProt35; the proteins were cut at those hinges and connectivity restraints were defined between the domains, allowing rigid‐body motion conformational changes during docking. The two‐body docking was only allowing for limited conformational changes in the interface. The results indicate that the ClustENM‐HADDOCK approach performs much better than two‐body docking but worse than the multidomain flexible docking approach of Karaca and Bonvin.20 The latter is especially advantageous to effectively incorporate rigid‐body hinge motions compared to our elastic network‐based conformational sampling, where two generations of ClustENM are clearly insufficient for the systems undergoing extremely large conformational changes (e.g. 1IRA, 1H1V, 1Y64). This might however depend on the type of motions since, in a previous study,11 for the adenylate kinase system undergoing conformational change of 7 å for example, we were able to observe full closing and opening of the domains when we applied ClustENM for 5 generations.

Table 7

Comparison of ClustENM‐HADDOCK with previous protocols for FMD benchmark

	ClustENM‐HADDOCK			Flexible Multidomain Docking results20			2‐Body Docking results20
Complex	Quality/Rank	i‐RMSD [å]	Fnat	Quality/Rank	i‐RMSD [å]	Fnat	Quality/Rank	i‐RMSD [å]	Fnat
1IRA	—	17.5	0.08	*/1	3.9	0.55	—	17.5	0.04
1H1V	—	9.3	0.21	*/11	4.6	0.49	—	11.9	0.08
1Y64	—	11.1	0.45	*/5	3.9	0.48	—	10.3	0.07
1F6M	*/15	5.2	0.36	*/1	3.5	0.69	—	14.1	0.00
1FAK	—	5.9	0.27	**/37	2.8	0.55	—	11.4	0.01
1ZLI	**/1	2.9	0.48	**/1	2.1	0.74	—	14.8	0.02
1E4K	*/1	3.9	0.48	**/5	2.3	0.70	*/1	4.1	0.58
1IBR	*/1	3.6	0.35	**/1	2.3	0.63	—	9.6	0.11
1KKL	**/20	2.1	0.57	**/1	2.2	0.67	*/1	3.1	0.56
1NPE	**/15	1.4	0.80	**/16	1.2	0.95	**/1	1.7	0.85
1DFJ	**/1	1.9	0.69	**/5	2.0	0.68	**/116	1.8	0.63

Comparison of ClustENM‐HADDOCK with previous protocols for FMD benchmark

Protein‐DNA dataset: ClustENM‐HADDOCK vs unbound‐unbound docking using canonical B‐DNA models and unbound‐unbound docking using custom‐build B‐DNA models

We compared our protein‐DNA results with those reported by van Dijk and Bonvin21 to assess our performance against (i) one round of unbound docking starting from canonical B‐DNA models (“Unbound flex”) and (ii) two‐rounds of unbound docking where the second round uses an ensemble of custom‐built bent B‐DNA models obtained from an analysis of the conformations obtained during the first docking stage (“DNA lib”). The results in Table 8 indicate that ClustENM‐HADDOCK approach performs better than “Unbound flex” in 4/6 cases, the same in 1 case (1AZP) and worse in one case (1ZME) when the top 10 best scoring structures are considered. Generated acceptable quality solutions for 1ZME from ClustENM‐HADDOCK were not scored in the top 10 unfortunately (rank of first acceptable model is 21st). The “DNA lib” approach on the other hand was able to obtain at least acceptable quality models in all cases. One thing to note is that for both “Unbound flex” and “DNA lib” approaches, a multidomain flexible docking method (ie, cutting from the hinge(s) and using connectivity restraints) was used for the protein in the 1ZME case, which prevents a one‐to‐one comparison.

Table 8

Comparison of the performance of ClustENM‐HADDOCK with the previous protocols by van Dijk and Bonvin21 for the protein‐DNA dataset

	ClustENM‐HADDOCK			Unbound flex21			DNA lib21
Complex	Quality	i‐RMSD [å]	Fnat	Quality	i‐RMSD [å]	Fnat	Quality	i‐RMSD [å]	Fnat
1BY4	7,3,0**	4.55 (0.70)	0.47 (0.05)	5,0,0**	5.87 (1.71)	0.17 (0.05)	4,3,0**	4.91 (2.32)	0.27 (0.09)
3CRO	8,2,0**	3.19 (0.60)	0.50 (0.01)	6,2,0**	3.29 (0.68)	0.27 (0.07)	3,7,0**	2.62 (0.73)	0.40 (0.06)
1AZP	0,0,0**	6.25 (0.52)	0.37 (0.06)	0,0,0**	6.68 (2.26)	0.04 (0.04)	5,0,0**	4.00 (0.45)	0.10 (0.04)
1JJ4	5,5,0**	2.39 (0.18)	0.51 (0.03)	6,0,0**	4.55 (0.58)	0.16 (0.07)	9,1,0**	3.62 (0.38)	0.21 (0.07)
1A74	10,0,0**	3.81 (0.40)	0.45 (0.02)	8,0,0**	6.30 (0.46)	0.14 (0.04)	9,1,0**	3.37 (0.37)	0.24 (0.05)
1ZME	0,0,0**	8.08 (0.70)	0.50 (0.01)	4,0,0**	5.29 (0.59)	0.12 (0.06)	8,0,0**	4.63 (0.80)	0.15 (0.04)

Note: Values between parenthesis correspond to standard deviations.

Comparison of the performance of ClustENM‐HADDOCK with the previous protocols by van Dijk and Bonvin21 for the protein‐DNA dataset Note: Values between parenthesis correspond to standard deviations. Based on the quality of the modeled complexes generated by our ClustENM‐HADDOCK approach, our pipeline offers a nice alternative for protein‐DNA docking cases.

CONCLUSIONS

We applied our ClustENM‐HADDOCK pre‐ and post‐sampling method to protein‐protein and protein‐DNA complexes to incorporate dynamics and simulate conformational selection and induced‐fit effects upon binding. The method does not use any prior knowledge on the direction and extent of the conformational changes. The extent of the applied deformation is estimated from its energetical costs, inspired from mechanical tensile testing on material. Our pre‐sampling/docking/post‐sampling approach was able to produce acceptable to medium quality models in 7 out of 11 cases for the protein‐protein complexes and 5 out of 6 for the protein‐DNA complexes. We observed that the conformational selection (“pre‐sampling” and also docking) has the highest impact on the quality of the final models especially in the case of protein‐protein complexes. For protein‐DNA complexes, in contrast, the induced‐fit stage of the pipeline (post‐sampling) significantly improves the quality of the final models. Comparison of our method with previous studies showed that the previously proposed flexible multidomain docking remains the best performing approach, provided that reliable hinge region information is available. In the absence of such information, our approach could however be a good alternative to incorporate flexibility. As for the protein‐DNA cases, our ClustENM‐HADDOCK approach does perform better than standard flexible docking and shows a comparable performance with a two‐stage docking approach in which a library of pre‐bent DNA conformation is used in a second docking stage. This makes ClustENM‐HADDOCK a suitable alternative for modeling conformational changes in protein‐DNA binding. We also observed that 2‐generations of ClustENM for pre‐sampling was not sufficient in the cases of large conformational changes (eg, 1IRA, 1H1V, 1Y64). For these cases, the number of ClustENM generations could be increased further to obtain large domain motions. Additionally, here we considered only the best two scoring clusters from the docking as starting points for post‐sampling. Instead of two, we could select more clusters from docking and apply post‐sampling. This would have increased our chance of success for the 1AZP case for example (although near native poses were generated during the docking, these did not end up in the top 2 clusters). Data S1. Supporting Information Click here for additional data file.

76 in total

1. Structural view of the Ran-Importin beta interaction at 2.3 A resolution.

Authors: I R Vetter; A Arndt; U Kutay; D Görlich; A Wittinghofer
Journal: Cell Date: 1999-05-28 Impact factor: 41.582

2. Assessment of blind predictions of protein-protein interactions: current status of docking methods.

Authors: Raúl Méndez; Raphaël Leplae; Leonardo De Maria; Shoshana J Wodak
Journal: Proteins Date: 2003-07-01

3. The hyperthermophile chromosomal protein Sac7d sharply kinks DNA.

Authors: H Robinson; Y G Gao; B S McCrary; S P Edmondson; J W Shriver; A H Wang
Journal: Nature Date: 1998-03-12 Impact factor: 49.962

4. Conformational changes and cleavage by the homing endonuclease I-PpoI: a critical role for a leucine residue in the active site.

Authors: E A Galburt; M S Chadsey; M S Jurica; B S Chevalier; D Erho; W Tang; R J Monnat; B L Stoddard
Journal: J Mol Biol Date: 2000-07-21 Impact factor: 5.469

5. Molecular mechanism of multispecific recognition of Calmodulin through conformational changes.

Authors: Fei Liu; Xiakun Chu; H Peter Lu; Jin Wang
Journal: Proc Natl Acad Sci U S A Date: 2017-05-01 Impact factor: 11.205

6. Refined structures of the active Ser83-->Cys and impaired Ser46-->Asp histidine-containing phosphocarrier proteins.

Authors: D I Liao; O Herzberg
Journal: Structure Date: 1994-12-15 Impact factor: 5.006

Review 7. The role of dynamic conformational ensembles in biomolecular recognition.

Authors: David D Boehr; Ruth Nussinov; Peter E Wright
Journal: Nat Chem Biol Date: 2009-11 Impact factor: 15.040

8. Structural comparison of fucosylated and nonfucosylated Fc fragments of human immunoglobulin G1.

Authors: Shigeki Matsumiya; Yoshiki Yamaguchi; Jun-ichi Saito; Mayumi Nagano; Hiroaki Sasakawa; Shizuo Otaki; Mitsuo Satoh; Kenya Shitara; Koichi Kato
Journal: J Mol Biol Date: 2007-02-22 Impact factor: 5.469

9. Structure of extracellular tissue factor complexed with factor VIIa inhibited with a BPTI mutant.

Authors: E Zhang; R St Charles; A Tulinsky
Journal: J Mol Biol Date: 1999-02-05 Impact factor: 5.469

Review 10. Identifying and Visualizing Macromolecular Flexibility in Structural Biology.

Authors: Martina Palamini; Anselmo Canciani; Federico Forneris
Journal: Front Mol Biosci Date: 2016-09-09

10 in total

Review 1. Challenges in protein docking.

Authors: Ilya A Vakser
Journal: Curr Opin Struct Biol Date: 2020-08-21 Impact factor: 6.809