Literature DB >> 31441121

Pre- and post-docking sampling of conformational changes using ClustENM and HADDOCK for protein-protein and protein-DNA systems.

Zeynep Kurkcuoglu1, Alexandre M J J Bonvin1.   

Abstract

Incorporating the dynamic nature of biomolecules in the modeling of their complexes is a challenge, especially when the extent and direction of the conformational changes taking place upon binding is unknown. Estimating whether the binding of a biomolecule to its partner(s) occurs in a conformational state accessible to its unbound form ("conformational selection") and/or the binding process induces conformational changes ("induced-fit") is another challenge. We propose here a method combining conformational sampling using ClustENM-an elastic network-based modeling procedure-with docking using HADDOCK, in a framework that incorporates conformational selection and induced-fit effects upon binding. The extent of the applied deformation is estimated from its energetical costs, inspired from mechanical tensile testing on materials. We applied our pre- and post-docking sampling of conformational changes to the flexible multidomain protein-protein docking benchmark and a subset of the protein-DNA docking benchmark. Our ClustENM-HADDOCK approach produced acceptable to medium quality models in 7/11 and 5/6 cases for the protein-protein and protein-DNA complexes, respectively. The conformational selection (sampling prior to docking) has the highest impact on the quality of the docked models for the protein-protein complexes. The induced-fit stage of the pipeline (post-sampling), however, improved the quality of the final models for the protein-DNA complexes. Compared to previously described strategies to handle conformational changes, ClustENM-HADDOCK performs better than two-body docking in protein-protein cases but worse than a flexible multidomain docking approach. However, it does show a better or similar performance compared to previous protein-DNA docking approaches, which makes it a suitable alternative.
© 2019 The Authors. Proteins: Structure, Function, and Bioinformatics published by Wiley Periodicals, Inc.

Entities:  

Keywords:  biomolecular complexes; conformational flexibility; elastic network modeling

Year:  2019        PMID: 31441121      PMCID: PMC6973081          DOI: 10.1002/prot.25802

Source DB:  PubMed          Journal:  Proteins        ISSN: 0887-3585


INTRODUCTION

The dynamic nature of biomolecules enables them to fulfill their functions in the cell such as catalysis, signaling, and regulation, while interacting with other bioentities. Understanding their structure and dynamics is therefore key to elucidate biological mechanisms, identify the underlying causes of diseases at molecular level and develop new therapeutical strategies. The extent of biomolecular flexibility can range from local side chain rotations of a few angstroms to global displacements like domain motions in the order of nanometers.1 Although experimental methods like X‐ray crystallography, Nuclear Magnetic Resonance (NMR) and cryo‐Electron Microscopy (EM) reveal biomolecular structures at atomic resolution, information about their mobility and flexibility is challenging to assess, especially for highly flexible biomolecules or supramolecules. Therefore, there is a need for complementary computational techniques to reveal the flexible properties and dynamics of such systems.2, 3, 4 In regard to molecular recognition mechanisms, the puzzle still exists whether a biomolecule binds to a pre‐existing conformer out of the pool of conformers existing in solution for a given partner (“conformational selection”) or rather conformational changes are induced upon binding (“induced‐fit”).5, 6 There is increasing evidence showing that different states are indeed accessible in the absence of the partner6, 7 but it is also likely that both mechanisms play a role in molecular recognition.8, 9 A study on a set of complexes with rather limited conformational changes10 has addressed the ability of current computational techniques to capture the conformational changes upon binding and concluded that current “conformational selection” methods can capture 22% of unbound‐bound transitions, “induced‐fit” methods about 57%; but backbone motions (21%) are not yet adequately covered by any technique. Previously, ClustENM has been developed as an iterative and unbiased conformer generation technique, combining global modes from Elastic Network Model (ENM), with energy minimization and clustering.11, 12 ClustENM was applied to systems of different sizes and oligomeric states including ribosome,11, 13 highly flexible proteins like adenylate kinase and calmodulin,11, 12 and to protein‐small ligand docking.12 The generated conformers were in good agreement with experimental structures and conformations from molecular dynamics simulations. This illustrated the efficiency of ClustENM especially for large systems where molecular dynamics could become computationally time‐consuming and encounter difficulties in sampling large conformational changes. The method also has the potential to capture induced‐fit effects upon binding, as demonstrated for calmodulin where a peptide was docked onto an intermediate state between the unbound and bound forms: The conformational sampling applied to the docked complex indeed allowed to sample the bound state.12 Following this observation, we propose here a method combining molecular docking with conformer generation by ClustENM, to account both for conformational selection and induced‐fit binding mechanisms. For this, we combine conformational sampling of the individual partners (pre‐docking sampling) with that of the docked models (post‐docking sampling). As molecular docking tool we use HADDOCK,14, 15, 16 an information‐driven approach for the modeling of biomolecular complexes, which has demonstrated sustained performance in CAPRI ‐ the blind modeling experiment for the prediction of complexes,17, 18 and is a widely used method for integrative modeling of biomolecular complexes.19 HADDOCK can handle mixed systems of proteins, nucleic acids and small molecules. To assess the performance of our ClustENM‐HADDOCK approach, we apply it to the protein‐protein flexible multidomain docking benchmark20 and to a subset of the protein‐DNA benchmark.21 Both datasets were used in previous studies focusing on modeling conformational changes, which allows us to make performance comparisons. Notably, we apply our method without prior knowledge of the extent and direction of the conformational changes.

MATERIALS AND METHODS

Dataset

We applied our method on two different datasets: The flexible multidomain docking benchmark (FMD)20 (Table 1A), where the ligands (smaller partner) are mainly rigid (0.5‐1.7 å) while the receptors undergo varying degrees of conformational change (between 1.6‐19.6 å) corresponding mainly to domain‐domain motions. In this case, in order to concentrate on the conformational change aspects, we assume we have an ideal interface information.
Table 1

List of protein‐protein (A) and protein‐DNA complexes (B) used for benchmarking

A. Protein‐protein flexible multidomain dataset20
Complex IDReceptor IDLigand IDBackbone conformational changes [å]
ReceptorLigand
1IRA37 1G0Y_R38 1ILR_139 19.60.7
1H1V40 1D0N_B41 1IJJ_B42 13.81.6
1Y6443 1UX5_A44 2FXU_A45 10.31.1
1F6M46 1CL0_A47 2TIR_A48 7.30.9
1FAK49 1QFK_HL50 1TFH_B51 6.31.0
1ZLI52 2JTO_A53 1KWM_A54 4.00.6
1E4K55 3AVE_AB56 1FNL_A57 2.91.7
1IBR58 1F59_A59 1QG4_A60 2.91.1
1KKL61 1JB1_A62 2HPR63 2.20.5
1NPE64 1KLO_A65 1NPE_A64 1.9
1DFJ66 2BNH_A67 9RSA_B68 1.60.7
A subset of the protein‐DNA benchmark21 for which experimental information about interfaces is available (Table 1B). Both protein and DNA (“ligand”) undergo conformational changes, but we focused here on the flexibility of the DNA. List of protein‐protein (A) and protein‐DNA complexes (B) used for benchmarking Both datasets have been previously used in benchmarking studies with HADDOCK. In the first case a multibody docking approach was followed in which the protein showing rigid‐body motion conformational changes was cut into domains with connectivity restraints between them,20 while in the second case a two steps docking strategy was followed in which DNA conformations from the first docking run were used to generate an ensemble of pre‐bent DNA conformations for a second, ensemble‐based docking stage.21

ClustENM method

ClustENM is an unbiased, iterative conformational sampling method combining ENM, energy minimization and clustering.11, 12 In ClustENM, first the initial structure is energetically minimized in implicit solvent, then a modified version of ENM (mixed resolution ENM)22 is applied to it to extract low frequency normal modes. The modified version of ENM employed in ClustENM involves placing the nodes at the residue centroids where node pairs within a cut‐off distance of 10 å are connected by elastic springs, thereby representing the residues as low‐resolution nodes. The spring constant connecting the node pairs is proportional to the total number of interacting atom pairs between the connected residues (bonded or nonbonded) within the specified cut‐off. Based on the sudden break in the smooth progression of eigenvalues in the eigenvalue spectrum, we define the number of modes “m” which are to be linearly combined using coefficients −1, 0 and 1 to obtain deformation vectors (3m vectors in total). The minimized structure is deformed in the direction of the deformation vectors with a fixed deformation RMSD d, where the deformation vector of a residue centroid is applied to all the atoms of that residue. The resulting structures are clustered based on mutual RMSD of heavy atoms with a cut‐off equal to d. The cluster containing the parent structure is discarded and a representative structure from each cluster is selected. Minimization‐ENM‐deformation‐clustering steps are applied on each representative structure for k generations.

Energy minimization

For proteins, the energy minimization in implicit solvent was performed using NAMD v2.1023 with the CHARMM22 force field.24 A generalized Born implicit solvent model was used with a 16 å cut‐off for non‐bonded interactions and a 14 å cut‐off for Born radius. The ion concentration was set to 0.3 M. 4000 steps of conjugate gradient were applied. The DNA structures and protein‐DNA complexes were minimized in implicit solvent using AMBER14 with the ff14SB force field parameters.25 The pairwise generalized Born model26, 27 was used with a 16 å cut‐off for non‐bonded interactions. We used the modified generalized Born theory‐based Debye‐Huckel limiting law for ion screening of interactions.28 The concentration of 1‐1 mobile counter‐ions in solution was set to 0.1 M. Steepest descent was applied for 500 steps followed by conjugate gradient with a maximum number of steps of 4000 and a convergence criterion of 0.01 kcal/mol/å.

Deformation Parameter d Estimation

The advantage of ClustENM is its computational efficiency in yielding conformers at atomic resolution and its reduction in the redundancy of the sampled conformations due to clustering. However, the degree of conformational change is usually arbitrarily defined by the user. To overcome this problem and inspired from mechanical tensile testing on material, we deformed the structures in the direction of the slow modes to assess the extent of flexibility. First, we determined the relevant low frequency modes based on the sudden break in eigenvalue spectrum; then we deformed the structure constantly in their direction and determined the energy of the deformed state by energy‐minimization. Based on the energy vs strain (deformation) curve and the reference energy state of the initial, non‐deformed structure, we assessed the maximum deformation the system could handle. The results of such energy‐deformation curves for the ribonuclease inhibitor structure (1DFJ) are shown in Figure 1. According to the consensus of four modes, the structure can handle a RMSD deformation of 1‐2 å, which actually covers the observed 1.6 å RMSD conformational change upon binding to its partner. This prediction scheme is applied to all receptors in the flexible multidomain protein docking benchmark and the DNA structures in the protein‐DNA dataset to determine the deformation RMSD d used in ClustENM. The same procedure is then applied to the docked complexes, to estimate d for post‐docking ClustENM using these complexes as starting point.
Figure 1

Tensile test on ribonuclease inhibitor structure along the direction of normal modes. Based on eigenvalue spectrum shown on top, the first four slowest modes are selected for application of tensile testing. The structure is first minimized (“Start”), then deformations of 1‐5 å are introduced in both negative and positive directions along each mode. To visualize the type of motions, the first and second slowest modes are also shown in the right panel. The energy of the structure increases as the deformation along an individual mode increases. The stable region corresponds to the deformations that keep the energy level below the reference state: For the first and second modes the energy starts increasing after 3 å, for the third mode after 2 å and the fourth mode after 1 å in both directions. The consensus of the four modes hints to a maximum deformation of 1‐2 å for this structure [Color figure can be viewed at http://wileyonlinelibrary.com]

Tensile test on ribonuclease inhibitor structure along the direction of normal modes. Based on eigenvalue spectrum shown on top, the first four slowest modes are selected for application of tensile testing. The structure is first minimized (“Start”), then deformations of 1‐5 å are introduced in both negative and positive directions along each mode. To visualize the type of motions, the first and second slowest modes are also shown in the right panel. The energy of the structure increases as the deformation along an individual mode increases. The stable region corresponds to the deformations that keep the energy level below the reference state: For the first and second modes the energy starts increasing after 3 å, for the third mode after 2 å and the fourth mode after 1 å in both directions. The consensus of the four modes hints to a maximum deformation of 1‐2 å for this structure [Color figure can be viewed at http://wileyonlinelibrary.com] The ClustENM parameters used for the conformer generation in the pre‐ and post‐sampling stages are listed in Table 2A and B for the FMD and protein‐DNA datasets, respectively.
Table 2

ClustENM pre‐ and post‐sampling parameters

A. Protein‐protein complexes of the FMD benchmark
Sampling prior to docking (receptor only)Sampling after docking (on complex structure)a
ComplexesDeformation RMSD [å]Number of modesDeformation RMSD [å]Number of modes
1IRA3315
1H1V2314
1Y64432/33
1F6M2315/4
1FAK2313
1ZLI1315
1E4K142/33
1IBR2515
1KKL251/23
1NPE341/23
1DFJ2414/5

The presence of two different values indicates the use of different parameters for the complexes. The first value was used for the complex from the best cluster and the second value for the second‐best cluster. Single value means same parameters were used for both complex.

ClustENM pre‐ and post‐sampling parameters The presence of two different values indicates the use of different parameters for the complexes. The first value was used for the complex from the best cluster and the second value for the second‐best cluster. Single value means same parameters were used for both complex.

HADDOCK settings

We used the HADDOCK2.2 webserver16 for all dockings in this study. For the FMD benchmark, the docking was performed with “ideal” interface information, that is, the interface residues from one subunit within a 5 å distance from any atom of the partner were selected as active. No passive residues were defined. This effectively brings the interfaces together during the docking without predefining their relative orientation since no specific contacts are defined. Random removal of ambiguous interface restraints (AIR) was switched off. Since we use ensembles of conformations, the sampling was increased to 10 000/400/400 structures for rigid (it0), semi‐flexible(it1) and water stages, respectively. Clustering was performed using the fraction of common contacts (FCC)29 with a cut‐off of 0.75 and a minimum cluster size of 4. For protein‐DNA docking, the experimental restraints described in the study by van Dijk and Bonvin, 201021 were used to drive the docking. Since we do not use the ideal interface information here, random removal of AIRs was switched on. The sampling was increased to 10 000/400/400 for the it0/it1/water stages, respectively. Because of the high charge of DNA, the value for the dielectric constant was set to 78. FCC clustering was used with 0.75 cut‐off and a minimum cluster size of 4. Details on the data used to drive the docking for both protein‐protein and protein‐DNA systems are given in Tables S1 and S2 of the Supporting Information in Data S1.

HADDOCK scoring

The HADDOCK score was used to select the top pose of the top two cluster for the post‐docking sampling stage and also to score the models resulting from that post‐docking sampling. The HADDOCK scoring function (HS)17, 30 is a linear weighted sum of energetic terms:where , , and stand for van der Waals, Coulomb electrostatics, desolvation and restraint energies, respectively. The non‐bonded components of the score (, ) were calculated with the OPLS forcefield,31 the desolvation energy is a solvent accessible surface area‐dependent empirical term32 which estimates the energetic gain or penalty of burying specific sidechains upon complex formation. The restraint energy term was only used for ranking the docking poses but was excluded for the scoring of the post‐docking sampling models.

Quality assessment

The quality of the modeled complexes was assessed based on the well‐established CAPRI criteria33: Interface RMSD (i‐RMSD): Backbone RMSD of the interface residues, defined as all residues having at least one atom within 10 å of an atom on the other partner Ligand RMSD (l‐RMSD): Backbone RMSD of the ligand (smaller partner in the complex) after fitting on the backbone of the receptor Fraction of native contacts (fnat): The number of correct residue‐residue contacts in the predicted complex divided by the number of residue‐residue contacts in the target complex. Two residues are defined to be in contact if they have any atom within 5 å distance to each other. Using i‐RMSD, l‐RMSD and fnat, the quality of models is defined as: High (three‐stars): fnat ≥0.5 with l‐RMSD ≤1 å or i‐RMSD ≤1 å Medium (two‐stars): fnat ≥0.3 with 1 å < l‐RMSD ≤5 å or 1 å < i‐RMSD ≤2 å Acceptable (one‐star): fnat ≥0.1 with 5 å < l‐RMSD ≤10 å or 2 å < i‐RMSD ≤4 å Incorrect (zero‐star): fnat <0.1

General workflow

The modeling pipeline combines conformational selection and induced‐fit mechanisms using conformational sampling prior and after complexation, respectively. Pre‐sampling was applied to the receptor in the case of protein‐protein docking, and to the DNA for the protein‐DNA dataset. The pipeline consists of the following steps (also shown in Figure 2):
Figure 2

Pipeline for ClustENM‐HADDOCK [Color figure can be viewed at http://wileyonlinelibrary.com]

The initial structure is energetically minimized and subjected to elastic network modeling to define the parameters for ClustENM. The eigenvalue spectrum is extracted to decide on the number of modes (“m”) to be used. The deformation test is applied to specify the deformation RMSD (“d”) in ClustENM. Conformers are generated using ClustENM with the parameters defined in Step 1. For the proteins in the FMD dataset the number of generations is set to two for each system. For the DNA structures in the protein‐DNA dataset we increased the number of generations to four since DNA conformational sampling is computationally much cheaper due to the DNA's smaller size. During the generation, clusters containing the parent structures are discarded. A maximum of 20 conformations are selected based on the minimization energy, to be used in ensemble docking. The initial structure is included in the ensemble regardless of its energy value. Ensemble semi‐flexible docking is performed in HADDOCK. The best scoring docked model from the best two scoring clusters are selected for the second round of ClustENM (post‐sampling). Parameters for ClustENM are determined for each structure (same as in Step 1) ClustENM is applied to each selected docked complex for two generations using parameters from Step 6 resulting in the final models. The final models are scored using HADDOCK scoring and their quality is assessed. Pipeline for ClustENM‐HADDOCK [Color figure can be viewed at http://wileyonlinelibrary.com]

RESULTS AND DISCUSSION

Sampling prior to docking

Protein‐protein cases from the FMD benchmark

Starting from the unbound state of the receptors in the FMD benchmark, two generations of ClustENM sampling were applied prior to docking. The RMSDs (closest and range) of the receptor conformers to the reference bound state are given in Table 3 and the resulting ClustENM ensembles are shown in Figure 3. The RMSD range for the ensemble reflects the blind sampling of ClustENM: Some of the sampled conformations get closer to the bound state and others move further away from it. The number of conformers at the end of two generations of sampling is not so large thanks to the clustering steps in ClustENM. Two generations of ClustENM is clearly not sufficient to reach bound‐like conformations, especially for the challenging cases like 1IRA and 1H1V, where the RMSD of the closest structure to the bound state is almost the same as the unbound state (less than 1 å deviation from unbound). Figure 3 shows the generated ensemble for each case, which also reveals the expected‐to‐fail cases of 1IRA, 1H1V, 1Y64, 1FAK where the large domain motions could not be sampled in two generations. The remaining cases however do not reflect this issue.
Table 3

RMSD values from the bound conformation for the receptors in FMD benchmark

Complex IDReceptor RMSD [å] unboundReceptor RMSD [å] closest generatedReceptor RMSD [å] range in ensembleNumber of conformers inensemble
1IRA19.619.119.1‐20.48
1H1V13.813.213.2‐14.011
1Y6410.36.986.98‐17.016
1F6M7.326.406.40‐10.017
1FAK6.335.575.57‐7.5816
1ZLI4.043.733.73‐5.0615
1E4K2.912.852.85‐3.536
1IBR2.961.551.55‐6.2620
1KKL2.642.722.72‐3.348
1NPE1.911.521.52‐4.8622
1DFJ1.551.091.09‐4.6120
Figure 3

Ensembles of receptor conformers generated by ClustENM for protein‐protein docking. The starting structure is shown in blue, the bound conformation in red and the ClustENM generated conformers in green [Color figure can be viewed at http://wileyonlinelibrary.com]

RMSD values from the bound conformation for the receptors in FMD benchmark Ensembles of receptor conformers generated by ClustENM for protein‐protein docking. The starting structure is shown in blue, the bound conformation in red and the ClustENM generated conformers in green [Color figure can be viewed at http://wileyonlinelibrary.com]

Protein‐DNA dataset

A standard B‐DNA conformation built using our 3D‐DART webserver34 was the starting conformation for all protein‐DNA cases. We applied four generations of ClustENM on the DNA structures. The RMSD ranges of the resulting ensemble are given in Table 4. As in the case of the protein FMD benchmark, both close‐to‐bound and further‐away DNA conformations were sampled. Figure 4 shows the sampled DNA conformers for each case.
Table 4

RMSD values from the bound conformation for the DNA structures in the protein‐DNA dataset

Complex IDRMSD [å]B‐DNARMSD [å]Closest generatedRMSD range [å]EnsembleNumber of conformers inensemble
1BY41.631.711.63‐4.0120
3CRO2.661.691.79‐4.0620
1AZP3.952.902.90‐4.1220
1JJ43.662.002.00‐7.8020
1A748.344.984.98‐9.1820
1ZME4.772.582.58‐5.8220
Figure 4

Ensembles of DNA conformers generated by ClustENM for protein‐DNA dockings. Starting structures are shown in blue, bound structure in red and the generated conformers in orange [Color figure can be viewed at http://wileyonlinelibrary.com]

RMSD values from the bound conformation for the DNA structures in the protein‐DNA dataset Ensembles of DNA conformers generated by ClustENM for protein‐DNA dockings. Starting structures are shown in blue, bound structure in red and the generated conformers in orange [Color figure can be viewed at http://wileyonlinelibrary.com]

Docking

After the pre‐docking conformational sampling, the resulting ensembles of conformers were used for ensemble docking with HADDOCK following the settings specified in Materials and Methods. This docking phase corresponds to the “conformational selection” part of our pipeline (with some induced fit as well since the flexible stage of HADDOCK allows for limited conformational changes). In this stage, ideally, HADDOCK should select the best conformers in its docking workflow from rigid body to final flexible refinement in explicit solvent. The best scoring docking pose from the top 2 ranked clusters were subsequently selected for the post‐docking induced‐fit part of the pipeline. The selected poses and their i‐RMSD, l‐RMSD, fnat and CAPRI quality metrics are shown in Figures 5 and 6 for the FMD and protein‐DNA docking, respectively.
Figure 5

Protein‐protein docking top pose from the best two clusters of HADDOCK (green receptor/blue ligand) superimposed onto bound structure (red) [Color figure can be viewed at http://wileyonlinelibrary.com]

Figure 6

Protein‐DNA docking top pose from the best two clusters of HADDOCK (green receptor/orange DNA) superimposed on bound structure (gray) [Color figure can be viewed at http://wileyonlinelibrary.com]

Protein‐protein docking top pose from the best two clusters of HADDOCK (green receptor/blue ligand) superimposed onto bound structure (red) [Color figure can be viewed at http://wileyonlinelibrary.com] Protein‐DNA docking top pose from the best two clusters of HADDOCK (green receptor/orange DNA) superimposed on bound structure (gray) [Color figure can be viewed at http://wileyonlinelibrary.com] In the FMD dataset, we were able to obtain models of at least acceptable quality in 7/11 cases. The failed cases are, as expected, 1IRA, 1H1V, 1Y64, and 1FAK, which all undergo large domain motions. As already explained in the “Sampling prior docking” section, the conformational sampling using only two generations was clearly not sufficient to model such large conformational change in these cases. As for the protein‐DNA complexes, HADDOCK was able to obtain at least acceptable quality structures in 4/6 cases as shown in Figure 6. The binding mode in 1AZP is reverted (180° rotation), which already nullifies its chance to be a successful case at the end of the pipeline. In this particular case, however, near native poses were generated during docking but did not end up in the top 2 clusters.

Post‐docking sampling

We applied a second round of ClustENM to the best ranking model of the top 2 ranked clusters obtained from docking to perform the “induced‐fit” stage of our pipeline. The resulting models were ranked using the HADDOCK score calculated after a short energy minimization of each model.17 The results for protein‐protein and protein‐DNA systems are presented in the following sections. The quality of the starting and final conformers is summarized in Table 5, together with the rank of the best generated conformer. Additionally, we show the change in i‐RMSD, l‐RMSD and fnat through the “generations” of post‐sampling ClustENM in Figures S1–S3 in Data S1, respectively. We were able to obtain acceptable/medium quality (1‐2‐star) poses ranked within the top 20 in 7/11 cases, 4 of which were ranked as top 1. Unsurprisingly, the failed cases are no other than 1IRA, 1H1V, 1Y64 and 1FAK. Figures S1–S3 in Data S1 indicate that the quality of the starting docked model (which is the result of pre‐sampling followed by docking) is the limiting factor for the final complex structures. Indeed the “induced‐fit” sampling phase does not dramatically change the quality of the final conformers for the FMD benchmark. Conformational sampling prior to docking is thus contributing more to the quality of the final complexes than post‐sampling in the case of the protein‐protein FMD benchmark. The final complex structures are shown in Figure 7.
Table 5

Quality and rank of the sampled conformers after the docking—FMD benchmark

Quality—Number of conformersQuality/Rank
Complex IDStarting docked modelsAll ‐post‐docking ClustENM generated complex modelsBest conformerFirst acceptable or better model
1IRA0*,0**,0***—20*,0**,0***—58
1H1V0*,0**,0***—20*,0**,0***—17
1Y640*,0**,0***—20*,0**,0***—31
1F6M1*,0**,0***—222*,0**,0***—72*/15*/15
1FAK0*,0**,0***—20*,0**,0***—30
1ZLI1*,1**,0***—220*,7**,0***—30**/1**/1
1E4K1*,0**,0***—213*,0**,0***—28*/1*/1
1IBR1*,0**,0***—227*,0**,0***—63*/1*/1
1KKL0*,1**,0***—212*,3**,0***—30**/20*/16
1NPE1*,1*,0***—217*,17**,0***—34**/15*/1
1DFJ0*,2**,0***—20*,18**,0***—18**/1**/1
Figure 7

Final protein‐protein complexes for the protein‐protein FMD benchmark. The receptor is shown in green, the ligand in blue and the reference bound structure in red. The quality of the best conformer (listed in Table 5) together with its i‐RMSD [å], l‐RMSD [å] and fnat is given for each protein‐protein complex. In cases where no acceptable or better model is present, the quality of best scoring complex is given [Color figure can be viewed at http://wileyonlinelibrary.com]

Quality and rank of the sampled conformers after the docking—FMD benchmark Final protein‐protein complexes for the protein‐protein FMD benchmark. The receptor is shown in green, the ligand in blue and the reference bound structure in red. The quality of the best conformer (listed in Table 5) together with its i‐RMSD [å], l‐RMSD [å] and fnat is given for each protein‐protein complex. In cases where no acceptable or better model is present, the quality of best scoring complex is given [Color figure can be viewed at http://wileyonlinelibrary.com]

Protein‐DNA cases

Table 6 shows the quality of the starting and final conformers with the rank of the best conformer for the protein‐DNA dataset. The change in i‐RMSD, l‐RMSD and fnat through the ClustENM generations are shown in Figures S4–S6 in Data S1, respectively. We obtained 1‐2 star poses for 5 out of 6 cases, 3 of which were ranked as top 1 and one as top 2 and the last one as 21st. As expected, there is no successful conformation generated for 1AZP.
Table 6

Quality and rank of the sampled conformers after the docking—protein‐DNA dataset

Quality—Number of conformersQuality/Rank
Complex IDStarting docked modelsAll post‐docking ClusENM generated complex modelsBest conformerFirst acceptable or better model
1BY41*, 1**, 0***—259*,15**,0***—75**/1**/1
3CRO1*,1**,0***—284*,74**,0***—158**/1**/1
1AZP0*,0**,0***—20*,0**,0***—174
1JJ41*,1**,0***—2235*,52**,0***—287**/2*/1
1A742*,0**,0***—234*,0**,0***—53*/1*/1
1ZME0*,0**,0***—25*,0**,0***—91*/21*/21
Quality and rank of the sampled conformers after the docking—protein‐DNA dataset Different from the protein‐protein FMD benchmark, sampling after docking for induced‐fit increases the quality of the models for the protein‐DNA cases as shown in Figures S4–S6 in Data S1. For example, in the case of 1ZME, we were able to obtain 1‐star quality structures by starting from zero quality docking poses. Nevertheless, the quality of the starting structures has again the most impact on the quality of the final conformers (shown in Figure 8).
Figure 8

Final ClustENM‐HADDOCK models for protein‐DNA complexes (green protein/orange DNA), aligned on bound structure (red). The quality of the best conformer (listed in Table 6) together with its i‐RMSD [å], l‐RMSD [å] and fnat is given for each protein‐DNA complex

Final ClustENM‐HADDOCK models for protein‐DNA complexes (green protein/orange DNA), aligned on bound structure (red). The quality of the best conformer (listed in Table 6) together with its i‐RMSD [å], l‐RMSD [å] and fnat is given for each protein‐DNA complex

Comparison with previous studies

FMD benchmark: ClustENM‐HADDOCK vs Multidomain Flexible docking and Two‐Body Docking

We compared the performance of our pipeline with that of the multidomain flexible docking and two‐body docking approaches reported in the study of Karaca and Bonvin20 (Table 7). For multidomain flexible docking, the possible hinge regions were predicted using HingeProt35; the proteins were cut at those hinges and connectivity restraints were defined between the domains, allowing rigid‐body motion conformational changes during docking. The two‐body docking was only allowing for limited conformational changes in the interface. The results indicate that the ClustENM‐HADDOCK approach performs much better than two‐body docking but worse than the multidomain flexible docking approach of Karaca and Bonvin.20 The latter is especially advantageous to effectively incorporate rigid‐body hinge motions compared to our elastic network‐based conformational sampling, where two generations of ClustENM are clearly insufficient for the systems undergoing extremely large conformational changes (e.g. 1IRA, 1H1V, 1Y64). This might however depend on the type of motions since, in a previous study,11 for the adenylate kinase system undergoing conformational change of 7 å for example, we were able to observe full closing and opening of the domains when we applied ClustENM for 5 generations.
Table 7

Comparison of ClustENM‐HADDOCK with previous protocols for FMD benchmark

ClustENM‐HADDOCKFlexible Multidomain Docking results20 2‐Body Docking results20
ComplexQuality/Ranki‐RMSD [å]FnatQuality/Ranki‐RMSD [å]FnatQuality/Ranki‐RMSD [å]Fnat
1IRA17.50.08*/13.90.5517.50.04
1H1V9.30.21*/114.60.4911.90.08
1Y6411.10.45*/53.90.4810.30.07
1F6M*/155.20.36*/13.50.6914.10.00
1FAK5.90.27**/372.80.5511.40.01
1ZLI**/12.90.48**/12.10.7414.80.02
1E4K*/13.90.48**/52.30.70*/14.10.58
1IBR*/13.60.35**/12.30.639.60.11
1KKL**/202.10.57**/12.20.67*/13.10.56
1NPE**/151.40.80**/161.20.95**/11.70.85
1DFJ**/11.90.69**/52.00.68**/1161.80.63
Comparison of ClustENM‐HADDOCK with previous protocols for FMD benchmark

Protein‐DNA dataset: ClustENM‐HADDOCK vs unbound‐unbound docking using canonical B‐DNA models and unbound‐unbound docking using custom‐build B‐DNA models

We compared our protein‐DNA results with those reported by van Dijk and Bonvin21 to assess our performance against (i) one round of unbound docking starting from canonical B‐DNA models (“Unbound flex”) and (ii) two‐rounds of unbound docking where the second round uses an ensemble of custom‐built bent B‐DNA models obtained from an analysis of the conformations obtained during the first docking stage (“DNA lib”). The results in Table 8 indicate that ClustENM‐HADDOCK approach performs better than “Unbound flex” in 4/6 cases, the same in 1 case (1AZP) and worse in one case (1ZME) when the top 10 best scoring structures are considered. Generated acceptable quality solutions for 1ZME from ClustENM‐HADDOCK were not scored in the top 10 unfortunately (rank of first acceptable model is 21st). The “DNA lib” approach on the other hand was able to obtain at least acceptable quality models in all cases. One thing to note is that for both “Unbound flex” and “DNA lib” approaches, a multidomain flexible docking method (ie, cutting from the hinge(s) and using connectivity restraints) was used for the protein in the 1ZME case, which prevents a one‐to‐one comparison.
Table 8

Comparison of the performance of ClustENM‐HADDOCK with the previous protocols by van Dijk and Bonvin21 for the protein‐DNA dataset

ClustENM‐HADDOCKUnbound flex21 DNA lib21
ComplexQualityi‐RMSD [å]FnatQualityi‐RMSD [å]FnatQualityi‐RMSD [å]Fnat
1BY47*,3**,0***4.55 (0.70)0.47 (0.05)5*,0**,0***5.87 (1.71)0.17 (0.05)4*,3**,0***4.91 (2.32)0.27 (0.09)
3CRO8*,2**,0***3.19 (0.60)0.50 (0.01)6*,2**,0***3.29 (0.68)0.27 (0.07)3*,7**,0***2.62 (0.73)0.40 (0.06)
1AZP0*,0**,0***6.25 (0.52)0.37 (0.06)0*,0**,0***6.68 (2.26)0.04 (0.04)5*,0**,0***4.00 (0.45)0.10 (0.04)
1JJ45*,5**,0***2.39 (0.18)0.51 (0.03)6*,0**,0***4.55 (0.58)0.16 (0.07)9*,1**,0***3.62 (0.38)0.21 (0.07)
1A7410*,0**,0***3.81 (0.40)0.45 (0.02)8*,0**,0***6.30 (0.46)0.14 (0.04)9*,1**,0***3.37 (0.37)0.24 (0.05)
1ZME0*,0**,0***8.08 (0.70)0.50 (0.01)4*,0**,0***5.29 (0.59)0.12 (0.06)8*,0**,0***4.63 (0.80)0.15 (0.04)

Note: Values between parenthesis correspond to standard deviations.

Comparison of the performance of ClustENM‐HADDOCK with the previous protocols by van Dijk and Bonvin21 for the protein‐DNA dataset Note: Values between parenthesis correspond to standard deviations. Based on the quality of the modeled complexes generated by our ClustENM‐HADDOCK approach, our pipeline offers a nice alternative for protein‐DNA docking cases.

CONCLUSIONS

We applied our ClustENM‐HADDOCK pre‐ and post‐sampling method to protein‐protein and protein‐DNA complexes to incorporate dynamics and simulate conformational selection and induced‐fit effects upon binding. The method does not use any prior knowledge on the direction and extent of the conformational changes. The extent of the applied deformation is estimated from its energetical costs, inspired from mechanical tensile testing on material. Our pre‐sampling/docking/post‐sampling approach was able to produce acceptable to medium quality models in 7 out of 11 cases for the protein‐protein complexes and 5 out of 6 for the protein‐DNA complexes. We observed that the conformational selection (“pre‐sampling” and also docking) has the highest impact on the quality of the final models especially in the case of protein‐protein complexes. For protein‐DNA complexes, in contrast, the induced‐fit stage of the pipeline (post‐sampling) significantly improves the quality of the final models. Comparison of our method with previous studies showed that the previously proposed flexible multidomain docking remains the best performing approach, provided that reliable hinge region information is available. In the absence of such information, our approach could however be a good alternative to incorporate flexibility. As for the protein‐DNA cases, our ClustENM‐HADDOCK approach does perform better than standard flexible docking and shows a comparable performance with a two‐stage docking approach in which a library of pre‐bent DNA conformation is used in a second docking stage. This makes ClustENM‐HADDOCK a suitable alternative for modeling conformational changes in protein‐DNA binding. We also observed that 2‐generations of ClustENM for pre‐sampling was not sufficient in the cases of large conformational changes (eg, 1IRA, 1H1V, 1Y64). For these cases, the number of ClustENM generations could be increased further to obtain large domain motions. Additionally, here we considered only the best two scoring clusters from the docking as starting points for post‐sampling. Instead of two, we could select more clusters from docking and apply post‐sampling. This would have increased our chance of success for the 1AZP case for example (although near native poses were generated during the docking, these did not end up in the top 2 clusters). Data S1. Supporting Information Click here for additional data file.
  76 in total

1.  Structural view of the Ran-Importin beta interaction at 2.3 A resolution.

Authors:  I R Vetter; A Arndt; U Kutay; D Görlich; A Wittinghofer
Journal:  Cell       Date:  1999-05-28       Impact factor: 41.582

2.  Assessment of blind predictions of protein-protein interactions: current status of docking methods.

Authors:  Raúl Méndez; Raphaël Leplae; Leonardo De Maria; Shoshana J Wodak
Journal:  Proteins       Date:  2003-07-01

3.  The hyperthermophile chromosomal protein Sac7d sharply kinks DNA.

Authors:  H Robinson; Y G Gao; B S McCrary; S P Edmondson; J W Shriver; A H Wang
Journal:  Nature       Date:  1998-03-12       Impact factor: 49.962

4.  Conformational changes and cleavage by the homing endonuclease I-PpoI: a critical role for a leucine residue in the active site.

Authors:  E A Galburt; M S Chadsey; M S Jurica; B S Chevalier; D Erho; W Tang; R J Monnat; B L Stoddard
Journal:  J Mol Biol       Date:  2000-07-21       Impact factor: 5.469

5.  Molecular mechanism of multispecific recognition of Calmodulin through conformational changes.

Authors:  Fei Liu; Xiakun Chu; H Peter Lu; Jin Wang
Journal:  Proc Natl Acad Sci U S A       Date:  2017-05-01       Impact factor: 11.205

6.  Refined structures of the active Ser83-->Cys and impaired Ser46-->Asp histidine-containing phosphocarrier proteins.

Authors:  D I Liao; O Herzberg
Journal:  Structure       Date:  1994-12-15       Impact factor: 5.006

Review 7.  The role of dynamic conformational ensembles in biomolecular recognition.

Authors:  David D Boehr; Ruth Nussinov; Peter E Wright
Journal:  Nat Chem Biol       Date:  2009-11       Impact factor: 15.040

8.  Structural comparison of fucosylated and nonfucosylated Fc fragments of human immunoglobulin G1.

Authors:  Shigeki Matsumiya; Yoshiki Yamaguchi; Jun-ichi Saito; Mayumi Nagano; Hiroaki Sasakawa; Shizuo Otaki; Mitsuo Satoh; Kenya Shitara; Koichi Kato
Journal:  J Mol Biol       Date:  2007-02-22       Impact factor: 5.469

9.  Structure of extracellular tissue factor complexed with factor VIIa inhibited with a BPTI mutant.

Authors:  E Zhang; R St Charles; A Tulinsky
Journal:  J Mol Biol       Date:  1999-02-05       Impact factor: 5.469

Review 10.  Identifying and Visualizing Macromolecular Flexibility in Structural Biology.

Authors:  Martina Palamini; Anselmo Canciani; Federico Forneris
Journal:  Front Mol Biosci       Date:  2016-09-09
View more
  10 in total

Review 1.  Challenges in protein docking.

Authors:  Ilya A Vakser
Journal:  Curr Opin Struct Biol       Date:  2020-08-21       Impact factor: 6.809

Review 2.  Towards gaining sight of multiscale events: utilizing network models and normal modes in hybrid methods.

Authors:  James M Krieger; Pemra Doruker; Ana Ligia Scott; David Perahia; Ivet Bahar
Journal:  Curr Opin Struct Biol       Date:  2020-07-01       Impact factor: 6.809

Review 3.  Intrinsic dynamics is evolutionarily optimized to enable allosteric behavior.

Authors:  Yan Zhang; Pemra Doruker; Burak Kaynak; She Zhang; James Krieger; Hongchun Li; Ivet Bahar
Journal:  Curr Opin Struct Biol       Date:  2019-11-27       Impact factor: 6.809

4.  Searching for Low Probability Opening Events in a DNA Sliding Clamp.

Authors:  Reza Esmaeeli; Benedict Andal; Alberto Perez
Journal:  Life (Basel)       Date:  2022-02-09

5.  Sampling of Protein Conformational Space Using Hybrid Simulations: A Critical Assessment of Recent Methods.

Authors:  Burak T Kaynak; James M Krieger; Balint Dudas; Zakaria L Dahmani; Mauricio G S Costa; Erika Balog; Ana Ligia Scott; Pemra Doruker; David Perahia; Ivet Bahar
Journal:  Front Mol Biosci       Date:  2022-02-04

6.  Modeling of protein complexes in CASP14 with emphasis on the interaction interface prediction.

Authors:  Justas Dapkūnas; Kliment Olechnovič; Česlovas Venclovas
Journal:  Proteins       Date:  2021-07-05

7.  PRosettaC: Rosetta Based Modeling of PROTAC Mediated Ternary Complexes.

Authors:  Daniel Zaidman; Jaime Prilusky; Nir London
Journal:  J Chem Inf Model       Date:  2020-10-06       Impact factor: 4.956

8.  Using yeast two-hybrid system and molecular dynamics simulation to detect venom protein-protein interactions.

Authors:  Ying Jia; Paulina Kowalski; Ivan Lopez
Journal:  Curr Res Toxicol       Date:  2021-02-23

9.  Improved prediction of protein-protein interactions using AlphaFold2.

Authors:  Patrick Bryant; Gabriele Pozzati; Arne Elofsson
Journal:  Nat Commun       Date:  2022-03-10       Impact factor: 14.919

Review 10.  Modeling the Dynamics of Protein-Protein Interfaces, How and Why?

Authors:  Ezgi Karaca; Chantal Prévost; Sophie Sacquin-Mora
Journal:  Molecules       Date:  2022-03-11       Impact factor: 4.411

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.