Literature DB >> 33554146

Protocol for hit-to-lead optimization of compounds by auto in silico ligand directing evolution (AILDE) approach.

Longcan Mei^1,2, Fengxu Wu^1,2, Gefei Hao^1,2,3, Guangfu Yang^1,2,4.

Abstract

Hit-to-lead (H2L) optimization is crucial for drug design, which has become an increasing concern in medicinal chemistry. A virtual screening strategy of auto in silico ligand directing evolution (AILDE) has been developed to yield promising lead compounds rapidly and efficiently. The protocol includes instructions for fragment compound library construction, conformational sampling by molecular dynamics simulation, ligand modification by fragment growing, as well as the binding free energy prediction. For complete details on the use and execution of this protocol, please refer to Wu et al. (2020).

Entities: Chemical

Keywords: Bioinformatics; Protein biochemistry; Structural biology

Mesh：

Substances：
Ligands

Year: 2021 PMID： 33554146 PMCID： PMC7856476 DOI： 10.1016/j.xpro.2021.100312

Source DB: PubMed Journal: STAR Protoc ISSN： 2666-1667

Before you begin

Auto in silico ligand directing evolution (AILDE) is an effective approach to rapidly explore the structure-activity relationship for H2L optimization (Wu et al., 2020). AILDE is utilized to optimize a hit compound by minor chemical modifications on its scaffold. Minimal increases or losses for the binding efficiency of ligand could be obtained by these modifications. Hence, this method can help us for the exploration of the chemical space around a hit compound and promote the optimization of the hit compound into more drug-like lead compounds. Figure 1 shows the workflow of AILDE. AILDE is based on molecular dynamics (MD) simulation complemented with one-step free energy perturbation (FEP) (Hao et al., 2015; Wu et al., 2018). MD simulation is firstly performed on the protein-hit complex. The equilibrated conformational ensemble is then collected. The hit ligand in each conformation is modified to new ligand with substitution using the fragment library. The binding free energy changes between protein-hit and hit analogs are calculated. New leads are finally determined based on the rank of the binding affinity changes.

Figure 1

Workflow of the AILDE method

Once an initial protein-hit complex structure is provided, AILDE will conduct the hit-to-lead optimization according to the following steps: (1) molecular dynamics simulation: this allow to simulate the structural flexibility of the protein-hit complex; (2) collecting the conformational ensemble: adequately collect the representative conformations of a protein-hit complex; (3) generating the hit analogues: a fragment growing based method is utilized to convert a hit to analogue compounds. The conformations of protein-analogues complex are obtained from the representative conformations of the protein-hit complex; (4) binding free energy change: comparing the binding free energy of protein-hit and protein-analogues complexes. Identifying the potential lead compounds according to the rank of the binding free energy changes.

Workflow of the AILDE method Once an initial protein-hit complex structure is provided, AILDE will conduct the hit-to-lead optimization according to the following steps: (1) molecular dynamics simulation: this allow to simulate the structural flexibility of the protein-hit complex; (2) collecting the conformational ensemble: adequately collect the representative conformations of a protein-hit complex; (3) generating the hit analogues: a fragment growing based method is utilized to convert a hit to analogue compounds. The conformations of protein-analogues complex are obtained from the representative conformations of the protein-hit complex; (4) binding free energy change: comparing the binding free energy of protein-hit and protein-analogues complexes. Identifying the potential lead compounds according to the rank of the binding free energy changes. We should provide a productive computational environment for the usage of AILDE. We need a high performance computer with a Unix-based operating system to perform this protocol. A graphics processing unit (GPU) equipment is advised to be used to accelerate the process of molecular dynamics simulation. The prerequisite software (in the materials and equipment section) is freely available on the corresponding websites. The detail introduction about the installation is provided in their user manuals. These software are utilized in different procedures. We download the source code of AILDE at https://github.com/fwangccnu/AILDE. After configuration of the computational environment, we prepare the input data according to the following steps.

Preparing the input structure file of the protein complex with a hit ligand

Timing: ∼15 min Download the structure file of our interest protein complex with a hit ligand from RCSB Protein Data Bank (PDB) (https://www.rcsb.org/). For example, search the PDB ID of 2P3B in PDB database, and obtain the result of the crystal structure of the subtype B wild type HIV protease complexes with TL-3 inhibitor. And save the PDB format structure file. Open the structure file by Chimera software, keep the hit ligand and its direct interacting protein chain and delete all the irrelative molecules, and save as a processed PDB format structure file. Figure 2A shows the example of deleting the irrelative atoms by Chimera software. A detail document about the usage of Chimera is available on the website (https://www.cgl.ucsf.edu/chimera/docindex.html).

Figure 2

The complex structure file preparation

(A) Illustration of selecting and deleting the irrelative atoms from the complex by Chimera software.

(B) Example of renaming amino acids according to their protonated states and adding a “TER” record. The positions of proton in different protonated states are highlighted in red: HID represents histidine with hydrogen on the delta nitrogen, HIE on the epsilon nitrogen, and HIP on both nitrogens.

(C) Illustration of the structure preparation by the Dock Prep module in Chimera. This module can perform some complex operations, such as deleting solvent and isolated ions, keeping only the conformation with highest occupancy, repairing missing atoms, and adding hydrogen atoms.

The complex structure file preparation (A) Illustration of selecting and deleting the irrelative atoms from the complex by Chimera software. (B) Example of renaming amino acids according to their protonated states and adding a “TER” record. The positions of proton in different protonated states are highlighted in red: HID represents histidine with hydrogen on the delta nitrogen, HIE on the epsilon nitrogen, and HIP on both nitrogens. (C) Illustration of the structure preparation by the Dock Prep module in Chimera. This module can perform some complex operations, such as deleting solvent and isolated ions, keeping only the conformation with highest occupancy, repairing missing atoms, and adding hydrogen atoms. Text edit the processed structure file. Rename MSE residues as MET, and rename HIS as HID, HIE, or HIP based on the protonated states (Figure 2B). And add a “TER” record to indicate that the chains are not physically connected to each other. If there are alternative conformations of the residues, keep only the highest occupancy. These operations can be conducted by Chimera software (Figure 2C). Alternatively, we can obtain the complex structure of a protein with a hit ligand by molecular docking method (Meng et al., 2011). CRITICAL: It requires the complex structure file to strictly abide by a standard PDB format, and a description about PDB format file is accessed at http://www.wwpdb.org/documentation/file-format-content/format33/v3.3.html.

Key resources table

Materials and equipment

Equipment

Computer: a computational workstation installed with proper software listed below is required. List of hardware of the computational workstation: CPU: Intel Core i5-6400 3.3GHz 4-core processor Memory: Kingston KVR16N11/8 8GB DDR3-1600 GPU: GeForce GTX 1060 3GB Storage: Western Digital WD10EZEX 1TB

Software

Chimera (version 1.13 or later, https://www.cgl.ucsf.edu/chimera/) OpenBabel (version 2.3.2 or later, http://openbabel.org/wiki/Main_Page) Python (version 2.7, https://www.python.org/) Amber and AmberTools (version 16, https://ambermd.org/) Autogrow (http://autogrow.ucsd.edu/) R package (version 3.0.0 or later, https://www.r-project.org/) A molecular visualizing software for viewing the complex structures, such as VMD (https://www.ks.uiuc.edu/Research/vmd/) or PyMOL (https://pymol.org/2/).

Step-by-step method details

A fragment library construction

Timing: ∼3 h Create the three-dimensional structures of all fragments by Chimera software (Figure 3). The fragments are used to link with the hit compound to generate the hit analogs (Hoffer et al., 2018). All the hydrogen atoms should be deleted in the fragment molecules, except the hydrogen atoms located at the linking point.

Figure 3

Construction of stereoscopic conformation of a fragment molecule

The Build Structure module in Chimera provides several ways to manipulate a molecular structure. This snapshot shows the procedure for building a benzene molecule.

Construction of stereoscopic conformation of a fragment molecule The Build Structure module in Chimera provides several ways to manipulate a molecular structure. This snapshot shows the procedure for building a benzene molecule. Due to the fixed format of a PDB file, it is easy to do these by the scripts in batch mode. We can delete all the hydrogen atoms in a PDB file except the linking point of a hydrogen atom (serial number: id) by a line of bash command: awk '{if ($NF == "H" && $2 != id) {} else {print}}' fragment_H.pdb > fragment_noH.pdb where id is the serial number of a hydrogen atom as the linking point, fragment_H.pdb is the pdb file of a fragment molecule with all hydrogen atoms, fragment_noH.pdb is the fragment molecule with only a hydrogen atom. For a large number of fragments, we just need to use the loop function to deal with each individual repeatedly: for i in $(ls ∗.pdb) do fragname=$(basename ${i} .pdb) for j in $( awk '{if ($NF == "H") {print $2}}' ${i}) do awk '{if ($NF == "H" && $2 != “‘${j}’”) {} else {print}}' ${i} > ${fragname}_${j}.pdb done done We can save the above bash codes in a text file (for example: a file named del_H.sh) in the same directory as the fragment PDB files. And run the scripts by type the following in the command line: bash del_H.sh The processed PDB files of fragments are generated in the directory. Perform the conformational optimization on the fragment molecules by Chimera software. Save the structure files of those processed fragment molecules in PDB format. The fragment molecules can be download from public fragment databases (Irwin and Shoichet, 2005; Visini et al., 2017; Yang et al., 2018). The related website links are available in the key resources table.

Preparing input parameter data

Timing: ∼5 min Prepare an input parameter file, in which all the parameters for AILDE calculation are contained. Create a parameter file which contains all the necessary parameters for AILDE protocol (Figure 4). Mandatory parameters are “step,” “complex,” “ligand_name,” “ligand_charge,” “parallel,” “MDstart,” “MDend,” “mm_pbsa,” “md_request,” and “entropy.” Each parameter has a default value.

Figure 4

Preparing the input parameter file

(A) The snapshot of webpage for downloading the example parameter file at https://github.com/fwangccnu/AILDE.

(B) The content in the example parameter file.

Preparing the input parameter file (A) The snapshot of webpage for downloading the example parameter file at https://github.com/fwangccnu/AILDE. (B) The content in the example parameter file. An example parameter file is provided in the AILDE source code (Figure 4). A standard format must be followed to define the parameters. Among those parameters, the following three parameters should be adjusted according to actual situation: complex: define the structure file name of the protein complex with hit ligand. ligand_name: define the name of the hit ligand in the complex structure file. md_request: define whether structural relaxation is performed on the newly obtained hit analogs-protein complex by molecular dynamics (MD) simulation. Other parameters can be maintained as default value and the detail descriptions about these parameters are recorded in the template parameter file (Troubleshooting 1). CRITICAL: The parameters should be accordant with the actual situations and abide by the format in the example parameter file.

Molecular dynamics simulation of the protein-hit complex

Timing: ∼3 h In this stage, conformational sampling on the protein complex with hit ligand is conducted by molecular dynamics simulation (Maximova et al., 2016). The topology and coordinate file for the solvated protein-hit complex is generated by tleap module in Amber program. Protein was parameterized by Amber ff14SB force filed and the hit compound molecule by the general Amber force field (GAFF). AM1-BCC charge was assigned for the hit compound by antechamber module in Amber. The complex system was then solvated in the cube box of the TIP3P water model with a distance of 10 Å between the solute and box. The counter-ions (Cl− or Na+) were then added to neutralize the solvated protein-hit system. Three cycles of energy minimization are performed on the system by sander module in Amber program. Each cycle includes 5,000 steps of steepest descent, followed by 5,000 steps of conjugated gradient minimization with a final convergence criterion of 0.2 kcal∙mol−1. The following shows a brief description about three cycles of energy minimization: Cycle 1: the heavy atoms in protein and hit compound are restrained with an elastic constant of 50 kcal∙mol−1∙Å−2. The water molecules, ions, and hydrogen atoms are allowed to move for relaxation. Cycle 2: protein side-chains are also allowed to relax in this step. Only the heavy atoms in the backbone of the protein are restrained with an elastic constant of 50 kcal∙mol−1∙Å−2. Cycle 3: the whole system is minimized. Molecular dynamics simulation is then performed on the protein-hit complex with the periodic boundary condition in the NPT ensemble. The detail steps are described as the following: The solvated system is heated gradually from 0 to 298 K in the NVT ensemble during a period of 500 ps. The heated system is equilibrated in the NPT ensemble during a period of 1 ns. Production simulation at the temperature of 298 K and the pressure of 1 atm. The SHAKE algorithm is used to constrain all covalent bonds involving hydrogens. The particle mesh Ewald (PME) algorithm is used to calculate long-range electrostatic interactions and the short-range van der Waals (vdW) interactions is handled with a cutoff distance of 10 Å. The integration of the equations of motion is conducted at a time step of 2 fs. Finally, 1 ns production simulation is performed and the structure snapshots are collected every 1 ps (Troubleshooting 2). The fluctuations of the protein and hit compound are measured by root-mean-square deviations (RMSD) against their initial structures. The stability of the MD simulation is reflected by their RMSD values, which reveal the conformational changes during the time period of MD simulation. A plateau RMSD value means that the protein-hit complex is in the equilibration state, and that the simulation moves toward the stability status.

Collecting the conformational ensemble of protein-hit complex

Timing: ∼10 min After the molecular dynamics simulation, we collected the representative conformations from the equilibrated simulation trajectories. Consideration of structural flexibility has an effect on the accuracy of binding free energy calculation. Predicting free energy changes using structural ensembles has better performance than that using static structures (Benedix et al., 2009). 50 snapshots are extracted from the last 1 ns simulation trajectory by the cpptraj module in Amber program.

Fragment growing to generate the hit analogs

Timing: ∼1 h In this stage, AILDE generates hit analogs by linking the fragment molecules with the hit ligand in the initial protein complex (Figure 5).

Figure 5

Lead compounds generation by fragment growing strategy

This illustrates the process of converting a hit compound to a lead compound by fragment linking with the former.

All fragment molecules in our fragment library are linked with the hit ligand by AutoGrow software. The generated compounds with the protein are saved as protein-analogs complexes. The complex structures are relaxed by energy minimization via one of the following two strategies: When the parameter “md_request” is set to “NO,” structural relaxation without molecular dynamics simulation will be performed. First, all the backbone atoms of protein and the ligand are restrained with an elastic constant of 50 kcal∙mol−1∙Å−2. Then the whole system is minimized. Each of the two minimization steps includes 5,000 cycles of steepest descent and 5,000 cycles of conjugated gradient optimization with a convergence criterion of 0.2 kcal∙mol−1. When the parameter “md_request” is set to “YES,” structural relaxation with molecular dynamics simulation will be performed. The above two cycles of energy minimizations is performed. Then a short period of MD simulation is performed to relax the side-chain atoms. All the structure snapshots of protein-hit complex are relaxed by the same strategy as mentioned above (Troubleshooting 3). In our previous study (Wu et al., 2020), structural relaxation with molecular dynamics simulation may provide more accurate binding free energy results, but at the cost of more computing resources. Lead compounds generation by fragment growing strategy This illustrates the process of converting a hit compound to a lead compound by fragment linking with the former.

Predicting free energy changes for binding affinities

Timing: ∼2 h In this stage, we calculate the binding affinity (ΔGbind) of protein-hit and protein-analogs complexes, then compare the differences of binding free energy change (ΔΔG) between protein-hit and protein-analogs complexes. The binding affinity ΔGbind can be decomposed into three main energy terms, including the molecular mechanical gas-phase interaction energy (ΔEMM), desolvation energy (ΔGsol) and the change of conformational entropy (–TΔS). ΔEMM and ΔGsol are calculated by the MMPBSA module in Amber program. –TΔS is calculated using our in-house scripts. The entropy contribution is temperature-dependent and can be divided to two contributions: solvation entropy (ΔS) and conformational entropy (ΔS). The solvation entropy consists of the apolar and polar solvation entropy, which is typically given by an empirical formula that is proportional to the apolar and polar solvent accessible surface area differences. We use the molsurf and surf modules in Amber program to calculate the solvent accessible surface area. The conformational entropy is related to the change of the number of rotatable bonds during the binding process. The change of the conformational entropy is proportional to the number of the lost rotatable bonds during the binding process. The conf module in Amber is used to calculate the number of the rotatable bonds. More details can be found in the previous work (Pan et al., 2008). The binding affinity ΔGbind for a complex is the average value of 50 structural snapshots. The binding affinity differences ΔΔG between the protein-hit and protein-analogs complexes are then obtained as ΔΔG = ΔGbind (Analogs) – ΔGbind (Hit).

Expected outcomes

Figure 6 illustrates the examples of the expected outcomes of this protocol.

Figure 6

Examples of the expected outcomes of AILDE method

(A) The calculated binding free energy of the lead compounds with a protein receptor.

(B) Heatmap showing the activities of the lead compounds obtained by fragment linking. The horizontal axis represents the fragment molecules, and the vertical axis represents the atomic number of the atoms connecting the fragments. The darker the red, the more active the compound is. The darker the blue, the less active the compound is.

Examples of the expected outcomes of AILDE method (A) The calculated binding free energy of the lead compounds with a protein receptor. (B) Heatmap showing the activities of the lead compounds obtained by fragment linking. The horizontal axis represents the fragment molecules, and the vertical axis represents the atomic number of the atoms connecting the fragments. The darker the red, the more active the compound is. The darker the blue, the less active the compound is. (C) The interaction modes of the lead compounds with protein receptor. We obtain a series of hit analogs from the initial hit molecule. The binding affinity of each analog compound interacting with the protein has been evaluated. Then the binding affinity changes between the protein-analogs and protein-hit complexes are calculated. A negative change of binding free energy indicates an increase in the binding affinity, while a positive change of binding free energy indicates a decrease in the binding affinity. Potential lead compounds are identified based on the rank of the changes of binding free energy. Figure 6A shows the results of the binding affinity changes, which contain the information about the analog compound structures, fragment molecules, substituent positions, and binding affinity changes. In addition, a heatmap is plotted to graphically display the relationship between the generated analog compounds and their binding affinity changes (Figure 6B). The horizontal axis represents fragment molecules, and the vertical axis represents hydrogen atoms where the fragment molecules are linked. The darker the red, the more active the compound is. The darker the blue, the less active the compound is. AILDE not only generates the analog compounds, also predict their binding poses in the protein. The binding modes of the compounds interacting with the protein are explored by analyze the complex structures (Figure 6C).

Limitations

AILDE is based on an assumption that there are no substantial changes in the binding modes between protein-hit and protein-analogs complexes. AILDE could not achieve a good performance in a complicated situation of activity cliffs. This is caused by the compounds with similar structures, but have large differences in binding affinity (Stumpfe and Bajorath, 2012). In this protocol, a small fragment library is applied to generate the potential lead compounds. It is suggested to utilize larger chemical space to search and predict more possible lead compounds. The molecular weight of a fragment molecule is recommend to be less than 90 g/mol for a good performance.

Troubleshooting

Problem 1

AILDE reminders that there are any files not found.

Potential solution

All the input files for AILDE are identified in the parameter file. We should prepare a suitable parameter file according to the example file provided in AILDE source code. We recommend changing the paths of the necessary data file in the example file.

Problem 2

The molecular dynamics simulation terminates abnormally. We recommend using the protein complex with only one ligand compound as the initial structure to perform the molecular dynamics simulation. In preparing the input structure file of the protein complex with a hit ligand, we could remove the unconcerned chemical compounds in the complex structure.

Problem 3

The structures of the hit analogs generated by fragment growing are weird. We recommend that the conformational optimization should be performed for the fragment molecules in the step of a fragment library construction. Chimera software could be used to conduct the optimization.

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Gefei Hao (gefei_hao@foxmail.com).

Materials availability

This study did not generate new unique reagents.

Data and code availability

The online web service of AILDE can be accessed at http://chemyang.ccnu.edu.cn/ccb/server/AILDE. The code of AILDE can be downloaded at https://github.com/fwangccnu/AILDE.

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Deposited data

Three-dimensional structure	RCSB Protein Data Bank (PDB)	PDB: 2P3B
ZINC	(Irwin and Shoichet, 2005)	http://zinc12.docking.org/browse/subsets/
FDB-17	(Visini et al., 2017)	https://gdb.unibe.ch/tools/
PADFrag	(Yang et al., 2018)	http://chemyang.ccnu.edu.cn/ccb/database/PADFrag/index.php/help/download

Software and algorithms

AILDE source code	This study	https://github.com/fwangccnu/AILDE

11 in total

1. Exploring activity cliffs in medicinal chemistry.

Authors: Dagmar Stumpfe; Jürgen Bajorath
Journal: J Med Chem Date: 2012-01-27 Impact factor: 7.446

2. ZINC--a free database of commercially available compounds for virtual screening.

Authors: John J Irwin; Brian K Shoichet
Journal: J Chem Inf Model Date: 2005 Jan-Feb Impact factor: 4.956

3. Modeling the catalysis of anti-cocaine catalytic antibody: competing reaction pathways and free energy barriers.

Authors: Yongmei Pan; Daquan Gao; Chang-Guo Zhan
Journal: J Am Chem Soc Date: 2008-03-15 Impact factor: 15.419

4. Predicting free energy changes using structural ensembles.

Authors: Alexander Benedix; Caroline M Becker; Bert L de Groot; Amedeo Caflisch; Rainer A Böckmann
Journal: Nat Methods Date: 2009-01 Impact factor: 28.547

5. Integrated Strategy for Lead Optimization Based on Fragment Growing: The Diversity-Oriented-Target-Focused-Synthesis Approach.

Authors: Laurent Hoffer; Yuliia V Voitovich; Brigitt Raux; Kendall Carrasco; Christophe Muller; Aleksey Y Fedorov; Carine Derviaux; Agnès Amouric; Stéphane Betzi; Dragos Horvath; Alexandre Varnek; Yves Collette; Sébastien Combes; Philippe Roche; Xavier Morelli
Journal: J Med Chem Date: 2018-06-22 Impact factor: 7.446