Literature DB >> 19255632

Pre-docking filter for protein and ligand 3D structures.

Alisa Wilantho1, Sissades Tongsima, Ekachai Jenwitheesuk.   

Abstract

Virtual drug screening using protein-ligand docking techniques is a time-consuming process, which requires high computational power for binding affinity calculation. There are millions of chemical compounds available for docking. Eliminating compounds that are unlikely to exhibit high binding affinity from the screening set should speed-up the virtual drug screening procedure. We performed docking of 6353 ligands against twenty-one protein X-ray crystal structures. The docked ligands were ranked according to their calculated binding affinities, from which the top five hundred and the bottom five hundred were selected. We found that the volume and number of rotatable bonds of the top five hundred docked ligands are similar to those found in the crystal structures and corresponded with the volume of the binding sites. In contrast, the bottom five hundred set contains ligands that are either too large to enter the binding site, or too small to bind with high specificity and affinity to the binding site. A pre-docking filter that takes into account shapes and volumes of the binding sites as well as ligand volumes and flexibilities can filter out low binding affinity ligands from the screening sets. Thus, the virtual drug screening procedure speed is increased.

Entities:  

Keywords:  binding affinity; docking filter; ligand volume; protein binding site; virtual drug screening

Year:  2008        PMID: 19255632      PMCID: PMC2646187          DOI: 10.6026/97320630003189

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

Virtual screening techniques are becoming increasingly more important in drug discovery. A popular method for virtual screening is molecular docking [1,2], which selects small-molecule structures from databases such as ChemBank [3], ChemPDB [4], KEGG [5], and NCI [6] and docks them into the protein binding site [7]. These processes involve the prediction of binding energies and analysis of molecular binding modes, which are time consuming and computationally expensive. The two-dimensional (2D) fingerprint technique, a virtual screening method which measures the structural similarity of molecules has been developed to address the above problems [8]. The similarity search is based upon the “similar property principle”, which states that molecules that are structurally similar are likely to have similar properties [9]. This technique uses a ligand with known chemical properties, inhibitory activities, or binding modes for a target of interest as a reference for searching similar ligands in the database regardless of the shape and size of the protein binding site. The accuracy of this method depends on which similarity coefficient is used [1-12], and the Tanimoto coefficient is most popularly employed [13]. Based on the “lock-and-key” principle, we propose a novel pre-docking procedure that matches the sizes of the ligand with the protein binding site, and optimizes the grid-box size before docking. This simple procedure dramatically reduces the size of screening ligand sets, significantly reducing time and effort required for virtual drug screening.

Methodology

Preparation of ligand and protein structures for docking

Preparation of ligand structures

The ligands used in this study were downloaded from Ligand.Info Meta-Database [14]. Ligands set-1 consisted of 2344 structures from ChemBank and set-2 consisted of 4009 structures from ChemPDB. The downloaded ligands in the SDF format were first converted to the Protein Data Bank (PDB) format using Open Babel [15]. The Gasteiger charges and rotatable bonds were then assigned to the PDB ligands using AutoDockTool [16]. All rotatable bonds were allowed to move freely.

Preparation of protein structures

Twenty-one protein X-ray crystal structures from the Protein Data Bank [17] were downloaded. The proteins and their PDB structure identifiers (PDB ID) are given in Table 1 (supplementary material). Of the twenty-one protein structures, sixteen have co-crystallized ligands (X-ray ligand) in the binding site. The ligand contained in each protein structure was removed from the binding site and saved to a new file. The missing atoms in each protein structure were searched for and fixed using SwissPDB [18]. The Gasteiger charges and the solvation term were then added to the protein structure using the AutoDockTool.

Ligand SMILES string similarity search

The ligands extracted from the X-ray crystal structure obtained from the previous step were converted to the SMILES string format, and used as an input for similarity search against the ligands in the ChemBank and the ChemPDB sets in step of the Preparation of ligand structures using Tanimoto coefficient cutoffs of 0.5, 0.6, and 0.7, respectively.

Calculation of ligand molecular volume

The volumes of the ligands in the screening set were calculated using Mol_Volume version 1.0 [19]. The van der Waals radii value for each atom type was derived from the CHARMM 22 force field. The radius of the spherical probe (R_PROBE) was set to 2.0 Å, and the GRID_STEP was set to 0.5 Å. The volumes of the ligands extracted from the X-ray crystal structures were calculated using the same protocol. The extracted ligand name (X-ray ligand) and its calculated volumes are shown in Table 2 (supplementary material).

Calculation of protein binding site molecular volume

Protein binding site volumes were calculated using the CASTp server (http://sts-fw.bioengr.uic.edu/castp) [20]. The solvent probe radius used for volume calculation was 1.4 Å. CASTp identifies all surface pockets with the chosen volume values, and then displays them on the computer screen. Pockets calculated by CASTp that matched the pocket resolved by crystallography were selected, and the volume of that calculated pocket was taken as the volume of the protein binding site.

Protein-ligand docking

Grid-box generation

The grid parameter file of each protein was generated using AutoDockTool. A grid-box was generated that was large enough to cover the entire protein binding site and accommodate all ligands to move freely. The number of grid points in x, y, and z-axes were 60×60×60. The distance between two connecting grid points was 0.375 Å. The center of the ligand in the X-ray crystal structure was used as the center of the grid-box. For protein structures that do not have ligands in the binding site, the center of the binding site was estimated from the structure and taken as the center of the grid-box.

Ligand docking

AutoDock4 and a Lamarckian Genetic Algorithm (LGA) [21] were used for protein-fixed ligand-flexible docking calculations. Ten search attempts (ga_run parameter) were performed for each ligand. The maximum number of energy evaluations before the termination of LGA run was 2500000 and the maximum number of generations of the LGA run before termination was 27000. Other docking parameters were set to the software’s default values. After docking, the ligands were ranked according to their protein-ligand affinity (calculated inhibitory constant, Ki).

Discussion

Size and shape of protein binding site

In this study, we categorized protein binding sites according to their sizes and shapes. Protein binding sites were classified as small (less than 1200 Å3) or large (greater than or equals to 1200 Å3). Protein binding site shapes were classified as either simple or complex. The protein binding site classifications are shown in Table 1 (supplementary material). Sixteen protein structures had co-crystallized ligands bound in the binding site. The calculated volumes show that the majority of ligands are larger (305–5922 Å3) than the binding sites (1040–2690 Å3) in particular those in the small binding site group. However, the average volumes of the ligands (1684 Å3) and the binding sites (1638 Å3) are very similar. The protein binding site typically accommodates 50–70% of the ligand, with the remainder of the ligand occupying pockets adjacent to the binding site. For example, 50% of the GDP ligand (volume = 1460 Å3) was contained within the small binding pocket (volume = 594 Å3) of the “Filamenting temperature-sensitive mutant Z” protein (PDB ID: 1RQ7), while the rest of GDP ligand occupied pockets close to or floating over the binding site. This suggests that the optimal ligand size can potentially exceed the binding site volume.

X-ray ligand docking and ranking

To verify the docking procedure utilized in this work, we re-docked the original X-ray ligand back to its corresponding protein binding site. The X-ray ligands along with all other ligands in the screening set were ranked according to the calculated Ki. The X-ray ligands were ranked in the top ten percentiles and were also able to move back to the original positions with the root mean square deviations of less than 3 Å.

Docked ligand size and flexibility

The top 500 and the bottom 500 ligands ranked according to the Ki value for each protein were selected for further analysis. A scatter plot of the molecular volumes and the number of active torsion bonds for these ligands is shown in Figure 1. The top 500 ligands are clearly coincident with the sixteen X-ray ligands, which occupy volumes of 800-2800 Å3, whereas the bottom 500 ligands occupy volumes outside this range, with 95% much smaller (300–900 Å3). There does not appear to be any correlation between the number of active torsion bonds and calculated Ki; however, the majority of the top 500 ligands have twenty or fewer active torsion bonds. These data suggest that ligands with high binding affinity are constrained by their size (volume 800–2800 Å3) and flexibility (20 or fewer active torsion bonds). For untested ligands, these parameters could be useful to prioritize docking calculations, so that priority is given to ligands of optimal size and flexibility.
Figure 1

The volume and the number of active torsion bonds of the top 500 (red) and the bottom 500 docked ligands (blue) ranked according to the calculated inhibitory constant (Ki). Of the twenty-one protein X-ray crystal structures used in this study, sixteen structures had ligand bound in the binding site. The top 500 ligands generally had structural profiles in terms of volume and number of active torsion bond similar to those of the X-ray ligands (green) while the bottom 500 ligands were, on average, 800-900 Å3 smaller than the X-ray ligands.

Optimal size of the grid-box

In this study, a very large grid-box (10830 Å3, 22.125 Å on each side) was used because we wanted to ensure that the grid-box could cover the entire binding site, and that all ligands in the screening sets had enough space to enter and move freely in the grid-box. The volume of the grid-box was 10830 Å3 while the volume of the largest protein binding site was only 5921.8 Å3 (PDB ID: 1N8W). We hypothesized that using a very large grid-box would allow the binding of some ligands to extend beyond the actual binding site, with non-specific binding into adjacent pockets. We tested this hypothesis by generating minimal grid-boxes that perfectly encompassed the entire binding site for each protein. The box sizes and their dimensions are shown in Table 3 (supplementary material). The top 500 ligands were left on protein at the docked positions. Filtering of the top 500 docked ligands was performed to test how ligands occupy space beyond the protein binding site. Six thresholds of decreasing stringency, allowing progressively more of the ligand atoms to be outside the minimal grid–box were used (Table 4 in supplementary material). It is clear that on average, the top 500 ligands cannot fit entirely within the minimal grid-boxes extending outside them, since 10.8% of the ligands were rejected, even when a very relaxed 30% threshold was employed (Table 4 in supplementary material). Visual inspection of the docked structures revealed that the protein binding sites contain at least one opening space, and parts of the docked ligands were always outside of the minimal box on this side. The rejected ligands might be either too large or too long to fit entirely within the minimal box, or their chemical properties may not match perfectly well with the binding pocket so that parts of their structures bind preferentially with adjacent pockets Reducing the grid-box size would significantly reduce CPU time for docking calculation, an important consideration for drug-discovery when potentially millions of compounds are screened. However, it is clear from our data that this would also increase the false negative rate, leading to some high binding-affinity ligands to be missed. These false-negatives would likely include molecules with long linear shapes, or with branches which extend beyond the target binding site and bind to adjacent pockets, in particular on the opening space side (see above). We propose that the optimal grid-box size allows approximately two-thirds of ligand molecule to occupy the target binding site, with the remaining one third able to bind with adjacent pockets. Grid-boxes of this size provide the optimal balance between the number of screening ligands and the CPU time required for docking.

SMILES strings similarity of the docked ligands and the X-ray ligands

We further explored whether the top 500 and the bottom 500 ligands docked on each protein were chemically similar to the X-ray ligand extracted from the protein-ligand co-crystal structures using the SMILES strings similarity search. The results show that, in general, more of the top 500 ligands matched with the X-ray ligands than of the bottom 500 ligands (Table 5 in supplementary material). On the other hand, even at a Tanimoto coefficient of 0.5, only thirteen ligands in the top500 list matched with the X-ray ligand. This indicates that most of the potential hits were chemically dissimilar to the X-ray ligands, yet similar in size as discussed above. Conversely, at a Tanimoto coefficient of 0.5, six out of the bottom 500 ligands matched with the X-ray ligand, suggesting that these ligands although similar to the X-ray ligand have chemical properties that are unfavorable to interactions with the binding site.

Conclusion

There are millions of ligand structures currently available in public databases. Virtual screening of these ligands against a protein target using protein-ligand docking methods requires lengthy calculations on a high performance computer. This is a major obstacle that prevents several research groups in academia, especially those with limited computer resources to conduct research in this field. To overcome this problem, several computational techniques have been developed to reduce the calculation time. In this study, we show that ligand size may be used as an initial criterion for prioritizing ligands for docking. The ligands that are greatly different in size to that of the X-ray ligand, or to the binding site volume may be set to have low priority for screening or removed from the screening set, since these usually have low calculated binding affinity (Ki). On the other hand, ligands of optimum size within ±1000 Å3 of the X-ray ligand or binding site volume may be assigned higher priority since they tend to bind with higher calculated affinity. In addition to clustering ligands into groups, adjusting the grid-box size would also help limit the number of ligands to be screened. The optimum grid-box allows one-third of the ligand to lie outside of the target binding site. Our new screening procedure, which takes into account the ligand size, the binding site volume, and the grid-box size, is easy to perform and could significantly reduce time and effort required for virtual drug screening.
  18 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 2.  Ligand.Info small-molecule Meta-Database.

Authors:  Marcin von Grotthuss; Grzegorz Koczyk; Jakub Pas; Lucjan S Wyrwicz; Leszek Rychlewski
Journal:  Comb Chem High Throughput Screen       Date:  2004-12       Impact factor: 1.339

Review 3.  Similarity-based virtual screening using 2D fingerprints.

Authors:  Peter Willett
Journal:  Drug Discov Today       Date:  2006-10-20       Impact factor: 7.851

4.  Multifingerprint based similarity searches for targeted class compound selection.

Authors:  Thierry Kogej; Ola Engkvist; Niklas Blomberg; Sorel Muresan
Journal:  J Chem Inf Model       Date:  2006 May-Jun       Impact factor: 4.956

5.  New similarity-based algorithm and its application to classification of anticonvulsant compounds.

Authors:  Alan Talevi; Julián J Prieto; Luis E Bruno-Blanch; Eduardo A Castro
Journal:  J Enzyme Inhib Med Chem       Date:  2007-06       Impact factor: 5.051

6.  Molecular docking.

Authors:  Garrett M Morris; Marguerita Lim-Wilby
Journal:  Methods Mol Biol       Date:  2008

7.  Modeling, docking, and simulation of the major facilitator superfamily.

Authors:  John Holyoake; Victoria Caulfeild; Stephen A Baldwin; Mark S P Sansom
Journal:  Biophys J       Date:  2006-09-15       Impact factor: 4.033

8.  CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues.

Authors:  Joe Dundas; Zheng Ouyang; Jeffery Tseng; Andrew Binkowski; Yaron Turpaz; Jie Liang
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

9.  The Blue Obelisk-interoperability in chemical informatics.

Authors:  Rajarshi Guha; Michael T Howard; Geoffrey R Hutchison; Peter Murray-Rust; Henry Rzepa; Christoph Steinbeck; Jörg Wegner; Egon L Willighagen
Journal:  J Chem Inf Model       Date:  2006 May-Jun       Impact factor: 4.956

10.  ChemBank: a small-molecule screening and cheminformatics resource database.

Authors:  Kathleen Petri Seiler; Gregory A George; Mary Pat Happ; Nicole E Bodycombe; Hyman A Carrinski; Stephanie Norton; Steve Brudz; John P Sullivan; Jeremy Muhlich; Martin Serrano; Paul Ferraiolo; Nicola J Tolliday; Stuart L Schreiber; Paul A Clemons
Journal:  Nucleic Acids Res       Date:  2007-10-18       Impact factor: 16.971

View more
  1 in total

1.  Comparative evaluation of several docking tools for docking small molecule ligands to DC-SIGN.

Authors:  Gregor Jug; Marko Anderluh; Tihomir Tomašič
Journal:  J Mol Model       Date:  2015-06-04       Impact factor: 1.810

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.