Literature DB >> 25229183

Application of the 4D fingerprint method with a robust scoring function for scaffold-hopping and drug repurposing strategies.

Adel Hamza¹, Jonathan M Wagner, Ning-Ning Wei, Stefan Kwiatkowski, Chang-Guo Zhan, David S Watt, Konstantin V Korotkov.

Abstract

Two factors contribute to the inefficiency associated with screening pharmaceutical library collections as a means of identifying new drugs: [1] the limited success of virtual screening (VS) methods in identifying new scaffolds; [2] the limited accuracy of computational methods in predicting off-target effects. We recently introduced a 3D shape-based similarity algorithm of the SABRE program, which encodes a consensus molecular shape pattern of a set of active ligands into a 4D fingerprint descriptor. Here, we report a mathematical model for shape similarity comparisons and ligand database filtering using this 4D fingerprint method and benchmarked the scoring function HWK (Hamza-Wei-Korotkov), using the 81 targets of the DEKOIS database. Subsequently, we applied our combined 4D fingerprint and HWK scoring function VS approach in scaffold-hopping and drug repurposing using the National Cancer Institute (NCI) and Food and Drug Administration (FDA) databases, and we identified new inhibitors with different scaffolds of MycP1 protease from the mycobacterial ESX-1 secretion system. Experimental evaluation of nine compounds from the NCI database and three from the FDA database displayed IC50 values ranging from 70 to 100 μM against MycP1 and possessed high structural diversity, which provides departure points for further structure-activity relationship (SAR) optimization. In addition, this study demonstrates that the combination of our 4D fingerprint algorithm and the HWK scoring function may provide a means for identifying repurposed drugs for the treatment of infectious diseases and may be used in the drug-target profile strategy.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2014 PMID： 25229183 PMCID： PMC4210175 DOI： 10.1021/ci5003872

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Introduction

Computational methodologies utilized for in silico high throughput screening (HTS) are a critical component of drug discovery approaches.[1−7] Within the available in silico HTS approaches, methodologies that combine ligand- and structure-based screening procedures find the widest application.[1,8] The challenge in any HTS virtual screening (VS) platform is to develop an algorithm that is sufficiently fast and robust to evaluate many compounds while maintaining sufficient accuracy to identify a subset of biological active compounds (i.e., hits) that have diverse structural scaffolds (i.e., scaffold-hopping). We sought to employ in silico screening to evaluate the repurposing of current drugs for a new therapeutic target.[9−11] Drug-repurposing maximizes the potential value of each hit by screening well-known compounds that have minimal toxicity and/or few side-effects.[12−14] Comparative studies of well-established ligand- and docking-based approaches concluded that shape-based ligand screening yielded markedly better outcomes than protein docking schemes.[15−18] A ligand-based computational method involved two essential elements: [1] an efficient similarity measure and [2] a reliable scoring method. The similarity measure varied among different methods and focused on three factors: pharmacophores, molecular shapes, and molecular fields. The molecular-shape approaches maximized the overlap of shapes and determined a similarity value based on the degree of shape overlap. Over the years, despite the investment made in developing scoring functions for molecular-shape approaches, none possessed accuracy and general applicability. Every scoring function had its advantages as well as its limitations. Consequently, investigators turned to the consensus-scoring technique that improved the probability of finding solutions by combining the scores from multiple scoring functions or using different reference molecules.[15,19−22] We recently developed an efficient 3D shape-based similarity algorithm encoding the consensus molecular shape pattern of a set of active ligands into one descriptor, called the 4D fingerprint (Figure 1). The 4D fingerprint formalism was originally proposed by Hopfinger and co-workers and developed the quantitative structure–activity relationships (4D-QSAR) model.[23] The 4D-QSAR model estimates molecular similarity measures as a function of conformation, alignment, and atom type.[24] The resulting descriptors values were the occupancy measures for the atoms in the investigated set of bioactive molecules. While the similarity measures achieved excellent predictions for a variety of enzyme inhibitors,[25−27] the weakness of this approach lies with the occupancy measures for the atoms (or pharmacophoric groups) which may also be present in similar, “inactive” compounds.[28]

Figure 1

Ligand and structure shape-based VS approach using the 4D fingerprint. The resulting 4D fingerprint encoded in the 3D shape of the candidate ligand B is docked and ranked using the HWK scoring function. The application of the 4D fingerprint to the ligand B decreases the interaction (purple arrow) with the receptor. The 4D fingerprint approach implemented in the Shape-Approach-Based Routines Enhanced or SABRE program possessed a number of attractive advantages over other VS methods.[29,30] First, it depended explicitly on 3D shape, not on the underlying chemical structure, and thus it excelled in identifying novel chemical scaffolds based on a set of known active ligands (scaffold-hopping). The iterative 4D fingerprint approach was particularly robust for several reasons: (i) the 4D fingerprint descriptors were very sensitive to the details of molecular shape of active ligands, reducing the need to use multiple conformers of multiple query structures; (ii) the method excel by the incorporation of the spatial distributions of chemical features of similar inactive ligands during the optimization and screening procedures; (iii) the algorithm was fast and had the ability to scan a library of millions of compounds in a matter of hours. The method unified ligand- and structure-based 4D fingerprint VS approaches by docking the shape filtered ligand structures into the receptor-binding cavity. Finally, running searches using this methodology was remarkably easy and required only that the end-user supply a query structure and runtime parameters to control the number of hits that were returned. Despite these advantages, the 4D fingerprint method, as previously reported, suffered from a weakness in the empirical HWZ scoring function[17] for ranking and selecting the active ligands from large databases. To remedy this deficiency, we modified the shape-based VS algorithm of the SABRE program and implemented a new, robust scoring function that accommodated the diversity of ligand scaffolds with an accuracy that exceeded our prior efforts. Tuberculosis (TB) is a chronic and complex disease resulting from infection with the bacterium Mycobacterium tuberculosis. TB remains an important public health problem worldwide, with 8.6 million estimated cases and 1.3 million deaths attributed to the disease in 2012.[31] In order to combat the spread of TB—particularly resistant strains of M. tuberculosis—it is necessary to identify new molecular targets for TB drugs and develop new, more efficient, methods for screening ligands as potential drug candidates than methods used in the past. Historically, high-throughput screening (HTS) approaches coupled with in vitro testing served to identify promising hits with anti-TB activity. While successful in some cases, the HTS approach frequently failed in the antibacterial drug discovery area due to the poor ADMET properties and insufficient or improper molecular diversity of the compounds screened.[32,33] The mycobacterial ESX secretion system, also referred to as the type VII secretion system, represents a promising, new target for TB drug development.[34,35] The ESX secretion system is a specialized system unique to mycobacteria that secrete a large number of proteins necessary for M. tuberculosis virulence.[36−38] Each ESX secretion system includes a membrane-associated subtilisin-like protease, called the mycosin: MycP1–MycP5 (numbered according to gene cluster). MycP1 from the ESX-1 system hydrolyzes the ESX associated protein B (EspB) during secretion,[39−41] and this processing affects virulence in a mouse model of TB infection.[42] The recent description of the molecular structure and substrate specificity of MycP1[43−45] prompted interest in MycP1 as a promising target for structure-based drug design. Recently, we applied the combined ligand- and structure-based virtual screening procedure and 4D fingerprint algorithm to identify new inhibitors for MycP1 protease.[30] The study reported here extends our previous work and reports a rational approach for ranking the ligand databases and demonstrates the performance of the novel HWK scoring function using 81 targets from the DEKOIS database.[46] Validation of the efficiency of the VS method for scaffold-hopping and drug repurposing involved the application of this methodology for the identification of diverse inhibitor scaffolds against MycP1 protease and experimental testing of some of these scaffolds in an in vitro enzyme assay.

Methods

We have recently developed a ligand and structure shape-based VS algorithm implemented within the SABRE program.[29,30] Unlike other ligand-based shape overlapping methods,[47−49] our approach efficiently detected the key pharmacophore groups of the active ligands responsible for binding to the target. The main advance of our methodology resided in the consideration of “virtual” but similar inactive structures (decoys) during the consensus molecular-shape detection process (Figure 1). After similarity scoring, the selected structures were ranked according to their shape complementarity in the receptor-binding site. This report highlights the major steps of the algorithm and describes the approach used for the development of this scoring function.

Enhanced Molecular Shape-Density Model

The molecular shape density function φ() of a ligand is expressed in terms of the shape density functions of individual atoms and their overlapin which each atom i with coordinates R = (X, Y, Z) is described by a spherical Gaussian:[50−53]where σ is the van der Waals radius of the atom i. The molecular volume V of the ligand is defined as[54]The volume v of an atom i isThe intersection volume of atom pairs is defined asThe overlap volume of two molecules A and B is defined asIn the SABRE algorithm, the shape-density model is enhanced and defined as a linear combination of weighted atomic Gaussian functions.[18,29,30] Thus, the molecular shape-density is the sum of all individual weighted pharmacophore densities, and the molecular volume is defined aswhere Vpharm– = ∑∫dr ρ(r) is the partial volume of the pharmacophore group k and is defined as a linear combination of atomic Gaussian functions. The optimal coefficients {C} are determined by iteratively adjusting the coefficients of the set of known active ligands {A} in the presence of virtual decoy structures {B} (virtual decoys are inactive similar compounds that are not necessarily synthetically feasible or identified in the previous VS rounds) until they satisfy these two criteria:This algorithm quickly builds a consensus molecular-shape pattern in which the optimal coefficients {C} define a 4D fingerprint of the entire set of active ligands that also effectively excludes structurally similar, but inactive ligands (decoys) (Figure 1).[29,30]

Rational Approach for Developing a Robust and Efficient Scoring Function

Given a set of known active ligands {A} with volumes V{A and a query structure A with the volume VA, we effect the shape-filtering and ranking the candidate molecule B with volume VB. During the VS process, we observe two trends: (i) VB ≤ VA or (ii) VB ≥ VA. As a result, we can rank the structure B according to either the condition (i), (ii), or the combination of (i) and (ii) for two different ligands B and B. Thus, we have two possible outcomes:and we obtain the Tanimoto scoring function:[55]We rewrote eqs 8 and 9 as smooth Gaussian distributions and defined the scoring function HWK (Hamza–Wei–Korotkov) that converges to one for optimal similarity:The Tanimoto function and eqs 11 and 12 clearly reveal that the ranking of the candidate structure B is determined by several inhomogeneous criteria. For a fixed overlap volume, eq 11 gives the highest score for the ligand B with a smallest size even if it possess less similar chemical features than other ligands with larger volume sizes. Second, the VAB term takes into account the overlap volume of the full ligand size instead the volume of the key chemical features present in both the query A, and candidate ligand B. This result in a higher ranking for the ligand B with largest size since the overlap volume varies with the full size of the ligand. Third, we recently demonstrated the weakness of the Tanimoto scoring function when used for filtering the 3D shape of the ligands and found that the Tanimoto function only efficiently ranks the ligands with comparable volume size to the query.[17] The volume of the candidate structure B is smaller than the volume of the query A: VB ≤ VA. The maximal overlap volume VABfor the two structures is restricted as VAB ≤ VB and rewritten as The volume of the structure B is larger than the volume of the query A: VB ≥ VA and VAB ≤ VA is rewritten as We illustrate this case by considering two candidate structures with VB ≤ VA and VB ≥ VA. For clarity we consider that they have the same overlap volume with the query structure and consequently, VAB ≈ VAB. Adding eqs 8 and 9 gives These drawbacks can be overcome by taking into account the specific atom-type information, such as the consensus molecular shape pattern or “4D fingerprint” of the set of known active ligands. According to the 4D fingerprint approach (eq 7), the volume of the query A and the set of active ligands {A} are defined asThe optimization of the coefficients {c} leads to the residual volume Vε ≪ VAPHARM, and the value of the VAPHARM term in eq 7 ranges across the interval [min V{APHARM; maxV{APHARM]. In the following demonstration, we assumed that the candidate B is an active ligand and similar to the query A in the set of active ligands {A}, since if B is dissimilar the overlap volume converges to a small value. We haveUsing the computed 4D fingerprint coefficients, the volume of B is written asthe volume of B is defined by eq 7 and is equals to the sum of the weighted partial volumes of the key pharmacophore groups and its volume size is in the interval of the set of active ligand volumes V{A*. Three possible scenarios exist:The optimal similarity is reached for T = VB*/VA* ≈ T = VA*/VB* → 1. VB* ≤ VA* In this case, VB* ∈ [min(V{A*); VA*] and the optimal similarity value is reached for VB* ≥ VA* In this case, VB* ∈ [VA*; max(V{A*)] and the optimal similarity is reached for According to the eqs 13 and 14 and the 4D fingerprint coefficients, the Tanimoto scoring function is written as T = VAB/(VB* + VA* – VAB) with As a result, the candidate ligands with a volume size slightly smaller or larger than the query volume are ranked equivalently when using the 4D fingerprint method with the Tanimoto equation. Finally, we observe that the weighting coefficients of the 4D fingerprint adjust and group the unknown “active” candidate structures with miscellaneous volume sizes and scaffolds into three classes relative to the query size. This confers the advantage of ranking more effectively these three types of shape using the HWK scoring function (HWK–, HWK+, HWKTanimoto).

Shape-Fitting Procedure

The docking approach described in our previous work combined the performance and speed of the ligand-based 4D fingerprint method with the shape characteristic of the receptor binding site.[29] The current SABRE docking algorithm encodes both the 4D fingerprint and the novel HWK scoring function, and it generates alignments where patterns with similar binding character are oriented in a similar fashion in the binding site of the receptor (Figure 1). During the rigid docking process, the SABRE program takes into account only the pharmacophore groups present in the VBPHARM(VB*) that interact in designated ways with key receptor atoms. Five important chemical features were assigned to an atom type: hydrogen bond donor, hydrogen bond acceptor, acidic center (negatively charged at physiological pH 7), basic center (positively charged at pH 7), and metal-chelation. The main novelty of the SABRE docking approach is that the pairwise interaction between the key pharmacophoric groups (defined by the 4D fingerprint, Figure 1) of the ligand and the receptor atoms are calculated using the Gaussian function G. The pairwise interaction of atoms i (ligand atoms) and j (receptor atoms) is defined aswhere DEqtype is the standard distance between the heavy atoms i and j for each “type” of interaction (i.e., hydrogen bond interaction, electrostatic interactions) and d is the distance between the two atom centers. The parameter ωtype is a freely adjustable parameter and controls the distribution of the Gaussian function. The parameter λtype controls the weight of the interaction type and depends on the 4D fingerprint coefficients. The pairwise interaction is attractive for λtype > 0 and repulsive for λtype < 0. The total of n pairwise interactions between the pharmacophoric groups of the ligand and the key receptor atoms is defined by the geometric mean GTOTAL(d) asThus, the combination of the total pairwise interactions GTOTAL(d) and eqs 10–12 takes into account both the 4D fingerprint and the key interacting pharmacophoric groups of the ligand and leads to improved enrichment of the VS process. The HWKDock scoring function of the SABRE docking method is summarized by the three equations:It is interesting to note that a ligand with high similarity score (eqs 10–12) is reranked with a lower score if its chemical features are close to repulsive receptor atoms. Therefore, our scoring strategy developed in the docking method combines the fast and efficient ligand-shape-based 4D fingerprint VS with an extremely quick calculation of the interactions between the ligand pharmacophoric groups and the key receptor atoms. In addition, we observe in Figure 1 that the interaction (purple arrow) involving the pharmacophore groups present in both active and decoy structures become negligible. Analysis of the HWKDock function (eqs 17–19) highlights a new strategy for improving scaffold-hopping and drug repurposing performances. During the VS campaign, the shape size of the query is fundamental and orients the choice of the scoring equations. Thus, if the shape of the query is small and does not completely fill the receptor binding cavity, the HWKDock+ is appropriate to identify structural hits with either comparable or larger volume sizes than the query volume. However, the hits with volume sizes ranging in the interval [min(V{A*); VA*)] of the set of active ligands (eq 14) are also ranked with a high HWKDock+ score. In contrast, if the shape of the query complement the receptor binding cavity, the HWKDock– is better suited to identify hits with smaller volume sizes while keeping high overlap volume with the pharmacophore groups of the query. Finally, the HWKDockTanimoto is effective to identify hits with comparable query shape (i.e., rather smaller or larger structural sizes that fit into the receptor binding cavity while retaining structural diversity). Consequently, three different classes of hits emerge based on the equations of the HWK function that selected them. In each list, the compounds are first ranked according to query similarity using the 4D fingerprint approach, and the diversity is achieved by selecting compounds ranked highly using one of these scoring equations.

Evaluation of VS Efficiency and Robustness Using the Novel HWK Scoring Function

The DEKOIS (version 2.0) database of annotated active compounds and decoys was used to validate the HWK scoring function.[46] The DEKOIS database is a publicly available VS test database consisting of 81 targets. For our purposes, the ratio of the number of decoys to the number of active ligands was fixed at 30. We used the DEKOIS database instead of the 40 targets of the DUD database for our VS test in order to measure the robustness of the scoring function when screening a large number of targets. This is one of the most commonly encountered measures for estimating prediction accuracy of VS algorithms. The effectiveness of the SABRE program was evaluated using the enrichment factor (EF) metric at a given percentage of the database screened,[56−58] and the area under the ROC (receiver operator characteristics). To test the efficiency of the HWK scoring function defined by eqs 10–12, we screened each target 10 times using a different set of five randomly selected active ligands (as templates) and reported both the highest performance (ROC AUC value) and enrichment factor EF at 1% for the 81 targets. Screening results using the empirical HWZ scoring function were reported for comparison.[17] For each screening test, the five template structures where first removed from the list of active ligands.

Identification of Potential Inhibitors of MycP1 Protease

The detailed VS procedure was described in our previous work[30] and summarized in the Supporting Information (Supplementary Figure SI-1). Briefly, the 4D fingerprint algorithm defined in the 3D-shape-based similarity method of SABRE was used as the first filter of the NCI (National Cancer Institute) from the NCI Open Database Compounds (Release 4, ∼265 000 structures) and FDA (Food and Drug Administration, 1217 compounds) database downloaded from the ZINC database.[59] It is important to note that the 4D fingerprint was generated using both the previously identified leads (active ligands) and inactive compounds (considered as decoys).[30] The multiple conformation states of each ligand in the database were generated using OMEGA (OpenEye Scientific Software).[60−62] Thereafter, we utilized the “docking option” of the SABRE program to place the filtered conformations of each ligand into the active site of the MycP1 (PDB ID: 4HVL) and ranked them using the HWK scoring function.

Drug Repurposing Approach Using the 4D Fingerprint

According to eq 14, the candidate B structure is similar to the set of known active ligands {A} if VB* ∈ [min(V{A*); max(V{A*)] and VB* ≈ VAPHARM ≈ VAB, which results in HWK converging to 1. The volume VAPHARM is defined by the 4D fingerprint coefficients {C} that encoded the chemical features of the consensus molecular shape pattern of known active ligands (eq 7) for the specific binding target. Therefore, this suggests that the best fitted and most highly ranked ligands from the VS of the database have similar 4D fingerprint coefficients and thus should interact with the receptor of the known active ligands (concept of the drug-target profile). It is important to note that this approach considers only the 4D fingerprint and the fast-fitting method implemented in the SABRE program. To validate the effectiveness of SABRE for drug repurposing, we conducted a ligand- and structure-based VS procedure using the FDA database.

In Vitro Assay of MycP1 Inhibitors

Recombinant Mycobacterium thermoresistibile MycP1 was expressed and purified as reported previously.[43] A quenched, fluorescent peptide assay was used to measure the activity of MycP1 in the presence of inhibitors. MycP1 was used to digest 20 μM of the fluorescent substrate, AbzAVKAASLGK(Dnp)OH (GenScript Inc.). Potential MycP1 inhibitors identified by SABRE were diluted to a concentration of 150 μM, and assays were measured in 96-well format. Compounds that were considered hits showed less than 50% activity compared to controls (DMSO-buffer blank). The same in vitro assay was used to measure the inhibitory concentration 50% (IC50) of the most promising hits. For IC50 measurements, inhibitors were added at 0, 5, 10, 50, 100, 200, 350, and 500 μM concentrations. Initial rates of fluorescent peptide hydrolysis were measured then incorporated into dose–response curves using GraphPad Prism.

Results and Discussion

Performance of the SABREHWK Scoring Function

We evaluated the accuracy of the HWK scoring function and compared the current values to those obtained with the empirical HWZ scoring function[17] using the multi conformational states of the decoys and active ligands of the 81 targets from the DEKOIS database. The merits of scoring function became clear as it accurately ranked compounds with subtle structural changes. In the present VS trial, we selected query molecules according to the procedures presented in Kirchmair et al.[63] and used to describe the performance of the algorithms.[29,49] As shown in Figure 2, we evaluated the AUC for each target of the DEKOIS database and the average AUC using the SABREHWK and SABREHWZ scoring functions. The average AUC value of the best performing query for the 81 DEKOIS targets using HWK and HWZ scoring functions was 0.875 ± 0.054 and 0.851 ± 0.054, respectively. The two scoring functions had similar overall performance for some of the 81 targets based on the AUC metric; however, an analysis of the complete set of targets revealed that SABREHWK performed more consistently in terms of AUC with an average AUC ≥ 0.9 for 26 targets and 0.9 > AUC ≥ 0.8 for 51 targets than SABREHWZ. Moreover, SABREHWK did not fail for any of the 81 targets screened. In comparison, SABREHWZ ranked the screening results with an AUC > 0.9 for only 17 of the 81 DEKOIS targets (11 targets out of 81 have AUC ≤ 0.8). The detailed results for each target are displayed in Table SI-1. One of the advantages of the SABREHWK approach is that the VS performance combining the 4D fingerprint and the novel HWK scoring function depended less on the screened targets, as already observed in our previous benchmark tests using the HWZ function with the 40 DUD targets.[17,18,29]

Figure 2

Comparison of the areas under the ROC curves (AUC) of the 81 DEKOIS databases using the SABREHWK and SABREHWZ scoring functions.

Analysis of the Enrichment Factor Using the SABREHWK Scoring Function

The efficiency of the SABREHWK scoring function was evaluated using the enrichment factor at 1% (EF1%), and the results were also compared to those using the SABREHWZ function (Figure 3 and Table SI-1). The average EF1% values for the 81 targets using the novel HWK and empirical HWZ score-based virtual screening were 21.8 ± 5.0 and 15.5 ± 5.9, respectively. The SABREHWK method performed more consistently resulting in an EF1% less than 10% for only two targets, whereas the results using the SABREHWZ method provided enrichment factors below 10% for 24 targets. Thus, the enrichments achieved with SABREHWK are considerably better than those obtained with the empirical HWZ scoring function, indicating that the novel scoring function was more efficient in identifying hits with notably different scaffolds compared to the query structure. Therefore, on the basis of the AUC and enrichment factor EF values, these results indicated that the novel HWK score demonstrated an improved and robust VS performance, albeit with the caveat that we used only 81 targets in this study.

Figure 3

Comparison of the Enrichment Factor EF at 1% of the 81 DEKOIS databases using the SABREHWK and SABREHWZ scoring functions.

Identification of Novel Inhibitors of MycP1 Protease

The SABRE program was generally applicable for ranking any bioactive scaffold classes with the exception of inactive decoys. The recognition of a wide variety of structurally different ligand classes was an important goal of our virtual screening strategy. The MycP1 protease represented a challenge for both ligand- and structure-based virtual-screening approaches. Indeed, only the crystal structure of the apo form of the enzyme was available, and the protein active site is relatively large, which decreased the probability of successfully identifying and ranking the correct pose of the screened ligands. The structures of MycP1 inhibitors that we previously reported were available, and visual analysis of their putative poses in the active site revealed that the binding mode of compound 1 was reasonable.[30] These compounds have chemical features that enabled SAR (structure–activity relationship) studies and generated a novel 4D fingerprint. For the purpose of SAR, an intuitive strategy for scaffold-hopping used the hit compound 1 as query and the HWKDock+ scoring function to rank hits with larger volume sizes from the NCI database. We constructed such a ranking of the best 1000 structures (top-1000) according to the docking-score function. The pharmacophore model reduced the number of these structures to a small subset of promising MycP1 lead candidates. The 135 hits derived from the NCI database were superimposed with the binding query (compound 1) and visually inspected. Forty molecules were selected from the hits and tested in vitro for inhibitory activity against MycP1. Notably, 9 compounds out of the 40 were able to inhibit MycP1 by more than 50% when added at 150 μM (Table 1 and Figure 4) and one compound showed an IC50 less than 100 μM. Compound 2 inhibited MycP1 activity in the low micromolar range with an IC50 of 76.8 μM and does not includes substructures described as Pan Assay Interference Compounds (PAINS).[64]

Table 1

Experimentally Determined Inhibitory Activity of the 13 Compounds Selected from the Virtual Screening

	compound	name	% inhibition at 150 μM	IC₅₀ (μM)a	PAINS filterc
NCI database	1 (query)	NSC-357905	73.0%	48.0b	pass
	2	NSC-67021	71.5%	76.8	pass
	3	NSC-270375	75.8%		pass
	4	NSC-67931	63.8%		pass
	5	NSC-356820	68.5%		fail
	6	NSC-206155	54.7%		pass
	7	NSC-614859	54.3%		pass
	8	NSC-207092	55.6%		fail
	9	NSC-111151	52.5%		fail
	10	NSC-641874	57.3%		pass
FDA database	11	Hydroxystilbamidine	79.1%	85.6	pass
	12	Diminazene	58.0%		fail
	13	Thiacetazone	80.1%		pass

Only IC50 < 100 μM are reported.

IC50 value according to ref (30).

Pan Assay Interference Compounds (PAINS) remover, see ref (64).

Figure 4

Structural scaffold of MycP1 inhibitors identified during the VS of the NCI database.

Structural scaffold of MycP1 inhibitors identified during the VS of the NCI database. Only IC50 < 100 μM are reported. IC50 value according to ref (30). Pan Assay Interference Compounds (PAINS) remover, see ref (64). The VS procedure identified these compounds based on their common chemical features (4D fingerprint) present in the subset of known active ligand structures and their fit in the binding pocket. The coefficients of the 4D fingerprint efficiently encoded the spatial distributions of pharmacophoric points providing the alignment of compounds relative to the binding site surfaces. Each point accounted for an important chemical feature such as hydrogen bond donors/acceptors and negative/positive charged groups. The basic physicochemical features of the known MycP1 compounds included the potential to establish hydrogen bonds as donors with Thr156 and Ser202, Glu203 and Thr333 residues. (Figure 5). Furthermore, analysis of the docking results revealed that the lead compound 2 fit well within the binding site cavity. The compound 2 formed hydrogen bonds with the Thr156, Thr333, Ser202, and Glu203 residues of MycP1. The results were in agreement with our previous docking studies pointing out Ser202 and Thr156 as key residues to stabilize the ligand scaffold in the MycP1 catalytic binding site.[30] A detailed analysis of the docking mode of the 11 compounds (Figure 4) revealed a close match between the pattern of hydrophobic and hydrogen bond donor pharmacophoric points of these hits compared to the pharmacophore model defined in our previous study.[30]

Figure 5

Stick view of the binding compound 2 (NSC-67021) and 11 (Hydroxystilbamidine) in the MycP1 active site.

Stick view of the binding compound 2 (NSC-67021) and 11 (Hydroxystilbamidine) in the MycP1 active site. As shown in Table 2, the VS procedure ranked 9 lead compounds at different cut-offs among the initial 1000 docked structures. Among the top-30, one lead compound was present, and this outcome corresponded to 11% coverage. Furthermore, the 9 lead compounds were among the top-300 of the filtered NCI database. These results highlight the merits of our 4D fingerprint VS approach when combined with the novel HWK scoring function. We also compared this simple approach to a complex approach including other likely query conformations. We modeled three plausible binding poses of the compound 1 (query) with different conformations in MycP1 cavity and redocked the top-1000 ligands using these three conformations, as shown in Table 2. The fusion approach markedly improved the percentage of retrieved lead compounds in the top-75 and further underscored the potential of the 4D fingerprint and HWKDock scoring VS procedure in the identification of lead compounds using the structure of unliganded receptor.

Table 2

Percent of Lead Compounds Recovered at Different Cutoffs of the Final Docked and Ranked Structures

cut off (top structures)	number of leadsa	% coverage of leadsb
30	1	11
75	4	44
150	5	55
300	9	100

Total lead structures = 9.

% coverage of leads = (number of leads in the top/total lead) × 100.

Total lead structures = 9. % coverage of leads = (number of leads in the top/total lead) × 100.

Assessment of the 4D Fingerprint and HWK Scoring Function for Drug Repurposing

The integration of this newly generated computational method, which combined the 4D fingerprint and the HWK scoring function with in vitro enzyme inhibition studies, was a useful approach for evaluating current drugs, already on the market for a particular therapeutic purpose as potential agents for treating TB. To demonstrate the applicability of this integrated virtual and experimental screening for drug-repurposing, we undertook the virtual screening of the FDA-approved drug database consisting of 1,217 compounds (corresponding to 3358 structures including tautomers) using the 4D fingerprints previously generated during NCI database screening. Hits were evaluated using the aforementioned MycP1 enzyme assay. In order to increase the structural diversity of the compounds identified by this process, we conducted the VS procedure three times using compounds 1, 8, and 9 as query for each VS round. The choice of the structural query was critical to the success of this approach. As described in Methods, the scaffold diversity depended on the selected HWK– or HWK+ or HWKT (Tanimoto) scoring equations, which also depended on the query size. Thus, compound 1, discovered in our previous work, was used as query since it has the highest affinity to MycP1. The compounds 8 (larger volume than that of compound 1) and 9 (smaller volume than that of compound 1) were selected based on their differential volumes and structural diversity compared to the structure of 1. The goal of our screen was to find hits with diverse structural scaffolds and comparable volume sizes to the queries. Thus, the resulting docked ligands of the FDA database were ranked using the HWKDockTanimoto scoring function (eq 19). Since the screening process of the NCI database and the benchmark test using the 4D fingerprint of SABRE program demonstrated high enrichment factor at 1% of the screened database, we visually inspected the binding mode of the best 30 structures (∼1%) identified within the FDA database. We focused, in particular, on four compounds based on their high HWKDockTanimoto docking score and their interactions with the key residues (Thr156 and Ser202) of the MycP1 binding cavity. These four compounds were chosen for in vitro inhibition assays, and three out of the four selected compounds exhibited more than 50% inhibition of MycP1 when used at 150 μM (Table 1), which validated the merits of our VS approach. Finally, the active compounds were filtered for Pan Assay Interference Compounds (PAINS) (Table 1) and showed that the hydroxystilbamidine scaffold may be used as starting structure for further optimization. An analysis of the three ranked FDA approved drugs showed that the compounds 11, 12, and 13 were ranked by SABRE near the top at positions 2, 24, and 3 out of 1217. The high score for compounds 11 and 13 was attributable to the HWKDockTanimoto scoring function, which took into account the ligand similarity as well as the optimal ligand/receptor pharmacophore model (eq 19). Out of the three, compound 11 had the greatest effect, with an IC50 of 85.6 μM, whereas the two other leads had IC50 > 100 μM. In addition, we noted the low structural similarity between the three identified FDA compounds (Figure 6). As observed for the other leads, compound 11 formed hydrogen bonds with the Thr156, Glu203, and Thr333 residues of the MycP1 active site (Figure 5). Interestingly, compound 11 is typically used as a histochemical stain to understand the distribution and localization of biomarkers,[65] and these results suggested that it or its analogs may be repurposed for inhibition of MycP1. More importantly, these preliminary findings show that the SABRE algorithm with HWK scoring provides an efficient means for the identification of new uses for current drugs and encourages us to pursue the applicability of methodology in drug repurposing strategy for other medically relevant drug targets.

Figure 6

Structural scaffolds of MycP1 inhibitors identified during the VS of the FDA database. The percentage of inhibition and IC50 are displayed.

Assessment of Lead Scaffold Diversity

Published data suggested counting hits only when the chemotype of a molecule is not equal to a template chemotype or any other chemotype that already exists in the hit list.[66] This approach resulted in a chemotype enrichment that emphasizes discovery of ligands with different chemotype properties. We assessed the novelty of the confirmed 12 hits by comparing their structural similarities with a “simple 2D descriptor”.[67] We computed the pairwise similarity index using the molecular access system MACCS structural keys (MACCS, 166 bits) of our 13 compounds (query +12 leads) and represented the structural diversity using the heat map (Figure 7). The MACCS similarity indexes were calculated using Openbabel.[68] The map visualizes 15 × 15 = 225 pairwise comparisons and was color-coded by similarity values ranging from red (low similarity value) to dark blue (high similarity value). We observed only two lobes in dark blue consistent with high similarity between the compounds (MACCS index > 0.8) and most of the compounds were dissimilar. This result supported the increased structural diversity (MACCS index < 0.6) of the new lead compounds using the combined 4D fingerprint and HWK scoring function. Considering the high degree of substructure encoded in each VS round, it was not surprising that the 4D fingerprint algorithm performed well at finding diverse chemotypes.

Figure 7

Heat map of the MACCS similarity index for the 13 compounds (12 leads + query).

Conclusion

We report a rational method for the design of novel scoring function HWK and validated its performance using a large number of targets from the DEKOIS database. The VS approach test using the 4D fingerprint and the HWK scoring function provided high enrichment factors in detecting active compounds at early stage of the 81 screened databases. We validated the efficiency of the combined 4D fingerprint and HWK scoring function in scaffold-hopping strategy through the identification of nine novel lead compounds in a short hit list from the VS of the NCI database. The result of the VS round ranked these compounds in the top-300 of the database, and one of them displayed an IC50 comparable to that of the reference structure. In the absence of new drugs for infectious diseases like TB, it made sense to develop a VS strategy capable of exploring databases of current drugs used to treat diseases other than infectious diseases and potentially repurpose some of them for TB treatment.[9−11] The merit of this approach lies in the obvious point that these commercially available drugs lack significant toxicity or side-effects.[12−14] To test this notion, the screening of the FDA database using our screening approach identified three FDA-approved compounds as potential lead structures. One of these compounds displayed an IC50 of 85.6 μM against MycP1 protease. The distributions of pairwise structural similarities presented in the heat map revealed that the 13 lead compounds resulting from the VS of NCI and FDA databases were structurally diverse. In summary, this study represents the comprehensive quantification of VS approach for scaffold-hopping and drug repurposing and provides a solid strategy for the discovery of new classes of MycP1 inhibitors.

58 in total

Review 1. Ligand identification for G-protein-coupled receptors: a lead generation perspective.

Authors: Konrad H Bleicher; Luke G Green; Rainer E Martin; Mark Rogers-Evans
Journal: Curr Opin Chem Biol Date: 2004-06 Impact factor: 8.822

2. Comparative evaluation of 3D virtual ligand screening methods: impact of the molecular alignment on enrichment.

Authors: David Giganti; Hélène Guillemain; Jean-Louis Spadoni; Michael Nilges; Jean-François Zagury; Matthieu Montes
Journal: J Chem Inf Model Date: 2010-06-28 Impact factor: 4.956

3. Identifying off-target effects and hidden phenotypes of drugs in human cells.

Authors: Marnie L MacDonald; Jane Lamerdin; Stephen Owens; Brigitte H Keon; Graham K Bilter; Zhidi Shang; Zhengping Huang; Helen Yu; Jennifer Dias; Tomoe Minami; Stephen W Michnick; John K Westwick
Journal: Nat Chem Biol Date: 2006-05-07 Impact factor: 15.040

4. Analysis of pharmacology data and the prediction of adverse drug reactions and off-target effects from chemical structure.

Authors: Andreas Bender; Josef Scheiber; Meir Glick; John W Davies; Kamal Azzaoui; Jacques Hamon; Laszlo Urban; Steven Whitebread; Jeremy L Jenkins
Journal: ChemMedChem Date: 2007-06 Impact factor: 3.466

5. How to optimize shape-based virtual screening: choosing the right query and including chemical information.

Authors: Johannes Kirchmair; Simona Distinto; Patrick Markt; Daniela Schuster; Gudrun M Spitzer; Klaus R Liedl; Gerhard Wolber
Journal: J Chem Inf Model Date: 2009-03 Impact factor: 4.956

6. A Computer Program for Classifying Plants.

Authors: D J Rogers; T T Tanimoto
Journal: Science Date: 1960-10-21 Impact factor: 47.728

7. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays.

Authors: Jonathan B Baell; Georgina A Holloway
Journal: J Med Chem Date: 2010-04-08 Impact factor: 7.446

8. Conformer generation with OMEGA: learning from the data set and the analysis of failures.

Authors: Paul C D Hawkins; Anthony Nicholls
Journal: J Chem Inf Model Date: 2012-11-12 Impact factor: 4.956

9. Knowledge-based virtual screening of HLA-A*0201-restricted CD8+ T-cell epitope peptides from herpes simplex virus genome.

Authors: Jianjun Bi; Huilan Yang; Huacheng Yan; Rengang Song; Jianyong Fan
Journal: J Theor Biol Date: 2011-04-22 Impact factor: 2.691

10. A unique Mycobacterium ESX-1 protein co-secretes with CFP-10/ESAT-6 and is necessary for inhibiting phagosome maturation.

Authors: Junjie Xu; Olli Laine; Mark Masciocchi; Joanna Manoranjan; Jennifer Smith; Shao Jun Du; Nathan Edwards; Xiaoping Zhu; Catherine Fenselau; Lian-Yong Gao
Journal: Mol Microbiol Date: 2007-10-01 Impact factor: 3.501

3 in total

1. Peptide Inhibitors Targeting the Neisseria gonorrhoeae Pivotal Anaerobic Respiration Factor AniA.

Authors: Aleksandra E Sikora; Robert H Mills; Jacob V Weber; Adel Hamza; Bryan W Passow; Andrew Romaine; Zachary A Williamson; Robert W Reed; Ryszard A Zielke; Konstantin V Korotkov
Journal: Antimicrob Agents Chemother Date: 2017-07-25 Impact factor: 5.191

Review 2. Advances in the Development of Shape Similarity Methods and Their Application in Drug Discovery.

Authors: Ashutosh Kumar; Kam Y J Zhang
Journal: Front Chem Date: 2018-07-25 Impact factor: 5.221

Review 3. Decoys Selection in Benchmarking Datasets: Overview and Perspectives.

Authors: Manon Réau; Florent Langenfeld; Jean-François Zagury; Nathalie Lagarde; Matthieu Montes
Journal: Front Pharmacol Date: 2018-01-24 Impact factor: 5.810

3 in total