Literature DB >> 36120069

Discovery of Novel Inhibitors of Bacterial DNA Gyrase Using a QSAR-Based Approach.

Ritu Jakhar¹, Alka Khichi¹, Dev Kumar¹, Mehak Dangi¹, Anil Kumar Chhillar².

Abstract

Type II topoisomerases like DNA gyrase initiate ATP-dependent negative supercoils in bacterial DNA. It is critical in all of the bacteria but is missing from eukaryotes, making it a striking target for antibacterials. Ciprofloxacin is a clinically approved drug, but its clinical effectiveness is affected by the emergence of resistance in both Gram-positive and Gram-negative bacteria. Thus, it is vital to identify novel compounds that can efficiently inhibit DNA gyrase, and quantitative structure-activity relationship (QSAR) modeling is a quick and economical means to do so. A QSAR-based virtual screening approach was applied to identify new gyrase inhibitors using an in-house-generated combinatorial library of 29828 compounds from seven ciprofloxacin scaffold structures. QSAR was built using a data set of 271 compounds, which were identified as positive and negative inhibitors from existing data reported in in vitro studies. The best QSAR model was developed using the 5-fold cross-validation Neural Network in Orange, and it was based on five PaDEL descriptors with an accuracy and sensitivity of 83%. As a result of screening of an in-house-built combinatorial library with the best-developed QSAR model, 675 compounds were identified as potential inhibitors of DNA gyrase. These inhibitors were further docked with DNA gyrase using AutoDock to compare the binding mode and score of the selected/screened compounds, and 615 compounds exhibited a docking score comparable to or lower than that of ciprofloxacin. Out of these, the top five analogues 902b, 9699f, 4419f, 5538f, and 898b reported in our study have binding scores of -13.81, -12.95, -12.52, -12.43, and -12.41 kcal/mol, respectively. The MD simulations of these five analogues for 100 ns supported the interaction stability of analogues with Escherichia coli DNA gyrase. Ninety-one per cent of the analogues screened by the QSAR model displayed better binding energy than ciprofloxacin, demonstrating the efficacy of the generated model. The NN-QSAR model proposed in this manuscript can be downloaded from https://github.com/ritu225/NN-QSAR_model.git.

Entities: Chemical

Year: 2022 PMID： 36120069 PMCID： PMC9476201 DOI： 10.1021/acsomega.2c04310

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Antimicrobial resistance (AMR) caused by mutations in microbes leads to less effective drugs and has emerged as a major global public health crisis. Globally in 2019, a total of 1.27 million deaths were directly reported due to bacterial drug resistance.[1] Recent worrying estimates expect that AMR can cost up to US$100 trillion in healthcare, and the deaths may increase to 10 million lives annually by 2050.[2] Antimicrobial clinical limitations like hypersensitivity,[3,4] hepatotoxicity,[5] low serum level,[6] nephrotoxicity,[7] multiple drug–drug interactions,[8] and high rates of clinical failures[9] as well as large numbers of death due to infectious diseases highlight the vital need to develop novel drugs. DNA gyrase is an important antimicrobial drug target[10,11] that resolves DNA topological problems in bacteria.[12] It is a tetrameric complex consisting of two GyrA and GyrB subunits that participate in the catalytic introduction of negative supercoils at the cost of ATP hydrolysis. To date, fluoroquinolone antimicrobial agents are the sole direct inhibitors of DNA replication by inhibiting DNA gyrase and topoisomerase IV. Ciprofloxacin is a second-generation fluoroquinolone, which inhibits DNA gyrase and exhibits remarkable antibacterial activity.[13] It is a wide-ranging antibacterial drug that is moderately susceptible to Gram-positive and highly effective against Gram-negative bacteria. It has been approved by the Food and Drug Administration (FDA) for the treatment of sexually transmitted diseases like chancroid and gonorrhea, respiratory tracts, genitourinary, anthrax, skin, bone and joint infections, typhoid fever, prostatitis, gastrointestinal infections, plague, and salmonellosis.[14] The clinical effectiveness of ciprofloxacin is affected due to the emergence of resistance in Gram-positive and Gram-negative bacteria. In a study conducted between 2016 and 2017, overall 15.3% samples of Klebsiella pneumonia and Escherichia coli showed resistance to ciprofloxacin.[15] Despite the approval of many fluoroquinolones, there has been a persistent effort to discover new quinolones to overcome the emergence of bacterial resistance.[16] As the development of a pioneer drug is uncertain since its therapeutic use is not clinically validated,[17] analogues of ciprofloxacin were generated to overrule a significant uncertainty from the overall risk of success. Many researchers have successfully developed a quantitative structure–activity relationship (QSAR) model to predict potential inhibitors of bacterial DNA gyrase.[18−24] QSAR is a theoretical approach that correlates molecular structure to biological activity to help in designing and predicting hypothetical drug candidates.[23,24] It is one of the strong ligand-based techniques to predict crucial features for the design of highly active congeneric compounds responsible for tackling the emergence of multidrug-resistant bacteria.[25−27] Abdel-Aziz et al.[18] generated a two-dimensional (2D)-QSAR model using the VLife Molecular Design Suite from 22 different bulky arenesulfonamido derivatives using norfloxacin and ciprofloxacin as scaffolds. They used the multiple linear regression (MLR) method of analysis to derive the 2D-QSAR equations, and the biological activity was taken as a dependent variable, whereas the descriptors were considered independent variables. The analysis of 2D-QSAR models provided details on the fine relationship linking structure and activity and offered clues for structural modifications that could improve the activity. El-Gamal et al.[20] formulated QSAR models based on molecular descriptors computed from the Calculate Molecular Properties protocol for quinoline-3-carbonitrile derivatives. The model was generated to explore the structural requirements controlling antimicrobial activities of the synthesized compounds against Bacillus subtilis. Almost all of the reported QSAR models for bacterial DNA gyrase are regression models. In this article, we have developed a novel QSAR model to predict putative inhibitors of DNA gyrase. Our approach combined various statistical methods to choose the most informative subset of molecular descriptors for the generation of the QSAR model. Additionally, our QSAR model is a classification model, which helps achieve an upfront interpretation of the predictions. The QSAR model was implemented to screen the combinatorial library. It is one of the most potent virtual screening methods due to its high throughput and reasonable hit rate.[28] To develop a QSAR model, a large and diverse data set of compounds mainly inhibiting the Gram-negative DNA GyrA subunit was identified through literature searches. Further, several molecular descriptors were computed and relevant descriptors were retained through the feature selection technique. Finally, a dual approach incorporating screening of the in-house-built combinatorial library first by the NN-QSAR model and then by the molecular docking studies was used for precise prediction of novel DNA Gyrase inhibitors.

Results and Discussion

Feature Selection

To develop a classification QSAR model, a data set of a total 271 compounds was prepared through literature searches; out of these, 157 compounds are included in the positive data set and 114 in the negative data set on the basis of the IC50 values reported during in vitro studies (Table S1). The PaDEL, RDKit, and CDK tools generated a large number of molecular descriptors that were used for the development of the QSAR classification model. However, the number of descriptors must be condensed to prevent overfitting of the data and enhance the sensitivity and specificity of the developed model. This was achieved by incorporating six different feature selection methods using TANAGRA[29] and Orange,[30] which are open-source machine learning tools available online. Three methods, namely, ANOVA, chi-square, and information gain ratio & Gini decrease from Orange, and another three methods, namely, Fisher filtering, runs filtering, and StepDisc from TANAGRA, were used for feature reduction. The reduced features from these feature selection methods generated satisfactory models, with the accuracy ranging between 63 and 83%. However, the best model statistics were observed for PaDEL descriptors reduced using the StepDisc method and was selected for further study (Table ).

Table 1

Fivefold Cross-Validation Statistics, Feature Selection Method, and Descriptor Type of All of the Models with AUC Greater than 0.85

model	descriptors	features	precision	accuracy	AUC
neural network	PaDEL	StepDisc	0.87	0.83	0.91
naive bays	PaDEL	StepDisc	0.87	0.8	0.88
naive bays	RDKit	StepDisc	0.85	0.76	0.83
SVM	PaDEL	StepDisc	0.83	0.8	0.87
random forest	PaDEL	StepDisc	0.83	0.79	0.87
logistic regression	PaDEL	StepDisc	0.82	0.79	0.89
logistic regression	RDKit	StepDisc	0.81	0.75	0.86
neural network	RDKit	StepDisc	0.8	0.77	0.86
random forest	RDKit	StepDisc	0.79	0.76	0.87

No relevant CDK descriptor was obtained by runs filtering, StepDisc, and ANOVA selection methods. Since different methods were used to select relevant features, it was likely that different feature sets were obtained to proceed with the model development. However, autocorrelation descriptors like ATSC6c were picked by all except runs filtering and AATSC6c was picked by most of the orange selection methods (Table ).

Table 2

List of Most Relevant Descriptors Retrieved after the Application of Different Feature Selection Methods

TANAGRA			Orange
runs filtering	StepDisc	Fisher filtering	information gain ratio & Gini decrease	ANOVA	Chi-square
RDKit Descriptors
Energy	NumAmideBonds	NumAmideBonds	NumSaturatedHeterocycles	MQN40	NumAmideBonds
LabuteASA	Slogp_VSA7	NumSaturatedHeterocycles	NumAmideBonds	MQN39	NumSaturatedHeterocycles
TPSA	MQN2	NumUnspecifiedSterocenters	MQN41	MQN38	NumSterocenters
AMW	Smr_VSA5	NumAliphaticHeterocycles	Slogp_VSA8	MQN34	MQN41
exactMW	MQN42	NumSaturatedRings	MQN35	MQN23	NumUnspecifiedSterocenters
PaDEL Descriptors
AlogP	ATSC6c	ATSC6c	ATSC6c	ATSC6c	ATSC6c
AlogP2	AATSC5e	AATSC6c	AATSC6c	AATSC6c	AATSC6c
AMR	AATSC4m	AATS8e	ATSC1c	ATSC5c	ATSC5c
Apol	ATSC7c	ATSC5c	ATS7s	ATSC6s	ATSC6s
naAromAtom	AATS8i	ATSC8s	ATSC8c	ATSC6e	ATSC6e
CDK Descriptors
	VP-2	GRAV-4	VC-6		Wlamb...unity
		ECCEN	SC-6		Wnu1.unity
		GRAV-1	SC-4		SC-6
		GRAVH-1	Wnu1.unity		MDEC-34
		MOMI-Z	Wlamb...unity		VC-6

The autocorrelation descriptors encode both molecular and physicochemical properties attributed to atoms as vectors. These descriptors conceal the relative position of atoms or atom properties by calculating the separation between atom pairs in terms of the number of bonds. The sums of all values calculated for a given small molecule are collected in a histogram.[31] Atom properties can be added with a coefficient, which is the product of atom properties for each pair, like ATSC6c is weighted by charge, AATSC5e is weighted by Sanderson electronegativities, AATSC4m by mass, ATSC7c lag7 weighted by charges, and AATS8i lag8 weighted by the first ionization potential. Several models were also developed using a combination of attributes selected from different feature selection methods. However, the accuracy of almost all such models was below 60%, so this strategy was abandoned.

Model Performance

Six supervised learning classifiers—decision tree, SVM, random forest, neural network, naïve Bayes, and logistic regression—were used to develop QSAR models for each selected descriptor set. The models were tested for external predictivity by the test and score widget in Orange, in which the data set was randomly split to 30% test set and 70% was utilized for training models. Model performance varied drastically among different descriptors and models, run filtering PaDEL descriptors performed worst for all of the classifiers, and CDK descriptors’ performance was average. In comparison, RDKit and PaDEL StepDisc descriptors were among the best-performing models. The RDKit and PaDEL StepDisc reduced descriptors were assessed using a more powerful 5-fold cross-validation step. The model generated by Orange’s neural network (NN) module, with PaDEL descriptors selected using TANAGRA’s StepDisc method, ReLu activation function, and the Adam optimizer, produced the best model. The confusion matrix and output statistics of the top model are depicted in Table and Figure , respectively. The best model statistics are reasonably good, with a precision of 87%, specificity of 82%, and accuracy and sensitivity of 83%. The misclassification error of the NN model was also low (0.13 on training data and 0.17 for 5-fold cross-validation); the lower the value of the misclassification error, the better the classification model. As observed in the confusion matrix (Table ), there are no major variations in the TP, TN, FP, and FN values, assuring that the developed NN model is well generalized and there is no overfitting.

Table 3

Tabulations of the Output of the Neural Network (a) on the Training Set and (b) 5-Fold Cross-Validation

(a)
confusion matrix
	prediction
actual	positive	negative	sum
positive	64	13	77
negative	12	100	112
sum	76	113	189

Figure 1

Bar graph representing the various evaluation parameters of the NN model for the training set, test set, and 5-fold cross-validation.

Y-Randomization Test and Cross-Correlation Matrix

It is evident from the y-scrambling test that the performance of the NN-QSAR model was worse after the randomization of the activity variable. The accuracy was 56% in the case of the randomized data set, whereas it was more than 80% for the genuine model. Other parameters were also reduced after the randomization, e.g., the area under the curve (AUC) was 47%, specificity was 31%, and precision was 60% (Figure S1). These results confirm that the NN-QSAR model was not the result of mere chance. In Table , the correlation matrix between the descriptors selected by the StepDisc method is shown. From the table, it can be deduced that there is no strong cross-correlation between the descriptors of the top selected model, and it also confirms their independence and high significance of the model.

Table 4

Cross-Correlation Data of the Selected Descriptors for the Selected NN-QSAR Model

	ATSC6c	AATSC5e	AATSC4m	ATSC7c	AATSC8i
ATSC6c	1.000	–0.216	–0.366	–0.607	–0.437
AATSC5e		1.000	–0.313	+0.015	+0.260
AATSC4m			1.000	+0.277	+0.123
ATSC7c				1.000	+0.145
AATS8i					1.000

QSAR Model Predictions for an In-House-Built Combinatorial Library

Combinatorial chemistry assisted in the design of a virtual library of ciprofloxacin derivatives. The process involved designing a ciprofloxacin scaffold by substituting cyclopropyl, propyl, fluoroethyl, Me, t-Bu, CF3, and 2,4-difluorophenyl at the N-1 position.[32] The structures of ciprofloxacin templates and predefined substitution sites R1, R2, and R3 are shown in Figure . The substitution for R1 was considered keeping in mind that aromatic amines or bulky groups are favorable,[33,34] whereas smaller group substitution was done at R2 and R3 positions.[35−37] Using the virtual combinatorial library tool by cheminfo.org, 29828 analogues were designed to create the virtual library, and after ADMET profiling using the swissADME tool, 2550 analogues were sorted out for further screening (Table ). These compounds were further filtered through the NN-classification-based QSAR model; 675 analogues were predicted to show inhibitory activity against the DNA gyrase A subunit.

Figure 7

Scaffold structures used to generate the combinatorial library.

Table 5

Number of Analogues Designed from Seven Different Scaffold Structures and the Number of Analogues Retained after ADMET Profiling, NN-QSAR Screening, and Docking

scaffold	no. of analogues	no. of analogues after ADMET	positive predicted analogues	docking
a	4941	490	95	84
b	4404	791	65	57
c	4953	459	180	164
d	3696	188	100	93
e	3672	301	137	124
f	4415	178	59	58
g	3747	143	39	35
total	29828	2550	675	615

Validation of the Docking Protocol

The cocrystallized ligand JHN was redocked to the E. coli DNA gyrase to validate the docking protocol. JHN was redocked with a root-mean-square deviation (RMSD) of 1.42 Å (Figure ).

Figure 2

Structure depicting the molecular docking of E. coli DNA gyrase nucleoprotein to the JHN ligand. (a) Chains A and C of the DNA binding and cleavage domain (PDB id: 6RKU) in complex with the crystallized ligand (yellow) and its redocked pose (blue). (b) Zoom-in view of the active site labeled with interacting residues.

Molecular Docking Studies of the Combinatorial Library

Docking of ciprofloxacin and compounds of the combinatorial library within the active site of the E. coli DNA gyrase (PDB code: 6RKU) was done using Autodock version 4.2. The binding energies of ciprofloxacin and positive and negative screened analogues with gyrase are depicted using the scatter plot (Figures and S2). Out of the positively predicted analogues, 615 analogues having binding energy less than the standard drug ciprofloxacin (−6.77 kcal/mol) were identified as potential lead compounds for gyrase inhibition. Binding energy, chemical structure, and binding interactions of ciprofloxacin and the top 5 hits are also shown in Table and Figure , respectively. The top five scoring analogues of each scaffold are also predicted as positive by the NN-QSAR model. (Figure S1).

Figure 3

Table 6

Chemical Structure, Binding Energy, and Interacting Residues of Ciprofloxacin and Its Top 5 Analogues Discovered on the Basis of Dual Screening through the NN-QSAR Model and Docking Studiesa

Note: The last alphabet (A–H) following the residue number in interacting residues represents the chain Id as per the PDB convention.

Figure 4

Three-dimensional (3D) orientation of ciprofloxacin (yellow) and its top scoring analogues (green) are illustrated (i–vi). Two-dimensional view of the binding interaction of ciprofloxacin and the top 5 hits with the E. coli DNA gyrase (PDB id: 6RKU). (A, i) Ciprofloxacin, (B, ii) 902b, (C, iii) 9699f, (D, iv) 4419f, and (E, v) 5538f, and (F, vi) 898b. Hydrogen bonds are shown by the pink arrow, hydrophobic interactions by the green spline line, π–π interactions by green lines, Pi–cation by the red line, and the salt bridge by the red-blue line.

Scatter plot for the binding energy of ciprofloxacin analogues designed using seven different scaffold structures (a–g) and docked to E. coli DNA gyrase. Top 5 analogues on the basis of the binding affinity score are labeled. Three-dimensional (3D) orientation of ciprofloxacin (yellow) and its top scoring analogues (green) are illustrated (i–vi). Two-dimensional view of the binding interaction of ciprofloxacin and the top 5 hits with the E. coli DNA gyrase (PDB id: 6RKU). (A, i) Ciprofloxacin, (B, ii) 902b, (C, iii) 9699f, (D, iv) 4419f, and (E, v) 5538f, and (F, vi) 898b. Hydrogen bonds are shown by the pink arrow, hydrophobic interactions by the green spline line, π–π interactions by green lines, Pi–cation by the red line, and the salt bridge by the red-blue line. Note: The last alphabet (A–H) following the residue number in interacting residues represents the chain Id as per the PDB convention. It is obvious from the binding scores that analogues 902b, 9699f, 4419f, 5538f, and 898b exhibited a higher affinity to gyrase than ciprofloxacin and all of the library analogues. 902b exhibited a 2-fold increase in binding affinity to gyrase than ciprofloxacin (−6.77 kcal/mol). However, the other four displayed a 1.8-fold increase in the binding score as compared to the reference inhibitor ciprofloxacin. The binding interactions between these top five analogues and gyrase reveal that the inhibitory mechanism of analogues is similar to that of other fluoroquinilones like ciprofloxacin; it binds noncovalently at the DNA–enzyme interface in the cleavage-ligation active site.[38−42] Ciprofloxacin inhibits E. coli DNA gyrase by forming a bridge and H-bond with Asp82C; the complex is also stabilized by the base stacking interaction with DA17G. In 902b, the amino groups at R1 and R3 form a H-bond and salt bridge with Asp82C, respectively, while the amino group at R2 forms a H-bond with adenylate. The complex is further stabilized by polar interaction with Ser83A and hydrophobic interaction with Met120A and Ala119A; also, the nitro group in the pyrazine ring forms a Pi–cation interaction with DT16G. Similarly, 9699f forms a H-bond with Asp82C through the thiol group substituted at R3. It forms base stacking interactions between DA17G and DA17H of DNA and the naphthalene rings of 9699f. DT16H of DNA forms a pi–cation interaction with the amino group at R3, stabilizing the complex. 898b forms a H-bond and a pi–cation interaction with adenine and thymine of DNA. The amino group at R2 forms a H-bond and a salt bridge with Asp82A and Asp82C. It also forms hydrophobic interactions with Val71C and Met120C. All of the complexes form H-bond and pi–cation interaction with DNA, while only 9699f forms base stacking interactions. 4419f binds to the gyrase via formation of a H-bond network between the amino group at R1, R2, and R3 to Asp82C, DT16G, and Asp82A, respectively. The amino group at R2 forms a pi–cation interaction, and the R3 amino forms a salt bridge with Asp82A. However, the chlorobenzene at R3 forms a hydrophobic interaction with Ala67A, Val70A, Gly71A, Ile74A, Ala67C, Val70C, Gly71C, Ile74C, Met120A, and Met120C. 5538f is situated in a hydrophobic pocket of gyrase created by Ala67A, Val70A, Gly71A, Ile74, Ile74C, Gly71C, Val70C, Ala67C, Met120A, and Met120C. The amino groups at R1, R2, and R3 form a H-bond with Asp82A, Da17G, and Asp82C, respectively; R3 amino also forms a salt bridge with Asp82C and DG15G; however, the R2 amino interacts with DA17H via a pi–cation interaction.

Molecular Dynamic Simulation Analysis

A detailed MD simulation study was performed to obtain insights into the interaction pattern and binding mode of DNA GyrA, ciprofloxacin, and its top five screened analogues. It was carried out using Desmond software for a duration of 100 ns.[43] Since docking studies only provide a static view of interactions between the ligand and the protein, MD simulations were executed to analyze the dynamic behavior of the interactions. From the simulations, the root-mean-square deviation (RMSD) of Cα and the corresponding ligands during the productive phase relative to its starting structure are shown in Figure . It was observed that 9699f had the most fluctuations and the peak was raised up to 2.9 Å, 5538f at 2.3 Å, and 898b Å at 2.1 Å, whereas ciprofloxacin, 902b, and 4419f peaked to 1.1, 1.2, and 1.8 Å, respectively. The 898b analogue forms a stable complex with a rise up of 1.1 Å until 56 ns; a slight fluctuation between 56 and 57 ns was observed, and after 57 ns, it maintained a peak at an average of 1.6 Å. Even though 9699f peaks at 2.3 Å, there are no major variations in the RMSD plot, and it maintained an average RMSD of 2 Å. Initially, the 5538f complex had more fluctuations until 30 ns; once it was equilibrated, it remained consistent throughout the simulations.

Figure 5

Graphs as generated during molecular dynamic simulations: (A) RMSD-Cα atoms and (B) RMSD ligands.

Graphs as generated during molecular dynamic simulations: (A) RMSD-Cα atoms and (B) RMSD ligands. As shown in Figure , during MD simulation, ciprofloxacin and 902b formed contacts mainly with Asp82A, Asp82C, Ser83A, and Ser83C. Moreover, 9699f has two amino groups at R3 and it forms a H-bond with Asp82A and Asp82C; the carbonyl oxygen at R1 constantly maintained the interactions with Asp82C. Similarly, in 4419f, the amino group at R1 and R3 interacted via a H-bond with Asp82A and Asp82C, while the residues Val70A, Ile74A, Met120A, Val70C, and Ile74C are involved in hydrophobic interactions. However, in the case of 5538f, in addition to the H-bond, the carbonyl and amino groups at R1 formed water bridges with Asp82A, Asp82C, Gly81C, and Arg121A. In the 898b complex, the amino group at R2 and the hydroxyl group at R3 persistently maintained the interaction with Asp82A and Ala117A. The other residues Ser83A, Asp87A, Ser116A, Ala117A, and Asp82C also favored contacts consistently throughout the simulation. The simulation results suggested that the ligands can properly bind at the active site of the GyrA subunit. The MD simulation analysis also exhibited amino acid interactions, namely, Asp82A, Asp82C, Ser83A, Ser83C, and a few hydrophobic residues Val70A, Ile70A, Ala119A, Ile74C, and Met120A to stabilize the protein–ligand complex. The simulation studies demonstrate that the residues Asp82A and Asp82C are important for H-bond interaction.

Figure 6

Desmond MD-calculated ligand–protein interactions at the binding site of GyrA of E. coli (6RKU). (A) Ciprofloxacin, (B) 902b, (C) 9699f, (D) 4419f, (E) 5538f, and (F) 898b.

Conclusions

DNA gyrase is a crucial enzyme, catalyzing the complex reactions of DNA supercoiling in prokaryotes, making it an important target for antibacterial drugs. To discover novel inhibitors for DNA gyrase, we generated a classification-based QSAR model. A data set assembled by literature searches was identified as inhibitors and noninhibitors of bacterial DNA gyrase based on the outcome of in vitro studies. This data set was used to develop a QSAR classification model. The classifiers were created using a broad range of molecular descriptors from PaDEL, CDK, and RDKit. Even though all of the reduced descriptor sets produced satisfactory models, the PaDEL descriptors consistently resulted in the best models. ATSC6c and AATSC6c are the most popular descriptors picked by almost every feature selection method. During QSAR model development, over 90 models were developed, but the NN model based on five PaDEL descriptors performed the best with the curated data set. The best model performed reasonably well with an accuracy of 83%, precision of 87%, sensitivity of 83%, and specificity of 82%. The QSAR model was used to screen an in-house-built library of 2550 analogues; 675 hit candidates were predicted with the potential to inhibit bacterial DNA gyrase. The molecular docking studies suggested 615 analogues with a better binding affinity than ciprofloxacin. In particular, 902b was the most potent gyrase inhibitor predicted (binding affinity = −13.81 kcal/mol). This study suggested novel inhibitors of bacterial DNA gyrase through QSAR screening; the docking studies suggested that the selected analogues could act as promising inhibitors of E. coli DNA gyrase. After the dual approach of screening using the NN-QSAR model and docking, the top five inhibitors of DNA gyrase reported are 902b, 9699f, 4419f, 5538f, and 898b. These analogues were subjected to MD simulations to observe their binding stability. With an average RMSD of 1.4 Å (902b-0.7 Å, 9699f-2 Å, 4419f-1.5 Å, 5538f-1.7 Å, and 898b-1.1 Å), all of the protein–ligand complexes displayed promising binding stability, with 902b and 898b displaying the highest stability with RMSD values even below 1.4 Å for the whole simulation process. The binding energy and simulation studies suggest that among the analyzed analogues compound 902b can be considered the most promising candidate against DNA gyrase. The NN-QSAR model reported in this manuscript can be downloaded and used from GITHUB through the provided link https://github.com/ritu225/NN-QSAR_model.git.

Methods

Generation of the DNA Gyrase Data Set

A reasonable data set of 271 compounds targeting the E. coli DNA GyrA subunit[44,45] was built incorporating 157 positive and 114 negative compounds, observed from different reported in vitro studies. The compounds with reported MIC/IC50 values comparable to or lower than ciprofloxacin were included in the positive data set, and compounds with higher MIC/IC50 values populated the negative data set. The compounds NXL101 and AZD9742, despite being good inhibitors of bacterial DNA gyrase, were included in the negative data set due to associated cardiovascular safety risks.[46] In spite of the fact that the combinatorial library was generated from ciprofloxacin derivatives, the QSAR data set included derivatives of ciprofloxacin, norfloxacin, and NBTI (Novel Bacterial type II Topoisomerase Inhibitors) because the diversity of training data can result in models with the ability to capture or complement unique information.[47] The list of compounds and details of the data set are accessible from the Supporting Information (Table S1).

Curation of Chemical Structures

The 2D structures of the compounds included in the data set curated to develop the QSAR model were downloaded as .SDF (Structure Data File) files from the chemical database PubChem (https://pubchem.ncbi.nlm.nih.gov/) by text searches. ChemDraw Ultra 12.0[48] was used to draw the compounds retrieved from literature searches. Structures of 271 chemical compounds in the .SDF file format were curated in a folder, which was then used to calculate molecular descriptors.

Calculation of Molecular Descriptors

Three different sets of descriptors described the molecular structures of 271 compounds in the data set as follows. A total of 863 1D, 2D, and 3D PaDEL descriptors (http://www.yapcwsoft.com/dd/PaDELdescriptor/) were calculated using a standalone JAVA application.[49] RDKit calculated 127 molecular descriptors using KNIME 4.4 (Konstanz Information Miner), a free and open-source data analytics tool for integrated data access and mining, statistics, visualization, and reporting.[50] A set of 268 Chemistry Development Kit (CDK) descriptors was calculated by the CDK GUI cheminformatics toolkit (http://www.rguha.net/code/java/cdkdesc.html). It is an open-source tool to compute chemical information in SDF and SMI input format.[51] The descriptors were further manipulated with open-source machine learning and the data visualization tool Orange (https://orangedatamining.com/)[29] and then processed through TANAGRA (https://eric.univ-lyon2.fr/~ricco/tanagra/en/tanagra.html)[30]—a free data mining, statistical learning, and machine learning software.

Development of the QSAR Model

Feature Selection

Adding unnecessary features while training a model leads to reduced overall accuracy, increases the complexity, decreases the model’s generalization capability, and makes the model biased. Hence, feature selection is one of the essential steps while building a QSAR model. In Orange, the Rank widget has scoring methods like information gain ratio, Gini, ANOVA, and Chi-square for feature selection. Information gain (IG) measures the contribution ratio of the absence or presence of a descriptor to the correct classification of the compound. It assigns a maximum value to a descriptor if it is a good indicator for posting the compound to a valid class.[52,53] The Gini index is a measure for quantifying a descriptor’s ability to distinguish between classes. In a binary classification, it can take a maximum value of 0.5. The smaller the value of the Gini index, the more relevant the feature. Analysis of variance (ANOVA) is a statistical method and an analysis tool used to compare numerous means across different groups. In this test, those descriptors are removed that are independent of the target class. The chi-square method tests the relationship between the descriptor and class in feature selection. If the descriptor is independent of the class, the descriptor is discarded. Within TANAGRA, there are some tools for selecting the most significant descriptors. Three types of filtering methods were used in this study: Fisher filtering, runs filtering, and StepDisc (stepwise discrimination analysis). Fisher filtering is a univariate Fisher’s ANOVA ranking selection method. The descriptors were selected based on the filtering approach at a p-value of 0.001. It does not take into account the redundancy of the input attributes. Runs filtering ranks the descriptors according to their importance; the descriptors were selected at the 0.001 level of significance. StepDisc, a stepwise discriminant analysis method, was used by the forward search strategy with a 0.05 level of significance. In this approach, the most significant descriptor is added at each step and stops when there is no relevant descriptor left to add. Data preparation is crucial for any QSAR analysis. The descriptor value may range widely, so normalization’s goal is to convert descriptors to a similar scale. This improves the training stability and performance of the model. The preprocess widget in Orange was used for this purpose, and features were normalized to μ = 0 and σ2 = 1. Each set of reduced CDK, RDKit, and PaDEL descriptors prepared using Orange and Tanagra is listed in Table .

Supervised Learning Algorithms

QSAR seeks to find a significant relationship between molecular descriptors and biological activity, and based on this feature, a classification model was developed. Six different classifiers were applied to the reduced and normalized descriptors using Orange that are listed as follows. Decision tree. Support vector machines. Random forest. Neural network. Naive Bayes. Logistic regression.

Train-Test Module and Cross-Validation

The performance or accuracy of models for external predictivity was calculated using the test and score widget in Orange. The test on the test data method and 5-fold cross-validation was opted to validate the performance of the developed models. The cross-validation method splits the data 5-fold. The model is tested by holding out compounds from 1-fold at a time; the model is generated from the rest of the folds, and the activity for the held-out fold compounds is predicted for testing. This process is repeated for all of the folds. For the test on the test data method, the data was handed down from the data sampler widget; 70% of the data is selected as a training set, and the rest 30% as the test set. Test data is not used in the construction of models, but it is only used to evaluate the model’s performance.

Y-Randomization Test

A y-scrambling test was applied to examine the extent to which the classification models obtained were the result of mere chance. It was achieved by shuffling the activity label of the compounds of the data set, and the models were redeveloped following the same procedure as used for the selected models. The performance of the developed model was evaluated using the same metrics as for the selected ones. If the selected model does not produce random results, then the performance should be better than the models built with the shuffled data.

Model Performance Metrics

To compare and access the model’s performance, mainly accuracy, area under the curve (AUC), and precision parameters were used. Some other standard metrics were also usedwhere TN, FN, TP, and FP are the counts of true negative, false negative, true positive, and false positive compounds, respectively. Accuracy is defined as the number of correct predictions made as a ratio of all predictions made. Precision is the number of true positives divided by the total number of positive predictions. The values of various metrics were recorded for every model from the results of the test on test data exercise.

Screening of the Analogue Library Using the QSAR Model

Building of the Analogue Library of Ciprofloxacin

A combinatorial library for the scaffold structures in Figure was generated by a virtual combinatorial library tool provided by cheminfo.org (http://www.cheminfo.org/). The combinations used for R1, R2, and R3 are listed in Table S2 (Supporting Information). The SwissADME Web tool was used for the computation of key physicochemical, pharmacokinetic, drug-like, and related parameters for library compounds.

Molecular Docking

Molecular docking was performed to investigate whether the positively predicted analogues by an in-house-built QSAR model have better binding affinity toward E. coli DNA gyrase (PDB id 6RKU) as compared to the standard ciprofloxacin. In 6RKU, the polar hydrogen atoms and Kollman charges were apportioned by AutoDock tool version 1.5.6. SwissPDB viewer performed energy minimization of the receptor. Grid boxes with sizes of 30.0, 25.0, and 60.0 Å were generated and allocated at the center of the receptor-binding site using x, y, and z coordinates as 158.22, 158.38, and 148.77, respectively. Molecular docking was performed by AutoDock embedded in Raccoon software.[54] For docking, the combinatorial library was prepared by LigPrep’s ligand preparation protocol.[55] The protocol generated diverse tautomeric, ionization, and stereochemical variants of the analogues after energy minimization. Docked structures were visualized by Chimera and PyMOL.[56,57] The 2D schematic representation of the protein–ligand interaction was generated by the 2D sketcher panel Maestro (Figure ). Scaffold structures used to generate the combinatorial library.

Molecular Dynamic Simulations

The best-ranked conformations of the top 5 analogues and ciprofloxacin in complex with the 6RKU receptor were further assessed for conformational flexibility, dynamic behavior, and stability by MD simulation studies. The Desmond Schrodinger package was used for MD simulations, and the system for simulations was prepared by the “system builder” panel in Schrodinger.[43] The complexes were placed in an orthorhombic box sized 10 × 10 × 10 Å, filled with water molecules by means of TIP3P, and the OPLS3 force field was applied for MD calculations.[58,59] The system setup was neutralized by NaCl ions with the physiological concentration of monovalent ions at 0.15 M. To equilibrate the system, the NPT ensemble was generated under a constant pressure of 1.0325 bar, 300k temperature, and with a simulation time of 100 ns. Simulation quality analysis, simulation event analysis, and simulation interaction diagram package of Desmond were used to analyze the trajectory files.

40 in total

1. UCSF Chimera--a visualization system for exploratory research and analysis.

Authors: Eric F Pettersen; Thomas D Goddard; Conrad C Huang; Gregory S Couch; Daniel M Greenblatt; Elaine C Meng; Thomas E Ferrin
Journal: J Comput Chem Date: 2004-10 Impact factor: 3.376

2. Type IIA topoisomerase inhibition by a new class of antibacterial agents.

Authors: Benjamin D Bax; Pan F Chan; Drake S Eggleston; Andrew Fosberry; Daniel R Gentry; Fabrice Gorrec; Ilaria Giordano; Michael M Hann; Alan Hennessy; Martin Hibbs; Jianzhong Huang; Emma Jones; Jo Jones; Kristin Koretke Brown; Ceri J Lewis; Earl W May; Martin R Saunders; Onkar Singh; Claus E Spitzfaden; Carol Shen; Anthony Shillings; Andrew J Theobald; Alexandre Wohlkonig; Neil D Pearson; Michael N Gwynn
Journal: Nature Date: 2010-08-04 Impact factor: 49.962

3. Design, synthesis and antibacterial activity of fluoroquinolones containing bulky arenesulfonyl fragment: 2D-QSAR and docking study.

Authors: Alaa A-M Abdel-Aziz; Yousif A Asiri; Mohamed H M Al-Agamy
Journal: Eur J Med Chem Date: 2011-09-16 Impact factor: 6.514

4. Autocorrelation descriptor improvements for QSAR: 2DA_Sign and 3DA_Sign.

Authors: Gregory Sliwoski; Jeffrey Mendenhall; Jens Meiler
Journal: J Comput Aided Mol Des Date: 2015-12-31 Impact factor: 3.686

5. Studies on the antimicrobial properties of N-acylated ciprofloxacins.

Authors: Ryan Cormier; Whittney N Burda; Lacey Harrington; Jordan Edlinger; Karthik M Kodigepalli; John Thomas; Rebecca Kapolka; Glen Roma; Burt E Anderson; Edward Turos; Lindsey N Shaw
Journal: Bioorg Med Chem Lett Date: 2012-07-01 Impact factor: 2.823

6. Design, synthesis and biological evaluation of 4,5-dibromo-N-(thiazol-2-yl)-1H-pyrrole-2-carboxamide derivatives as novel DNA gyrase inhibitors.

Authors: Tihomir Tomašič; Matic Mirt; Michaela Barančoková; Janez Ilaš; Nace Zidar; Päivi Tammela; Danijel Kikelj
Journal: Bioorg Med Chem Date: 2016-11-01 Impact factor: 3.641

Discovery of Novel Inhibitors of Bacterial DNA Gyrase Using a QSAR-Based Approach.

Introduction

Results and Discussion

Feature Selection

Model Performance

Y-Randomization Test and Cross-Correlation Matrix

QSAR Model Predictions for an In-House-Built Combinatorial Library

Validation of the Docking Protocol

Molecular Docking Studies of the Combinatorial Library

Molecular Dynamic Simulation Analysis

Conclusions

Methods

Generation of the DNA Gyrase Data Set

Curation of Chemical Structures

Calculation of Molecular Descriptors

Development of the QSAR Model

Feature Selection

Supervised Learning Algorithms

Train-Test Module and Cross-Validation

Y-Randomization Test

Model Performance Metrics

Screening of the Analogue Library Using the QSAR Model

Building of the Analogue Library of Ciprofloxacin

Molecular Docking

Molecular Dynamic Simulations

1. UCSF Chimera--a visualization system for exploratory research and analysis.

2. Type IIA topoisomerase inhibition by a new class of antibacterial agents.

3. Design, synthesis and antibacterial activity of fluoroquinolones containing bulky arenesulfonyl fragment: 2D-QSAR and docking study.

4. Autocorrelation descriptor improvements for QSAR: 2DA_Sign and 3DA_Sign.

5. Studies on the antimicrobial properties of N-acylated ciprofloxacins.

6. Design, synthesis and biological evaluation of 4,5-dibromo-N-(thiazol-2-yl)-1H-pyrrole-2-carboxamide derivatives as novel DNA gyrase inhibitors.

Review 7. QSAR based therapeutic management of M. tuberculosis.

8. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility.

9. Clinical failures of appropriately-treated methicillin-resistant Staphylococcus aureus infections.

10. A critical analysis of the review on antimicrobial resistance report and the infectious disease financing facility.