Type II topoisomerases like DNA gyrase initiate ATP-dependent negative supercoils in bacterial DNA. It is critical in all of the bacteria but is missing from eukaryotes, making it a striking target for antibacterials. Ciprofloxacin is a clinically approved drug, but its clinical effectiveness is affected by the emergence of resistance in both Gram-positive and Gram-negative bacteria. Thus, it is vital to identify novel compounds that can efficiently inhibit DNA gyrase, and quantitative structure-activity relationship (QSAR) modeling is a quick and economical means to do so. A QSAR-based virtual screening approach was applied to identify new gyrase inhibitors using an in-house-generated combinatorial library of 29828 compounds from seven ciprofloxacin scaffold structures. QSAR was built using a data set of 271 compounds, which were identified as positive and negative inhibitors from existing data reported in in vitro studies. The best QSAR model was developed using the 5-fold cross-validation Neural Network in Orange, and it was based on five PaDEL descriptors with an accuracy and sensitivity of 83%. As a result of screening of an in-house-built combinatorial library with the best-developed QSAR model, 675 compounds were identified as potential inhibitors of DNA gyrase. These inhibitors were further docked with DNA gyrase using AutoDock to compare the binding mode and score of the selected/screened compounds, and 615 compounds exhibited a docking score comparable to or lower than that of ciprofloxacin. Out of these, the top five analogues 902b, 9699f, 4419f, 5538f, and 898b reported in our study have binding scores of -13.81, -12.95, -12.52, -12.43, and -12.41 kcal/mol, respectively. The MD simulations of these five analogues for 100 ns supported the interaction stability of analogues with Escherichia coli DNA gyrase. Ninety-one per cent of the analogues screened by the QSAR model displayed better binding energy than ciprofloxacin, demonstrating the efficacy of the generated model. The NN-QSAR model proposed in this manuscript can be downloaded from https://github.com/ritu225/NN-QSAR_model.git.
Type II topoisomerases like DNA gyrase initiate ATP-dependent negative supercoils in bacterial DNA. It is critical in all of the bacteria but is missing from eukaryotes, making it a striking target for antibacterials. Ciprofloxacin is a clinically approved drug, but its clinical effectiveness is affected by the emergence of resistance in both Gram-positive and Gram-negative bacteria. Thus, it is vital to identify novel compounds that can efficiently inhibit DNA gyrase, and quantitative structure-activity relationship (QSAR) modeling is a quick and economical means to do so. A QSAR-based virtual screening approach was applied to identify new gyrase inhibitors using an in-house-generated combinatorial library of 29828 compounds from seven ciprofloxacin scaffold structures. QSAR was built using a data set of 271 compounds, which were identified as positive and negative inhibitors from existing data reported in in vitro studies. The best QSAR model was developed using the 5-fold cross-validation Neural Network in Orange, and it was based on five PaDEL descriptors with an accuracy and sensitivity of 83%. As a result of screening of an in-house-built combinatorial library with the best-developed QSAR model, 675 compounds were identified as potential inhibitors of DNA gyrase. These inhibitors were further docked with DNA gyrase using AutoDock to compare the binding mode and score of the selected/screened compounds, and 615 compounds exhibited a docking score comparable to or lower than that of ciprofloxacin. Out of these, the top five analogues 902b, 9699f, 4419f, 5538f, and 898b reported in our study have binding scores of -13.81, -12.95, -12.52, -12.43, and -12.41 kcal/mol, respectively. The MD simulations of these five analogues for 100 ns supported the interaction stability of analogues with Escherichia coli DNA gyrase. Ninety-one per cent of the analogues screened by the QSAR model displayed better binding energy than ciprofloxacin, demonstrating the efficacy of the generated model. The NN-QSAR model proposed in this manuscript can be downloaded from https://github.com/ritu225/NN-QSAR_model.git.
Antimicrobial resistance
(AMR) caused by mutations in microbes
leads to less effective drugs and has emerged as a major global public
health crisis. Globally in 2019, a total of 1.27 million deaths were
directly reported due to bacterial drug resistance.[1] Recent worrying estimates expect that AMR can cost up to
US$100 trillion in healthcare, and the deaths may increase to 10 million
lives annually by 2050.[2] Antimicrobial
clinical limitations like hypersensitivity,[3,4] hepatotoxicity,[5] low serum level,[6] nephrotoxicity,[7] multiple drug–drug interactions,[8] and high rates of clinical failures[9] as well as large numbers of death due to infectious
diseases highlight the vital need to develop novel drugs.DNA
gyrase is an important antimicrobial drug target[10,11] that resolves DNA topological problems in bacteria.[12] It is a tetrameric complex consisting of two GyrA and GyrB
subunits that participate in the catalytic introduction of negative
supercoils at the cost of ATP hydrolysis. To date, fluoroquinolone
antimicrobial agents are the sole direct inhibitors of DNA replication
by inhibiting DNA gyrase and topoisomerase IV.Ciprofloxacin
is a second-generation fluoroquinolone, which inhibits
DNA gyrase and exhibits remarkable antibacterial activity.[13] It is a wide-ranging antibacterial drug that
is moderately susceptible to Gram-positive and highly effective against
Gram-negative bacteria. It has been approved by the Food and Drug
Administration (FDA) for the treatment of sexually transmitted diseases
like chancroid and gonorrhea, respiratory tracts, genitourinary, anthrax,
skin, bone and joint infections, typhoid fever, prostatitis, gastrointestinal
infections, plague, and salmonellosis.[14] The clinical effectiveness of ciprofloxacin is affected due to the
emergence of resistance in Gram-positive and Gram-negative bacteria.
In a study conducted between 2016 and 2017, overall 15.3% samples
of Klebsiella pneumonia and Escherichia coli showed resistance to ciprofloxacin.[15]Despite the approval of many fluoroquinolones,
there has been a
persistent effort to discover new quinolones to overcome the emergence
of bacterial resistance.[16] As the development
of a pioneer drug is uncertain since its therapeutic use is not clinically
validated,[17] analogues of ciprofloxacin
were generated to overrule a significant uncertainty from the overall
risk of success.Many researchers have successfully developed
a quantitative structure–activity
relationship (QSAR) model to predict potential inhibitors of bacterial
DNA gyrase.[18−24] QSAR is a theoretical approach that correlates molecular structure
to biological activity to help in designing and predicting hypothetical
drug candidates.[23,24] It is one of the strong ligand-based
techniques to predict crucial features for the design of highly active
congeneric compounds responsible for tackling the emergence of multidrug-resistant
bacteria.[25−27] Abdel-Aziz et al.[18] generated
a two-dimensional (2D)-QSAR model using the VLife Molecular Design
Suite from 22 different bulky arenesulfonamido derivatives using norfloxacin
and ciprofloxacin as scaffolds. They used the multiple linear regression
(MLR) method of analysis to derive the 2D-QSAR equations, and the
biological activity was taken as a dependent variable, whereas the
descriptors were considered independent variables. The analysis of
2D-QSAR models provided details on the fine relationship linking structure
and activity and offered clues for structural modifications that could
improve the activity. El-Gamal et al.[20] formulated QSAR models based on molecular descriptors computed from
the Calculate Molecular Properties protocol for quinoline-3-carbonitrile
derivatives. The model was generated to explore the structural requirements
controlling antimicrobial activities of the synthesized compounds
against Bacillus subtilis.Almost
all of the reported QSAR models for bacterial DNA gyrase
are regression models. In this article, we have developed a novel
QSAR model to predict putative inhibitors of DNA gyrase. Our approach
combined various statistical methods to choose the most informative
subset of molecular descriptors for the generation of the QSAR model.
Additionally, our QSAR model is a classification model, which helps
achieve an upfront interpretation of the predictions.The QSAR
model was implemented to screen the combinatorial library.
It is one of the most potent virtual screening methods due to its
high throughput and reasonable hit rate.[28] To develop a QSAR model, a large and diverse data set of compounds
mainly inhibiting the Gram-negative DNA GyrA subunit was identified
through literature searches. Further, several molecular descriptors
were computed and relevant descriptors were retained through the feature
selection technique. Finally, a dual approach incorporating screening
of the in-house-built combinatorial library first
by the NN-QSAR model and then by the molecular docking studies was
used for precise prediction of novel DNA Gyrase inhibitors.
Results and Discussion
Feature Selection
To develop a classification
QSAR model, a data set of a total 271 compounds was prepared through
literature searches; out of these, 157 compounds are included in the
positive data set and 114 in the negative data set on the basis of
the IC50 values reported during in vitro studies (Table S1). The PaDEL, RDKit,
and CDK tools generated a large number of molecular descriptors that
were used for the development of the QSAR classification model. However,
the number of descriptors must be condensed to prevent overfitting
of the data and enhance the sensitivity and specificity of the developed
model. This was achieved by incorporating six different feature selection
methods using TANAGRA[29] and Orange,[30] which are open-source machine learning tools
available online. Three methods, namely, ANOVA, chi-square, and information
gain ratio & Gini decrease from Orange, and another three methods,
namely, Fisher filtering, runs filtering, and StepDisc from TANAGRA,
were used for feature reduction. The reduced features from these feature
selection methods generated satisfactory models, with the accuracy
ranging between 63 and 83%. However, the best model statistics were
observed for PaDEL descriptors reduced using the StepDisc method and
was selected for further study (Table ).
Table 1
Fivefold Cross-Validation Statistics,
Feature Selection Method, and Descriptor Type of All of the Models
with AUC Greater than 0.85
model
descriptors
features
precision
accuracy
AUC
neural network
PaDEL
StepDisc
0.87
0.83
0.91
naive bays
PaDEL
StepDisc
0.87
0.8
0.88
naive bays
RDKit
StepDisc
0.85
0.76
0.83
SVM
PaDEL
StepDisc
0.83
0.8
0.87
random forest
PaDEL
StepDisc
0.83
0.79
0.87
logistic regression
PaDEL
StepDisc
0.82
0.79
0.89
logistic regression
RDKit
StepDisc
0.81
0.75
0.86
neural network
RDKit
StepDisc
0.8
0.77
0.86
random forest
RDKit
StepDisc
0.79
0.76
0.87
No relevant
CDK descriptor was obtained by runs filtering,
StepDisc,
and ANOVA selection methods. Since different methods were used to
select relevant features, it was likely that different feature sets
were obtained to proceed with the model development. However, autocorrelation
descriptors like ATSC6c were picked by all except runs filtering and
AATSC6c was picked by most of the orange selection methods (Table ).
Table 2
List of Most Relevant Descriptors
Retrieved after the Application of Different Feature Selection Methods
TANAGRA
Orange
runs filtering
StepDisc
Fisher filtering
information
gain ratio & Gini decrease
ANOVA
Chi-square
RDKit Descriptors
Energy
NumAmideBonds
NumAmideBonds
NumSaturatedHeterocycles
MQN40
NumAmideBonds
LabuteASA
Slogp_VSA7
NumSaturatedHeterocycles
NumAmideBonds
MQN39
NumSaturatedHeterocycles
TPSA
MQN2
NumUnspecifiedSterocenters
MQN41
MQN38
NumSterocenters
AMW
Smr_VSA5
NumAliphaticHeterocycles
Slogp_VSA8
MQN34
MQN41
exactMW
MQN42
NumSaturatedRings
MQN35
MQN23
NumUnspecifiedSterocenters
PaDEL Descriptors
AlogP
ATSC6c
ATSC6c
ATSC6c
ATSC6c
ATSC6c
AlogP2
AATSC5e
AATSC6c
AATSC6c
AATSC6c
AATSC6c
AMR
AATSC4m
AATS8e
ATSC1c
ATSC5c
ATSC5c
Apol
ATSC7c
ATSC5c
ATS7s
ATSC6s
ATSC6s
naAromAtom
AATS8i
ATSC8s
ATSC8c
ATSC6e
ATSC6e
CDK
Descriptors
VP-2
GRAV-4
VC-6
Wlamb...unity
ECCEN
SC-6
Wnu1.unity
GRAV-1
SC-4
SC-6
GRAVH-1
Wnu1.unity
MDEC-34
MOMI-Z
Wlamb...unity
VC-6
The autocorrelation descriptors
encode both molecular
and physicochemical
properties attributed to atoms as vectors. These descriptors conceal
the relative position of atoms or atom properties by calculating the
separation between atom pairs in terms of the number of bonds. The
sums of all values calculated for a given small molecule are collected
in a histogram.[31] Atom properties can be
added with a coefficient, which is the product of atom properties
for each pair, like ATSC6c is weighted by charge, AATSC5e is weighted
by Sanderson electronegativities, AATSC4m by mass, ATSC7c lag7 weighted
by charges, and AATS8i lag8 weighted by the first ionization potential.Several models were also developed using a combination of attributes
selected from different feature selection methods. However, the accuracy
of almost all such models was below 60%, so this strategy was abandoned.
Model Performance
Six supervised
learning classifiers—decision tree, SVM, random forest, neural
network, naïve Bayes, and logistic regression—were used
to develop QSAR models for each selected descriptor set. The models
were tested for external predictivity by the test and score widget
in Orange, in which the data set was randomly split to 30% test set
and 70% was utilized for training models. Model performance varied
drastically among different descriptors and models, run filtering
PaDEL descriptors performed worst for all of the classifiers, and
CDK descriptors’ performance was average. In comparison, RDKit
and PaDEL StepDisc descriptors were among the best-performing models.
The RDKit and PaDEL StepDisc reduced descriptors were assessed using
a more powerful 5-fold cross-validation step.The model generated
by Orange’s neural network (NN) module, with PaDEL descriptors
selected using TANAGRA’s StepDisc method, ReLu activation function,
and the Adam optimizer, produced the best model. The confusion matrix
and output statistics of the top model are depicted in Table and Figure , respectively. The best model statistics are reasonably good,
with a precision of 87%, specificity of 82%, and accuracy and sensitivity
of 83%. The misclassification error of the NN model was also low (0.13
on training data and 0.17 for 5-fold cross-validation); the lower
the value of the misclassification error, the better the classification
model. As observed in the confusion matrix (Table ), there are no major variations in the TP,
TN, FP, and FN values, assuring that the developed NN model is well
generalized and there is no overfitting.
Table 3
Tabulations of the Output of the Neural
Network (a) on the Training Set and (b) 5-Fold Cross-Validation
(a)
confusion matrix
prediction
actual
positive
negative
sum
positive
64
13
77
negative
12
100
112
sum
76
113
189
Figure 1
Bar graph representing
the various evaluation parameters of the
NN model for the training set, test set, and 5-fold cross-validation.
Bar graph representing
the various evaluation parameters of the
NN model for the training set, test set, and 5-fold cross-validation.
Y-Randomization Test and Cross-Correlation
Matrix
It is evident from the y-scrambling test that the
performance of the NN-QSAR model was worse after the randomization
of the activity variable. The accuracy was 56% in the case of the
randomized data set, whereas it was more than 80% for the genuine
model. Other parameters were also reduced after the randomization,
e.g., the area under the curve (AUC) was 47%, specificity was 31%,
and precision was 60% (Figure S1). These
results confirm that the NN-QSAR model was not the result of mere
chance.In Table , the correlation matrix between the descriptors selected by the
StepDisc method is shown. From the table, it can be deduced that there
is no strong cross-correlation between the descriptors of the top
selected model, and it also confirms their independence and high significance
of the model.
Table 4
Cross-Correlation Data of the Selected
Descriptors for the Selected NN-QSAR Model
ATSC6c
AATSC5e
AATSC4m
ATSC7c
AATSC8i
ATSC6c
1.000
–0.216
–0.366
–0.607
–0.437
AATSC5e
1.000
–0.313
+0.015
+0.260
AATSC4m
1.000
+0.277
+0.123
ATSC7c
1.000
+0.145
AATS8i
1.000
QSAR Model Predictions
for an In-House-Built Combinatorial Library
Combinatorial chemistry assisted
in the design of a virtual library of ciprofloxacin derivatives. The
process involved designing a ciprofloxacin scaffold by substituting
cyclopropyl, propyl, fluoroethyl, Me, t-Bu, CF3, and 2,4-difluorophenyl
at the N-1 position.[32] The structures of
ciprofloxacin templates and predefined substitution sites R1, R2, and R3 are shown in Figure . The substitution for R1 was
considered keeping in mind that aromatic amines or bulky groups are
favorable,[33,34] whereas smaller group substitution
was done at R2 and R3 positions.[35−37] Using the virtual combinatorial
library tool by cheminfo.org, 29828 analogues were designed to create
the virtual library, and after ADMET profiling using the swissADME
tool, 2550 analogues were sorted out for further screening (Table ). These compounds
were further filtered through the NN-classification-based QSAR model;
675 analogues were predicted to show inhibitory activity against the
DNA gyrase A subunit.
Figure 7
Scaffold structures used to generate the combinatorial
library.
Table 5
Number of Analogues
Designed from
Seven Different Scaffold Structures and the Number of Analogues Retained
after ADMET Profiling, NN-QSAR Screening, and Docking
scaffold
no. of analogues
no. of analogues
after ADMET
positive
predicted analogues
docking
a
4941
490
95
84
b
4404
791
65
57
c
4953
459
180
164
d
3696
188
100
93
e
3672
301
137
124
f
4415
178
59
58
g
3747
143
39
35
total
29828
2550
675
615
Validation of the Docking
Protocol
The cocrystallized ligand JHN was redocked to the E. coli DNA gyrase to validate the docking protocol.
JHN was redocked with a root-mean-square deviation (RMSD) of 1.42
Å (Figure ).
Figure 2
Structure depicting the molecular docking of E.
coli DNA gyrase nucleoprotein to the JHN ligand. (a)
Chains A and C of the DNA binding and cleavage domain (PDB id: 6RKU) in complex with
the crystallized ligand (yellow) and its redocked pose (blue). (b)
Zoom-in view of the active site labeled with interacting residues.
Structure depicting the molecular docking of E.
coli DNA gyrase nucleoprotein to the JHN ligand. (a)
Chains A and C of the DNA binding and cleavage domain (PDB id: 6RKU) in complex with
the crystallized ligand (yellow) and its redocked pose (blue). (b)
Zoom-in view of the active site labeled with interacting residues.
Molecular Docking Studies
of the Combinatorial
Library
Docking of ciprofloxacin and compounds of the combinatorial
library within the active site of the E. coli DNA gyrase (PDB code: 6RKU) was done using Autodock version 4.2. The binding
energies of ciprofloxacin and positive and negative screened analogues
with gyrase are depicted using the scatter plot (Figures and S2). Out of the positively
predicted analogues, 615 analogues having binding energy less than
the standard drug ciprofloxacin (−6.77 kcal/mol) were identified
as potential lead compounds for gyrase inhibition. Binding energy,
chemical structure, and binding interactions of ciprofloxacin and
the top 5 hits are also shown in Table and Figure , respectively. The top five scoring analogues of each scaffold
are also predicted as positive by the NN-QSAR model. (Figure S1).
Figure 3
Scatter plot for the binding energy of
ciprofloxacin analogues
designed using seven different scaffold structures (a–g) and
docked to E. coli DNA gyrase. Top 5
analogues on the basis of the binding affinity score are labeled.
Table 6
Chemical
Structure, Binding Energy,
and Interacting Residues of Ciprofloxacin and Its Top 5 Analogues
Discovered on the Basis of Dual Screening through the NN-QSAR Model
and Docking Studiesa
Note: The last
alphabet (A–H)
following the residue number in interacting residues represents the
chain Id as per the PDB convention.
Figure 4
Three-dimensional (3D) orientation of ciprofloxacin (yellow)
and
its top scoring analogues (green) are illustrated (i–vi). Two-dimensional
view of the binding interaction of ciprofloxacin and the top 5 hits
with the E. coli DNA gyrase (PDB id: 6RKU). (A, i) Ciprofloxacin,
(B, ii) 902b, (C, iii) 9699f, (D, iv) 4419f, and (E, v) 5538f, and
(F, vi) 898b. Hydrogen bonds are shown by the pink arrow, hydrophobic
interactions by the green spline line, π–π interactions
by green lines, Pi–cation by the red line, and the salt bridge
by the red-blue line.
Scatter plot for the binding energy of
ciprofloxacin analogues
designed using seven different scaffold structures (a–g) and
docked to E. coli DNA gyrase. Top 5
analogues on the basis of the binding affinity score are labeled.Three-dimensional (3D) orientation of ciprofloxacin (yellow)
and
its top scoring analogues (green) are illustrated (i–vi). Two-dimensional
view of the binding interaction of ciprofloxacin and the top 5 hits
with the E. coli DNA gyrase (PDB id: 6RKU). (A, i) Ciprofloxacin,
(B, ii) 902b, (C, iii) 9699f, (D, iv) 4419f, and (E, v) 5538f, and
(F, vi) 898b. Hydrogen bonds are shown by the pink arrow, hydrophobic
interactions by the green spline line, π–π interactions
by green lines, Pi–cation by the red line, and the salt bridge
by the red-blue line.Note: The last
alphabet (A–H)
following the residue number in interacting residues represents the
chain Id as per the PDB convention.It is obvious from the binding scores that analogues
902b, 9699f,
4419f, 5538f, and 898b exhibited a higher affinity to gyrase than
ciprofloxacin and all of the library analogues. 902b exhibited a 2-fold
increase in binding affinity to gyrase than ciprofloxacin (−6.77
kcal/mol). However, the other four displayed a 1.8-fold increase in
the binding score as compared to the reference inhibitor ciprofloxacin.The binding interactions between these top five analogues and gyrase
reveal that the inhibitory mechanism of analogues is similar to that
of other fluoroquinilones like ciprofloxacin; it binds noncovalently
at the DNA–enzyme interface in the cleavage-ligation active
site.[38−42] Ciprofloxacin inhibits E. coli DNA
gyrase by forming a bridge and H-bond with Asp82C; the complex is
also stabilized by the base stacking interaction with DA17G. In 902b,
the amino groups at R1 and R3 form a H-bond and salt bridge with Asp82C,
respectively, while the amino group at R2 forms a H-bond with adenylate.
The complex is further stabilized by polar interaction with Ser83A
and hydrophobic interaction with Met120A and Ala119A; also, the nitro
group in the pyrazine ring forms a Pi–cation interaction with
DT16G. Similarly, 9699f forms a H-bond with Asp82C through the thiol
group substituted at R3. It forms base stacking interactions between
DA17G and DA17H of DNA and the naphthalene rings of 9699f. DT16H of
DNA forms a pi–cation interaction with the amino group at R3,
stabilizing the complex.898b forms a H-bond and a pi–cation
interaction with adenine
and thymine of DNA. The amino group at R2 forms a H-bond and a salt
bridge with Asp82A and Asp82C. It also forms hydrophobic interactions
with Val71C and Met120C. All of the complexes form H-bond and pi–cation
interaction with DNA, while only 9699f forms base stacking interactions.
4419f binds to the gyrase via formation of a H-bond network between
the amino group at R1, R2, and R3 to Asp82C, DT16G, and Asp82A, respectively.
The amino group at R2 forms a pi–cation interaction, and the
R3 amino forms a salt bridge with Asp82A. However, the chlorobenzene
at R3 forms a hydrophobic interaction with Ala67A, Val70A, Gly71A,
Ile74A, Ala67C, Val70C, Gly71C, Ile74C, Met120A, and Met120C. 5538f
is situated in a hydrophobic pocket of gyrase created by Ala67A, Val70A,
Gly71A, Ile74, Ile74C, Gly71C, Val70C, Ala67C, Met120A, and Met120C.
The amino groups at R1, R2, and R3 form a H-bond with Asp82A, Da17G,
and Asp82C, respectively; R3 amino also forms a salt bridge with Asp82C
and DG15G; however, the R2 amino interacts with DA17H via a pi–cation
interaction.
Molecular Dynamic Simulation
Analysis
A detailed MD simulation study was performed to
obtain insights into
the interaction pattern and binding mode of DNA GyrA, ciprofloxacin,
and its top five screened analogues. It was carried out using Desmond
software for a duration of 100 ns.[43] Since
docking studies only provide a static view of interactions between
the ligand and the protein, MD simulations were executed to analyze
the dynamic behavior of the interactions. From the simulations, the
root-mean-square deviation (RMSD) of Cα and the corresponding
ligands during the productive phase relative to its starting structure
are shown in Figure . It was observed that 9699f had the most
fluctuations and the peak was raised up to 2.9 Å, 5538f at 2.3
Å, and 898b Å at 2.1 Å, whereas ciprofloxacin, 902b,
and 4419f peaked to 1.1, 1.2, and 1.8 Å, respectively. The 898b
analogue forms a stable complex with a rise up of 1.1 Å until
56 ns; a slight fluctuation between 56 and 57 ns was observed, and
after 57 ns, it maintained a peak at an average of 1.6 Å. Even
though 9699f peaks at 2.3 Å, there are no major variations in
the RMSD plot, and it maintained an average RMSD of 2 Å. Initially,
the 5538f complex had more fluctuations until 30 ns; once it was equilibrated,
it remained consistent throughout the simulations.
Figure 5
Graphs as generated during
molecular dynamic simulations: (A) RMSD-Cα
atoms and (B) RMSD ligands.
Graphs as generated during
molecular dynamic simulations: (A) RMSD-Cα
atoms and (B) RMSD ligands.As shown in Figure , during MD simulation,
ciprofloxacin and
902b formed contacts mainly with Asp82A, Asp82C, Ser83A, and Ser83C.
Moreover, 9699f has two amino groups at R3 and it forms a H-bond with
Asp82A and Asp82C; the carbonyl oxygen at R1 constantly maintained
the interactions with Asp82C. Similarly, in 4419f, the amino group
at R1 and R3 interacted via a H-bond with Asp82A and Asp82C, while
the residues Val70A, Ile74A, Met120A, Val70C, and Ile74C are involved
in hydrophobic interactions. However, in the case of 5538f, in addition
to the H-bond, the carbonyl and amino groups at R1 formed water bridges
with Asp82A, Asp82C, Gly81C, and Arg121A. In the 898b complex, the
amino group at R2 and the hydroxyl group at R3 persistently maintained
the interaction with Asp82A and Ala117A. The other residues Ser83A,
Asp87A, Ser116A, Ala117A, and Asp82C also favored contacts consistently
throughout the simulation. The simulation results suggested that the
ligands can properly bind at the active site of the GyrA subunit.
The MD simulation analysis also exhibited amino acid interactions,
namely, Asp82A, Asp82C, Ser83A, Ser83C, and a few hydrophobic residues
Val70A, Ile70A, Ala119A, Ile74C, and Met120A to stabilize the protein–ligand
complex. The simulation studies demonstrate that the residues Asp82A
and Asp82C are important for H-bond interaction.
Figure 6
Desmond MD-calculated
ligand–protein interactions at the
binding site of GyrA of E. coli (6RKU).
(A) Ciprofloxacin, (B) 902b, (C) 9699f, (D) 4419f, (E) 5538f, and
(F) 898b.
Desmond MD-calculated
ligand–protein interactions at the
binding site of GyrA of E. coli (6RKU).
(A) Ciprofloxacin, (B) 902b, (C) 9699f, (D) 4419f, (E) 5538f, and
(F) 898b.
Conclusions
DNA gyrase is a crucial
enzyme, catalyzing the complex reactions
of DNA supercoiling in prokaryotes, making it an important target
for antibacterial drugs. To discover novel inhibitors for DNA gyrase,
we generated a classification-based QSAR model. A data set assembled
by literature searches was identified as inhibitors and noninhibitors
of bacterial DNA gyrase based on the outcome of in vitro studies. This data set was used to develop a QSAR classification
model. The classifiers were created using a broad range of molecular
descriptors from PaDEL, CDK, and RDKit. Even though all of the reduced
descriptor sets produced satisfactory models, the PaDEL descriptors
consistently resulted in the best models. ATSC6c and AATSC6c are the
most popular descriptors picked by almost every feature selection
method. During QSAR model development, over 90 models were developed,
but the NN model based on five PaDEL descriptors performed the best
with the curated data set. The best model performed reasonably well
with an accuracy of 83%, precision of 87%, sensitivity of 83%, and
specificity of 82%.The QSAR model was used to screen an in-house-built
library of 2550 analogues; 675 hit candidates were predicted with
the potential to inhibit bacterial DNA gyrase. The molecular docking
studies suggested 615 analogues with a better binding affinity than
ciprofloxacin. In particular, 902b was the most potent gyrase inhibitor
predicted (binding affinity = −13.81 kcal/mol). This study
suggested novel inhibitors of bacterial DNA gyrase through QSAR screening;
the docking studies suggested that the selected analogues could act
as promising inhibitors of E. coli DNA
gyrase. After the dual approach of screening using the NN-QSAR model
and docking, the top five inhibitors of DNA gyrase reported are 902b,
9699f, 4419f, 5538f, and 898b. These analogues were subjected to MD
simulations to observe their binding stability. With an average RMSD
of 1.4 Å (902b-0.7 Å, 9699f-2 Å, 4419f-1.5 Å,
5538f-1.7 Å, and 898b-1.1 Å), all of the protein–ligand
complexes displayed promising binding stability, with 902b and 898b
displaying the highest stability with RMSD values even below 1.4 Å
for the whole simulation process. The binding energy and simulation
studies suggest that among the analyzed analogues compound 902b can
be considered the most promising candidate against DNA gyrase. The
NN-QSAR model reported in this manuscript can be downloaded and used
from GITHUB through the provided link https://github.com/ritu225/NN-QSAR_model.git.
Methods
Generation of the DNA Gyrase
Data Set
A reasonable data set of 271 compounds targeting
the E. coli DNA GyrA subunit[44,45] was built incorporating 157 positive and 114 negative compounds,
observed from different reported in vitro studies.
The compounds with reported MIC/IC50 values comparable to or lower
than ciprofloxacin were included in the positive data set, and compounds
with higher MIC/IC50 values populated the negative data set. The compounds
NXL101 and AZD9742, despite being good inhibitors of bacterial DNA
gyrase, were included in the negative data set due to associated cardiovascular
safety risks.[46] In spite of the fact that
the combinatorial library was generated from ciprofloxacin derivatives,
the QSAR data set included derivatives of ciprofloxacin, norfloxacin,
and NBTI (Novel Bacterial type II Topoisomerase Inhibitors) because
the diversity of training data can result in models with the ability
to capture or complement unique information.[47] The list of compounds and details of the data set are accessible
from the Supporting Information (Table S1).
Curation of Chemical Structures
The
2D structures of the compounds included in the data set curated to
develop the QSAR model were downloaded as .SDF (Structure Data File)
files from the chemical database PubChem (https://pubchem.ncbi.nlm.nih.gov/) by text searches. ChemDraw Ultra 12.0[48] was used to draw the compounds retrieved from literature searches.
Structures of 271 chemical compounds in the .SDF file format were
curated in a folder, which was then used to calculate molecular descriptors.
Calculation of Molecular Descriptors
Three
different sets of descriptors described the molecular structures
of 271 compounds in the data set as follows.A total of 863 1D, 2D, and 3D PaDEL
descriptors (http://www.yapcwsoft.com/dd/PaDELdescriptor/) were calculated
using a standalone JAVA application.[49]RDKit calculated 127 molecular
descriptors
using KNIME 4.4 (Konstanz Information Miner), a free and open-source
data analytics tool for integrated data access and mining, statistics,
visualization, and reporting.[50]A set of 268 Chemistry
Development
Kit (CDK) descriptors was calculated by the CDK GUI cheminformatics
toolkit (http://www.rguha.net/code/java/cdkdesc.html). It is an open-source tool to compute chemical information in SDF
and SMI input format.[51]The descriptors were further manipulated with open-source
machine learning and the data visualization tool Orange (https://orangedatamining.com/)[29] and then processed through TANAGRA
(https://eric.univ-lyon2.fr/~ricco/tanagra/en/tanagra.html)[30]—a free data mining, statistical learning,
and machine learning software.
Development
of the QSAR Model
Feature Selection
Adding unnecessary
features while training a model leads to reduced overall accuracy,
increases the complexity, decreases the model’s generalization
capability, and makes the model biased. Hence, feature selection is
one of the essential steps while building a QSAR model.In Orange,
the Rank widget has scoring methods like information gain ratio, Gini,
ANOVA, and Chi-square for feature selection. Information gain (IG)
measures the contribution ratio of the absence or presence of a descriptor
to the correct classification of the compound. It assigns a maximum
value to a descriptor if it is a good indicator for posting the compound
to a valid class.[52,53] The Gini index is a measure for
quantifying a descriptor’s ability to distinguish between classes.
In a binary classification, it can take a maximum value of 0.5. The
smaller the value of the Gini index, the more relevant the feature.
Analysis of variance (ANOVA) is a statistical method and an analysis
tool used to compare numerous means across different groups. In this
test, those descriptors are removed that are independent of the target
class. The chi-square method tests the relationship between the descriptor
and class in feature selection. If the descriptor is independent of
the class, the descriptor is discarded.Within TANAGRA, there
are some tools for selecting the most significant
descriptors. Three types of filtering methods were used in this study:
Fisher filtering, runs filtering, and StepDisc (stepwise discrimination
analysis). Fisher filtering is a univariate Fisher’s ANOVA
ranking selection method. The descriptors were selected based on the
filtering approach at a p-value of 0.001. It does
not take into account the redundancy of the input attributes. Runs
filtering ranks the descriptors according to their importance; the
descriptors were selected at the 0.001 level of significance. StepDisc,
a stepwise discriminant analysis method, was used by the forward search
strategy with a 0.05 level of significance. In this approach, the
most significant descriptor is added at each step and stops when there
is no relevant descriptor left to add.Data preparation is crucial
for any QSAR analysis. The descriptor
value may range widely, so normalization’s goal is to convert
descriptors to a similar scale. This improves the training stability
and performance of the model. The preprocess widget in Orange was
used for this purpose, and features were normalized to μ = 0
and σ2 = 1. Each set of reduced CDK, RDKit, and PaDEL
descriptors prepared using Orange and Tanagra is listed in Table .
Supervised Learning Algorithms
QSAR seeks to find a
significant relationship between molecular descriptors
and biological activity, and based on this feature, a classification
model was developed. Six different classifiers were applied to the
reduced and normalized descriptors using Orange that are listed as
follows.Decision tree.Support
vector machines.Random forest.Neural network.Naive Bayes.Logistic regression.
Train-Test Module and
Cross-Validation
The performance or accuracy of models for
external predictivity
was calculated using the test and score widget in Orange. The test
on the test data method and 5-fold cross-validation was opted to validate
the performance of the developed models. The cross-validation method
splits the data 5-fold. The model is tested by holding out compounds
from 1-fold at a time; the model is generated from the rest of the
folds, and the activity for the held-out fold compounds is predicted
for testing. This process is repeated for all of the folds. For the
test on the test data method, the data was handed down from the data
sampler widget; 70% of the data is selected as a training set, and
the rest 30% as the test set. Test data is not used in the construction
of models, but it is only used to evaluate the model’s performance.
Y-Randomization Test
A y-scrambling
test was applied to examine the extent to which the classification
models obtained were the result of mere chance. It was achieved by
shuffling the activity label of the compounds of the data set, and
the models were redeveloped following the same procedure as used for
the selected models. The performance of the developed model was evaluated
using the same metrics as for the selected ones. If the selected model
does not produce random results, then the performance should be better
than the models built with the shuffled data.
Model Performance Metrics
To compare
and access the model’s performance, mainly accuracy, area under
the curve (AUC), and precision parameters were used. Some other standard
metrics were also usedwhere
TN, FN, TP, and FP are the counts of
true negative, false negative, true positive, and false positive compounds,
respectively.Accuracy is defined as the number of correct predictions
made as a ratio of all predictions made. Precision
is the number of true positives divided by the total number of positive
predictions. The values of various metrics were recorded for every
model from the results of the test on test data exercise.
Screening of the Analogue Library Using the
QSAR Model
Building of the Analogue Library of Ciprofloxacin
A combinatorial library for the scaffold structures in Figure was generated by a virtual combinatorial library tool provided
by cheminfo.org (http://www.cheminfo.org/). The combinations used for R1, R2, and R3 are listed in Table S2 (Supporting Information). The SwissADME
Web tool was used for the computation of key physicochemical, pharmacokinetic,
drug-like, and related parameters for library compounds.
Molecular Docking
Molecular docking
was performed to investigate whether the positively predicted analogues
by an in-house-built QSAR model have better binding
affinity toward E. coli DNA gyrase
(PDB id 6RKU) as compared to the standard ciprofloxacin. In 6RKU, the polar hydrogen
atoms and Kollman charges were apportioned by AutoDock tool version
1.5.6. SwissPDB viewer performed energy minimization of the receptor.
Grid boxes with sizes of 30.0, 25.0, and 60.0 Å were generated
and allocated at the center of the receptor-binding site using x, y, and z coordinates
as 158.22, 158.38, and 148.77, respectively. Molecular docking was
performed by AutoDock embedded in Raccoon software.[54] For docking, the combinatorial library was prepared by
LigPrep’s ligand preparation protocol.[55] The protocol generated diverse tautomeric, ionization, and stereochemical
variants of the analogues after energy minimization. Docked structures
were visualized by Chimera and PyMOL.[56,57] The 2D schematic
representation of the protein–ligand interaction was generated
by the 2D sketcher panel Maestro (Figure ).Scaffold structures used to generate the combinatorial
library.
Molecular
Dynamic Simulations
The best-ranked
conformations of the top 5 analogues and ciprofloxacin
in complex with the 6RKU receptor were further assessed for conformational
flexibility, dynamic behavior, and stability by MD simulation studies.
The Desmond Schrodinger package was used for MD simulations, and the
system for simulations was prepared by the “system builder”
panel in Schrodinger.[43] The complexes were
placed in an orthorhombic box sized 10 × 10 × 10 Å,
filled with water molecules by means of TIP3P, and the OPLS3 force
field was applied for MD calculations.[58,59] The system
setup was neutralized by NaCl ions with the physiological concentration
of monovalent ions at 0.15 M. To equilibrate the system, the NPT ensemble
was generated under a constant pressure of 1.0325 bar, 300k temperature,
and with a simulation time of 100 ns. Simulation quality analysis,
simulation event analysis, and simulation interaction diagram package
of Desmond were used to analyze the trajectory files.
Authors: Eric F Pettersen; Thomas D Goddard; Conrad C Huang; Gregory S Couch; Daniel M Greenblatt; Elaine C Meng; Thomas E Ferrin Journal: J Comput Chem Date: 2004-10 Impact factor: 3.376
Authors: Benjamin D Bax; Pan F Chan; Drake S Eggleston; Andrew Fosberry; Daniel R Gentry; Fabrice Gorrec; Ilaria Giordano; Michael M Hann; Alan Hennessy; Martin Hibbs; Jianzhong Huang; Emma Jones; Jo Jones; Kristin Koretke Brown; Ceri J Lewis; Earl W May; Martin R Saunders; Onkar Singh; Claus E Spitzfaden; Carol Shen; Anthony Shillings; Andrew J Theobald; Alexandre Wohlkonig; Neil D Pearson; Michael N Gwynn Journal: Nature Date: 2010-08-04 Impact factor: 49.962
Authors: Ryan Cormier; Whittney N Burda; Lacey Harrington; Jordan Edlinger; Karthik M Kodigepalli; John Thomas; Rebecca Kapolka; Glen Roma; Burt E Anderson; Edward Turos; Lindsey N Shaw Journal: Bioorg Med Chem Lett Date: 2012-07-01 Impact factor: 2.823
Authors: Garrett M Morris; Ruth Huey; William Lindstrom; Michel F Sanner; Richard K Belew; David S Goodsell; Arthur J Olson Journal: J Comput Chem Date: 2009-12 Impact factor: 3.376