Abdulmujeeb T Onawole1, Ibnelwaleed A Hussein1, Mohammed A Saad1,2, Musa E M Ahmed1, Hassan Nimir3. 1. Gas Processing Center, College of Engineering, Qatar University, Doha 2713, Qatar. 2. Chemical Engineering Department, College of Engineering, Qatar University, Doha 2713, Qatar. 3. Chemistry Department, College of Arts and Sciences, Qatar University, Doha 2713, Qatar.
Abstract
Sulfate-reducing bacteria (SRB), such as Desulfobacter postgatei are found in oil wells. However, they lead to the release of hydrogen sulfide. This in turn leads to the iron sulfide scale formation (pyrite). ATP sulfurylase is an enzyme present in SRB, which catalyzes the formation of adenylyl sulfate (APS) and inorganic pyrophosphatase (PPi) from ATP and sulfate. This reaction is the first among many in hydrogen sulfide production by D. postgatei . Consensus scoring using molecular docking and machine learning was used to identify three potential inhibitors of ATP sulfurylase from a database of about 40 million compounds. These selected hits ((S,E)-1-(4-methoxyphenyl)-3-(9-((m-tolylimino)methyl)-9,10-dihydroanthracen-9-yl)pyrrolidine-2,5-dione; methyl 2-[[(1S)-5-cyano-2-imino-1-(4-phenylthiazol-2-yl)-3-azaspiro[5.5]undec-4-en-4-yl]sulfanyl]acetate; and (4S)-4-(3-chloro-4-hydroxy-phenyl)-1-(6-hydroxypyridazin-3-yl)-3-methyl-4,5-dihydropyrazolo[3,4-b]pyridin-6-ol), known as A, B, and C, respectively) all had good binding affinities with ATP sulfurylase and were further analyzed for their toxicological properties. Compound A had the highest docking score. However, based on the physicochemical and toxicological properties, only compound C was predicted to be both safe and effective as a potential inhibitor of ATP sulfurylase, hence the preferred choice. The molecular interactions of compound C revealed favorable interactions with the following residues: LEU213, ASP308, ARG307, TRP347, LEU224, GLN212, MET211, and HIS309.
Sulfate-reducing bacteria (SRB), such as Desulfobacter postgatei are found in oil wells. However, they lead to the release of hydrogen sulfide. This in turn leads to the iron sulfide scale formation (pyrite). ATP sulfurylase is an enzyme present in SRB, which catalyzes the formation of adenylyl sulfate (APS) and inorganic pyrophosphatase (PPi) from ATP and sulfate. This reaction is the first among many in hydrogen sulfide production by D. postgatei . Consensus scoring using molecular docking and machine learning was used to identify three potential inhibitors of ATP sulfurylase from a database of about 40 million compounds. These selected hits ((S,E)-1-(4-methoxyphenyl)-3-(9-((m-tolylimino)methyl)-9,10-dihydroanthracen-9-yl)pyrrolidine-2,5-dione; methyl 2-[[(1S)-5-cyano-2-imino-1-(4-phenylthiazol-2-yl)-3-azaspiro[5.5]undec-4-en-4-yl]sulfanyl]acetate; and (4S)-4-(3-chloro-4-hydroxy-phenyl)-1-(6-hydroxypyridazin-3-yl)-3-methyl-4,5-dihydropyrazolo[3,4-b]pyridin-6-ol), known as A, B, and C, respectively) all had good binding affinities with ATP sulfurylase and were further analyzed for their toxicological properties. Compound A had the highest docking score. However, based on the physicochemical and toxicological properties, only compound C was predicted to be both safe and effective as a potential inhibitor of ATP sulfurylase, hence the preferred choice. The molecular interactions of compound C revealed favorable interactions with the following residues: LEU213, ASP308, ARG307, TRP347, LEU224, GLN212, MET211, and HIS309.
Sulfate-reducing
bacteria (SRB) have been a persistent problem
in the oil and gas industry as their presence in oil and gas reservoirs
are abundant. SRB are one of the main sources of hydrocarbons as one
of the predominant sources of sulfide and sulfur during the maturation
of oil reservoirs.[1,2] They help in sulfur formation
by the incomplete oxidation of sulfur. This sulfur in turn leads to
serious problems, such as iron sulfide (pyrite) scale formation that
hinders the injectivity of water injection wells, corrosion of iron,
and contamination of produced gas by generating hydrogen sulfide (H2S).[3−6] These problems sometimes lead to deaths during oil and gas production,
such as the fatalities caused by hydrogen sulfide poisoning during
offshore operations in the North Sea.[7] SRBs
have been tagged has the major perpetrators of microbial-influenced
corrosion.[8−10]SRBs operate mostly in anaerobic conditions
and use sulfate as
an electron acceptor during their metabolism of energy.[3] They are prevalent in water and land environments
as long as they are anoxic and such anoxic environments include oil
and natural gas wells.[11]D. postgatei is one of the common SRBs that are rod-like
or elliptical. Unlike other SRBs, D. postgatei converts acetate to carbon dioxide (CO2) and H2S[3,12] as described in eq .However, in the reduction of sulfate to H2S, some
enzymes
are involved (Table ), which make the reaction feasible. ATP sulfurylase catalyzed the
first reaction in the series.[13,14]
Table 1
Reactions Involved in Respiratory
Sulfate Reduction
reaction
enzyme involved
SO42– + ATP4– →
APS2– + PPi4–
ATP sulfurylase
PPi + H2O →
2 Pi
inorganic pyrophosphatase
APS + H+ + 2e– → HSO3– + AMP
APS reductase
HSO3– + 6H+ + 6e– →
HS– + 3H2O
bisulfite reductase
Existing solutions to alleviate the effects
of SRBs include the
use of biocides like aldehydes and amines, which are often hazardous
and mutagenic.[15] Also, dissolved oxygen
and nitrates facilitate an oxidizing environment. However, a few SRBs
thrive under aerobic conditions.[7,11,14,16] Also, heavy metals are used,
but they are mostly toxic.Recent works[17−19] have proffered
environmentally friendly formulations
that can be used in scale removal. These formulations include chelating
agents such as DTPA (diethylenetriamine pentaacetate) acid, which
removes oil field scales by chelating with the iron in the scale.
This is much better than using HCl, which exacerbates the situation
by producing the toxic hydrogen sulfide gas. However, these formulations
do not solve the problem from the origin. Hence, a root cause analysis
solution is imperative in preventing iron sulfide scale formation.
This could be achieved by eliminating SRBs present in oil and gas
reservoirs, which is the central idea of this work. The elimination
of SRBs is possible by inhibiting the enzymes responsible for sulfate
reduction present in SRB. Computational techniques such as virtual
screening using molecular docking and artificial intelligence techniques,
such as machine and/or deep learning, have been extensively used in
the pharmaceutical industry in discovering and designing novel compounds
that inhibit certain proteins to create a pharmacological effect.[20−24] Moreover, machine-learning tools have been suggested to give a synergistic
effect in finding improved hit rates.[25−27] This process can be
translated to oil field chemistry in finding a safer and longer lasting
solution to sulfur production by SRB. Moreover, these methods can
help in screening large databases of compounds and also predict their
toxicity properties, which can help as a guide in selecting safe compounds
that can be used as inhibitors of ATP sulfurylase.[28−30]There
is a dearth of literature using this approach in reducing
sulfur production in oil fields. A relatively recent work[31] screened about 15 compounds to find potential
inhibitors for Archaeoglobus fulgidus, which though not a bacterium but is also a sulfate-reducing
organism. However, the number of compounds screened was quite small
and this reduced the possibility of finding an effective compound
that will be able inhibit the enzymes responsible for sulfate reduction.
Herein, in this work, we use computer-aided drug design (CADD) techniques,
such as homology modeling, molecular docking, and machine learning,
to select hits via consensus scoring by screening 100,000 compounds
from a database of about 40 million compounds. Unlike the earlier
work mentioned that screened 15 compounds, this work screens 100,000
compounds from a large database of about 40 million compounds. The
selected hits from this work have the potential to inhibit ATP sulfurylase,
hence alleviating the threat of SRBs in iron sulfide scale formation
particularly those scales formed by pyrite. This will consequently
increase production in the oil and gas industry.
Results
and Discussion
Protein Structure and Validation
Homology modeling is a renowned tool in computer-aided drug design
that has been used in finding potential inhibitors for diseases.[32,33] The protein sequence of both the template protein (A. vinosum) and the target protein (D. postgatei) showed similarities (Figure )in their sequences using the
clustal format.[34] The structure was validated
using a Ramachandran plot (Figure ).
Figure 1
Protein sequence of ATP sulfurylase of D. postgatei (D.) and A. vinosum (A.) (PDB ID: 4DNX) in clustal format.
Similar residues in both organisms have the same color code in the
sequence.
Figure 2
(a) Tertiary structure and (b) the Ramachandran
plot of ATP sulfurylase
(D. postgatei). (Red: helix; yellow:
strand; white: coil; squares: proline; triangles: glycine; squares:
proline; circle: others).
Protein sequence of ATP sulfurylase of D. postgatei (D.) and A. vinosum (A.) (PDB ID: 4DNX) in clustal format.
Similar residues in both organisms have the same color code in the
sequence.(a) Tertiary structure and (b) the Ramachandran
plot of ATP sulfurylase
(D. postgatei). (Red: helix; yellow:
strand; white: coil; squares: proline; triangles: glycine; squares:
proline; circle: others).The homology-modeled structure (Figure a) depicted about 10 helices, which are connected
by strands and coils. With the aid of the Ramachandran plot as a validation
tool (Figure b). The
Ramachandran plot depicts the allowed values of ψ (psi) versus
φ (phi) angles corresponding to many confirmations, for a specific
amino acid.[35] The structure is made up
of 423 residues with 389, 27, and 7 being in the favored, allowed,
and outlier regions, respectively. The upper and lower left regions
of the Ramachandran plot correspond to the β-pleated and right-handed
α-helices, while the middle of the right region indicates the
left-handed α-helices. For a superior quality protein, it is
expected that the outlier percentage should not exceed 5%, while the
favored and allowed regions should have a combined total of more than
90%.[36−38] The seven residues in the outlier make up 1.7%, while
the favored and allowed regions both constitute 98.3%; hence, the
structure is of excellent quality and hence is reliable for molecular
docking.Besides the Ramachandran plot, the alignment RMSD (root-mean-square
deviation) is calculated for only the core residues of both the homology-modeled
and template proteins (Table S1). The RMSD
value is mostly within the recommended range of 0 to 1.2 Å for
most of the core residues.[39] Nevertheless,
the quality of the modeled protein can be analyzed by several parameters[40,41] (Table ). The score
is the alignment score that falls between 0 and the domain sequence
length. A score of 0 implies that the protein is of the worst quality;
hence, the higher the score the better the quality. The score value
is 363; hence, it implies it is of good quality. The P-value indicates the relative quality of the model such that the
smaller the P-value, the better the model quality.
A P-value that is less than 0.0001 is considered
a model of good quality. The P-value of the modeled
protein is 3.80 × 10–13, which implies a good
quality model. The uGDT (Global Distance Test) measures the absolute
model quality. For a protein with more than 100 residues, such as
the ATP sulfurylase of D. postgatei (423 residues), a uGDT value that is greater than 50 indicates a
good quality model,[42] which is correct
for the homology model (309). The uSeqID is the number of identical
residues in the alignment. A higher uSeqID value implies a better
model, which is also correct for this model (157).
Table 2
Model Parameters of the Homology-Modeled
Structure of ATP Sulfurylase of D. postgatei
parameter
value
score
363
P-value
3.80 × 10–13
uGDT
309
uSeqID
157
Docking Simulations
Using the vote
rank approach, the consensus scoring (Figures and 4) from the two
virtual screenings (molecular docking and machine learning) resulted
in the selection of three compounds as hits, namely, MCULE-798447760801
(A), MCULE-123221368002 (B), and MCULE-938606270701 (C), which have
the following IUPAC names: (S,E)-1-(4-methoxyphenyl)-3-(9-((m-tolylimino)methyl)-9,10-dihydroanthracen-9-yl)pyrrolidine-2,5-dione;
methyl 2-[[(1S)-5-cyano-2-imino-1-(4-phenylthiazol-2-yl)-3-azaspiro[5.5]undec-4-en-4-yl]sulfanyl]acetate;
and (4S)-4-(3-chloro-4-hydroxy-phenyl)-1-(6-hydroxypyridazin-3-yl)-3-methyl-4,5-dihydropyrazolo[3,4-b]pyridin-6-ol.
These compounds would henceforth be referred to as compounds A, B,
and C (Figure ). Compound
A was the 5th and 21st top-scored compound for AutoDock Vina and KDEEP, respectively, while compound B corresponds
to the 11th and 3rd top-scored compound for AutoDock Vina and KDEEP, respectively. Compound C was the 17th
and 20th top-scored compound for AutoDock Vina and KDEEP, respectively. All the selected compounds had good
binding free energies from the molecular docking score that corresponds
to −8.7, −8.4, and −8.2 kcal/mol for compounds
A, B, and C, respectively. The negative values of the docking score
implied that they all bound to the target protein with compound A
having the highest binding energy. The pKd values from KDEEP were 6.56, 7.03, and
6.58 for compounds A, B, and C, respectively, with compound B having
the highest pKd value (Tables S1 and S2, Supporting Information).
Figure 3
Top 5% normalized scored
compounds from AutoDock Vina (molecular
docking).
Figure 4
Top 5% normalized scored compounds from KDEEP (machine learning).
Figure 5
Selected
compounds A, B, and C from consensus scoring of two different
virtual screening methods.
Top 5% normalized scored
compounds from AutoDock Vina (molecular
docking).Top 5% normalized scored compounds from KDEEP (machine learning).Selected
compounds A, B, and C from consensus scoring of two different
virtual screening methods.
Binding Modes and Molecular Interactions of
Selected Compounds
The molecular docking provided insight
into the mode of binding and the molecular interactions of the selected
compounds in the protein target. Compound A showed many favorable
molecular interactions with the amino residues in the binding pocket
of the target protein (Figure ). They include hydrogen bonding with GLU287 and 212; alkyl
interactions with ALA310, LEU318, and TRP347; sulfur interaction with
MET280; pi-anion interaction with ASP308; and pi-pi interactions with
HIS221 and HIS309. Compound B also had the favorable interactions.
However, unlike compound A, it had one unfavorable interaction with
ALA310, which is due to the donor-donor interaction by the two hydrogen
atoms (Figure ). The
favorable interactions include hydrogen bonding with ARG307, GLN212,
and HIS309; sulfur interaction with HIS291; pi-pi interaction with
TRP347; and alkyl interaction with MET280 and TYR282. Like compound
B, compound C also has an unfavorable interaction due to the donor–donor
interaction between an oxygen atom and nitrogen atom from compound
C and ARG214, respectively (Figure ). However, its favorable
interactions include van der Waals interaction with LEU213; hydrogen
bonding interactions with ASP308, ARG307, and TRP347; pi-sigma interaction
with LEU224; pi-amide interaction with GLN212; and alkyl interactions
with MET211 and HIS309. The high number of favorable interactions
and the absence of unfavorable interaction in compound A is possibly
responsible for its highest binding energy (−8.7 kcal/mol)
compared to the other selected hits. The presence of GLN212, HIS309,
and TRP347 in all the molecular interactions of all three compounds
with the protein confirms the same binding pocket. To further validate
the homology model, the active sites of both the homology model (D. postgatei) and the template protein (A. vinosum) were compared with the aid of an active
site identification tool, AADS.[43] Though
it is important to note that the two proteins are of different species,
and they do not have the same number of residues; that is, the homology
model contains 423 residues, while the template protein has 396 residues.
Nevertheless, there are similarities. All the amino residues present
in all the binding pockets of the amino residues were also found in
the active site identification tool (GLN212, HIS309, and TRP347).
Hence, confirming that the binding pocket used was the true binding
site. The amino residues ARG201 and ARG214; HIS208 and HIS218; and
LEU233 and LEU244 were found at the same position of both the template
protein and homology-modeled protein, respectively (Table S4), further validating the structure of the homology
model (Figure ).
Figure 6
(a) Binding
mode and (b) molecular interactions of compound A.
Figure 7
(a) Binding mode and (b) molecular interactions of compound B.
Figure 9
Toxicity alerts are in red font for compounds
A and B.
Figure 8
(a) Binding mode and (b) molecular interactions of compound
C.
(a) Binding
mode and (b) molecular interactions of compound A.(a) Binding mode and (b) molecular interactions of compound B.(a) Binding mode and (b) molecular interactions of compound
C.Toxicity alerts are in red font for compounds
A and B.
Physicochemical
and Toxicological Properties
of Selected Compounds
The environmental safety of the use
of any of the selected hits is quite important as a compound may be
quite effective but due to its toxicological properties may be quite
unsafe. In silico methods have been predominantly used in predicting
the physicochemical and toxicological properties in the early stages
of discovering potential inhibitors as it is much faster and safer
and avoids unnecessary deaths of many animals that may be used for
tests.[29,44−48] Moreover, their physicochemical properties (Table ) that were determined
using MCULE[49] are often used for gaining
insight into their absorption, distribution, and properties. The renowned
Lipinski’s RO5 (molar mass and Log P values
should not exceed 500 g/mol and 5, respectively) means that the number
of hydrogen bond acceptors (HBA) and hydrogen bond donors (HBD) must
not be greater than 10 and 5, respectively; hence, the name RO5 is
due to the multiples of five being the maximum limit for the rules.[50] Only compound C did not violate any of the rules;
however, both compounds A and B had Log P values
greater than 5. The refractivity, which contributes to the absorption
and distribution properties of the compound, is expected to be between
40 and 130[51] of which only compound A exceeded
a value of 151.04. Another property, which contributes to the absorption
and distribution of a molecule, is the polar surface area (PSA) that
is expected to be less than 140 Å2[52] of which compound B exceeded with a value of 152.4. The
solubility of the selected hits was predicted using admetsar.[53] For a compound to be soluble, it is expected
to be between −1 and −5[54] of which all the selected hits qualify. Hence, it can be concluded
that only compound C has the most suitable physicochemical properties
required among the selected hits.
Table 3
Physicochemical Properties
of the
Selected Hits
property
A
B
C
mass
498.57
452.59
371.78
Log P
6.02
5.49
2.59
HBA
5
6
8
HBD
0
2
3
PSA
58.97
152.4
116.65
RO5 violations
1
1
0
refractivity
151.04
128.98
99.43
atoms
64
55
40
rings
7
4
4
heavy atoms
38
31
26
hydrogen
atoms
26
24
14
chiral centers
4
1
1
solubility
–4.01
–3.67
–3.18
The toxicity properties (Table ) were predicted using pkCSM webtool,[55] the toxicity checker,[49] and
a DL-AOT (deep learning-acute oral toxicity) prediction server.[56] The AMES toxicity refers to the mutagenicity
of the molecule, while the maximum recommended tolerated dose (MRTD)
signifies an estimation of the toxic dose threshold in humans. It
is recommended to be less than or equal to 0.477 log mg/kg/day. Fortunately,
all the selected hits are safe with regard to mutagenicity and MRTD.
The skin sensitization refers to the hazard involved if the compound
is applied dermally. The results show that all the selected hits are
safe. The LD50 (lethal dose) value for the oral rat acute
toxicity considers the toxic potency of a compound. It is the amount
of a compound that if given at once would cause the death of 50% of
a group of test animals. The DL-AOT webtool predicts the LD50 values in four categories, namely, (1) danger/poison, (2) warning,
(3) caution, and (4) none required/safe. The values for compounds
A, B, and C correspond to the warning, caution, and safe, respectively.
The minnow toxicity refers to the LC50 (lethal concentration)
value, which is the concentration of a compound required to cause
the death of 50% of a group of flathead minnows. A value below 0.5
log mM is regarded as high acute toxicity. Only compound C was predicted
to be safe with a value of 2.014. The toxicity checker[49] tool was used to deduce the toxicity alerts
in the selected hits. However, it depicted that the N—C=O
group in the imidazole ring and the methoxy group are the toxic alerts
for compound A (Figure ), while the ester group is the toxic alert for compound B.
Consequently, compound A could be modified using the toxic alerts
as a guide. The oxygen in the methoxy group could be replaced with
a nitrogen atom making it a tertiary amine. Also, the two carbonyl
groups attached to the imidazole ring should be replaced with any
halogen atom. Nevertheless, it showed no alerts for compound C confirming
it as a safe compound.
Table 4
Toxicological Properties
of the Selected
Hits
properties
A
B
C
units
AMES toxicity
No
No
No
categorical (Yes/No)
max. tolerated dose
(human)
0.302
–0.573
–0.151
numeric (log mg/kg/day)
skin sensitization
No
No
No
categorical
(Yes/No)
oral rat acute toxicity
(LD50)
2.77
2.97
3.16
log(mg/kg)
minnow toxicity
–2.262
–0.987
2.014
numeric (log mM)
Conclusions
The use of virtual screening
and applying a consensus scoring method,
which combines both molecular docking and machine learning, helped
in selecting three compounds as hits from a database of about 40 million
compounds. The molecular docking results showed that all the compounds
have negative binding energies with compound A having the highest
docking score. The molecular interactions revealed that the high binding
affinity observed in compound A is most likely due to the high number
of favorable interactions and the absence of unfavorable interactions,
which occurred in compounds B and C. However, based on the physicochemical
and toxicological properties, compound C is the best choice, as it
does not violate any of the Lipinski’s RO5 and other recommended
properties that relate to absorption and distribution properties.
Moreover, compound C was predicted to be the only safe compound with
respect to LD50 and LC50 values for both the
acute toxicity for both rats and flathead minnows. Nevertheless, compound
C can be modified to increase its binding affinity by replacing the
hydroxyl group, which is a hydrogen bond donor and is responsible
for the unfavorable interaction, with a carbonyl group that is a hydrogen
bond acceptor and hence would form a favorable interaction with ARG214.
Nevertheless, whatever modification(s) are made to compound C it is
important to take into consideration the effect it would have on its
physicochemical and toxicological properties. Summarily, when modifying
selected hit compounds, there is often a trade-off between increasing
binding affinity and acquiring good physicochemical and toxicological
properties. It is recommended that compound C should be validated
using molecular dynamics and in vitro experiments to confirm its inhibitory
activity against Desulfobacter postgatei (a prominent sulfate-reducing bacteria). A positive result would
reduce the level of sulfur produced in oil and gas wells, which will
subsequently prevent formation of iron sulfide scales such as pyrite.
These findings will help in directing future experimental research
on this subject. However, an important point to note is the stability
and compatibility of these potential inhibitors with other existing
chemicals, which are already in use for field applications in the
industry. This could pose as a challenge if the reactivity of these
compounds may lead to production of harmful substances when added
with other chemicals such as chelating agents used for field applications.
A possible way of solving such a challenge if it occurs is to further
modify this compound based on the structural activity relationships
such that it is able to carry out both activities of being an inhibitor
of SRB and a scale remover. The lesser the number of chemicals are
used in the field, the better, provided that such chemicals are multipurpose
in providing a safer and more economical solution.
Methods
Target Protein Preparation
Selecting
a protein target is the first and most crucial step in structure-based
virtual screening. Unfortunately, as there was no structure of ATP
sulfurylase in the protein databank (PDB), hence the structure was
designed using homology modeling with the aid of the Raptor program.[40,41] The sequence of the target protein was taken from the National Center
for Biotechnology Information (NCBI) GenBank database (https://www.ncbi.nlm.nih.gov/protein) with the accession number: EIM64703.[57,58] As homology models are dependent on the quality of an existing crystal
structure,[59] the model was based on the
template of the crystal structure of ATP sulfurylase of Allochromatium vinosum (PDB ID: 4DNX)[60,61] with a resolution of 1.6 Å, which implies that the resolution
is high as electron density maps derived from the X-ray crystallographic
data of a protein structure with a resolution of at most 2.0 Å
are deemed high-resolution structures.[62]
Virtual Screening
A blind docking
process was employed for the virtual screening as the structure is
based on a homology model. This process involves considering the whole
of the protein during the molecular docking process to find potential
binding sites and is used when a binding site is unknown.[63−65] The PyRx software[66] was used to determine
the binding parameters using the maximize option, which enclosed the
entire protein molecule, since the binding site is not known and blind
docking is being employed. This resulted in a binding site center
of 2.0027, −1.9107, and 24.2677 for the X, Y, and Z axes, respectively. The size of
the box for the docking was 100 × 100 × 100 Å.[3] These parameters ensured that the whole structure
was covered during the virtual screening process. The homology-modeled
structure was virtually screened against 100,000 compounds randomly
chosen from the MCULE database,[49] which
consists of precisely 39,884,964 compounds at the time of this work.
To select the 100,000 compounds randomly, the basic properties option
was used. This option filters chemical compounds from the database
on the basis of the following conditions: The filtered compounds should
not violate more than one of the Lipinski’s rule of 5 (RO5);[50,67,68] the compounds should not exceed
a maximum of 10 rotatable bonds, a maximum of five halogen atoms,
a maximum of five chiral centers; and a minimum of 10 heavy atoms
and a minimum of one aromatic ring.[69] The
criteria used for filtering 100,000 compounds from the MCULE database
are based on common properties found in drug-like compounds. This
criteria of filtering ensured that the selected compounds had the
desirable physicochemical properties, which are important during absorption
and toxicity studies and have been used in our previous works on finding
potential inhibitors against the Zika and Ebola virus diseases.[70,71] The filtration is brought to a halt after finding the first 100,000
compounds from the MCULE database that met this criteria. AutoDock
Vina was used for the molecular docking of the filtered compounds
with the target protein.[72] Only the top-scored
500 compounds were kept from the virtual screening.
Consensus Scoring
Consensus scoring
is a process of combining of two or more scoring algorithms to rank
the binding affinities of ligands to a target protein. Consensus scoring
has been known to reduce the rate of false positives[73−76] while also serving as a validation tool for the binding affinities.
Hence, the use of consensus scoring has led to improved hit rates.
Machine learning has proved to be a useful tool in calculating binding
affinities in a much shorter time than molecular docking.[77,78] The KDEEP machine-learning tool, which
uses convolutional neural networks (CNN), was used for a second virtual
screening. The KDEEP gives the binding
affinity result in pKd such that the higher
the pKd value, the stronger the binding
affinity. Lately, artificial intelligence techniques, such as machine
and/or deep learning, are gaining considerable attention in the drug
discovery process. This is partly due to it being much faster than
molecular docking. In addition, having a different method rather than
another molecular docking program with a different scoring method
helps to reduce the number of false positives.[62,79] The datasets used in building KDEEP were
from PDBbind (v.2016) database[80] that contained
13,308 protein–ligand complexes, and their corresponding experimentally
determined binding affinities were collected from literature and the
Protein Data Bank (www.rscb.org).[81] The dataset was split into training
and test datasets. The CNN method was applied using the 3D descriptors
described in Jiménez et al.’s[82] study and was used to create the model. Keras and Tensorflow were
used to implement the model.[83] The model
was compared with other scoring functions, such as cyscore[84] and RF-score,[85] and
was found to perform better with respect to average correlation. KDEEP also served as an updated benchmark for
even unseen data for predicting binding affinity for protein–ligand
affinity.[86] The results of both virtual
screenings (molecular docking and KDEEP) were normalized such that values close to 1 and 0 corresponded
to the top- and low-scored values, respectively. A vote rank method
was used in the consensus scoring process to select the hits that
mutually appeared in the top 5% scored ligands from the two virtual
screenings.[52,69] That is, the top-scored 25 (5%
of 500) compounds in molecular docking (AutoDock Vina) were compared
to the top-scored 25 compounds in KDEEP. The compounds that appeared on both lists were the selected compounds
from the consensus scoring. This method is known as the rank by voting
method.[73] The workflow of the methodology
employed is depicted in Figure .
Authors: Robert D Clark; Alexander Strizhev; Joseph M Leonard; James F Blake; James B Matthew Journal: J Mol Graph Model Date: 2002-01 Impact factor: 2.518
Authors: Wim Buijs; Ibnelwaleed A Hussein; Mohamed Mahmoud; Abdulmujeeb T Onawole; Mohammed A Saad; Golibjon R Berdiyorov Journal: Ind Eng Chem Res Date: 2018-07-05 Impact factor: 3.720
Authors: Abdulmujeeb T Onawole; Kazeem O Sulaiman; Temitope U Kolapo; Fatimo O Akinde; Rukayat O Adegoke Journal: Virus Res Date: 2020-05-15 Impact factor: 3.303
Authors: Abdulmujeeb T Onawole; Ibnelwaleed A Hussein; Mohammed A Saad; Nadhem Ismail; Ali Alshami; Mustafa S Nasser Journal: Polymers (Basel) Date: 2022-06-09 Impact factor: 4.967