Kazuki Z Yamamoto1, Nobuaki Yasuo2, Masakazu Sekijima1. 1. Department of Computer Science, Tokyo Institute of Technology, Yokohama, 226-8501, Japan. 2. Academy for Convergence of Materials and Informatics, Tokyo Institute of Technology, Tokyo, 152-8550, Japan.
Abstract
In addition to vaccines, antiviral drugs are essential for suppressing COVID-19. Although several inhibitor candidates were reported for SARS-CoV-2 main protease, most are highly polar peptidomimetics with poor oral bioavailability and cell membrane permeability. Here, we conducted structure-based virtual screening and in vitro assays to obtain hit compounds belonging to a new chemical space, excluding peptidyl secondary amides. In total, 180 compounds were subjected to the primary assay at 20 μM, and nine compounds with inhibition rates of >5% were obtained. The IC50 of six compounds was determined in dose-response experiments, with the values on the order of 10-4 M. Although nitro groups were enriched in the substructure of the hit compounds, they did not significantly contribute to the binding interaction in the predicted docking poses. Physicochemical properties prediction showed good oral absorption. These new scaffolds are promising candidates for future optimization.
In addition to vaccines, antiviral drugs are essential for suppressing COVID-19. Although several inhibitor candidates were reported for SARS-CoV-2 main protease, most are highly polar peptidomimetics with poor oral bioavailability and cell membrane permeability. Here, we conducted structure-based virtual screening and in vitro assays to obtain hit compounds belonging to a new chemical space, excluding peptidyl secondary amides. In total, 180 compounds were subjected to the primary assay at 20 μM, and nine compounds with inhibition rates of >5% were obtained. The IC50 of six compounds was determined in dose-response experiments, with the values on the order of 10-4 M. Although nitro groups were enriched in the substructure of the hit compounds, they did not significantly contribute to the binding interaction in the predicted docking poses. Physicochemical properties prediction showed good oral absorption. These new scaffolds are promising candidates for future optimization.
Caused
by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2),
Coronavirus disease 2019 (COVID-19) became a pandemic in 2020 and
is still highly prevalent.[1] Although some
effective vaccines have been developed[2−4] and are being widely
administered,[5,6] the disease is far from being
completely eradicated, because of poor compliance by the public with
containment protocols, vaccine breakthrough infections, and the emergence
of mutant strains.[7,8] In addition, although antiviral
drugs such as remdesivir have shown some efficacy in drug repositioning
studies,[9] no effective and specific SARS-CoV-2
antiviral drugs are available.The most widely used attempts
to identify new anticoronaviral drugs
involve targeting RNA-dependent RNA polymerase and main protease (3-chymotrypsin
like protease).[10,11] The main protease is an enzyme
that cleaves the viral polyprotein and is essential for viral replication.[12] It shows glutamine-specific cleavage activity
that has not been observed in human proteases,[13,14] and it is highly conserved among coronaviruses such as severe acute
respiratory syndrome (SARS) and Middle East respiratory syndrome,[15] making it a suitable target for drug discovery.[16]Here, we list the representative main
protease inhibitors that
have been discovered earlier. The structural formulas of them, including
the earliest and most recent ones, are shown in Figure a. Many of these inhibitors are peptidomimetics.
Figure 1
Examples of known main protease inhibitors. (a) Structural formulas
of inhibitors. (b) Binding pose of PF-07321332 and main protease.
The protein structure is 7VH8.PDB. 3D representation is a wall-eye stereogram powered
by PyMOL. 2D interaction diagram was output by Molegro Virtual Docker.
N3 is a substrate-mimicking covalent
inhibitor identified
in a study of SARS-coronavirus (CoV).[17] This inhibitor covalently binds to a cysteine residue in the active
site as a Michael acceptor; however, because of its high polarity,
it exhibits low membrane permeability and is not effective in vivo.[18,19]GC376 is a dipeptide-based inhibitor
of main protease
that was originally developed for treating feline infectious peritonitis
and is a broad-spectrum anticoronaviral drug.[20−22]Pfizer is currently conducting clinical trials for oral
(PF-07321332; nirmatrelvir) and intravenous (PF-07304814; lufotrelvir)
candidate inhibitors.[23−25] The optimization of PF-07321332 started from PF-00835231
(active form of PF-07304814). A nitrile group was introduced as a
covalent warhead to react with the cysteine residue in the active
site (Figure b).[26] After several optimization steps, the final
structure of PF-07321332 resulted in a derivative of the merged pharmacophores
of boceprevir[19] and PF-00835231.GRL-2420 is a tripeptide-based inhibitor[27] that was originally found in a study of SARS-CoV.CVD-0013943 is an inhibitor discovered as
part of the
COVID Moonshot project,[28] an open science
challenge to fight the global pandemic.[29−31] CVD-0013943 is smaller
in size than other peptidic inhibitors and was shown to have low toxicity
but also low metabolic stability.[32,33]Examples of known main protease inhibitors. (a) Structural formulas
of inhibitors. (b) Binding pose of PF-07321332 and main protease.
The protein structure is 7VH8.PDB. 3D representation is a wall-eye stereogram powered
by PyMOL. 2D interaction diagram was output by Molegro Virtual Docker.Various other inhibitors are currently being developed
by pharmaceutical
companies;[34−36] however, some of the structural formulas of these
inhibitors have not been disclosed.The above survey of inhibitors
is unfortunately consistent with
the fact that protease inhibitors have a tendency to have an amide
structure,[37,38] and many active compounds identified
to date are highly polar compounds containing amides, as shown in Figure a. The main protease
is an enzyme that functions inside of the virus-infected cell;[16] thus, compounds must penetrate the cell membrane
to inhibit the protease. Amide compounds sometimes show low membrane
permeability, because of their polarity or are degraded by proteases,[39,40] and structural conversion is necessary in some cases, particularly
to ensure oral bioavailability. Therefore, nonamide active compounds
should be identified to expand the chemical space of hit compounds
and increase the success rate of novel drug discovery.In this
study, we created a subset of a screening compound library
that excludes the peptidyl secondary amide structure and conducted
a hit search for nonamide compounds using structure-based virtual
screening (SBVS), a rational in silico physicochemical simulation
method. Candidate compounds for the assay were selected by SBVS, which
is more useful than ligand-based virtual screening (LBVS) for identifying
novel scaffolds.[41−44] The 180 compounds extracted by SBVS were subjected to enzyme inhibition
assays to confirm their activity, and six compounds showing activity
were obtained.
Results
In Silico Screening
The Enamine library (3 341 762
compounds) was filtered into 99 765 compounds to avoid peptidyl
secondary amides. These compounds were ranked by conventional rigid
docking simulation,[45] using three Protein
Data Bank (PDB) structures as targets: 6M0K.PDB,[46,47]7JKV.PDB,[27,48] and Mpro ligand x12073[49]. According to
the docking scores and visual inspection, 180 compounds were selected
for in vitro assays. The list of 180 compounds is shown in Table S1 in the Supporting Information. The PDB
ID of the protein structure of the docking target, from which the
selection of each compound was derived, is also shown in this table.
A comparison of the properties of the 180 compounds evaluated in this
study and two sets of known hit compounds is shown in Figure . The two sets consist of ChEMBL
registered compounds and COVID Moonshot compounds with submicromolar
activities. The 180 compounds assayed in this study do not contain
peptidyl secondary amides; however, compounds containing lactams or
tertiary amides were not excluded (Figure a). Principal component analysis (PCA) plots
of each group of hit compounds based on Morgan fingerprints, are shown
in Figures b and 2c. Although there was some overlap between the sets
in the chemical space, the 180 compounds assayed in this study were
generally located in a new space.
Figure 2
Comparison of the compound sets. (a) Distribution
of molecular
weight and amide number of compounds for each data source. Among the
compounds showing inhibitory activity on the main protease, those
with IC50 < 1 μM were extracted from ChEMBL and
COVID Moonshot. (b) PCA plots of chemical space for compounds from
each data source and the compounds shown in Figure a. Contribution ratio: PC1 0.0548, PC2 0.0384
(c) Enlargement of the dense part of Figure 2b.
Comparison of the compound sets. (a) Distribution
of molecular
weight and amide number of compounds for each data source. Among the
compounds showing inhibitory activity on the main protease, those
with IC50 < 1 μM were extracted from ChEMBL and
COVID Moonshot. (b) PCA plots of chemical space for compounds from
each data source and the compounds shown in Figure a. Contribution ratio: PC1 0.0548, PC2 0.0384
(c) Enlargement of the dense part of Figure 2b.
Primary Assay
The 180 compounds selected by in silico
screening were examined by an in vitro fluorescence assay. The mechanism
of the assay system is shown in Figure a. The assay system was validated using GC376[20] as a positive control (Figure b). All test compounds were assayed at 20
μM. Among the 180 compounds, nine compounds showed inhibition
rates of >5% (see Table S1 in the Supporting
Information).
Figure 3
In vitro assay setup and results. (a) Schematic of the
assay system.
The decrease in fluorescence caused by the inhibitor was measured.
(b) Validation of the assay system. The dose–response curve
of the positive control compound GC-376 was measured. (c) Dose–response
curves of six hit compounds whose IC50 were determined
by dose–response experiments. The IC50 values are
shown, along with the graph.
In vitro assay setup and results. (a) Schematic of the
assay system.
The decrease in fluorescence caused by the inhibitor was measured.
(b) Validation of the assay system. The dose–response curve
of the positive control compound GC-376 was measured. (c) Dose–response
curves of six hit compounds whose IC50 were determined
by dose–response experiments. The IC50 values are
shown, along with the graph.
Dose Response Experiment
Dose–response experiments
were conducted for compounds whose inhibition rates in the primary
assay were >5%. The dose–response curves are shown in Figure c. The concentrations
of the compounds that reduced enzyme activity by 50% (IC50) were determined for six compounds: Z391132396, Z166626994, Z819866548,
Z2094146478, Z1159100304, and Z324552662. In the counter assay, no
signal interference was detected for the six compounds. The structural
formulas of these six hit compounds are shown in Figure a. Figure b shows the positions of these hit compounds
in the PCA plot in Figure c.
Figure 4
Hit compounds and their positions: (a) structural formulas of the
six hit compounds; (b) positions of the six hits in the PCA plot of Figure c; and (c) docking
poses of the six hit compounds (white) and ligand x12073 (yellow)
superimposed at the active site.
Hit compounds and their positions: (a) structural formulas of the
six hit compounds; (b) positions of the six hits in the PCA plot of Figure c; and (c) docking
poses of the six hit compounds (white) and ligand x12073 (yellow)
superimposed at the active site.
Redocking of the Six Hit Compounds
Since five of the
six hit compounds were candidates obtained from docking against ligand
Mpro-x12073, each hit compound was redocked against ligand Mpro-x12073.
A superimposed image of the docking poses of the six compounds (white)
is shown in Figure c. Ligand x12073 is colored yellow. All ligands except for Z324552662
were in positions that roughly overlapped with ligand x12073 and occupied
the P1–P2 pocket. Z324552662 protruded into the P1′
pocket. The nitro groups faced the outside of the cavity. The 2D diagrams
of the docking poses are shown in Figure S1a in the Supporting Information.
ADME Prediction of Compounds
Absorption, distribution,
metabolism, and excretion (ADME) predictions for the compounds assayed
in this study and sets of known inhibitors were performed using SwissADME[50] (see Table S2 in the
Supporting Information). The six compounds identified in this study
generally satisfied Lipinski’s Rule of Five and were predicted
to be orally absorbable; structural alerts include nitro groups, whereas
no structures corresponded to pan-assay interference compounds (PAINS;
compounds prone to false positives).[51] IC50 values, structural alerts, and c log P values for each compound are shown in Table .
Table 1
Properties
of the Six Hit Compounds;
IC50 Values, Structural Alerts, and c log P Values for Each Compound Are Shown
ID
IC50 [μM]
structural alerts
c log P (ZINC
db)
Z391132396
154
Nitro-Sp2
1.95
Z166626994
222
Nitro-Sp2
2.529
Z819866548
189
Nitro-Sp2
1.864
Z2094146478
281
–
2.877
Z1159100304
273
Nitro-Sp2, Sulfone-Cyclic
1.804
Z324552662
291
Propenals, Alkene-Internal-Sp2
4.066
Discussion
Of
the 180 compounds assayed, we obtained six compounds with main
protease inhibitory activity at high concentrations and for which
IC50 values could be determined. We identified a tertiary
amide compound, a sulfonamide compound, a compound containing a lactam
structure, and three compounds without an amide bond. According to
the prediction by SwissADME, a certain level of membrane permeability
seems to be guaranteed, because of the avoidance of amides in the
screening process. Some hit compounds were weakly reactive, such as
a weak electrophile containing a nitro group. As we did not perform
a counter assay using other enzymes, the target specificity of these
compounds remains unknown. Twelve of the 180 compounds assayed contained
nitro groups, and 4 of the 6 hit compounds contained nitro groups.
Because the percentage of compounds containing nitro groups was enriched,
the electrophilic nature of the nitro groups may have conferred reactivity
to the cysteine protease. Formation of a thiohydroximate adduct in
the reaction of a nitro group with a cysteine residue in the active
site has been reported.[52] In fragment screening
conducted prior to the COVID Moonshot project,[53] an electrophile library[54,55] oriented toward
covalent bonding was used. Since these electrophile libraries did
not cover nitro groups, they evaluated a different chemical space
from that examined in the present study. Based on the predicted docking
poses (Figure S1a in the Supporting Information),
covalent interactions were not suggested by our results, because the
nitro groups of each hit compound were not located close to the cysteine
residues in the active site. In addition, because these nitro groups
are not responsible for strong interactions in the docking poses,
they are considered bioisosterically substitutable[56,57] or removable if necessary (Figure S1b in
the Supporting Information). In the docking poses, the residue interactions
were generally consistent with important hot spots reported previously.[58]All hit compounds showed IC50 values on the order of
10–4 M, which is weaker than those of currently
known sets of amide compounds with submicromolar activity. On a case-by-case
basis, hit-to-lead optimization can increase activity by hundreds
or thousands of folds.[59] For example, the
activity of an amide bond can be improved by restricting the dihedral
degree of freedom of the amide bond to the active conformation by
cyclization in some cases.[60] However, because
the structure of the active site of the main protease widely fluctuates,[61] it may be desirable for the compound to have
some degree of freedom in its conformers to accommodate fluctuations
in the active site. In contrast, compounds without a peptidyl secondary
amide structure are thought to be more stable against cleavage by
proteases, and structural optimization may yield a more stable and
active inhibitor in vivo.[62]As our
hit compounds did not contain a peptidyl secondary amide
at the P1–P2 position (Figure c), the hit compound structures may be useful as reference
scaffolds if amide substitution is necessary to optimize other peptidomimetic
inhibitors. In addition, most assay datasets reported to date for
main proteases, both positive and negative, consist of compounds containing
peptidyl secondary amide structures; thus, our dataset is valuable
because it expands the compound chemical space to be assayed. Some
inhibitors are currently being evaluated in clinical trials;[23,24] however, it is important to have many candidate compounds, in view
of the emergence of resistant viruses in the future. Expanding the
number of hit compounds with higher structural diversity is beneficial
for drug discovery.In conclusion, because of their predictedly
good physicochemical
properties as oral drugs, our new scaffolds identified in this study
will contribute to the advancement of anticoronaviral drug research.
Experimental
Section
Filtering of Screening Compound Library
The September
2020 version of the Enamine Collection,[63] consisting of 3 341 762 compounds, was used as the
compound library. To obtain a set of compounds without peptidyl secondary
amide bonds, the library was filtered using the following criteria.20 ≤ heavy atom count ≤
301 ≤ hydrogen bond donor (HBD)
count2 ≤ hydrogen bond acceptor
(HBA) count ≤
70.1 ≤ fraction of sp3 carbons
(Fsp3) ≤
0.452 ≤ rotatable bond (RB) count
≤ 62 ≤ aromatic ring count
≤ 31 ≤ aromatic heterocycle
count1 ≤ aliphatic ring count
≤ 2(aromatic ring count + aliphatic
ring count) ≤
4300 < molecular weight < 450contains no amide bond except tertiary amideAfter filtering, 99 765 compounds
remained.
Structure-Based Virtual Screening
Compound Conformer Generation
After filtering as described
above, conformer generation was performed for the 99 765 remaining
compounds using GYPSUM-DL software (version 1.1.7).[64] The following execution options were used: -max_variants_per_compound
1 use_durrant_lab_filters The resulting structures were saved as SDF
files.
Protein Model Preparation
The protein models for docking
simulation were prepared using Molegro Virtual Docker (version 7.0.0).[45] The source PDB structures were 6M0K,[47]7JKV,[48] and Mpro-x12073[49] (COVID Moonshot project[31]).
The Models were prepared using the Protein Preparation Wizard in Molegro
Virtual Docker (default settings).
Compound Selection by Docking
Simulation
Using the
compound conformers and protein models, docking simulation was performed
using Molegro Virtual Docker.[45] The search
space was set as an 8 Å sphere centered at the active site. Docking
simulation was performed via the Docking Wizard in Molegro Virtual
Docker (Scoring function: PLANTS score, Algorithm: MolDock SE, After
docking: Energy Minimization enabled, H-bonds optimization enabled.
The compounds were ranked by the Rerank score (linear combination
of steric, van der Waals, hydrogen bonding, and electrostatic interactions)
and LE3 score (Rerank score divided by heavy atom count). A 2D scatter
plot was drawn using the Rerank score and LE3, and we manually chose
compounds that were outliers with better (lower) values in the distribution.
High-scoring compounds were further assessed by visual inspection
to check for key hydrogen bonds and shape fitting. Briefly, docking
poses fixed by hydrogen bonds at both ends or three or more points
of the compound in the cavity of the active site were selected. Docking
poses with geometric centers too close to the walls of the cavity
and showing a low filling rate of the cavity were avoided. We also
avoided compounds with unfavorable torsions in the conformer of the
docked pose. Finally, 180 compounds were chosen for in vitro analysis
(Table S1 in the Supporting Information).
3CL Protease In Vitro Fluorescence Assay
Briefly, SARS-CoV-2
3CL protease (0.6 ng/mL) was incubated with fluorogenic 3CL substrate
(40 μM) and test compound (20 μM) in 25 μL of assay
buffer for 4 h at 25 °C. This experiment was performed by Bienta
(Biology Service Division of Enamine Ltd., Kyiv, Ukraine).
Materials
SARS-CoV-2 3CL protease (untagged) (Catalog
No. 100823), internally quenched fluorogenic (FRET) 3CL protease substrate
(Catalog No. 79952), 3CL protease assay buffer (Catalog No. 79956),
and reference protease inhibitor GC376 (Catalog No. 78013) were purchased
from BPS Bioscience (San Diego, CA, USA). In addition, 384-well low-volume
black polystyrene microplates with nonbinding surfaces were obtained
(Item No. 4514, from Corning, Corning, NY, USA).
Primary Assay
Protocol
In one 384-well plate, 150 nL
of compounds was placed in columns 3–22 and dimethyl sulfoxide
was added to columns 1–2, except for wells A1–2 and
B1–2, which were filled with the reference protease inhibitor
GC-376 at the IC50. Columns 23 and 24 were filled with
150 nL of the reference compound GC-376 at a saturating concentration
of 3.7 μM. Next, 7.5 μL of 3CL protease 2× solution
(4.5 μg/mL in 1× assay buffer with 1 mM DDT) was added
to all wells, using a Multidrop Combi Reagent Dispenser (Thermo Fisher
Scientific, Waltham, MA, USA). The enzyme was preincubated with the
compounds for 30 min at room temperature (25 °C) with slow shaking.
Each well of the plate was filled with 7.5 μL of 3CL substrate
2× solution (30 μM in assay buffer with 1 mM DDT) using
a Multidrop Combi Dispenser (Thermo Fisher Scientific, Waltham, MA,
USA). The final concentration of test compounds was 20 μM. The
plate was incubated for 20 min at room temperature. The fluorescence
(excitation 360 nm, emission 460 nm) was read on a Paradigm reader
(Molecular Devices, Sunnyvale, CA, USA).
Data Analysis of Primary
Assay
Each high-throughput
screening plate contained a single test compound in columns 3–22,
controls (enzyme, no compound) in columns 1 and 2, and blanks (saturating
concentration of the reference compound GC-376) in columns 23 and
24. The high-throughput screening percent inhibition was calculated
for each compound from the signal in fluorescence units, mean of the
plate controls, and mean of the plate blanks using the following equation:
Dose–Response Analysis
Using the same protocol
as used in the primary assay, six compounds were titrated by 3-fold
in 8-point curves from 900 μM (n = 4). The
dose–response curve data were analyzed in GraphPad Prism 8.0.2
software (GraphPad, Inc., La Jolla, CA, USA) and the IC50 values were determined by curve fitting.