Son Tung Ngo1,2, Nguyen Minh Tam2,3, Minh Quan Pham4,5, Trung Hai Nguyen1,2. 1. Laboratory of Theoretical and Computational Biophysics, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam. 2. Faculty of Applied Sciences, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam. 3. Computional Chemistry Research Group, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam. 4. Graduate University of Science and Technology, Vietnam Academy of Science and Technology, Hanoi 100000, Vietnam. 5. Institute of Natural Products Chemistry, Vietnam Academy of Science and Technology, Hanoi 100000, Vietnam.
Abstract
The COVID-19 pandemic has killed millions of people worldwide since its outbreak in December 2019. The pandemic is caused by the SARS-CoV-2 virus whose main protease (Mpro) is a promising drug target since it plays a key role in viral proliferation and replication. Currently, developing an effective therapy is an urgent task, which requires accurately estimating the ligand-binding free energy to SARS-CoV-2 Mpro. However, it should be noted that the accuracy of a free energy method probably depends on the protein target. A highly accurate approach for some targets may fail to produce a reasonable correlation with the experiment when a novel enzyme is considered as a drug target. Therefore, in this context, the ligand-binding affinity to SARS-CoV-2 Mpro was calculated via various approaches. The molecular docking approach was manipulated using Autodock Vina (Vina) and Autodock4 (AD4) protocols to preliminarily investigate the ligand-binding affinity and pose to SARS-CoV-2 Mpro. The binding free energy was then refined using the fast pulling of ligand (FPL), linear interaction energy (LIE), molecular mechanics-Poisson-Boltzmann surface area (MM-PBSA), and free energy perturbation (FEP) methods. The benchmark results indicated that for docking calculations, Vina is more accurate than AD4, and for free energy methods, FEP is the most accurate method, followed by LIE, FPL, and MM-PBSA (FEP > LIE ≈ FPL > MM-PBSA). Moreover, atomistic simulations revealed that the van der Waals interaction is the dominant factor. The residues Thr26, His41, Ser46, Asn142, Gly143, Cys145, His164, Glu166, and Gln189 are essential elements affecting the binding process. Our benchmark provides guidelines for further investigations using computational approaches.
The COVID-19 pandemic has killed millions of people worldwide since its outbreak in December 2019. The pandemic is caused by the SARS-CoV-2 virus whose main protease (Mpro) is a promising drug target since it plays a key role in viral proliferation and replication. Currently, developing an effective therapy is an urgent task, which requires accurately estimating the ligand-binding free energy to SARS-CoV-2Mpro. However, it should be noted that the accuracy of a free energy method probably depends on the protein target. A highly accurate approach for some targets may fail to produce a reasonable correlation with the experiment when a novel enzyme is considered as a drug target. Therefore, in this context, the ligand-binding affinity to SARS-CoV-2Mpro was calculated via various approaches. The molecular docking approach was manipulated using Autodock Vina (Vina) and Autodock4 (AD4) protocols to preliminarily investigate the ligand-binding affinity and pose to SARS-CoV-2Mpro. The binding free energy was then refined using the fast pulling of ligand (FPL), linear interaction energy (LIE), molecular mechanics-Poisson-Boltzmann surface area (MM-PBSA), and free energy perturbation (FEP) methods. The benchmark results indicated that for docking calculations, Vina is more accurate than AD4, and for free energy methods, FEP is the most accurate method, followed by LIE, FPL, and MM-PBSA (FEP > LIE ≈ FPL > MM-PBSA). Moreover, atomistic simulations revealed that the van der Waals interaction is the dominant factor. The residues Thr26, His41, Ser46, Asn142, Gly143, Cys145, His164, Glu166, and Gln189 are essential elements affecting the binding process. Our benchmark provides guidelines for further investigations using computational approaches.
SARS-CoV-2, a novel coronavirus, causes severe acute respiratory syndromes and is
responsible for millions of deaths worldwide since its first outbreak in December 2019 in
Wuhan, Hubei Province, China.[1−4] The virus has been thought to originate from bats and can quickly
transfect between humans.[5] The spreading speed is high since it is able
to exist in aerosol especially.[6] Despite the efforts to limit the spread
of the virus, more than 80 million people were infected within a year. The outbreak of the
virus effectuates the COVID-19 pandemic. Therefore, the development of an effective therapy
is thus much more urgent for community health. Although remdesivir was first approved as the
anti-viral drug for treating COVID-19,[7] it is probably considered a
controversial decision[8] since the drug showed disappointing
trials.[9,10] Searching
an appropriate treatment for COVID-19 is accordingly a matter of great urgency.The coronavirus genome sequence, encoding more than 20 different proteins, is known as the
largest RNA virus sequence, which is approximately 26–32 kb in
length.[11,12] The
SARS-CoV-2 virus forms >82% homologous RNA genomes to SARS-CoV.[1] The
SARS-CoV-2main protease (Mpro), being more than 96% identical to SARS-CoVMpro, is one of
the most pivotal proteins because of its direct involvement with viral replication and
proliferation.[12,13]
In particular, SARS-CoV-2Mpro splits 11 polyproteins into polypeptides, which are used for
replication and to encapsulate a new virus.[12] Several probes in both
computational and experimental studies were used to develop an effective therapy and
preliminary results were acquired;[14−18]
however, as mentioned above, it was a controversial decision,[8] and
moreover, an effective drug inhibiting SARS-CoV-2Mpro is still unattainable.Currently, computer-aided drug design (CADD) is routinely employed to rapidly screen
probable inhibitors for preventing the biological function of a specific
enzyme.[19,20] The time
and cost for the development of a therapy thus decrease. In particular, the Gibbs free
energy difference of the noncovalent chemical reaction between SARS-CoV-2Mpro and its
inhibitors, ΔG, can commonly be computed using molecular dynamics
(MD) simulations because it connects with the inhibition constant,
Ki, an important metric disclosing the binding process among
biomolecules. Reliable calculation of ΔG is one of the most pivotal
issues in CADD.[21−23] Therefore, numerous
approaches have been developed to resolve the problem.[23] In order to
screen a large number of candidates, which can be up to several million compounds, the
computational probe is usually operated via two steps: initial screening of
thousands/millions of compounds via rapid protocols such as quantitative
structure–activity relationship,[24] molecular docking,[25] and machine learning;[26] ΔG was
then realized using MD simulations, in which the popular free energy estimation approaches
include the linear interaction energy (LIE),[27] fast pulling of ligand
(FPL),[28] molecular mechanism-Poisson–Boltzmann (generalized
Born) surface area (MM-PB(GB)SA),[29,30] nonequilibrium molecular dynamics (NEMD),[31]
thermodynamic integration,[32] free energy perturbation (FEP)
approaches,[33] and so forth. However, it should be noted that the
precision and accuracy of the ligand-binding affinity approaches somehow depend on the
enzyme targets.[34−40]
Therefore, in this work, we benchmarked the performance of docking protocols involving
Vina[25] and AD4[41] applying on the SARS-CoV-2 target.
Consequently, MD simulations were then performed to investigate the dynamics of the
SARS-CoV-2 + inhibitor complexes. The relaxed complexes were then used as initial
conformations for probing ligand-binding affinity using four free energy schemes including
FPL,[28] LIE,[27] molecular
mechanics-Poisson–Boltzmann surface area (MM-PBSA),[32,33] and FEP.[33] The obtained
observations probably guide for further investigations using computational approaches.
Materials and Methods
Structure of the Receptor and Ligands
The X-ray diffraction structure of SARS-CoV-2Mpro was obtained from the Protein Data
Bank (PDB) with the PDB ID 7JYC.[42] The structure of 34 ligands was taken from the
PubChem database[43] referring to the previous work,[44−51]
and their 2D structure is reported in Table S1 in the Supporting Information.
Molecular Docking Simulations
Vina[25] and AD4[41] were manipulated to dock the
available inhibitors to the binding cleft of SARS-CoV-2Mpro. The binding cleft was
selected as the binding region of the compound named narlaprevir.[42] In
particular, the grid size for docking was chosen as 24 × 24 × 24 Å
according to the previous work.[16,17] The modeling of docking simulations is described in Figure A.
Figure 1
Computational scheme via molecular docking, steered molecular dynamics (SMD), and MD
simulations. (A) Modeling of molecular docking simulations. Inhibitors were docked to
the binding cleft of SARS-CoV-2 Mpro, which was limited in the docking box with a
volume of ca. 13.82 nm3. (B) Starting structure of the SARS-CoV-2 Mpro +
inhibitor complex for estimating the ligand-binding affinity via the FPL scheme. (C)
Initial shape of the SARS-CoV-2 Mpro + inhibitor complex for MD simulations. (D)
Starting conformation of the solvated inhibitor. The cyan balls describe the
neutralized Na+ ions.
Computational scheme via molecular docking, steered molecular dynamics (SMD), and MD
simulations. (A) Modeling of molecular docking simulations. Inhibitors were docked to
the binding cleft of SARS-CoV-2Mpro, which was limited in the docking box with a
volume of ca. 13.82 nm3. (B) Starting structure of the SARS-CoV-2Mpro +
inhibitor complex for estimating the ligand-binding affinity via the FPL scheme. (C)
Initial shape of the SARS-CoV-2Mpro + inhibitor complex for MD simulations. (D)
Starting conformation of the solvated inhibitor. The cyan balls describe the
neutralized Na+ ions.
Autodock Vina (Vina)[25]
The performance of Vina depends on the parameter’s exhaustiveness, which was
chosen to be 8.[34,52]
The largest energy difference between docking shapes was set to 7 kcal
mol–1. The largest ligand-binding affinity was selected as the best
docking structure.
Autodock4 (AD4)[41]
AD4 was used with a grid size of 72 × 72 × 72 Å and with a spacing of
0.333 Å. Autogrid4 was selected to perform the docking. The inhibitor was docked to
SARS-CoV-2Mpro with a genetic algorithm (GA) run of 10, a population size of 150, and a
number of 27 000 generations. The GA number of evaluations was 250 000.
The lowest binding free energy cluster was selected as the best docking
conformation.
MD Simulations
MD simulations were performed to improve the docking results of the available inhibitors
to SARS-CoV-2Mpro by using GROMACS version 5.1.5.[53] In particular,
SARS-CoV-2Mpro and neutralized ions were described via the Amber99SB-iLDN force
field.[54] The TIP3P water model was simultaneously used to represent
water molecules.[55] The protonation states of the protein were predicted
by GROMACS using canonical pKa values in a neutral solution
since it probably adjusts the ligand-binding free energy,[56] in which
the protonation state of the catalytic dyad including His41 and Cys145 was also predicted
(cf. Figure S1 in the Supporting Information). The most possible distance between
the His41Hε atom and Cys145Sγ is ca. 0.44 nm (Figure S1A), which is in good agreement with the experimental data.[42] Consequently, the inhibitor was illustrated using a general Amber force
field[57] with the help of AmberTools18 and ACPYPE
packages.[58,59] It
should be noted that before parameterizing the ligand, the quantum chemical calculation
using the B3LYP functional at the 6-31G(d,p) level of theory was performed to obtain
chemical information of the inhibitor during which the restrained electrostatic potential
(RESP) approach was used to assign atomic charges upon quantum simulations using an
implicit solvent environment, ε = 78.4.[57] Moreover, the
SARS-CoV-2Mpro + inhibitor complex was placed into a rectangular or dodecahedron periodic
boundary (PBC) condition box with a volume of 506 or 820 nm3 corresponding to
SMD (Figure B) or unbiased MD (Figure C) simulations, respectively. The soluble system hence
encompasses ca. 50,000 or 80,000 atoms, respectively, involving SARS-CoV-2Mpro, ligands,
water molecules, and neutralized Na+ ions. Moreover, in order to carry out the
perturbation simulations, the ligand was individually simulated in a dodecahedron PBC box
with a volume of ca. 85 nm3 (Figure D). The soluble ligand system consists of ca. 8000 atoms, involving ligands,
water molecules, and counterbalanced ions.The parameters for operating MD simulations were described in previous
works.[16,17] In
particular, the integral was attempted every 2 fs. A nonbonded pair was enumerated within
a radius of 0.9 nm. The van der Waals (vdW) interaction was assessed using the cutoff
scheme, while the electrostatic (cou) interaction was determined via the fast particle
mesh Ewald electrostatics scheme.[60] The solvated system was initially
minimalized via the steepest descent approach. The canonical (NVT) and
isobaric–isothermal (NPT) simulations, with lengths of 0.1 and 2.0
ns, respectively, were then followed to equilibrate the system. The final conformations of
NPT simulations of the solvated complex were operated as the initial
structure of SMD or MD simulations, a length of 0.5 or 20.0 ns, respectively. Moreover,
the solvated inhibitor system was run for 5.0 ns. Each system was imitated two times for
getting better samples.
Free Energy Calculations
FPL Scheme
An externally harmonic force was applied to dissociate the inhibitor from the
SARS-CoV-2Mpro binding cleft as mentioned in the Supporting Information. In particular, cantilever spring constant,
v = 600 kJ mol–1 nm–2, and
pulling velocity, k = 0.005 nm ps–1, were selected as
forced parameters.[28] During SMD simulations, the pulling work,
W, was recorded to be used as a critical factor to estimate the
ligand-binding affinity[28] since it correlates with the binding free
energy, ΔG, via the isobaric–isothermal Jarzynski
equality.[61] The pulling work W is calculated as
follows
LIE Calculation
The ligand-binding free energy, ΔGLIE, was computed
as the mean of vdW and cou interaction differences of the inhibitor with its neighboring
atoms upon incorporation, that is, the individual ligand in the solvent (unbound
state—denoted as subscript u) and the inhibitor in the binding mode with
SARS-CoV-2Mpro (bound state—denoted as subscript b). The formula of the LIE
approach can be expressed as
followsThe coefficient γ, a constant, is associated with the alteration of the
hydrophobic nature of the binding cleft conceding to various species of inhibitors,
whereas the coefficients α and β are rating parameters for nonpolar and
polar interactions, respectively.[62] Additional information about the
LIE approach is described in the Supporting Information.
MM-PBSA Analysis
The ligand-binding affinity, ΔGMM-PBSA, can be
assessed using MD simulations via the MM-PBSA approach[29,30] as
followswhere ΔEcou,
ΔEvdW, ΔGsur, and
ΔGPB correspond to the energetic changes in cou,
vdW, nonpolar, and polar interactions, respectively;
TΔS is the entropic contribution to
ΔGMM-PBSA. In particular,
ΔEcou and ΔEvdW
were computed using GROMACS tools “gmx energy”. The nonpolar metrics,
ΔGsur, was determined via the Shrake–Rupley
formula,[63] which is ΔGsur =
γSASA + β, where γ = 0.0072 kcal mol–1
Å–2 and β = 0.[64] The polar component,
ΔGPB, was assessed by numerically resolving the
Poisson–Boltzmann equation using an implicit solvent model.[65,66] Finally, the entropic term can be
probed via normal mode approximation.[67] Additional information about
the MM-PBSA approach is described in the Supporting Information.
Double-Annihilation Binding Free Energy Investigation
The inhibitor was changed from bound state to unbound
state by using λ-alteration simulations,[68] which concur at
λ = 0 and λ = 1. Several values of the coupling parameter λ were used
to complete this task. The free energy change,
ΔGλ=0→1 =
−kBTln⟨e–Δ⟩λ=0,
corresponds to the work of the ligand annihilation process, whose information is
described in detail in the Supporting Information. The value can be assessed via the Bennett
acceptance ratio scheme.[69] The binding free energy between SARS-CoV-2Mpro and the inhibitor, ΔGFEP, is thus obtained due to
the difference of the free energy changes upon two ligand annihilation processes
involving a demolishing inhibitor in the solvated complex, ΔGλ=0→1Comp, and an
inhibitor, ΔGλ=0→1lig,[22] as
follows
Analysis Tools
Because the protonation states of ligands possibly alter the protein–ligand
binding,[70] the chemicalize webserver, a tool of ChemAxon, was used to
assess the protonation states of inhibitors. The Adaptive Poisson-Boltzmann Solver (APBS)
webserver was used to determine the surface charge of the protease.[66,71] The correlation error was
calculated using 1000 rounds of the bootstrapping method.[72] The
intermolecular nonbonded contact (NBC) between the ligand atoms and the residual
SARS-CoV-2Mpro was confirmed when the pair between their nonhydrogen atoms is smaller
than 4.5 Å. The intermolecular hydrogen bond (HB) between the Mpro residues and the
inhibitors was endorsed when the angle ∠ acceptor (A)-hydrogen (H)-donor (D) is
larger than 3π/4 and the pair A–D is smaller than 3.5 Å. ROC-AUC was
calculated using the Scikit-Learn library.[73] Since ROC-AUC calculations
require ligands to be assigned a binary label, we classify the ligands according to their
experimental binding affinity and split them into two halves. The ligands in the first
half having experimental binding free energy below the median were assigned the
strong binder label, whereas those in the second half were assigned the
weak binder label. ROC-AUC was used to benchmark different
computational methods in terms of discriminating between strong and weak binders.
Results and Discussion
Molecular Docking Calculations
The obtained results are shown in Tables and
S1 in the Supporting Information. Initially, we assessed the docking results
against the relevant experimental data including binding affinity and native binding
poses.[44−50]
The assessment includes two parts: correlation between docking and experimental
ligand-binding affinity and successful docking rate.[34] The estimated
correlation coefficients for Vina and AD4 are RVina = 0.60
± 0.13 and RAD4 = 0.47 ± 0.21, respectively. This
indicates that the docking energies of Vina are more strongly correlated with experiments
than those of AD4. Moreover, the root-mean-square error (RMSE) of Vina is lower than that
of AD4, that is, RMSEVina = 1.78 ± 0.17 and RMSEVina = 1.97
± 0.17 kcal mol–1, respectively. Although AD4 required much more
computing resources than Vina does, its docking performances lagged behind Vina. It is
probably caused by the difference in scoring functions as indicated by prior
observations,[34] in which AD4 uses a hybrid physical-based/empirical
scoring function, while Vina uses an empirical scoring function.[25,41] Furthermore, in the previous
work,[74] AD4 gave poor correlation, R = 0.36, with
ΔGJarzynski, which is obtained via NEMD
simulations,[31] a much more accurate free energy approach. Therefore,
it may be argued that Vina is the appropriate protocol for preliminary assessment of the
ligand-binding affinity to SARS-CoV-2Mpro.
Table 1
Calculated Results in Comparison with the Experimental Values of Some Compounds
to SARS-CoV-2 Mpro
no
name
ΔGVina
ΔGAD4
W
ΔGLIE
ΔGMM-PBSA
ΔGFEP
ΔGEXPa
1
7Jb
–7.4
–6.0
95.7 ± 6.1
–15.04 ± 0.30
–19.3 ± 1.03
–17.95 ± 2.74
–8.69[44]
2
11ab
–7.3
–8.1
109.7 ± 3.1
–14.78 ± 0.81
–29.67 ± 0.30
–18.95 ± 0.52
–9.96[45]
3
11bb
–7.4
–8.0
91.3 ± 7.9
–13.99 ± 0.36
–14.41 ± 1.85
–16.53 ± 0.59
–10.13[45]
4
11r
–6.8
–6.9
96.6 ± 8.2
–15.98 ± 1.92
–15.14 ± 1.61
–20.89 ± 0.51
–9.23[46]
5
13ab
–7.6
–7.6
64.7 ± 10.6
–10.07 ± 0.59
–0.71 ± 0.87
–10.94 ± 2.51
–7.70[46]
6
13bb
–7.7
–7.4
81.3 ± 6.1
–16.16 ± 2.00
–19.93 ± 3.96
–16.47 ± 0.32
–8.45[46]
7
baicalein
–6.8
–5.7
36.5 ± 8.0
–10.36 ± 2.57
–8.88 ± 2.92
–8.40 ± 2.23
–8.25[47]
8
boceprevirc
–7.1
–8.8
54.5 ± 1.8
–11.75 ± 0.85
–9.74 ± 0.48
–7.65 ± 1.31
–7.37[48]
9
calpain inhibitor I
–5.3
–5.4
50.2 ± 4.9
–10.77 ± 0.87
–7.51 ± 0.17
–6.41 ± 0.37
–6.94[48]
10
calpain inhibitor IIc
–5.6
–5.3
74.1 ± 22.9
–12.21 ± 0.22
–14.92 ± 6.95
–9.09 ± 2.39
–8.23[48]
11
calpain inhibitor XIIc
–6.3
–5.1
51.8 ± 5.7
–11.98 ± 0.30
–24.73 ± 1.21
–9.27 ± 0.88
–8.69[48]
12
calpeptin
–4.9
–6.1
33.0 ± 5.4
–9.75 ± 1.07
–4.37 ± 1.89
–3.43 ± 0.97
–6.81[48]
13
carmofurb
–5.6
–6.0
39.1 ± 5.9
–9.30 ± 1.78
–3.97 ± 0.98
–7.12 ± 3.20
–7.86[49]
14
GC-373b
–7.2
–6.6
53.1 ± 8.3
–12.16 ± 0.41
–12.04 ± 1.13
–10.32 ± 1.55
–8.76[50]
15
MG-115
–6.1
–5.4
57.8 ± 2.2
–11.34 ± 0.57
–8.14 ± 1.21
–9.19 ± 0.73
–7.53[48]
16
MG-132c
–6.2
–5.2
71.4 ± 9.1
–11.50 ± 0.39
–12.13 ± 3.06
–8.48 ± 0.41
–7.41[48]
17
narlaprevirb
–7.5
–5.9
69.9 ± 2.1
–12.69 ± 0.05
–22.75 ± 0.25
–6.57 ± 0.50
–7.18[48]
18
PX-12b
–3.9
–4.8
32.1 ± 1.0
–8.67 ± 0.05
–32.44 ± 0.73
–2.56 ± 0.30
–6.39[49]
19
shikonin
–6.1
–6.0
27.3 ± 6.9
–9.37 ± 0.27
–0.74 ± 4.30
–3.01 ± 0.95
–6.58[49]
20
tideglusib
–6.6
–7.1
36.5 ± 3.1
–9.92 ± 0.27
–10.56 ± 2.93
–4.26 ± 0.12
–7.95[49]
The experimental binding free energies were gained based on the IC50
value, approximating that the one equals to the inhibition constant
Ki and based on the assumption of the contribution of
the covalent binding energy is small.
The covalent binding inhibitors.
The possible slow covalent binding inhibitors. The unit is kcal
mol–1.
The experimental binding free energies were gained based on the IC50
value, approximating that the one equals to the inhibition constant
Ki and based on the assumption of the contribution of
the covalent binding energy is small.The covalent binding inhibitors.The possible slow covalent binding inhibitors. The unit is kcal
mol–1.The inhibitor-binding pose was also obtained using this process. The docking pose forms a
small root-mean-square deviation (RMSD) with respect to the experimental pose. It was
considered as a successfully docked conformation if the RMSD is smaller than 2
Å.[34] In particular, nine compounds including 7j, 11a, 11b, 13b,
baicalein, boceprevir, calpeptin,
GC-373, and narlaprevir were reported to have the experimental binding
poses with the PDB IDs 6XMK,[44]6LZE,[45]6M0K,[45]6Y2F,[46]6M2N,[47]7K40,[42]7AKU,[75]6WTK,[50] and
7JYC,[42]
respectively. Over these systems, the successful docking rate of Vina is ca. 67% with a
mean RMSD of 1.97 ± 0.32 Å, as presented in Figure . It is significantly better than those by AD4 with the RMSD
between docked and experimental structures of 3.22 ± 0.33 Å, as represented in
Figure S2 in the Supporting Information. Therefore, it may be concluded that
Vina not only formed the proper affinity results but also showed the suitable binding pose
to SARS-CoV-2Mpro.
Figure 2
Comparison of docking (cyan) and experimental binding (yellow) poses of 11a, 11b,
13b, and boceprevir to SARS-CoV-2 Mpro. The surface charge, ranging from −5 to
5, of the protease was computed via the APBS webserver. The docking results were
obtained by using the Vina package.
Comparison of docking (cyan) and experimental binding (yellow) poses of 11a, 11b,
13b, and boceprevir to SARS-CoV-2Mpro. The surface charge, ranging from −5 to
5, of the protease was computed via the APBS webserver. The docking results were
obtained by using the Vina package.Because the molecular docking simulations often use many constraints/approximations to
accelerate the calculation speed, the results often need to be refined using more accurate
protocols.[17,18] In
this context, because Vina formed the most suitable binding affinity and pose as discussed
above, we have chosen the docking structures provided by this approach as initial
structures for simulating via SMD/MD techniques. The ligand-binding free energy
calculation methods were thus carried out.[23,35] The performance of free energy calculations based on
SMD/MD trajectories was thus assessed.
Steered MD Simulation
FPL is an efficient technique to quickly classify the ligand-binding affinity.[28] This approach successfully estimated the affinities of several
inhibitors binding to SARS-CoV-2Mpro, which suggested a shortlist of potent compounds
to further evaluate via perturbation simulations.[16] A benchmark with
11 compounds was then used later on, indicating that the correlation coefficient of the
pulling work, W, and the experimental binding free energy,
ΔGEXP, are appropriate with a value of
RFPL = −0.76 ± 0.10.[17]
However, due to the small size of the testing set, the obtained results are probably
unstable due to the large value of the computed error. Consequently, the value did not
show superiority over Vina docking with RVina = 0.72 ±
0.14, which is within the computed error.[17] In this context, we
benchmarked again this approach for evaluating the ligand-binding affinity versus
SARS-CoV-2Mpro with a larger testing set. The FPL scheme was thus used for refining the
obtained docking results, which were provided by Vina. In FPL simulations, an externally
pulling force was applied to extract inhibitors from bound to
unbound states. The recorded rupture force,
Fmax, and pulling work, W, during the
simulations are given in Table S2 in the Supporting Information. The F value in
time dependence is also shown in Figure S3 in the Supporting Information. The average of W
values falls in the range 18.3 ± 1.4 to 111.3 ± 6.0 kcal
mol–1, providing a median of 56.0 ± 5.0 kcal
mol–1, while the mean rupture force Fmax
is within the range from 279.5 ± 12.7 to 1040.6 ± 68.9 pN, giving a median of
581.9 ± 41.2 pN. The ligand-binding affinity is possibly ranked via the
W value, which formed an appropriate correlation, R
= −0.51 ± 0.15 (cf. Table S2), with the respective experiments.[44−50] The obtained
coefficient indicated that FPL is significantly worse than Vina docking,
RVina = 0.60 ± 0.13, in predicting the ligand-binding
affinity of SARS-CoV-2Mpro. The poorly correlated outcomes of FPL probably appear due
to SMD simulations performed using the conformations provided by the short NPT
simulations, which may not be sufficient to reach the equilibrium states. Therefore, the
unbiased MD simulations with a length of 20 ns were performed after NPT simulations and
were reported below. We used the last conformations of MD simulations as starting
structures of FPL calculations. The obtained outcomes are reported in Tables and S3 in the Supporting Information, in which the F value
during SMD simulations is reported in Figure S4 in the Supporting Information. The
Fmax and W values were thus altered and
ranged from 342.0 to 961.4 pN and 27.3 to 109.7 kcal mol–1, forming
median values of 644.0 ± 39.2 pN and 61.3 ± 5.3 kcal mol–1,
respectively. The obtained correlation between W and
ΔGEXP was thus increased from R =
−0.51 ± 0.15 to RFPL = −0.74 ± 0.11
(cf. Figure ). The FPL technique is thus able
to improve upon the docking results; however, the equilibrated simulations are required
to be performed carefully.
Figure 3
Association of pulling work W and
ΔGEXP. The W values were
calculated via eq . The
ΔGEXP values were computed when the half-maximal
inhibitory concentration, IC50, was assumed to be equal to the inhibition
constant, Ki.
Association of pulling work W and
ΔGEXP. The W values were
calculated via eq . The
ΔGEXP values were computed when the half-maximal
inhibitory concentration, IC50, was assumed to be equal to the inhibition
constant, Ki.
Unbiased MD Simulations
As mentioned above, the FPL results based on the rather short relaxation time of only 2
ns were probably limited since it may not be sufficient to reach the equilibrium states.
Moreover, the SARS-CoV-2Mpro Cα atoms were restrained, probably
hindering the structural change of the complexed system to equilibrium states. The
complexed conformation including SARS-CoV-2Mpro and the ligand in the best docking pose
provided by Vina was thus solvated and equilibrated via unbiased MD simulations with a
length of 20 ns. The accuracy of the following FPL calculations was thus increased
significantly (Figure ). During the unbiased
MD simulation, the complexes almost reached the equilibrium states after 5 ns (cf.
Figure S5 in the Supporting Information). Furthermore, the all-atom RMSD
of SARS-CoV-2Mpro was found to be in good agreement with the previous work[76] (cf. Figure S6 in the Supporting Information). Therefore, the snapshots over an
interval of 10–20 ns with a stride of 10 ps were collected for binding free
energy calculation via the LIE and MM-PBSA approaches. In addition, the structures
extracted from MD trajectories of 2.5–5 ns of the solvated inhibitor system were
also involved in free energy calculations via the LIE approach.In order to probe the factors controlling the binding process of inhibitors to Mpro,
the intermolecular NBC and HB between inhibitors and individual residues of SARS-CoV2
Mro were investigated using equilibrium snapshots of all complexes. The obtained
outcomes are presented in Figure S7 in the Supporting Information, which mentioned 30 residues
establishing NBC to inhibitors over more than 15% of the appraised shapes (40,000
snapshots totally). However, there are only 19/30 residues that created HB to
inhibitors. Shortening the list, we have only counted residues, which simultaneously
showed NBC and HB to inhibitors with a probability being higher than 42 and 4%,
respectively. It should be noted that 47 ± 5 and 6 ± 2% amounts correspond to
the averaged values over 30 residues. Nine residues were obtained and are described in
Figure . We may argue that the residues
Thr26, His41, Ser46,
Asn142, Gly143, Cys145,
His164, Glu166, and Gln189 are
critical elements governing the binding process of ligands to SARS-CoV-2Mpro. The large
contribution of these residues to the ligand-binding process implies that the
protonation state of these residues involving the catalytic dyad probably alters the
ligand-binding affinity. Therefore, the issue should be carefully carried out in further
work. Furthermore, possible mutations at these residues could change much the
ligand-binding free energy to SARS-CoV-2Mpro.
Figure 4
Critical residues forming NBC and HB to the inhibitors. The results were obtained
over the equilibrium snapshots of MD simulations of all complexes.
Critical residues forming NBC and HB to the inhibitors. The results were obtained
over the equilibrium snapshots of MD simulations of all complexes.In addition, the clustering method was then applied to characterize the structural
change of nine critical residues during the equilibrium conformations of all complexes.
The calculation was performed with a nonhydrogen atomic RMSD cutoff of 1.2 Å over
40 000 structures of nine residues in stabilizing bound states with 20
inhibitors. One cluster was found, which is shown as colorful residues in Figure . The representative structure of nine
critical residues was compared with the starting conformation, which is in gray color.
The differences between the MD refined and starting structures are noted as red arrows
in Figure . The significant structural changes
are the flexing of the residue Asn142 and the rotation of the hydroxyl
and thiol side chains of Ser46 and Cys145,
respectively. The side chain residues probably rotate to form HB to inhibitors.
Moreover, overall, the difference between the representative structure and the initial
conformation is ca. 1.0 Å, implying the stability of the SARS-CoV-2Mpro active
site during MD simulations.
Figure 5
Representative structures of nine critical residues via the nonhydrogen RMSD
clustering calculation with a cutoff of 1.2 Å. The colorful residues represent
the MD refined structure in comparison with the initial structure, which is denoted
in gray color. Red arrows imply the change of these residues during MD
simulations.
Representative structures of nine critical residues via the nonhydrogen RMSD
clustering calculation with a cutoff of 1.2 Å. The colorful residues represent
the MD refined structure in comparison with the initial structure, which is denoted
in gray color. Red arrows imply the change of these residues during MD
simulations.
Binding Free Energy Calculation by Using the LIE Scheme
The difference between the averaged vdW and cou interaction energies between each
inhibitor to SARS-CoV-2Mpro, bound state, and the solution,
unbound state, as long as ΔGEXP is
given in Tables and S4 in the Supporting Information. The binding free energy,
ΔGLIE, is computed using eq . Traditionally, the parameters α and β were chosen as
0.18 and 0.50, respectively.[27,77] However, similar to the Aβ oligomeric system,[78] no correlation, R = −0.13 ± 0.20, was
observed between the calculated and experimental values. This is probably due to the
shallow binding cleft of SARS-CoV-2Mpro, which is similar to the ligand–surface
binding in the case of Aβ oligomer.[78] Therefore, the parameters
including α = 0.288, β = −0.049, and γ = −5.880 of the
Aβ system[78] were proposed to be used for calculating the
ligand-binding free energy of the SARS-CoV-2Mpro + inhibitor complex. Interestingly,
the set of parameters gave a correlation coefficient RLIE =
0.73 ± 0.09 and RMSE = 4.12 ± 0.40 kcal mol–1 (Figure ). Absolutely, the LIE approach formed
similar accuracy outcomes, RLIE = 0.73 ± 0.09, compared
to FPL simulations, RFPL = −0.74 ± 0.11.
Moreover, the negative parameter β may imply the loss of cou interactions of
inhibitors upon association (cf. Table S4) or it may be argued that the vdW interactions control the
binding process of inhibitors to the protease. It is in good agreement with the previous
outcomes[16,18] and
obtained results via MM-PBSA and perturbation calculations below. Furthermore, the
negative value γ implies that the hydrophobic interactions between inhibitors and
SARS-CoV-2Mpro are strong as mentioned in the conclusion about the superiority of the
above vdW term. In addition, although the LIE showed a good Pearson correlation,
ΔGLIE overestimates
ΔGEXP with an amount of ca. 3.89 kcal
mol–1 (see Table ). It is
probably caused by the lower hydrophobic interaction between the SARS-CoV-2Mpro +
inhibitor complexes compared with the Aβ-complexed system or the incorrect
imitation of the interaction between inhibitors and the surrounding
atoms.[79,80]
Overall, it may be argued that the binding process of the SARS-CoV-2Mpro + inhibitor
complex is similar to the Aβ oligomer + ligand complex, but the hydrophobic
contacts of the Mpro complex are weaker than the Aβ ones.
Figure 6
Comparison of ΔGLIE and
ΔGEXP. The calculated binding free energy was
computed using eq with the parameters
α = 0.288, β = −0.049, and γ = −5.880 referring the
Aβ oligomer + inhibitor complex. The ΔGEXP
values were computed when the half-maximal inhibitory concentration,
IC50, was assumed to be equal to the inhibition constant,
Ki.
Comparison of ΔGLIE and
ΔGEXP. The calculated binding free energy was
computed using eq with the parameters
α = 0.288, β = −0.049, and γ = −5.880 referring the
Aβ oligomer + inhibitor complex. The ΔGEXP
values were computed when the half-maximal inhibitory concentration,
IC50, was assumed to be equal to the inhibition constant,
Ki.
Establishing the Ligand-Binding Free Energy via the MM-PBSA Protocol
The equilibrium conformations of the SARS-CoV-2Mpro + inhibitor complex during MD
simulations were implemented for estimating the binding free energy using continuum
models[29,30]
following eq . It should be highlighted that our
group has successfully calculated the ligand-binding free energy for various
biomolecules using the MM-PBSA method.[78,81−83] The obtained outcomes are described in Tables and S5 in the Supporting Information. In particular,
ΔGMM-PBSA overestimates
ΔGEXP with a value of ca. 5.60 kcal
mol–1, which is slightly larger than that given by the LIE protocol.
Moreover, the MM-PBSA method provides a poor accuracy in comparison with the
corresponding experiments, RMM-PBSA = 0.32 ± 0.29 and
RMSE = 10.15 ± 1.92 kcal mol–1 (Figure ). It is in good agreement with the previous study[84] that MM-PBSA formed a correlation with the experiment with a value of
RMM-PBSA = 0.25 upon investigating 15 complexes.
Interestingly, as mentioned above that the binding process of inhibitors to SARS-CoV-2Mpro is quite similar to that of the inhibitors to the Aβ oligomer, the Pearson
correlations of two systems are similar, RMM-PBSASARS-CoV-2 = 0.32 versus
RMM-PBSAAβ = 0.27.[78] The poor accuracy of the MM-PBSA approach
applying on SARS-CoV-2Mpro is possibly similar to the Aβ system that is probably
caused by the selection of an inappropriate dielectric constant, ε, and roughly
entropic approximation.[35,78,85] Furthermore, the ε issue was also consolidated
via the inhibitor interaction diagram analysis (cf. Table S2 in the Supporting Information) where the solvation exposure of
inhibitors is absolutely complicated. Therefore, further investigation to characterize
factors affecting the accuracy of MM-PBSA applying on SARS-CoV-2Mpro should be
performed before the approach is widely used for screening potential inhibitors for the
Mpro target.
Figure 7
Comparison of ΔGMM-PBSA and
ΔGEXP. The calculated binding free energy was
computed using eq . The
ΔGEXP values were computed when the half-maximal
inhibitory concentration, IC50, was assumed to be equal to the inhibition
constant, Ki.
Comparison of ΔGMM-PBSA and
ΔGEXP. The calculated binding free energy was
computed using eq . The
ΔGEXP values were computed when the half-maximal
inhibitory concentration, IC50, was assumed to be equal to the inhibition
constant, Ki.
Determination of Ligand-Binding Free Energy Using the FEP Method
In recent reports,[18,84] the FEP simulation successfully determined the ligand-binding free
energy and is known as the most accurate free energy method.[23,86] However, although the
perturbation results correlate with the respective experiments,[18,84] the Pearson coefficient diffused
in a large range from 0.54 to 0.94. In particular, FEP simulations determined the
ligand-binding free energy of 11 inhibitors to SARS-CoV-2Mpro with high accuracy,
RFEP = 0.94 ± 0.04.[18] In a
different study, perturbation simulations also formed a Pearson correlation
RFEP = 0.54 when 15 complexes were considered.[84] Therefore, in this work, we benchmarked the FEP performance on a larger
set from multi-sources, which would probably provide a clarification for the
accomplishment of this approach.The final structures of MD simulations mimicking the solvated complex and ligand were
utilized as the input of λ-alteration simulations. The obtained results are
reported in detail in Tables and S6 in the Supporting Information. The perturbation simulations provide the
highest accuracy results with a Pearson correlation of RFEP
= 0.85 ± 0.06 (cf. Figure ). Although
RFEP is quite high, it is far from being perfect. The
inaccurate outcomes are probably caused by the insufficient simulating the
ligand−covalent binding interaction of conventional MD simulations. Because some
covalent and slow covalent binding inhibitors were employed upon investigations as noted
in Table , the binding free energy of these
compounds to SARS-CoV-2Mpro involves two components including covalent and noncovalent
binding free energy.[87] The further expensive computing approach such
as QM/MM or PDLD/S-LRA/β should be thus carried out to refine the binding
process.[87−89] However, in this work,
the computational investigations were limited in the framework of the classical
simulation, which is widely used and good enough to estimate the potential inhibitors
for Mpro. Moreover, in average over complexes, the
ΔGFEP value is −9.87 ± 1.20 kcal
mol–1, which overestimates ca. 1.87 kcal mol–1
compared to the mean of the experimental values. The difference is significantly smaller
than those obtained by LIE, ca. 3.89 kcal mol–1, and MM-PBSA, 5.60
kcal mol–1, methods. The difference between the mean of experimental
and computational values probably comes from the incorrect simulations of the
interaction between inhibitors and neighboring atoms.[79,80] The rough assumption of the IC50
equals the inhibition constant Ki, when calculated the
experimental binding free energy, also adopts a shifting possibility. Furthermore, the
obtained results by λ-alteration simulations also revealed the binding process of
the SARS-CoV-2Mpro inhibitor. The vdW interaction is dominant in the binding process of
ligands to Mpro, which is in good agreement with the previous
observations[16,18]
because the average values of ΔGcou and
ΔGvdW are −2.82 ± 0.83 and −7.05
± 0.49 kcal mol–1, respectively.
Figure 8
Comparison of ΔGFEP and
ΔGEXP. The calculated binding free energy was
computed using eq . The
ΔGEXP values were computed when the half-maximal
inhibitory concentration, IC50, was assumed to be equal to the inhibition
constant, Ki.
Comparison of ΔGFEP and
ΔGEXP. The calculated binding free energy was
computed using eq . The
ΔGEXP values were computed when the half-maximal
inhibitory concentration, IC50, was assumed to be equal to the inhibition
constant, Ki.
Calculating the Binding Affinities of Other SARS-CoV-2 Inhibitors
The binding free energy of some available SARS-CoV-2 inhibitors to the Mpro was also
evaluated using the assessed approaches. The outcomes are described in Tables and S7–S10 and Figures S8 and S9 in the Supporting Information.
Although the inhibitory concentration of these compounds was extracted from cell culture
experiments,[51] indicating that drug targets probably differ from
SARS-CoV-2Mpro such as RNA polymerase, appropriate correlations between the calculated
results and experimental data were recorded. Therefore, it may be argued that there are
many compounds aiming at inhibiting Mpro. In particular, in good agreement with the
above evaluation, Vina showed the higher correlation, RVina
= 0.78 ± 0.23, compared with the AD4 package, RAD4 =
0.48 ± 0.23. The binding poses of these compounds to SARS-CoV-2Mpro were thus used
as the initial structures for SMD/MD refined simulations. The Pearson correlations
between FEP, LIE, MM-PBSA, and FPL compared with the experimental data are
RFEP = 0.70 ± 0.16, RLIE
= 0.67 ± 0.28, RMM-PBSA = 0.00 ± 0.26, and
RFPL = −0.71 ± 0.17, respectively. The MM-PBSA
approach is different from others since it is very weakly correlated with experiments.
Moreover, although FEP, LIE, and FPL showed appropriate results, the linear relationship
was decreased. The discrepancies occurred since some compounds probably target on the
RNA polymerase rather than Mpro.[14]
Table 2
Calculated Results in Comparison with the Experimental Values of Some Compounds
to SARS-CoV-2
no
name
ΔGVina
ΔGAD4
W
ΔGLIE
ΔGMM-PBSA
ΔGFEP
ΔGEXPa
1
bazedoxifene
–7.5
–8.1
47.4 ± 9.6
–11.12 ± 1.02
–5.13 ± 1.60
–5.25 ± 2.47
–7.48[51]
2
ciclesonide
–7.4
–8.9
55.9 ± 1.9
–12.89 ± 0.47
–11.58 ± 2.80
–9.87 ± 0.40
–7.34[51]
3
digitoxin
–8.1
–8.1
72.5 ± 5.2
–13.22 ± 0.93
–0.92 ± 2.95
–16.19 ± 3.88
–9.09[51]
4
favipiravir
–4.9
–4.9
16.0 ± 1.5
–7.23 ± 0.35
–2.31 ± 1.56
–0.99 ± 0.51
–4.52[51]
5
gilteritinib
–7.5
–8.5
37.7 ± 2.5
–12.15 ± 0.53
–11.55 ± 3.06
–8.02 ± 0.35
–7.08[51]
6
lopinavir
–6.3
–5.1
41.8 ± 5.1
–12.39 ± 1.94
–9.30 ± 3.95
–4.72 ± 2.92
–6.59[51]
7
mefloquine
–6.5
–6.5
45.7 ± 3.0
–10.51 ± 0.16
–9.84 ± 0.06
–3.05 ± 1.36
–7.34[51]
8
mequitazine
–6.6
–7.7
24.6 ± 2.3
–9.33 ± 0.76
–1.98 ± 4.66
–8.88 ± 0.41
–7.03[51]
9
niclosamide
–6.6
–6.3
41.9 ± 6.0
–10.95 ± 0.57
–8.38 ± 1.95
–8.77 ± 0.40
–8.97[51]
10
osajin
–7.0
–7.7
27.6 ± 4.4
–11.45 ± 0.14
–15.26 ± 3.00
–4.15 ± 1.14
–7.41[51]
11
penfluridol
–6.9
–8.0
59.6 ± 0.5
–10.55 ± 0.38
–0.27 ± 3.41
–10.51 ± 1.51
–7.26[51]
12
phenazopyridine
–6.0
–6.0
23.9 ± 3.2
–9.96 ± 0.56
–5.90 ± 2.98
–3.80 ± 0.58
–6.23[51]
13
proscillaridin
–7.6
–7.4
57.8 ± 4.5
–11.85 ± 0.38
–7.32 ± 1.33
–14.56 ± 2.65
–7.79[51]
14
remdesivir
–6.5
–4.5
37.8 ± 3.9
–12.00 ± 0.31
–28.69 ± 2.94
–8.91 ± 5.65
–6.96[51]
The experimental binding free energies were gained based on the IC50
value, approximating that the one equals to the inhibition constant
Ki. The unit is kcal mol–1.
The experimental binding free energies were gained based on the IC50
value, approximating that the one equals to the inhibition constant
Ki. The unit is kcal mol–1.
Area under the ROC Curve—ROC-AUC
The ROC-AUC values for SARS-CoV-2Mpro inhibitors and other SARS-CoV-2 inhibitors are
shown in Tables S11 and S12. ROC-AUC is a commonly used metric to measure the ability of a binary
classifier in distinguishing between two labels. The results indicate that for Mpro
inhibitors, FEP and LIE are the best classifiers, followed by MM-PBSA, FPL, Vina, and
AD4. However, for other SARS-CoV-2 inhibitors, the two docking methods give better
ROC-AUC than the binding free energy methods do, except for the FPL method.
Conclusions
In this context, in order to benchmark which is the appropriate free energy approach for
probing the binding free energy of inhibitors to the SARS-CoV-2Mpro, we have carried out
both molecular docking and MD simulations. Vina and AD4 were employed for docking
imitations. We have initially demonstrated that the Vina package is better than the AD4
protocol in both predicting the ligand-binding affinity, RVina =
0.60 ± 0.13, and binding pose of ligands, where the successful docking rate is of ca.
67%, to the SARS-CoV-2Mpro target. Surprisingly, AD4 produced poorly correlated results
with coefficients of RAD4 = 0.47 ± 0.21. It should be noted
that the poor accuracy of AD4 was also revealed when the docking results were compared with
the NEMD simulations, R = 0.36.[74]MD simulations would then be accomplished. The FEP approach provided the most accurate
results, RFEP = 0.85 ± 0.06, compared with the respective
experiments. Interestingly, the LIE and FPL approaches also formed good correlation
coefficients, RLIE = 0.73 ± 0.09 and
RFPL = −0.74 ± 0.11, while using significantly
lower computing resources compared to FEP, respectively. However, an appropriate relaxed
simulation, which is similar to the prepared input for FEP/LIE/MM-PBSA calculations, was
required to reach equilibrium states before FPL was carried out. Because the successful
docking rate is ca. 67%, the short NPT simulation may not be sufficient to reach the
equilibrium states. The MM-PBSA method poorly correlates with the experimental data,
RMM-PBSA = 0.32 ± 0.29, as agreed in the recent
outcomes.[84]Atomistic simulations also revealed that the vdW interaction rigidly dominates the cou
interaction during the binding process of inhibitors to the SARS-CoV-2Mpro. Moreover, the
residues Thr26, His41, Ser46,
Asn142, Gly143, Cys145,
His164, Glu166, and Gln189 play
essential roles in frequently forming NBC and HB to inhibitors.
Authors: Garrett M Morris; Ruth Huey; William Lindstrom; Michel F Sanner; Richard K Belew; David S Goodsell; Arthur J Olson Journal: J Comput Chem Date: 2009-12 Impact factor: 3.376
Authors: Sunghwan Kim; Paul A Thiessen; Evan E Bolton; Jie Chen; Gang Fu; Asta Gindulyte; Lianyi Han; Jane He; Siqian He; Benjamin A Shoemaker; Jiyao Wang; Bo Yu; Jian Zhang; Stephen H Bryant Journal: Nucleic Acids Res Date: 2015-09-22 Impact factor: 16.971
Authors: Athri D Rathnayake; Jian Zheng; Yunjeong Kim; Krishani Dinali Perera; Samantha Mackin; David K Meyerholz; Maithri M Kashipathy; Kevin P Battaile; Scott Lovell; Stanley Perlman; William C Groutas; Kyeong-Ok Chang Journal: Sci Transl Med Date: 2020-08-03 Impact factor: 17.956
Authors: Chunlong Ma; Michael Dominic Sacco; Brett Hurst; Julia Alma Townsend; Yanmei Hu; Tommy Szeto; Xiujun Zhang; Bart Tarbet; Michael Thomas Marty; Yu Chen; Jun Wang Journal: Cell Res Date: 2020-06-15 Impact factor: 46.297
Authors: Ahmed M Tolah; Lamya M Altayeb; Thamir A Alandijany; Vivek Dhar Dwivedi; Sherif A El-Kafrawy; Esam I Azhar Journal: Pharmaceuticals (Basel) Date: 2021-11-24
Authors: Guillem Macip; Pol Garcia-Segura; Júlia Mestres-Truyol; Bryan Saldivar-Espinoza; María José Ojeda-Montes; Aleix Gimeno; Adrià Cereto-Massagué; Santiago Garcia-Vallvé; Gerard Pujadas Journal: Med Res Rev Date: 2021-10-26 Impact factor: 12.388