Literature DB >> 35642423

Relative Binding Free Energy between Chemically Distant Compounds Using a Bidirectional Nonequilibrium Approach.

Abstract

In the context of advanced hit-to-lead drug design based on atomistic molecular dynamics simulations, we propose a dual topology alchemical approach for calculating the relative binding free energy (RBFE) between two chemically distant compounds. The method (termed NE-RBFE) relies on the enhanced sampling of the end-states in bulk and in the bound state via Hamiltonian Replica Exchange, alchemically connected by a series of independent and fast nonequilibrium (NE) simulations. The technique has been implemented in a bidirectional fashion, applying the Crooks theorem to the NE work distributions for RBFE predictions. The dissipation of the NE process, negatively affecting accuracy, has been minimized by introducing a smooth regularization based on shifted electrostatic and Lennard-Jones non bonded potentials. As a challenging testbed, we have applied our method to the calculation of the RBFEs in the recent host-guest SAMPL international contest, featuring a macrocyclic host with guests varying in the net charge, volume, and chemical fingerprints. Closure validation has been successfully verified in cycles involving compounds with disparate Tanimoto coefficients, volume, and net charge. NE-RBFE is specifically tailored for massively parallel facilities and can be used with little or no code modification on most of the popular software packages supporting nonequilibrium alchemical simulations, such as Gromacs, Amber, NAMD, or OpenMM. The proposed methodology bypasses most of the entanglements and limitations of the standard single topology RBFE approach for strictly congeneric series based on free-energy perturbation, such as slowly relaxing cavity water, sampling issues along the alchemical stratification, and the need for highly overlapping molecular fingerprints.

Entities: Chemical

Mesh：

Substances：
Ligands

Year: 2022 PMID： 35642423 PMCID： PMC9202353 DOI： 10.1021/acs.jctc.2c00295

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.578

Introduction

Alchemical free-energy methods based on atomistic molecular dynamics simulations[1−6] are becoming an important tool in modern drug design, often serving as the last confirmative step for lead identification in the silico drug-discovery process. The impressive rise of computer power provided by high performing computing (HPC) facilities, the consequent drop of the parallel computing cost, and the constant improvement in efficiency, accuracy, and reliability of algorithms and protocols[7−11] have made in silico screening competitive with the traditional medicinal chemistry practice in the early preclinical stages. The industrial interest in state-of-the-art computer-based drug discovery is testified by the deals, for a total exceeding 100 million dollars, signed by pharmaceutical giants Sanofi, AstraZeneca, Bayer, and BMS in 2015–2020, with Schrödinger, Inc., an American company active in the development of software for computational chemistry.[12,13] In industrial settings, MD-based alchemical methods are most often used for ranking the relative binding affinities of a series of compounds to a protein target in order to prioritize them for further wet-lab assessement.[14,15] The relative binding free energy (RBFE) is obtained by computing the free energy cost of transmuting compound A into compound B by way a stratification of nonphysical hybrid λ states (the alchemical path) connecting the two compounds in the bound and unbound arms of a thermodynamic cycle. In Figure , the two alchemical transformations in the cycle (bound and bulk states) are indicated by the green arrows, and their free-energy difference equals the difference of the two absolute binding free energies (ABFE), ΔGA and ΔGB. The alchemical cost (ΔGAB) is generally computed as a sum of free-energy contributions between contiguous λ states using free-energy perturbation (FEP)[16] or, equivalently, thermodynamic integration (TI),[17] requiring independent MD simulations for all λ-states (typically from 10 to 20).

Figure 1

(a) Thermodynamic cycle in RBFE calculations. (b) Dual topology (upper) and single topology (lower) schemes. The gray structure refers to the fully decoupled region.

(a) Thermodynamic cycle in RBFE calculations. (b) Dual topology (upper) and single topology (lower) schemes. The gray structure refers to the fully decoupled region. The FEP-based or TI-based calculation of the RBFE is much simpler than the direct calculations of the ABFE. The latter requires the setup of a thermodynamic cycle where the ligand is decoupled in the bound state and recoupled in bulk again via λ stratifications. The decoupling process in the bound state is complicated by the tendency of the noninteracting (gas-phase) ligand to drift away from the pocket with the necessity of introducing system-dependent restraints, and by severe sampling issues when the protein switches from the holo form to the apo form.[18] Nonetheless, in the recent literature, there are few examples of successful FEP-based ABFE calculations in blind SAMPL challenges[19,20] for symmetrical host–guest systems and (retrospectively) in protein–ligand systems.[21−23] In the last case, however, the quality of the prediction was found to be strongly dependent on a prior knowledge of the ligand pose.[22] The current consensus in RBFE calculations is based on the so-called single topology scheme (ST). In this approach (see Figure ), compound A is transformed to a strictly congeneric partner B by interpolating the bonded and nonbonded potential parameters between the two end-states representing A and B. In the most straightforward ST implementation, the two compounds share a common topological structure with the same number of atoms, and a single atom is transformed to another (e.g., H into a halogen atom). When the number of atoms differs as in a so-called “R-group perturbation”[24,25] (e.g., H into CH3 or CH into N in an aromatic cycle), “dummy” atoms are added to the common core structure. While these atoms are fully interacting on one end-state, at the other end, they are bound to the molecule with zero nonbonded parameters, except for 14 nonbonded interactions involving the dummy atoms.[26] The effect of the dummy atoms should not influence the RBFE as long as this spurious contribution cancels in the two arms of the thermodynamic cycle. Such assumption was recently questioned in ref (24), where the authors showed that the inattentive selection of the bonded terms between dummy and physical atoms may produce errors of the order of kBT, even when evaluating relative hydration free energies between small molecules such as methane and ammonia. ST-RBFE can be applied to congeneric compounds differing by two or more substituents on a common scaffold. In this case, to avoid the sampling issue related to large transmutation free energies and/or to the enhanced conformational activity of the protein residues upon massive ligand changes, intricate ”perturbation graphs” gradually connecting the interesting molecules must be devised, thereby spending computational resources in determining RBFEs between uninteresting connecting molecules.[27,28] Single topology RBFE becomes further problematic when dealing with the so-called scaffold hopping transformations involving ring breaking or shrinking, a technique that is widely used in the medicinal chemistry practice.[29] Recently, many ST schemes have been proposed for RBFE calculations to tackle scaffold hopping.[25,30] However, the approaches are involved, highly system-dependent, and, as such, not easily amenable to be implemented in automated high-throughput virtual screening for pharmaceutical applications. It should be finally said that the accuracy of ST RBFE calculations has been demonstrated to be dependent on the input poses to construct the perturbation map in congeneric series if sampling is incomplete,[15] a fact that may further limit the utility of ST RBFE in industrial projects. The dual topology (DT) scheme is an alternative and conceptually simpler approach in RBFE calculation. In the DT scheme, A and B are present at the same time in both arms of the thermodynamic cycle. In the intermediate λ-states, the two compounds interact with the environment and have no mutual nonbonded interactions, so that they can occupy the same volume while the simultaneous alchemical transformations of A into B and B into A occur. Apparently, DT might be considered the most straightforward approach when the two end-state molecules are chemically distant. However, as noted in ref (31), a dual-topology calculation is, in essence, the same as two absolute free-energy calculations where coupling (creation), and decoupling (annihilation) are executed simultaneously in opposite directions. Hence, DT schemes are believed to suffer from the same pathologies exhibited by FEP-based ABFE calculations, with difficulties in sampling and slow convergence.[31] The DT scheme has been revived in two recent papers.[31,32] In ref (32), Gallicchio and co-workers proposed an extension for RBFE of their FEP-based alchemical transfer method (ATM)[33] for ABFE, a two-arm alchemical technology where the common intermediate state is a mixture of the solvated bound and unbound complex instead of vacuum as in standard FEP for ABFE. In the ATM-RBFE variant, the two end-states are the A-bound complex and B in bulk and vice versa, while the common intermediate state is an AB alchemical mixture in the bound state and in bulk. In the work by Roux and co-workers,[31] the alchemical end-states in the cycle of Figure contain both A and B, one fully coupled and the other decoupled, sharing a common substructure where each pair of corresponding atoms of A and B is holonomically constrained to share similar coordinates at all time during the λ-alchemical simulations. Both technologies are based on FEP and, in principle, could be used to evaluate RBFE for scaffold hopping transformations. In practice, these ingenious DT methods, requiring convergence on all intermediate λ states, still face severe sampling issues when the alchemical part of the hybrid A+B molecule is large. Both techniques have been either applied to strictly congeneric compounds[31] or tested on structurally similar compounds imposing an appropriate restraining potential that maintains the ligand alignment in order to enhance the convergence of the calculations.[32] In this paper, we propose a straightforward dual topology alchemical approach for calculating the binding free energy between two arbitrary chemical compounds, with disparate connectivity, volume, and net charge. The method relies on the enhanced sampling of the A(B) and B(A) end-states in bulk and in the bound state, alchemically connected by a series of independent fast nonequilibrium (NE) simulations affording the calculation of a NE work distribution. Enhanced sampling is performed using an efficient solute tempering Hamiltonian Replica exchange scheme (ST-HREM)[34] and is required only for the true physical states corresponding to the solvated complex and the compound in bulk. The hybrid dual topology starting end-states are generated by combining gas-phase configurations of the ghost molecule with the ST-HREM sampling of the true end-states. The method constitutes an extension of our NE alchemy for ABFE calculation termed virtual double system single box (vDSSB),[35−38] and, unlike the ABFE variant (unidirectional by design[38]), can be also implemented in a bidirectional fashion, applying the Crooks theorem to the forward and reverse work distribution and exploiting the accuracy and precision of the Bennett Acceptance Ratio.[39] In this paper, we have used our approach for RBFE on the important test bed provided by the recent SAMPL9 challenge,[40] involving the binding affinity of a series of noncongeneric compounds for the WP6 macrocyclic host.[41]

Theoretical Background

NE alchemy for RBFE is implemented using the thermodynamic cycle of Figure as in standard FEP alchemy, but replacing the intermediate λ-states simulations with a swarm of fast NE alchemical transitions. This methodology was recently applied[42] to a standard benchmark dataset for relative binding free energy of strictly congeneric series.[10,11] NE alchemy was systematically compared to ST-FEP and ST-FEP+, which is an FEP variant implemented in the proprietary code Desmond[7] featuring the enhanced sampling of the binding pocket in the intermediate states of the alchemical path.[10,43] It was found that[42] “the non-equilibrium free energy calculations performed comparably to FEP+ and reached a mean unsigned error of 3.70 kJ mol–1”. In ref (42), NE alchemy was applied in a rather conservative fashion, strictly retracing the standard ST FEP-based alchemical approach for congeneric ligands with R-group perturbations. Basically, the λ-states equilibrium simulations in FEP were substituted by a few hundreds of NE alchemical simulations in both directions, each lasting 50 ps, recovering the RBFE from the crossing point of the forward and reverse work distributions via the Bennett Acceptance Ratio.[44] No enhanced sampling of the end-states was performed, nor was an attempt made to check whether the technology could be extended to compounds not sharing a large common core structure. Here, as in ref (42), we use a swarm of independent NE simulations to perform the alchemical connection from A to B, recovering the forward PAB(Wb/u) and time-reversal backward PBA(−Wb/u) work distributions. At variance with ref (42), we use the DT scheme, with the forward (eq ) and reverse (eq ) transformations being defined asIn this notation, (A), (B) denotes the decoupled ghost, while the superscripts b or u indicate that the alchemical NE process is conducted in the bound and unbound state, respectively. The sampling of the four end-states of Figure , {A(B)}b, {B(A)}b, are obtained using replicates of Hamiltonian Replica Exchange with solute tempering[45] with only intrasolute scaling[34] along the replica progression (i.e., leaving the solvent cold), hence affording an intrasolute temperature of thousands of Kelvin, using a limited number of replicas. During the alchemical simulations, the ghost-environment interaction potential is gradually switched on, while that of the partner is simultaneously switched off. In the NE unbound transition, the two molecules do not sense each other and are characterized by the full intramolecule potential. In the bound NE transitions, the two molecules do not sense each other but their centers of mass (COM) are tethered via a simple harmonic potential of the form Vrstr = 1/2K(ΔR)2, with ΔR being the COM–COM distance and with the force constant K set to 2 kcal mol–1 Å–2. The COM–COM weak restraint between the two ligands does not affect, in any way, their relative conformations during the transitions, as the two alchemical molecules do not interact with each other, and serves only to keep the growing and annihilating molecules in the binding pocket at all times. The COM–COM restraint may have a limited impact on the dissipation of the process (defined as the difference between the mean work and the underlying free energy) but not on the final free energy values between the thermodynamic end-states (a rigorous proof is given in the Appendix). The latter can be recovered exploiting the Crooks theorem on the forward and reverse work distributions, or using the Jarzynski identity[46] on the forward or reverse distributions. Note that, if, in the alchemical process, the net charge of the system changes, then a finite size correction must be applied a posteriori to the RBFE. In NE-RBFE calculations, these finite-size corrections are trivially given by the difference of the finite-size corrections of the corresponding ABFE′ (see eq 9 of ref (47)). The strength of NE alchemy for RBFE, compared to FEP, lies in the fact that the need for canonical sampling in the intermediate states of the alchemical path is totally bypassed. Accurate sampling is needed only for the physical end states involving one molecule, with the dual topology initial configurations being trivially obtained by combining these physical states with gas-phase configurations of the ghost partner. The computational paradigm is shifted from single molecule time averages to a corresponding ensemble averages derived from time-unordered canonical configurations sampled with replicates of HREM simulations. The end-state canonical distribution of the initial equilibrium state is reflected in the resulting work distribution obtained in the driven alchemical transitions to the final nonequilibrium end-state. If, for example, a molecule can bind with two possible metastable poses with different free energies separated by large energy barriers, then the work distribution where this molecule is rapidly annihilated and the other is created is expected to exhibit a corresponding bimodal character. The work distribution can hence be seen as the fingerprint of the initial canonical sampling, blurred by the dissipation of the process defined as Wdiss = ⟨W⟩ – ΔG. The latter grows linearly with the speed[48] of the process and is noticeably dependent on the selected alchemical protocol. In our case, provided that the molecules have similar volumes, thanks to the COM–COM restraint, the growth process occurs with little resistance (i.e., small dissipation) in an “empty” region occupied by the (unperceived) annihilating partner. Therefore, NE-RBFE is not equivalent to two NE-ABFE calculations. The variance or dissipation in the DT NE-alchemical process is generally much less than the sum of the variance of the annihilation and growth processes combined (see Figure S19 in the SI). When the volume of the two molecules is disparate, steric clashes while a molecule is growing can dramatically enhance the NE work in most of the NE trajectories, negatively affecting the accuracy and precision of the free-energy estimate. To further limit the dissipation in the early stage of the growth process, we use a modification of the soft-core regularization proposed in ref (49), based on shifted, rather than softened, electrostatic and Lennard-Jones potential of the formwhere the shifted origins α = 0.35 Å and β = 4.0 Å have been tuned to minimize the dissipation. On the Zenodo public repository https://zenodo.org/record/6385017, we provide two small movies that illustrate the evolution of the Beutler soft-core[49] potential and of the shifted potential of eq , from λ = 1, corresponding to the ghost molecule, to λ = 0 corresponding to the full interacting compound.

Methods

Simulation Setup and Parameters

Guests and host PDB files were taken from the SAMPL9 GitHub site.[40] The protonation state of all species is indicated in Figure . For the two chiral compounds, the tested G3 guest (2-bicyclo[2.2.1]heptanyl]azanium) corresponds to the 1R 2R 4S diastereoisomer. The tested G12 guest (hexamethyladamantane-2,6-diammonium) has 2 and 6 carbon atoms of the R type. The force field (FF) parameters and topology of the host and guests molecules were prepared using the PrimaDORAC interface,[50] based on the GAFF2[51] parameter set. For G4, the silicon-related parameters were taken from ref (52). The initial bound state was prepared using the Autodock Vina code.[53] The bound complexes and the ghost ligands were solvated in ∼1600 and 512 OPC3[54] water molecules, respectively. Long-range electrostatic interactions were treated using the Smooth Particle Mesh Ewald (SPME) method.[55] A background neutralizing plasma was assumed within the SPME method.[56] The cutoff of the Lennard-Jones interactions was set to 13 Å. All simulations, HREM or nonequilibrium, were performed in the NPT ensemble under standard conditions, using an isotropic Parrinello–Rahman Lagrangian[57] and a series of Nosé thermostats[58] for pressure and temperature control, respectively. Bond constraints were imposed on X–H bonds only, where X is a heavy atom. All other bonds were assumed to be flexible. All simulations have been done using the hybrid OpenMP-MPI program ORAC[59] on the CRESCO6 cluster.[60]

Figure 2

Guests (Gn) and host (WP6) of the SAMPL9 challenge. The green arrows indicate the pairs for which we computed the RBFEs. The latter (in kcal/mol) are reported in green and red for the BAR and BAR* estimates, respectively (see below). Primary cycles for cycle closure assessment are shown in the highlighted green region.

HREM Simulations

HREM simulations have been performed for the 13 complexes Gx-WP6 and the 13 solvated ligand, for a total of 26 HREM runs. HREM uses n = 16 and n = 12 replicas for the bound and unbound states, respectively, with a maximum scaling factor of S = 0.1 (corresponding to a temperature of 3000 K) involving only the intrasolute bonded and nonbonded interactions.[34] The scaling factors along the replica progression are computed according to the protocol S = S( with m = 1, ..., n. HREM simulations were performed in a single parallel job, using the ORAC option BATTERY,[59,61] in eight replicates, each lasting 6 ns for the bound ligand and 2 ns for the ligand in bulk, for a total of 48 and 12 ns for the bound and unbound state, respectively. The exchange rates were in the range of 25%–50%. The HREM jobs for the bound state engaged 128 MPI processes, each using 6 OpenMP threads for a total of 768 cores. The jobs were completed within ∼8 wall clock hours on the CRESCO6[60] cluster. For the unbound state, we used 96 MPI processes with 6 threads each, for a total of 576 cores, taking a wall clock time of 1 h. On the Zenodo platform (https://zenodo.org/record/6385017), we provide the HREM-generated trajectories of the bound and unbound state for all 13 guest–host systems.

Preparation of the Initial States for the NE Dual Topology Runs

The dual topology end-states for the bound systems, {A(B)}b and {B(A)}b, are prepared by combining 196 phase-space points of the solvated complex taken from the 48 ns HREM, with an equivalent number of gas-phase configurations of the ghost molecule also sampled with HREM in a separate simulation (8 ns of the isolated molecule) and performed on a low-end workstation within <5 min. In each starting configuration for the bound state, the COM of the ghost molecule is made coincident with the COM of the fully coupled partner. The dual topology end-states for the ligand in bulk, {A(B)}u and {B(A)}u are prepared by combining 96 gas-phase configurations of the ghost molecules with 96 snapshots taken at regular intervals from the 16 ns HREM simulation of the fully coupled compound in bulk solvent.

NE-Alchemy Runs

Each of 196 NE-alchemy DT transitions for the complex lasted 0.72 ns with identical and simultaneous linear protocols for Lennard-Jones and electrostatic interactions and were run in both directions (i.e., {A(B)}b → {(A)B}b and {B(A)}b → {(B)A}b. During the alchemical transition, the COM–COM harmonic potential helps to maintain both molecules in the binding site (see section ). For the unbound arm, we produced 96 NE-alchemy DT runs, each lasting 0.36 ns in the forward and reverse direction. Therefore, for each couple AB, we run 392 NE-alchemical trajectories in the bound state, for a total simulation time of ∼280 ns and 192 trajectories in bulk for a total time of ∼70 ns. A parallel job for the bound state in one direction uses 1176 cores (196 MPI × 6 threads) delivering the work file within <2 wall clock hours on CRESCO6. For the unbound state, the job in one direction uses 576 cores (96 MPI× 6 threads) for 20 wall clock minutes. The computational demand of BAR-based (bidirectional) NE-RBFE is similar to that that of our ABFE SAMPL9 submisson,[47] where we used ∼400 annihilation (unidirectional) NE trajectories 1.44 ns long in the bound state.

Results

In Figure , we show the 13 guests and the WP6 host of the SAMPL9 challenge. In total, we have computed 18 RBFE, as indicated by the green arrows in Figure . In Table , for each pair, we report the Tanimoto coefficient[62] (computed using the PubChem fingerprints[63]), the volume and charge change, and the change in the number of rings.

Table 1

Tanimoto Coefficient (T), Volume Change (ΔV), Charge Change (ΔQ), and Change in the Number of Rings (Δnrings) in the RBFE Calculations

pair	T	ΔV(Å³)	ΔQ(e)	Δn_rings
g03 → g01	0.55	72.01	0	–1
g04 → g01	0.21	76.98	0	+1
g12 → g01	0.44	–39.90	–1	–3
g05 → g02	0.53	–41.91	0	–2
g10 → g02	0.37	12.58	0	+1
g04 → g03	0.15	4.97	0	+2
g05 → g03	0.38	–145.55	–1	–2
g07 → g03	0.61	4.29	0	+1
g11 → g03	0.40	9.54	0	+2
g13 → g03	0.17	–35.63	–1	0
g10 → g05	0.34	54.48	–1	+3
g11 → g06	0.50	30.97	0	+1
g12 → g06	0.43	–90.48	–1	–3
g08 → g07	0.21	–77.36	+1	0
g10 → g07	0.24	–95.36	–1	0
g09 → g08	0.22	35.17	–1	–3
g10 → g08	0.80	–18.00	–2	0
g12 → g11	0.29	–121.45	–1	–4

In typical ST-RBFE perturbations for isocharged congeneric series,[10,64] the mean Tanimoto coefficient is in the range of 0.85–0.98, and the volume change involves, in most cases, the volume of one single substituent (e.g., H → CH3). In this study, as can be seen from Table , each transformation is characterized, in most cases, by a Tanimoto coefficient T of <0.5.[65] When T > 0.5, as in g10 → g08 and g07 → g03, the transformation involves a charge change of 2e and a variation of the number of rings, respectively. Many of the transformations are characterized by at least one index of chemical dissimilarity, as measured by T, ΔV, and Δnrings.

Bidirectional and Unidirectional NE-RBFE Estimates

In Table , we report various estimates of the RBFE for the 18 pairs. The notation gx → gy signifies that the corresponding RBFE refers to the transmutation of gx into gy, with the DT scheme indicated as gx(gy) → (gx)gy. The quantity δGfs refers to the finite size correction under periodic boundary conditions and PME[56] when the transmutation involves two ligands with different charges. The finite charge corrections for the absolute binding free energy of the 13 guests are reported in ref (47). The raw values of the work for all the forward and reverse transformations involving the 18 pairs are provided on the Zenodo public repository (https://zenodo.org/record/6385017), along with a simple bash scripts for processing the raw data to yield bidirectional (BAR-based) or unidirectional (Jarzynski or Gaussian) estimates.

Table 2

Results of Free Energy Calculations for the AB Trasmutation for 18 Pairs of the SAMPL9 Challengesa

pair	BAR	BAR*b	δG_fs	J(FF)	J(RR)	J*(FR)	J*(FR)	G(FF)	G(RR)
g03 → g01	3.7 ± 0.6	3.4	0.0	2.6 ± 0.8	3.8 ± 1.6	2.0 ± 0.7	4.3 ± 1.3	0.5 ± 2.0	6.5 ± 2.0
g04 → g01	–0.4 ± 0.6	–0.5	0.0	–0.2 ± 0.7	–0.5 ± 1.1	–0.6 ± 1.0	–0.1 ± 0.7	–3.2 ± 2.0	3.3 ± 1.6
g12 → g01	8.6 ± 0.9	8.6	–2.3	9.0 ± 0.9	8.7 ± 1.2	8.8 ± 0.8	8.6 ± 1.2	3.7 ± 3.5	9.6 ± 1.9
g05 → g02	–5.1 ± 0.4	–5.3	0.0	–3.0 ± 0.4	–7.2 ± 0.8	–3.2 ± 0.6	–7.2 ± 0.7	–4.2 ± 0.8	–
g10 → g02	–1.4 ± 0.6	–1.6	0.0	–0.2 ± 1.3	–3.6 ± 1.0	0.3 ± 1.6	–4.2 ± 0.6	–2.8 ± 2.1	–0.2 ± 1.6
g04 → g03	–4.0 ± 0.3	–4.0	0.0	–4.0 ± 1.0	–5.0 ± 0.7	–4.3 ± 0.5	–5.0 ± 0.5	–5.6 ± 1.0	–4.1 ± 0.6
g05 → g03	–1.3 ± 0.7	–1.3	–2.3	1.3 ± 0.7	–2.2 ± 3.3	1.5 ± 0.7	–2.2 ± 2.7	0.9 ± 0.7	–
g07 → g03	–1.6 ± 0.3	–1.6	0.0	–2.6 ± 2.2	–2.1 ± 0.5	–2.5 ± 1.8	–2.7 ± 0.5	–3.3 ± 1.2	–0.7 ± 0.8
g11 → g03	–2.9 ± 0.3	–2.9	0.0	–2.3 ± 0.7	–3.7 ± 1.0	–2.6 ± 0.6	–3.6 ± 0.8	–3.7 ± 1.0	–3.0 ± 0.7
g13 → g03	–4.2 ± 0.4	–4.2	–2.3	–3.0 ± 0.6	–5.3 ± 0.8	–2.8 ± 0.8	–5.5 ± 0.6	–7.9 ± 1.5	–3.9 ± 1.2
g10 → g05	3.0 ± 0.5	3.0	0.0	3.5 ± 2.0	2.0 ± 1.3	3.6 ± 2.2	1.9 ± 1.2	–	2.4 ± 0.8
g11 → g06	–2.9 ± 0.3	–2.8	0.0	–1.8 ± 1.0	–3.2 ± 1.2	–1.6 ± 1.1	–3.1 ± 1.0	–3.9 ± 1.1	–2.5 ± 1.1
g12 → g06	6.4 ± 0.3	6.4	–2.3	7.7 ± 0.5	5.8 ± 0.9	7.9 ± 0.4	5.6 ± 1.1	5.8 ± 0.9	7.6 ± 1.2
g08 → g07	–3.1 ± 0.8	–3.5	2.0	–3.4 ± 2.7	–3.6 ± 1.3	–1.2 ± 2.2	–6.0 ± 1.8	–4.7 ± 2.1	2.9 ± 3.4
g10 → g07	2.7 ± 0.4	2.6	–2.3	4.8 ± 0.9	0.0 ± 0.6	4.8 ± 1.1	0.0 ± 0.4	3.2 ± 1.1	4.6 ± 1.6
g09 → g08	4.3 ± 0.8	4.3	–2.0	4.6 ± 1.7	3.8 ± 1.2	6.2 ± 1.1	1.5 ± 2.1	2.1 ± 2.0	6.9 ± 2.0
g10 → g08	5.7 ± 0.9	5.8	–4.3	7.2 ± 1.2	4.6 ± 2.1	8.3 ± 1.4	3.2 ± 2.0	4.4 ± 1.7	6.4 ± 2.3
g12 → g11	9.2 ± 0.4	9.2	–2.3	10.2 ± 0.6	9.2 ± 0.8	10.1 ± 0.6	9.3 ± 0.7	9.0 ± 0.9	11.0 ± 1.7

BAR* errors (not reported) are <0.1 kcal/mol in all cases. Nota bene: all reported estimates include δGfs.

Legend: BAR, Bennett acceptance ratio bidirectional estimate; BAR*, Bennett acceptance ratio bidirectional estimate using vDSSB; δGfs, finite size correction due to charge change (see text); J(FF), Jarzynski forward AB process; J(RR), Jarzynski reverse BA process; J*(FR), vDSSB estimate with convolution of bound forward and unbound reverse distributions; J*(RF), vDSSB estimate with unbound forward and bound reverse distributions; G(FF), Gaussian estimate forward; and G(RR), Gaussian estimate reverse. All estimates are in kcal/mol. Reported errors have been computed by bootstrap with resampling. BAR* errors (not reported) are <0.1 kcal/mol in all cases. Nota bene: all reported estimates include δGfs. BAR and BAR* are the two possible bidirectional estimates with the proposed approach. ΔΔGBAR corresponds to the difference of the bound and unbound arm perturbations (see Figure ), each computed using BAR on the forward and reverse transformation. ΔΔGBAR* is obtained by applying the BAR to the convolution of the forward and reverse vDSSB processes, i.e., J*(FR) and J*(RF) are unidirectional Jarzynski estimates based on the convolution of the work distribution of the forward bound process and the reverse unbound process and vice versa. G(FF) and G(RR) are unidirectional Gaussian estimated, computed separately on the two arms of the thermodynamic cycle. For the transformations involving g05, where this compound starts as a ghost, the distribution is markedly non-normal (see work distributions involving G5 in the SI) and no Gaussian estimate is reported. In the correlation plot of Figure , the two-arms standard BAR estimate is compared to the vDSSB-based bidirectional (BAR*) and unidirectional (J*(FR) and J*(RF)) estimates. Basically, BAR and BAR* are coincident.

Figure 3

vDSSB unidirectional and bidirectional estimates of the RBFE (see text) vs the BAR estimate.

vDSSB unidirectional and bidirectional estimates of the RBFE (see text) vs the BAR estimate. Both these estimates involve four transitions—two in the bound state and two in the unbound state—but they are not equivalent, as the overlap of the forward and reverse distribution in the convolution is smaller than that of the individual bound and unbound distribution, as can be seen in the example reported in Figure .

Figure 4

Work distributions in the G03 → G01 transformation.

Work distributions in the G03 → G01 transformation. The bound state forward work W(b) and the unbound state reverse work W(u) are two independent random variables. Hence, the mean value of their sum is given by the sum of the individual mean values, i.e., W̅vDSSB = W̅(b) + W̅(u) and W̅vDSSB = W̅(b) + W̅(u) and the variances are σ2(vDSSB) = σ2(b) + σ2(u) and σ2(vDSSB) = σ2(b) + σ2(u). Therefore, the corresponding convolution distributions PvDSSB(W) and PvDSSB(−W) have a tendency to widen and spread apart, reducing the overlap. However, accuracy is preserved, since the convolution is much more resolved than the individual distributions on the right-hand side (rhs) of eq . These facts are illustrated in Figure . BAR is notoriously the most accurate estimator in free-energy calculations. However, assessment of the unidirectional estimates is important, since these allow one to spare half of the computational time for NE alchemical stage. In this regard, J*(FR) does not coincide with J*(RF) as the dissipation can be different in the two senses (again, see Figure ). Generally, we note that J*(FR) and J*(RF) are overestimated and underestimated, with respect to the BAR estimates, respectively. By the same token, J(FF) and J(RR) (see Figure ) generally appear to overestimate and underestimate the BAR RBFE, respectively. The situation is reversed for the estimate G(FF) and G(RR) with the former being systematically underestimated, with respect to BAR. Evidently, our choice of the 18 transitions of Figure and the (arbitrary) assignment of their forward sense (see Table ) must have had a systematic impact on the unidirectional estimates.

Figure 5

(Upper panel) Correlation diagram between the BAR and J(FF) and J(RR) estimates. (Lower panel) Correlation diagram between the BAR and G(FF) and G(RR) estimates (see text).

(Upper panel) Correlation diagram between the BAR and J(FF) and J(RR) estimates. (Lower panel) Correlation diagram between the BAR and G(FF) and G(RR) estimates (see text). In Table , we have computed the Pearson correlation coefficient of the deviation from the BAR estimate of each of the unidirectional estimates of Table with the corresponding metrics for chemical dissimilarity given by the relative charge change (Cq = , the relative volume change (Cv = , and the Tanimoto coefficient T in the 18 alchemical transformations. The Tanimoto coefficient does not appear to be correlated to the observed discrepancies between bidirectional (BAR) estimate and unidirectional estimate. Some significant correlation is observed for the charge and for the volume change. Remarkably, for the latter, a moderate negative and weakly positive or null correlation is observed in the forward and reverse process, respectively. Stated in other terms, when the volume of the B ligand is much less than the volume of the A ligand, the unidirectional estimates has a tendency to deviate from the BAR estimate with a systematic sign. In the reverse direction, the correlation is significantly weaker, but the systematic deviation still remains. These data seem to suggest that volume and charge dissimilarity in the AB transition, rather than chemical fingerprints dissimilarity, are more likely to affect the accuracy of unidirectional estimates.

Table 3

Pearson’s Correlation Coefficient of the Deviation between Unidirectional and Bidirectional Estimates with the Charge, Tanimoto, and Volume Changesa in the 18 Transformations of Table

error	charge change, R(C_q)	change in Tanimoto coefficient, R(T)	volume change, R(C_v)	MSE (kcal/mol)
J*(FR)-BAR	0.57	–0.03	–0.60	1.01
J*(RF)-BAR	–0.54	0.08	0.24	–1.20
J(FF)-BAR	0.10	0.03	–0.57	0.77
J(RR)-BAR	0.09	0.07	0.24	–0.83
G(FF)-BAR	–0.04	0.10	–0.52	–1.36
G(RR)-BAR	0.47	–0.25	–0.13	1.49

See text.

See text. We conclude this section by comparing the results with vDSSB[47] for ABFE with the BAR-based RBFE reported in Table . vDSSB-ABFE are unidirectional estimates relying on the convolution of the bound state annihilation work distribution with the unbound state growth work histogram. The ABFE for the 13 guests of SAMPL9 are tabulated as the first entry in Table 2 in ref (47). They include the volume correction due to the receptor–ligand restraint (eq 8 in ref (47)) and the so-called finite-size correction (FSC)[56] that applies to charged ligands (eq 9 in ref (47)). For RBFE, while volume correction does not apply, FSC must be accounted for when, in the transmutation, the net charge of the ligand varies. Values of the relative FSC for all 18 transitions are reported in Table . The ABFE/RBFE correlation diagram is shown in Figure . ABFE and RBFE are strongly correlated (R = 0.98) exhibiting a mean unsigned error (MUE) of 1.5 kcal/mol, with most of the RBFE data lying in the 2 kcal/mol error region and within the ABFE error. In this regard, note that the mean error in the BAR-based RBFE is much less than that of the Jarzynski-based error of the ABFE. There is apparently no systematic bias between ABFE and RBFE, as testified by the negligible mean signed error (MSE) and by a best-fitting line with a ≈ 0.9 and b ≈ 0. Figure represents a strong mutual validation of the two NE-based techniques for ABFE[38,47] and RBFE calculation.

Figure 6

ABFE/RBFE correlation diagrams for the 18 transmutations of Table . White, green, and blue symbols indicate a charge change in the transmutation of 0, 1, and 2 electrons.

Cycle Closure Conditions

The so-called cycle closure condition[32,64] (CCC) represents a very stringent test for assessing the reliability and precision of RBFE calculations. The free-energy change in a cycle, regardless of the number of edges, should be zero. In Figure , the four primary three-edge cycles are highlighted in green. In two of these cycles (g01 → g03 → g04 → g01 and g10 → g02 → g05 → g10), the alchemical transmutations involve isocharged ligands with one and two net charges. In the other two cycles (g06 → g12 → g11 → g06 and g07 → g10 → g08 → g07), the ligand net charge may change by one or two electron units. Besides, CCC can be tested also on secondary cycles involving more than three edges. In Table , CCC has been tested using the BAR and BAR* bidirectional estimates for all possible primary (3 edges) and secondary cycles (4, 5, 6 edges) in the RBFE network of Figure . CCC is satisfied for the three-edge cycles with errors largely within the edge confidence intervals summed in quadrature. Remarkably CCC is still satisfied within the confidence interval in cycles up to six edges, irrespective of changes in chemical similarity, ligand charge, volume or number, and nature of rings.

Table 4

Cycles Closure Conditions in the Network of Figure

cycle	BAR (kcal/mol)	BAR* (kcal/mol)
g01 → g03 → g04 → g01	–0.1 ± 1.2	0.1
g10 → g02 → g05 → g10	0.7 ± 1.2	0.6
g06 → g12 → g11 → g06	0.0 ± 1.0	0.0
g07 → g10 → g08 → g07	–0.4 ± 1.4	–0.3
g12 → g01 → g03 → g11 → g12	1.3 ± 1.5	1.1
g03 → g05 → g10 → g07 → g03	0.6 ± 1.4	0.7
g03 → g05 → g02 → g10 → g07 → g03	1.2 ± 1.5	1.2
g03 → g05 → g02 → g10 → g08 → g07 → g03	–1.5 ± 1.9	–1.4

Comparison with Experimental RBFE

In Figure , we show the correlation plot between the 18 experimental[41] and calculated RBFE using NE-Alchemy. The latter include the finite-size correction δGfs reported in Table . Correlation is very good with a Pearson correlation coefficient of 0.82 and an MUE of 1.7 kcal/mol. In ref (47), we showed that the systematic overestimation of the absolute dissociation free energies of 2–3 kcal/mol was very likely, because of the large (+12 electrons) neutralizing background uniform charge distribution artificially causing the charged guest to favor the lower dielectric environment[66] (i.e., the WP6 cavity).

Figure 7

Correlation plot between the experimental[41] and calculated RBFE of Table . White, green, and blue symbols indicate a charge change in the tansmutation of 0, 1, and 2 electrons, respectively.

Correlation plot between the experimental[41] and calculated RBFE of Table . White, green, and blue symbols indicate a charge change in the tansmutation of 0, 1, and 2 electrons, respectively. For RBFEs, this systematic error largely cancels out, yielding an MSE of only 0.66 kcal/mol. Figure also shows that the only clear outlier involves g13 (Paraquat). The latter molecule was also an outlier in our SAMPL7 submission[36] for the cucurbituril-like open cavitand (CB-clip[67]) and was an outlier in our SAMPL9 submission,[47] once the absolute dissociation free energies were down-shifted by the systematic bias, because of the background plasma. As discussed in ref (36), the positive charge distribution delocalized on the aromatic rings of g13 (a peculiarity of this molecule, with respect to all other guests, both in CB-clip SAMPL7 and WP6 SAMPL9) is very likely systematically polarized by the electron-withdrawing groups decorating the rims of the WP6 or CB-clip hosts in the bound state, an effect that cannot be captured by a nonpolarizable FF such as ours. If we eliminate the outlier g13 → g03 from the set, the correlation between experimental and calculated RBFE becomes excellent (R = 0.90) with MUE and MSE dropping to 1.38 and 0.29 kcal/mol, respectively. In Figure , we finally report the comparison between experimental and calculated RBFE, taking g03 and g05 as references. These two guests are characterized by a strong chemical dissimilarity, with respect to volume, net charge, Tanimoto coefficient, and number of cycles. It should be stressed that most of the RBFEs refer, in this case, to indirect calculations involving two or more transmutations. For example, to arrive at g09, taking g03 as a reference, we have combined the results of four RBFEs as reported in the SI. The number flanking the symbols in the plot refers to the number of transmutations used in the final estimate (N). Note that the error bars in the calculations are generally higher the larger N is, as the final confidence interval is obtained by summing in quadrature the N errors. Remarkably, the agreement with experimental data is not significantly dependent on the reference choices, despite their chemical dissimilarity.

Figure 8

Calculated and experimental RBFE using G3 (left) and G5 (right) as references. R, MUE, and MSE have been computed by discarding the G13 outlier and the reference (see text). The number flanking the symbol represents the corresponding number of transmutations (shortest path) in the link (see Figure ). The 12 connection paths for calculated RBFEs are reported in the SI.

Conclusion

NE-RBFE, implemented via a straightforward dual topology bidirectional scheme, has been shown to provide accurate (BAR-based) results in the challenging SAMPL9 test bed, irrespective of the chemical distance between the two compounds. Cycle closure conditions involving compounds differing in charge, volume, and Tanimoto coefficient are satisfied within the confidence interval, irrespective of the number of edges in the cycle. These two remarkable features are not within reach of standard ST FEP-based RBFE calculation. Besides, according to ref (64), a ST-FEP-RBFE calculation between two strictly congeneric compounds requires, on per edge basis, a total of ∼500 ns in the bound state (3 ns on each of the 20 λ-states and 10 replicates for assessing the confidence intervals requiring 0.5 GB of disk space of storage for a post-processing BAR-based calculation. NE-RBFE requires, on per edge basis, <300 ns in total in the bound state, storing only the final work data (<2 kB) for later analysis. In NE-RBFE, there is no need of of introducing restrained dummy atoms, as done in ST-FEP-RBFE. As recently shown,[24] an inattentive choice of these restraints, by coupling the ligand–ligand internal coordinates, may produce a spurious contribution to the RBFE. The restraint in our approach is only between the COM of the two ligands with no impact, by design, on the internal degrees of freedom and on the final transmutation free energy as their cost cancels out at the end-states of the transformation. We have seen that NE-alchemy unidirectional estimates are accurate if the volume of the two compounds do not change much, even if the Tanimoto coefficient is well below 0.5. This gives the opportunity to avoid the HREM sampling on the arrival end-state and to cut the cost of NE-RBFE in half for nearly equal volume ligands, hence considerably accelerating hit-to-lead projects in drug design. In standard FEP-RBFE, slowly intercalating water molecules during the transition are a well-known limiting factor in convergence.[68] In NE-RBFE, this is no longer an issue, since the distribution of water molecules must be at equilibrium only at the end-states, which may be effectively sampled using efficient HREM schemes. In NE-RBFE, the alchemical path is crossed at fast speed and there is no need to canonically sample intermediate λ-states as in FEP-RBFE. Besides, in the dual topology NE-alchemical trajectories, thanks to the shifted potential of eq and to the COM–COM tethering potential, the ligand region is kept constantly devoid of water molecule if the two compounds have approximately the same volume and shape. When the volume or shape is different, water molecules may enter or not into the region made available by the annihilating partner, but this may have an impact on the dissipation observed in the final work distributions, largely tamed by the intrinsic accuracy of the vDSSB approach in the unidirectional estimates or by the intrinsic accuracy of the BAR (vDSSB-boosted or not) bidirectional approach, even when the overlap of the work distribution is negligible. We conclude by stressing that NE-RBFE can be implemented with little or no code modification in all popular MD programs that support HREM and NE alchemical transitions such as Gromacs, NAMD, or Amber. By affording the calculation of the RBFE for chemically distant (or scaffold-hopping-related) compounds in a matter of few hours on an HPC facility, NE-RBFE has the potential to become a powerful tool in the computer-based hit-to-lead drug discovery pipeline in industrial settings.

Data and Software Availability

PDB trajectory files, raw work data, and force field parameter files are available at the general-purpose open-access repository Zenodo (https://zenodo.org/record/6385017). The ORAC program (v6.1) is available for download under the GPL at the Web site http://www1.chim.unifi.it/orac/. Third-party software Autodock Vina can be downloaded from the Web site https://vina.scripps.edu/.

50 in total

1. Equilibrium free energies from nonequilibrium measurements using maximum-likelihood methods.

Authors: Michael R Shirts; Eric Bair; Giles Hooker; Vijay S Pande
Journal: Phys Rev Lett Date: 2003-10-02 Impact factor: 9.161

2. Good practices in free-energy calculations.

Authors: Andrew Pohorille; Christopher Jarzynski; Christophe Chipot
Journal: J Phys Chem B Date: 2010-08-19 Impact factor: 2.991

3. ORAC: a molecular dynamics simulation program to explore free energy surfaces in biomolecular systems at the atomistic level.

Authors: Simone Marsili; Giorgio Federico Signorini; Riccardo Chelli; Massimo Marchi; Piero Procacci
Journal: J Comput Chem Date: 2010-04-15 Impact factor: 3.376

4. Scaffold-hopping strategy: synthesis and biological evaluation of 5,6-fused bicyclic heteroaromatics to identify orally bioavailable anticancer agents.

Authors: Yen-Shih Tung; Mohane Selvaraj Coumar; Yu-Shan Wu; Hui-Yi Shiao; Jang-Yang Chang; Jing-Ping Liou; Paritosh Shukla; Chun-Wei Chang; Chi-Yen Chang; Ching-Chuan Kuo; Teng-Kuang Yeh; Chin-Yu Lin; Jian-Sung Wu; Su-Ying Wu; Chun-Chen Liao; Hsing-Pang Hsieh
Journal: J Med Chem Date: 2011-03-24 Impact factor: 7.446

5. The development of an Amber-compatible organosilane force field for drug-like small molecules.

Authors: Xue Dong; Xinghang Yuan; Zhenlei Song; Qiantao Wang
Journal: Phys Chem Chem Phys Date: 2021-05-26 Impact factor: 3.676

6. Statistical Mechanics of Ligand-Receptor Noncovalent Association, Revisited: Binding Site and Standard State Volumes in Modern Alchemical Theories.

Authors: Piero Procacci; Riccardo Chelli
Journal: J Chem Theory Comput Date: 2017-04-17 Impact factor: 6.006

7. Computing Relative Binding Affinity of Ligands to Receptor: An Effective Hybrid Single-Dual-Topology Free-Energy Perturbation Approach in NAMD.

Authors: Wei Jiang; Christophe Chipot; Benoît Roux
Journal: J Chem Inf Model Date: 2019-08-27 Impact factor: 4.956