Matteo Aldeghi1, Alexander Heifetz2, Michael J Bodkin2, Stefan Knapp3,4,5, Philip C Biggin1. 1. Structural Bioinformatics and Computational Biochemistry, Department of Biochemistry, University of Oxford , South Parks Road, Oxford OX1 3QU, U.K. 2. Evotec (U.K.) Ltd. , 114 Innovation Drive, Milton Park, Abingdon, Oxfordshire OX14 4RZ, U.K. 3. Structural Genomics Consortium, Nuffield Department of Clinical Medicine, University of Oxford , Old Road Campus Research Building, Roosevelt Drive, Oxford OX3 7DQ, U.K. 4. Target Discovery Institute, Nuffield Department of Clinical Medicine, University of Oxford , Roosevelt Drive, Oxford OX3 7BN, U.K. 5. Institute for Pharmaceutical Chemistry, Goethe University Frankfurt , 60438 Frankfurt, Germany.
Abstract
Binding selectivity is a requirement for the development of a safe drug, and it is a critical property for chemical probes used in preclinical target validation. Engineering selectivity adds considerable complexity to the rational design of new drugs, as it involves the optimization of multiple binding affinities. Computationally, the prediction of binding selectivity is a challenge, and generally applicable methodologies are still not available to the computational and medicinal chemistry communities. Absolute binding free energy calculations based on alchemical pathways provide a rigorous framework for affinity predictions and could thus offer a general approach to the problem. We evaluated the performance of free energy calculations based on molecular dynamics for the prediction of selectivity by estimating the affinity profile of three bromodomain inhibitors across multiple bromodomain families, and by comparing the results to isothermal titration calorimetry data. Two case studies were considered. In the first one, the affinities of two similar ligands for seven bromodomains were calculated and returned excellent agreement with experiment (mean unsigned error of 0.81 kcal/mol and Pearson correlation of 0.75). In this test case, we also show how the preferred binding orientation of a ligand for different proteins can be estimated via free energy calculations. In the second case, the affinities of a broad-spectrum inhibitor for 22 bromodomains were calculated and returned a more modest accuracy (mean unsigned error of 1.76 kcal/mol and Pearson correlation of 0.48); however, the reparametrization of a sulfonamide moiety improved the agreement with experiment.
Binding selectivity is a requirement for the development of a safe drug, and it is a critical property for chemical probes used in preclinical target validation. Engineering selectivity adds considerable complexity to the rational design of new drugs, as it involves the optimization of multiple binding affinities. Computationally, the prediction of binding selectivity is a challenge, and generally applicable methodologies are still not available to the computational and medicinal chemistry communities. Absolute binding free energy calculations based on alchemical pathways provide a rigorous framework for affinity predictions and could thus offer a general approach to the problem. We evaluated the performance of free energy calculations based on molecular dynamics for the prediction of selectivity by estimating the affinity profile of three bromodomain inhibitors across multiple bromodomain families, and by comparing the results to isothermal titration calorimetry data. Two case studies were considered. In the first one, the affinities of two similar ligands for seven bromodomains were calculated and returned excellent agreement with experiment (mean unsigned error of 0.81 kcal/mol and Pearson correlation of 0.75). In this test case, we also show how the preferred binding orientation of a ligand for different proteins can be estimated via free energy calculations. In the second case, the affinities of a broad-spectrum inhibitor for 22 bromodomains were calculated and returned a more modest accuracy (mean unsigned error of 1.76 kcal/mol and Pearson correlation of 0.48); however, the reparametrization of a sulfonamide moiety improved the agreement with experiment.
Binding selectivity is one of the most
important requirements for
the development of a safe therapeutic; in fact, large differences
in binding affinity between the intended target and similar proteins
are often sought in order to avoid unwanted and often unknown side
effects.[1] Conversely, in some cases a compound
with a specific promiscuous profile might be equally desirable for
reasons of efficacy.[2,3] Yet, the identification and design
of selective (or promiscuous) chemical tools is a difficult task,
in particular when working on protein families with conserved folds
and similar binding pocket residues. A typical example is provided
by the protein kinase family, for which different assay panels have
been developed in order to determine selectivity profiles of different
ligands.[4,5] The development of bromodomain (BRD) chemical
probes poses the same selectivity challenge due their conserved fold.[6] BRDs are epigenetic mark readers that specifically
recognize ε-N-lysine acetylation motifs; 61
human BRDs have been found in 46 human nuclear and cytoplasmic proteins,
and they are typically divided into eight families based on sequence
and structural similarity (Figure a).[7] With acetylation motifs
often found in macromolecular complexes implicated in chromatin remodeling,
DNA repair and cell-cycle control, and especially on histones, BRD
inhibitors are finding broad application in medicine and basic biological
research, in particular in oncology and inflammation. Despite their
sequence diversity, they all share a conserved fold (Figure b) that renders the design
of selective ligands a challenging task. Selective probes can, however,
lead to better preclinical target validation and profiling, and eventually
to better clinical outcomes.[6]
Figure 1
Ligands and
proteins considered in this study. (a) Phylogenetic
tree of the human bromodomain family; in green are the BRDs included
in the study. (b) Conserved fold of human bromodomains (PDB ID 2OSS). Depicted as red
spheres is the conserved water network found in most BRD binding pockets,
and as blue sticks the conserved hydrogen bond donor, an asparagine
residue in the majority of BRDs. A transparent surface shows the cavity
in between the ZA and BC loops forming the acetyl-lysine binding pocket.
Two clefts, placed in between the WP residues (conserved in subfamilies
I and II) and the ZA and BC loops and exploited by many bromodomain
inhibitors are indicated. (c) Chemical structures of the compounds.
Ligands and
proteins considered in this study. (a) Phylogenetic
tree of the human bromodomain family; in green are the BRDs included
in the study. (b) Conserved fold of human bromodomains (PDB ID 2OSS). Depicted as red
spheres is the conserved water network found in most BRD binding pockets,
and as blue sticks the conserved hydrogen bond donor, an asparagine
residue in the majority of BRDs. A transparent surface shows the cavity
in between the ZA and BC loops forming the acetyl-lysine binding pocket.
Two clefts, placed in between the WP residues (conserved in subfamilies
I and II) and the ZA and BC loops and exploited by many bromodomain
inhibitors are indicated. (c) Chemical structures of the compounds.Engineering selectivity adds considerable
complexity to the rational
design of new drugs, as it involves the optimization of multiple binding
affinities. Different strategies have been adopted for selectivity
design, such as shape and electrostatics complementarity, conformational
selection, water displacement, and allosteric binding. Yet, engineering
selectivity still remains a challenge for the drug design process.[1] Computationally, the prediction of binding selectivity
is a difficult task too, and not many generally applicable methodologies
are available to the computational and medicinal chemistry communities.
Henrich et al.[8] evaluated the performance
of target-specific scoring functions, developed from known sets of
binders, for three protein targets: thrombin, trypsin, and urokinase-type
plasminogen activator. The quality of the predictions appeared to
be system dependent and the authors discussed how inclusion of protein
flexibility and conserved waters might result in better docking methods.
Thanks to advances in theory and computing, however, the prediction
of binding affinities using physics-based biomolecular simulations
is becoming increasingly feasible. All-atom simulations naturally
take into account effects such as the dynamics of the binding pocket,
the role of the solvent, and entropy changes upon binding, which are
poorly captured with scoring-function-based approaches.[8,9] In particular, free energy calculations based on molecular dynamics
(MD) and alchemical pathways could offer an alternative and more rigorous
approach to the problem, and one that does not require the use of
training sets. Moreover, such calculations are becoming progressively
more accessible to current levels of computational power.[10,11] Recent studies have shown the potential of relative binding free
energy (RBFE) calculations for small-molecule lead optimization.[12−14] RBFE calculations can also allow the evaluation of free energy changes
upon protein mutation,[15−19] but they become unfeasible when a large number of residues differ
between the proteins. Absolute binding free energy (ABFE) calculations
are a more general approach as they do not require knowledge of any
affinity value in advance and, in principle, could be used to compare
the affinity of any ligand for any protein target. Such calculations
could thus be employed as a tool to probe ligand selectivity across
different binding pockets, in order to identify the scaffolds providing
the desired selectivity profile. For instance, Lin and co-workers
used absolute free energy calculations to explain the specificity
of Gleevec for Abl kinase versus similar homologous tyrosine kinases.[20−22] More recently, we have shown how calculations based on standard
implementations of alchemical transformations might be applied for
the accurate affinity estimation of diverse, drug-like organic molecules.[10]ABFE calculations are computationally
demanding and, despite the
few examples mentioned above, their performance has had only limited
testing against experimental data, with a focus on model systems.[23−26] Careful retrospective validation is therefore still needed before
the methodology can be confidently applied in a prospective drug discovery
scenario, with the ability to anticipate when these calculations might
be the most suitable tool and where potential pitfalls might lie.
Here, we test the robustness of ABFE calculations across different
binding pockets, in order to assess the accuracy of the computational
method against multiple proteins and thus evaluate its performance
and potential utility for the study of ligand selectivity. We calculated
the affinities for 36 complexes, involving 22 bromodomains and 3 ligands
(RVX-OH, RVX-208, and bromosporine; see Figure c), without any information on the affinities
or structures of the complexes. We compare the results with high-quality
isothermal titration calorimetry (ITC) data and discuss the performance
of the calculations in relation to their potential application in
drug design.
Methods
System Setup
Protein conformations were taken from
crystal structures as listed in Tables and 2 (more information on
the protein models is in Table S1). If
multiple chains were present in the asymmetric unit, only monomer
A would be kept for the calculations. Missing atoms in the crystals
were modeled with the WHAT-IF web interface[27,28] and all organic molecules were removed from the system, whereas
all crystallographic waters were retained. Proteins were then protonated
using the Gromacs tool (Pdb2gmx). The binding poses for RVX-208 and RVX-OH
were taken from a previous study,[10] where
the poses were identified via docking following the same procedure
here used for bromosporine. The preferred binding pose for bromosporine
was identified by docking the ligand in an apo X-ray structure of
BRD4(1) (PDB ID 2OSS) and then performing free energy calculations on the clustered poses.
For bromosporine and RVX-208, the highest affinity pose (as predicted
by the calculations) was assumed to be conserved and was modeled into
the pockets of the other bromodomains by structural alignment to BRD4(1)
with PyMOL. For RVX-OH, however, two alternative poses were tested
via free energy calculations for all bromodomains binding to this
ligand. When placing ligands into the pockets, crystallographic water
molecules within 1.4 Å of the organic molecule were removed in
order to avoid steric clashes. All other crystallographically resolved
water molecules were retained, so that the conserved network of water
molecules at the bottom of the cavities was preserved in the calculations.
After adding hydrogens with Maestro (v9.5, Schrödinger), the
ligands were parametrized with the general AMBER force field[29] (GAFF v1.7) using AmberTools14. Restrained electrostatic
potential (RESP) charges[30] (HF/6-31G*//HF/6-31G*)
for bromosporine were obtained with the PyRED server[31] using Gaussian 09 (Rev. D.01). RESP charges for the ligands
RVX-208 and RVX-OH were obtained with a geometry optimization followed
by a molecular electrostatic potential calculation, both at the HF/6-31G*
level of theory. ESP points were sampled according to the Merz–Kollman
scheme.[32,33] The RESP fit was done using antechamber in AmberTools14, while all QM calculations were performed in Gaussian
09 (Revision D.01). The PyRED server was not employed in the latter
cases due to its unavailability for a number of months in late 2015.
Gromacs topologies and coordinate files were generated from the Amber
ones using acpype (v.2014-08-27 Rev. 403).[34] The Amber99SB-ILDN force field was used for
the protein and the TIP3P model for water molecules.[35,36] The complexes were solvated in a dodecahedral box with periodic
boundary conditions and a minimum distance between the solute and
the box of 12 Å. Sodium and chloride ions were added to neutralize
the systems at the concentration of 0.15 M.
Table 1
Summary
of the Calculation Results
for RVX-OH and RVX-208, along with Information about the Experimental
Structural and Affinity Dataa
ΔGcalc (kcal/mol)
protein
ligand
PDB ID
pose
ΔGexp (kcal/mol)
H-pose
V-pose
ΔGcalc – ΔGexp (kcal/mol)
BRD2(1)
RVX-OH
4ALG
H
–8.5 ± 0.1
–9.8 ± 0.1
–6.5 ± 0.1
–1.3 ± 0.2
BRD2(2)
RVX-OH
4MR5
V
–8.8 ± 0.1
–7.6 ± 0.1
–7.9 ± 0.1
+0.9 ± 0.1
BRD3(1)
RVX-OH
3S91
H
–8.0 ± 0.1
–6.7 ± 0.1
–5.7 ± 0.1
+1.3 ± 0.2
BRD3(2)
RVX-OH
3S92
V
–8.8 ± 0.1
–6.6 ± 0.2
–7.4 ± 0.1
+1.5 ± 0.2
BRD4(1)
RVX-OH
2OSS
H
–9.0 ± 0.1
–9.9 ± 0.1
–6.6 ± 0.1
–0.9 ± 0.2
BRD4(2)
RVX-OH
2OUO
V
–8.8 ± 0.1
–5.0 ± 0.1
–8.5 ± 0.1
+0.3 ± 0.2
BRDT(1)
RVX-OH
4KCX
H
–7.7 ± 0.1
–7.7 ± 0.1
–6.3 ± 0.1
+0.0 ± 0.2
BRD2(1)
RVX-208
4ALG
V
–6.9 ± 0.1
n/a
–7.2 ± 0.1
–0.3 ± 0.2
BRD2(2)
RVX-208
4MR5
V
–8.5 ± 0.1
n/a
–7.5 ± 0.1
+0.9 ± 0.2
BRD3(1)
RVX-208
3S91
V
–7.1 ± 0.1
n/a
–5.5 ± 0.1
+1.6 ± 0.2
BRD3(2)
RVX-208
3S92
V
–8.8 ± 0.1
n/a
–8.7 ± 0.1
+0.1 ± 0.2
BRD4(1)
RVX-208
2OSS
V
–7.8 ± 0.1
n/a
–6.8 ± 0.1
+1.0 ± 0.2
BRD4(2)
RVX-208
2OUO
V
–9.1 ± 0.1
n/a
–9.7 ± 0.1
–0.6 ± 0.2
BRDT(1)
RVX-208
4KCX
V
–7.0 ± 0.1
n/a
–6.5 ± 0.1
+0.5 ± 0.2
ITC data were taken from Picaud
et al.[37] All uncertainties are one standard
error of the mean. ITC uncertainties are an estimate based on the
standard deviation of the ΔG values observed
in the ABRF-MIRG’02 inter-laboratory assessment.[38] Values for the difference ΔGcalc – ΔGexp may
appear inconsistent due to rounding. “PDB ID” refers
to RCSB Protein Data Bank code for the protein structure used for
the free energy calculations. “Pose” indicates in which
one of the two orientations the ligand is expected to bind given the
experimental evidence, where H is the horizontal pose and V is the
vertical pose. In square brackets are the 95% confidence intervals
of the statistics based on percentile bootstrap.
Table 2
Summary of the Calculation
Results
for Bromosporine, along with Information about the Experimental Structural
and Affinity Dataa
protein
family
PDB ID
ΔGexp (kcal/mol)
ΔGcalc (kcal/mol)
ΔGcalc – ΔGexp (kcal/mol)
ΔGcalcsulf (kcal/mol)
ΔGcalcsulf – ΔGexp (kcal/mol)
CECR2
I
3NXB
–10.7 ± 0.1
–11.5 ± 0.3
–0.8 ± 0.3
–10.6 ± 0.3
+0.1 ± 0.3
FALZ
I
3UV2
–6.8 ± 0.1
–9.8 ± 0.2
–3.0 ± 0.3
–9.2 ± 0.2
–2.4 ± 0.3
PCAF
I
3GG3
–7.0 ± 0.1
–10.7 ± 0.3
–3.7 ± 0.3
–9.6 ± 0.3
–2.5 ± 0.3
BRD2(1)
II
4ALG
–9.2 ± 0.1
–7.7 ± 0.3
+1.6 ± 0.3
–6.6 ± 0.3
+2.7 ± 0.3
BRD2(2)
II
4MR5
–9.6 ± 0.1
–9.7 ± 0.3
–0.0 ± 0.3
–8.8 ± 0.3
+0.9 ± 0.3
BRD3(1)
II
3S91
–9.1 ± 0.1
–7.2 ± 0.3
+1.9 ± 0.3
–6.1 ± 0.3
+3.0 ± 0.3
BRD3(2)
II
3S92
–9.6 ± 0.1
–11.7 ± 0.3
–2.0 ± 0.3
–11.0 ± 0.3
–1.3 ± 0.3
BRD4(1)
II
2OSS
–9.7 ± 0.1
–11.3 ± 0.3
–1.5 ± 0.3
–9.8 ± 0.3
–0.0 ± 0.3
BRD4(2)
II
2OUO
–9.8 ± 0.1
–10.0 ± 0.3
–0.2 ± 0.3
–9.2 ± 0.3
+0.6 ± 0.3
BRDT(1)
II
4KCX
–9.8 ± 0.1
–9.3 ± 0.3
+0.4 ± 0.3
–8.2 ± 0.3
+1.6 ± 0.3
CREBBP
III
4NYX
–7.6 ± 0.1
–10.8 ± 0.2
–3.1 ± 0.3
–10.3 ± 0.2
–2.6 ± 0.3
EP300
III
3I3J
–6.8 ± 0.1
–9.8 ± 0.3
–3.0 ± 0.3
–9.2 ± 0.3
–2.4 ± 0.3
BRD1
IV
5AME
–7.6 ± 0.1
–10.8 ± 0.2
–3.2 ± 0.3
–10.4 ± 0.2
–2.7 ± 0.3
BRD9
IV
4XY8
–9.7 ± 0.1
–12.0 ± 0.2
–2.3 ± 0.3
–11.1 ± 0.2
–1.4 ± 0.3
BRPF1B
IV
4LC2
–8.6 ± 0.1
–11.5 ± 0.3
–2.9 ± 0.3
–11.7 ± 0.3
–3.2 ± 0.3
BAZ2A
V
4QBM
–7.2 ± 0.1
–11.0 ± 0.3
–3.9 ± 0.3
–10.0 ± 0.3
–2.9 ± 0.3
TIF1
V
4YBM
–6.7 ± 0.1
–5.4 ± 0.3
+1.3 ± 0.3
–5.0 ± 0.3
+1.7 ± 0.3
TAF1(1)
VII
3UV5
–6.9 ± 0.1
–8.5 ± 0.3
–1.6 ± 0.3
–7.7 ± 0.3
–0.7 ± 0.3
TAF1(2)
VII
3UV4
–10.2 ± 0.1
–10.7 ± 0.3
–0.4 ± 0.3
–10.1 ± 0.3
+0.1 ± 0.3
TAF1L(2)
VII
3HMH
–9.7 ± 0.1
–10.5 ± 0.2
–0.8 ± 0.3
–10.1 ± 0.2
–0.4 ± 0.3
PB1(5)
VIII
3MB4
–6.4 ± 0.1
–7.0 ± 0.2
–0.6 ± 0.3
–6.5 ± 0.2
–0.1 ± 0.3
SMARCA4
VIII
2GRC
–6.2 ± 0.1
–6.7 ± 0.3
–0.5 ± 0.3
–5.6 ± 0.3
+0.6 ± 0.3
mean unsigned error
1.76 [1.66, 1.90] kcal/mol
1.54 [1.46, 1.68] kcal/mol
root mean square error
2.13 [2.03, 2.26] kcal/mol
1.88 [1.78, 2.02] kcal/mol
% ≤1.0 kcal/mol
36 [27, 41]
41 [32, 45]
% ≤2.0 kcal/mol
59 [50, 68]
59 [55, 64]
Pearson’s r
0.48 [0.41, 0.53]
0.43 [0.36, 0.49]
Spearman’s ρ
0.50 [0.41, 0.62]
0.46 [0.38, 0.58]
ITC data taken from Picaud et
al.[39] All uncertainties are one standard
error of the mean. ITC uncertainties are an estimate based on the
standard deviation of the ΔG values observed
in the ABRF-MIRG’02 inter-laboratory assessment.[38] ΔGcalc refers
to the calculated free energy values with the initial bromosporine
model; ΔGcalcsulf refers
to the model with modified torsional parameters for the benzensulfonamide
group. Values for the difference ΔGcalc – ΔGexp may appear inconsistent
due to rounding. In square brackets are the 95% confidence intervals
of the statistics based on percentile bootstrap.
ITC data were taken from Picaud
et al.[37] All uncertainties are one standard
error of the mean. ITC uncertainties are an estimate based on the
standard deviation of the ΔG values observed
in the ABRF-MIRG’02 inter-laboratory assessment.[38] Values for the difference ΔGcalc – ΔGexp may
appear inconsistent due to rounding. “PDB ID” refers
to RCSB Protein Data Bank code for the protein structure used for
the free energy calculations. “Pose” indicates in which
one of the two orientations the ligand is expected to bind given the
experimental evidence, where H is the horizontal pose and V is the
vertical pose. In square brackets are the 95% confidence intervals
of the statistics based on percentile bootstrap.ITC data taken from Picaud et
al.[39] All uncertainties are one standard
error of the mean. ITC uncertainties are an estimate based on the
standard deviation of the ΔG values observed
in the ABRF-MIRG’02 inter-laboratory assessment.[38] ΔGcalc refers
to the calculated free energy values with the initial bromosporine
model; ΔGcalcsulf refers
to the model with modified torsional parameters for the benzensulfonamide
group. Values for the difference ΔGcalc – ΔGexp may appear inconsistent
due to rounding. In square brackets are the 95% confidence intervals
of the statistics based on percentile bootstrap.
Docking
The ligands were docked
into an X-ray structure
of BRD4(1) (PDB ID 2OSS) using MOE (Chemical Computing Group, v2014.09). All water and organic
molecules were removed from the model, with the exception of five
water molecules at the bottom of the binding pocket, which are conserved
across most bromodomains. The ligands were initially drawn in 2D using
Marvin Sketch (ChemAxon, v6.1.0) and a stochastic conformational search
was performed in MOE for the generation of multiple 3D conformers.
The number of conformers generated was limited to one hundred and
duplicates ones (RMSD < 0.25 Å) were removed. Protein and
ligands were parametrized using the AMBER10:EHT force field option
in MOE. Thus, the protein parameters were taken from Amber ff99SB,[40] while the ligand bonded parameters were obtained
with 2D Extended Hückel Theory,[41] the VdW parameters were derived from GAFF,[29] and the charges from Bond Charge Increments.[42] Pharmacophores were used for the placement of the ligand
during docking. The pharmacophore query was built based on the structure
of bound acetyl-lysine, as found in a holo X-ray structure (PDB ID 3UVW), and consisted
of a hydrogen-bond acceptor site (mimicking the acetyl oxygen) and
a nonpolar site (mimicking the methyl moiety). The London ΔG function was used for the initial scoring of the poses,
and each binding pose was then minimized and rescored with the GBVI/WSA
ΔG function. Duplicate poses were removed based
on their hydrogen-bond and hydrophobic patterns using this option
in MOE. In addition, also poses with positive binding free energy
as predicted by the GBVI/WSA ΔG scoring function
were removed. The remaining poses were clustered by RMSD with a 3.0
Å cutoff in order to reduce the number of poses to test via free
energy calculations, while also retaining a diverse set of binding
orientations; in fact, similar binding orientations might interconvert
during the simulations, resulting in almost equivalent ensembles.
Finally, the best scoring poses within each cluster were selected
for free energy calculations. This is the same protocol adopted in
our previous study.[10]
Derivation
of Small-Molecule Torsional Parameters
The
parameters for the two different biaryl torsions present in bromosporine
and the two RVX ligands were optimized by least-squares fitting to
a QM torsion scan at the MP2/6-31G* level of theory (Gaussian 09;
Rev. D.01) performed every 5 degrees (Figure S1). Three torsion angles present in the benzensulfonamide moiety were
also later reparametrized specifically for bromosporine using the
program paramfit in AmberTools14.[43] Parameters were derived by fitting the MM energies to single-point
energy calculations at the MP2/6-31G* level of theory (Figure S2) performed with Gaussian 09 (Rev. A.02).
The parameter search was performed using the hybrid genetic algorithm
with a mutation rate of 0.1; all other input parameters were used
with their default values. A total of 514 conformations were generated
by systematic variation of the dihedral angles for the fitting of
3 torsions (4 terms per torsion) within the benzensulfonamide moiety.
Conformations with QM relative energies below 5 kcal/mol were assigned
weights of 1.0, conformations with energies between 5 and 10 kcal/mol
were assigned weights of 0.5, and conformations with energies above
10 kcal/mol were assigned weights of zero. Only the force constant
and the phase of the torsion were fit, while the periodicity was fixed.
The parameters for all these terms are shown in Figure S2.
Preliminary Molecular Dynamics
In
order to improve
the fit of the ligands into the binding pockets of the structures
used for the free energy calculations, a brief preliminary MD simulation
was run for each complex. All simulations were carried out in Gromacs
5.0.4 or Gromacs 5.1.[44] 10 000 energy
minimization steps were performed using a steepest descent algorithm
(the protein–ligand structures obtained after this step were
submitted to the CSM-Lig server for scoring).[45] The system was subsequently simulated for 0.5 ns in the canonical
ensemble and for 1 ns in the isothermal–isobaric ensemble with
harmonic position restraints applied to all solute heavy atoms with
a force constant of 1000 kJ mol nm. Temperature was coupled
using Langevin dynamics at the target temperature of 298.15 K, while
pressure was coupled using the Berendsen weak coupling algorithm with
a target pressure of 1 atm.[46−48] A 15 ns unrestrained run was
then performed in the NPT ensemble with the Parrinello–Rahman
pressure coupling algorithm.[49] The last
frame of the run provided the starting system conformation for the
free energy calculations. The last 5 ns of the unrestrained simulations
were also used to obtain equilibrium values for the protein–ligand
restraints used during the free energy cycle as described in the next
paragraph.
Free Energy Calculations
All calculations
were carried
out in Gromacs 5.0.4[44] (all input files
for the ABFE calculations are available as Supporting Information). The ligand van der Waals interactions were decoupled
and the charges annihilated using a linear alchemical pathway with
Δλ = 0.05 for the van der Waals and Δλ = 0.1
for the Coulombic transformations. For the addition of the ligand
restraints, 12 nonuniformly distributed λ values were used (0.0,
0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.3, 0.5, 0.75, 1.0). A
total of 42 windows for the complex simulations and 31 windows for
the ligand simulations were therefore employed. For each window, 10 000
energy minimization steps were carried out using a steepest descent
algorithm. The system was subsequently simulated for 0.5 ns in the
canonical ensemble with harmonic position restraints applied to the
solute heavy atoms with a force constant of 1000 kJ mol nm.
Temperature was coupled using Langevin dynamics with 298.15 K as the
reference temperature.[47,48] A 1 ns position restrained run
in the isothermal–isobaric ensemble was then performed using
the Berendsen weak coupling algorithm and with a target pressure of
1 atm.[46] Several 15 ns unrestrained production
runs were performed for data collection using Hamiltonian-exchange
Langevin dynamics in the NPT ensemble with the Parrinello–Rahman
pressure coupling scheme.[49] A total of
3 million swaps between any state pair were attempted every 1000 time
steps, following the Gibbs sampling scheme proposed by Chodera and
Shirts.[50] The relative position and orientation
of the bound ligand with respect to the protein was restrained by
means of one distance, two angles, and three dihedral harmonic potentials
with force constant of 10 kcal mol Å [rad]. The equilibrium distance and angles for the restraints
were taken from the average values of the last 5 ns of the 15 ns preliminary
MD simulations. The analytical part of the contribution of this set
of restraints to the free energy can be evaluated as described by
Boresch et al.[51] The equation used to evaluate
this contribution also includes a correction for the standard-state
dependence of the binding free energy.[51] A soft-core potential was employed for the van der Waals interactions
transformed.[52] In all simulations, the
particle mesh Ewald (PME) algorithm[53] was
used for electrostatic interactions with a real space cutoff of 12
Å, a spline order of 6, a relative tolerance of 10, and a Fourier spacing of 1.0 Å. The
Verlet cutoff scheme was used with a van der Waals interaction cutoff
of 12 Å and a buffer tolerance of 0.005 kJ mol–1 ps–1.[54] The P-LINCS
constraint algorithm was used only on H-bonds.[55] The Gromacs long-range dispersion correction for energy
and pressure was used, and an additional numerical long-range dispersion
correction (EXP-LR) was applied as described by Shirts et al.[56] to address the failure of the isotropic assumption
of the analytical correction during simulations of protein–ligand
complexes. No correction for box-size effects was applied as it has
been shown that, contrary to charged ligands, such effects are negligible
for neutral solutes.[57]Relative free
energy calculations were performed in order to estimate the effect
of the sulfonamide torsional parameters on the calculated binding
free energies for bromosporine. The force constant and phase of the
dihedrals were interpolated between the two models using 12 uniformly
distributed λ values. The equilibration and production phases
were carried out as per the protocol described above for absolute
free energy calculations. No restraints have been used in this calculations,
and all ligand atoms were interacting with the environment during
the transformations. For the ligand in solution, five repeats were
carried out, while for the protein–ligand complexes simulations
a single repeat was performed.
Data Analysis
Free energy estimates were obtained with
the implementation of the multiple Bennet acceptance ratio (MBAR)
provided with the python package pymbar (https://github.com/choderalab/pymbar) and using the alchemical analysis tool (https://github.com/MobleyLab/alchemical-analysis).[56,58] The data from each lambda state were subsampled
in order to include only uncorrelated data-points by calculating the
autocorrelation time and statistical inefficiency of the derivative
of the potential energy with respect to lambda (dhdl). Only the dhdl components that were changing at
a particular lambda state were used as the observable for the calculation
of the autocorrelation times.[59] The equilibrated
region was determined by visualizing the forward and reverse convergence
plots as suggested by Klimovich et al.[58] Multiple such plots were generated by discarding data at increments
of 1 ns, from a minimum of 1 ns to a maximum of 12 ns, and identifying
the equilibration time producing a convergence plot with the least
drift in free energy in any one direction. The uncertainty of the
free energy estimate was taken from the MBAR asymptotic variance-derived
uncertainty as the square root of the statistical variance of the
total free energy. For the free energy of decoupling the ligand from
solution, the mean and sample standard deviation of multiple repeats
were used. For bromosporine and RVX-OH, the five initial conformations
of the ligand in solution were determined by the five docked poses;
for RVX-208, which only had two poses returned by the docking and
clustering procedure,[10] another three snapshots
were extracted from the simulations in order to provide three additional
random starting conformations. The final binding free energy is the
difference between the decoupling of the ligand from the solution
and from the solvated complex; the final uncertainty in the binding
free energies is thus the root sum square of the uncertainties of
ligand and complex calculations. The binding free energies for the
bromosporine models with modified torsional parameters were obtained
by adding the relative free energy values to the absolute free energy
ones that were predicted with the original model. The final uncertainty
was derived as the root sum square of the standard errors of the absolute
and relative free energies. The 95% confidence intervals (CIs) for
the performance statistics (mean unsigned error (MUE), root-mean-square
error (RMSE), Pearson product-moment correlation (rp), Spearman rank correlation (rs)) were calculated with percentile bootstrap,[60] after building 105 bootstrap samples by resampling
the experimental and calculated free energies according to their mean
and standard errors, assuming normality. Throughout the text we report
the 95% CI in square brackets immediately after the statistics.
Results
Predicting the Different Poses and Selectivity of Similar Ligands
RVX-208 is an orally available small molecule that is currently
being investigated in clinical trials for the treatment of atherosclerosis,
diabetes, and cardiovascular diseases. It has been found to inhibit
the bromo and extraterminal (BET) family of bromodomains.[37,61,62] RVX-208 shows a slight selectivity
(from about 8- to 21-fold in affinity) for the second (BD2) versus
the first bromodomains (BD1) of the BET family.[37] RVX-OH is the synthetic precursor of RVX-208 and also targets
the BET bromodomain family; however, it does not show the same preference
for the second bromodomains. The compound in fact rearranges in the
binding pockets of the first BET bromodomains and binds with a different
pose as compared to RVX-208, where it occupies the WP/ZA shelf (see Figure b for the bromodomain
structure). In such a way, RVX-OH achieves similar affinities for
both first and second BET bromodomains, abrogating the subtle selectivity
observed for RVX-208. The crystallographically observed binding poses
for RVX-208 and RVX-OH in the binding pockets of a first and second
BET bromodomains are shown in Figure . For ease of description, and consistently with the
orientation of the figures in this text, we denote the binding orientation
of RVX-OH in BD2 pockets as the “vertical” V-pose, and
the binding orientation of RVX-OH in BD1 pockets as the “horizontal”
H-pose, despite such naming not having any physical or biological
meaning. Note that it is not possible for RVX-208 to bind in the H-pose,
due to the presence of an additional hydroxyethyl group that would
clash with the protein. These ligands thus represent an interesting
case scenario to test the ability of the calculations to identify
both the binding modes and slight selectivity of a ligand for multiple
proteins. In this study, the 7 BET BRDs (out of 8) with at least a
crystal structure available in the RCSB Protein Data Bank (PDB) and
affinity data determined by ITC were considered.
Figure 2
Comparison of the X-ray
structures of RVX-208 and RVX-OH in the
first (BD1) and second (BD2) BET bromodomains binding pockets, and
the corresponding docked structures. RVX-208 binds with the same pose
to both BD1 and BD2, while RVX-OH adopts two distinct binding orientations.
The two ligands have been docked into an apo structure of BRD4(1)
(PDB ID 2OSS), and the docking poses best representing the crystal structures
are shown. These poses are taken from previous work.[10] BD1s are represented by white cartoons, and BD2s by gray
cartoons; RVX-208 is highlighted in light blue, and RVX-OH in light
green; the conserved Asn residue and network of waters are shown as
reference points.
Comparison of the X-ray
structures of RVX-208 and RVX-OH in the
first (BD1) and second (BD2) BET bromodomains binding pockets, and
the corresponding docked structures. RVX-208 binds with the same pose
to both BD1 and BD2, while RVX-OH adopts two distinct binding orientations.
The two ligands have been docked into an apo structure of BRD4(1)
(PDB ID 2OSS), and the docking poses best representing the crystal structures
are shown. These poses are taken from previous work.[10] BD1s are represented by white cartoons, and BD2s by gray
cartoons; RVX-208 is highlighted in light blue, and RVX-OH in light
green; the conserved Asn residue and network of waters are shown as
reference points.The docking
pose for RVX-208 was taken from a previous study, in which absolute
free energy calculations were shown to be able to identify the crystallographic
pose for this ligand in BRD4(1).[10] Similarly,
the two docking poses for RVX-OH corresponding to the V- and H-poses
were taken from the same study; in this case, too, it was shown how
the calculations identified the correct pose for BRD4(1). Here, however,
we were interested in assessing whether the calculations could predict
the difference in its binding orientation between BD1s and BD2s; therefore,
we rescored both the V-pose and H-pose for all 7 BRDs considered.
The relevant docking poses are shown in Figure . The two poses for RVX-OH, and the one for
RVX-208, were modeled into the structures of the other six BET bromodomains
considered. In order to allow the binding pockets to relax around
the ligands, a short (15 ns) MD simulation of the complexes was run
before performing the free energy calculations.Given the two potential
binding poses of RVX-OH for the BET bromodomains,
the calculations were run starting from both orientations for all
the seven BET bromodomains considered (no structure is yet available
for BRDT(2)) in order to test whether the computational protocol could
reliably identify the correct pose for the first (H-pose) versus second
(V-pose) BET bromodomains. Table summarizes the results of the calculations obtained
for RVX-OH, and shows how the orientation that would be expected to
be the more stable based on the crystallographic evidence was always
predicted to have a more negative binding free energy value, i.e.
a higher affinity, than the other competing orientation. For each
bromodomain/RVX-OH complex we then took the higher affinity pose as
being the most stable orientation, and compared the corresponding
binding free energy value to the ITC data. As previously mentioned,
RVX-208 can bind only in one orientation (V-pose), thus this exercise
was not carried out for this ligand. However, it needs to be kept
in mind that it is the retrospective nature of this study that has allowed
us to confidently make such an assumption. In a prospective drug discovery
scenario, it is likely that limited information will be available
on the compound being investigated, so that one might need to be more
careful when assuming a conserved pose across different proteins.Table and Figure summarize the correspondence
of the computational predictions to the ITC data for the two ligands
together (full breakdown of free energy terms in Table S2). It is possible to see how, in this case, the predictions
reproduced experimental values in absolute terms extremely well. All
of the predicted binding free energies were within 2 kcal/mol of the
ITC values, and about two-thirds were within 1 kcal/mol. This resulted
in low MUE and RMSE: 0.81 [0.74, 0.90] kcal/mol and 0.95 [0.87, 1.05]
kcal/mol, respectively. The correlation with ITC values was particularly
good too for absolute calculations, with a Pearson correlation (rp) of 0.75 [0.67, 0.80] and a Spearman (rs) of 0.78 [0.64, 0.85]. Considering RVX-208
alone, the calculations achieved rp =
0.84 [0.76, 0.90] and rs = 0.75 [0.71,
0.93]. On the other hand, when considering RVX-OH only, lower correlation
values with larger uncertainties were returned (rp = 0.49 [0.27, 0.66] and rs = 0.46 [0.14, 0.71]) despite the calculations performing well in
terms of absolute errors for both ligands. This is expected given
the small dynamic range of 1.3 kcal/mol when considering RVX-OH alone.
Figure 3
Scatter plot of calculated
versus experimental affinities for the
compounds RVX-208 and RVX-OH. The shaded gray areas indicate where
the 1 and 2 kcal/mol error boundaries lie. On the right-hand side
are the distributions of the calculated binding free energies for
the two ligands, binding to BD1s (light blue and light green) and
BD2s (dark blue and dark green); Gaussian curves have been fitted
to the data.
Scatter plot of calculated
versus experimental affinities for the
compounds RVX-208 and RVX-OH. The shaded gray areas indicate where
the 1 and 2 kcal/mol error boundaries lie. On the right-hand side
are the distributions of the calculated binding free energies for
the two ligands, binding to BD1s (light blue and light green) and
BD2s (dark blue and dark green); Gaussian curves have been fitted
to the data.If the predictions were grouped into first [BRD2(1), BRD3(1), BRD4(1),
BRDT(1)] and second [BRD2(2), BRD3(2), BRD4(2)] BET bromodomains,
it would be possible to anticipate the slight selectivity of RVX-208
for BD2s, as one can visually gather from Figure . In fact, the sample mean and standard deviation
of RVX-208 binding free energies for BD1s were of −6.5 ±
0.7 kcal/mol, while for BD2s of −8.6 ± 1.1 kcal/mol. A
two-tailed t test returns a p-value
of 0.03, which would suggest that the predicted difference in affinity
between the two groups is unlikely to be caused by chance. Contrariwise,
RVX-OH was predicted to have an average binding free energy of −8.5
± 1.6 kcal/mol for BD1s and of −7.9 ± 0.6 kcal/mol
for BD2s, showing a larger overlap of their distributions and consequently
returning a p-value of 0.57 when applying a two-tailed t test, correctly suggesting the probable lack of any preference
for either the first or second BET bromodomains.For comparison, we used
a recent machine learning scoring function
to test whether similar results might have been achievable with a
much cheaper computational method. More specifically, we used CSM-Lig,
a machine-learning method that relies on graph-based signatures and
showed a performance statistically comparable to RF-Score[63,64] and superior to 18 other common scoring functions.[45] The input protein–ligand structures were the docking
poses modeled in all BRDs considered after minimization of the whole
system in explicit solvent using Gromacs. The scoring function identified
the correct pose for RVX-OH five out of seven times, and returned
more modest correlation with experimental affinities (rp = 0.44 and rs = 0.35). Furthermore,
the predictions systematically underestimated the binding for these
two ligands (Table S5), returning large
values for the MUE and RMSE of 5.82 and 5.86 kcal/mol, respectively.
No prediction was within 2 kcal/mol of the experimental value. Figure S3 plots the predicted versus experimental
binding free energies and shows how the scoring-based predictions
deviated substantially from the line of identity.
Reproducing
the Affinity Profile of a Broad-Spectrum Inhibitor
Bromosporine
(Figure c) is a broad-spectrum
bromodomain inhibitor that binds across all
bromodomain families with a relatively wide range of affinities (from
∼25 μM to ∼10 nM; from −6.2 to −10.7
kcal/mol in binding free energy terms). As affinities for a number
of BRDs have been experimentally determined by ITC (Table ),[39] it is an ideal system to test the ability of free energy calculations
to capture the affinity profile of a ligand across many similar binding
pockets. As for RVX-OH and RVX-208, all bromodomains with at least
a crystal structure available in the PDB and affinity data determined
by ITC were considered, for a total of 22 bromosporine/BRD pairs.At the outset of this study, no structure of bromosporine in complex
with a bromodomain had been deposited in the PDB. We therefore docked
the ligand in the binding pocket of BRD4(1) in order to determine
its likely binding pose. Five different clusters of poses were identified
and free energy calculations were performed on one pose from each
cluster in order to determine the most stable orientation. These were
distinct poses that did not interconvert within the simulations time
scale. Figure provides
an overview of these docking poses, ranked by their predicted affinity.
In July 2015 the first bromosporine structure in complex with a human
bromodomain (BRPF1B, family IV) was released on the PDB (PDB ID 5C7N),[65] so that we were able to compare the docked poses to this
experimentally resolved structure. Figure a shows the superimposition of the docking
pose with highest predicted affinity in BRD4(1) and the X-ray structure
of bromosporine in complex with BRPF1B. The predicted pose resembled
well the crystallographic orientation, with a RMSD of 2.3 Å,
mainly due to a rotation of the ethyl carbamate side chain. The fact
that bromosporine was shown to bind to BRPF1B in the same fashion
as predicted for BRD4(1) also gave us confidence that the binding
mode for this ligand was conserved across bromodomain families. The
second pose (Figure c) was close to the experimental structure as well (RMSD of 3.4 Å),
but with the sulfonamide group directed toward the solvent rather
than making contacts with the protein. The remaining three poses (Figure d–f) were
bound in substantially different orientations as compared to the first
two docking poses and the crystallographic pose, as indicated by the
larger RMSDs of 4.6, 5.0, and 6.2 Å. The binding affinity of
bromosporine for BRD4(1) was, however, overestimated by over 1 kcal/mol:
the calculated free energy of the most stable pose was predicted to
be −11.3 ± 0.3 kcal/mol, while the ITC measure is of −9.7
± 0.1 kcal/mol. Correspondingly, the docking poses c, d, and
e are unexpectedly assigned large affinities while binding very differently
from the X-ray pose.
Figure 4
Docking poses for bromosporine in the binding pocket of
BRD4(1).
(a) Comparison of the pose with highest predicted affinity to the
X-ray structure of bromosporine in complex with BRPF1B (PDB-ID 5C7N).
(b–e) The five diverse bromosporine poses suggested by docking,
with their predicted binding affinity and RMSD with respect to the
binding mode found in the crystal structure.
Docking poses for bromosporine in the binding pocket of
BRD4(1).
(a) Comparison of the pose with highest predicted affinity to the
X-ray structure of bromosporine in complex with BRPF1B (PDB-ID 5C7N).
(b–e) The five diverse bromosporine poses suggested by docking,
with their predicted binding affinity and RMSD with respect to the
binding mode found in the crystal structure.After establishing the most likely binding orientation of
bromosporine
in BRD pockets via docking, the ligand was modeled into the X-ray
structures of the other 21 bromodomains, from 7 of the 8 bromodomain
families. As for the RVX ligands, a short MD simulation of the complexes
was run before performing free energy calculations for all the bromosporine/BRD
pairs. The results of the calculations are summarized in Table (full breakdown of
free energy terms in Table S3) and Figure a. A reasonable overall
agreement with experiment in terms of absolute values was achieved,
with MUE of 1.76 [1.66, 1.90] kcal/mol and RMSE of 2.13 [2.03, 2.26]
kcal/mol. Roughly a third of the results were within 1 kcal/mol of
the ITC values, and two-thirds within 2 kcal/mol. However, this left
another third of the results being off by at least 2 kcal/mol, which
corresponds to about a 30-fold error in the dissociation constant.
The correlation with experiment was reasonable too, but not as strong
as for RVX-OH/RVX-208, with rp = 0.48
[0.41, 0.53] and rs = 0.50 [0.41, 0.62].
The low number of samples from each bromodomain family hampered a
meaningful statistical analysis, and in fact a one-way analysis of
variance revealed that the error distribution of each subfamily did
not differ in a statistically significant manner from any other subfamily.
However, there seemed to be a pattern where calculations involving
bromodomains from subfamilies II, VII, and VIII, returned better results
than other subfamilies.
Figure 5
Scatter plots of calculated versus experimental
binding free energies
for bromosporine. (a) Calculated binding free energies for the initial
bromosporine model. (b) Calculated binding free energies for the bromosporine
model with optimized benzensulfonamide torsions. The shaded gray areas
indicate where the 1 and 2 kcal/mol error boundaries lie.
Scatter plots of calculated versus experimental
binding free energies
for bromosporine. (a) Calculated binding free energies for the initial
bromosporine model. (b) Calculated binding free energies for the bromosporine
model with optimized benzensulfonamide torsions. The shaded gray areas
indicate where the 1 and 2 kcal/mol error boundaries lie.Overall, in this set of predictions, the binding
free energies
are overestimated; in fact, the mean signed error was of −1.29
[−1.42, −1.17] kcal/mol. As we were interested in identifying
possible sources for this issue, we evaluated the effect of the charge
model on the results by carrying out relative calculations where RESP
charges were transformed into AM1-BCC ones (data not shown). However,
this change aggravated the overestimation issue, resulting in higher
MUE and RMSE (2.15 and 2.54 kcal/mol, respectively) and a more negative
mean signed error (−2.00 kcal/mol), while leaving the correlations
almost unvaried (rp = 0.46 and rs = 0.48). We then focused on the parameters
of the soft dihedrals present in the molecule, since such terms are
known to have limited transferability across different molecules and
might lead to inaccurate sampling of ligand conformations.[66] In particular, chemical groups such as sulfonamides
are especially challenging when considering that also quantum effects
like the interaction of the nitrogen lone pair with antibonding orbitals
involving the sulfur affect the torsional energy around the N–S
bond.[67,68] For these reasons, we decided to optimize
the torsional parameters that determine the ensemble of conformations
explored by the benzensulfonamide moiety in bromosporine. These dihedral
terms were optimized specifically for bromosporine using the program paramfit(43) where the target energies
were obtained with single-point energy calculations at the MP2/6-31G*
level of theory. The parameters derived in such a way are specific
for the bromosporine model we used, and provided a better set of parameters
in terms of agreement with QM energies in vacuo (Figure S2). In order to estimate the effect of
the new set of torsional parameters on the affinity predictions, RBFE
calculations were carried out. The predicted binding free energies
for the optimized model were derived by adding the ΔΔG obtained from the RBFE calculations, to the ΔGcalc values obtained with ABFE calculations
for the initial model. Table and Figure b summarize the results obtained for the bromosporine models with
modified torsional parameters for the benzensulfonamide group (full
breakdown of the free energy terms in Table S4). It is possible to note how the optimization of the sulfonamide
moiety had a positive effect on the absolute errors of the calculations,
with MUE decreasing to 1.54 [1.46, 1.68] kcal/mol and RMSE to 1.88
[1.78, 2.02] kcal/mol. The percentage of predicted binding free energy
values with error relative to ITC of less than 1 kcal/mol also increased
slightly, while the percentage of less than 2 kcal/mol remained constant.
Importantly, the strong overestimation problem was alleviated, so
that the errors were distributed more symmetrically around zero, with
a mean signed error of −0.53 [−0.65, −0.41] kcal/mol.
On the other hand, unexpectedly, the correlation between calculated
and experimental affinities slightly deteriorated (rp = 0.43 [0.36, 0.49] and rs = 0.46 [0.38, 0.58]). We further proceeded to derive optimized parameters
for other soft torsions found in bromosporine: the ones found in the
ethyl carbamate side chain, and again the one between the two aromatic
rings. However, new parameters for these dihedrals did not result
in improved agreement with experiment in terms of absolute error or
correlation (data not shown). It thus appeared that the sulfonamide
parameters only (among the one tested) were contributing toward the
systematic overestimation of the binding free energies for bromosporine.
Nonetheless, the modification did not completely solve the overestimation
issue, and the overall accuracy of the predictions for this ligand
was still modest as compared to what achieved with the two RVX ligands,
and what previously reported for the binding of 11 other compounds
to BRD4(1).[10] This suggests there probably
are other factors contributing toward the errors in the predictions
for this test case, which we have not yet been able to identify.As it was done for RVX-OH and RVX-208, the ABFE calculations were
also compared to a computationally cheaper, yet state of the art,
scoring function.[45] In this case too, the
CSM-Lig method returned lower correlation to experiment as compared
to ABFE, with rp = 0.22 and rs = 0.19 (Figure S3). Moreover,
the predictions deviated substantially from the ITC measurements,
returning a MUE of 5.38 kcal/mol and a RMSE of 5.62 kcal/mol. However,
contrary to the case in RVX-OH and RVX-208, for bromosporine this
approach overestimated binding, with most binding free energies being
around −14 kcal/mol (Table S6).
As for the RVX compounds, none of these predictions was within 2 kcal/mol
of the experimentally measured binding free energy.
Discussion
Here, we presented two test cases in which the prediction of the
different affinities for single ligands binding to multiple bromodomains
was attempted. Excellent agreement with experimental data was achieved
for the compounds RVX-OH and RVX-208, while the broad-spectrum inhibitor
bromosporine proved to be more challenging. It is important to note
how the quality of the ligand parameters might vary largely between
different small molecules depending on the chemical groups present.
The fact that reparameterization of the soft torsions in the benzensulfonamide
moiety for the bromosporine model alleviated the systematic overestimation
of binding affinities corroborates the hypothesis that transferability
of dihedral parameters in small molecules can be an issue. Parameterization
approaches where some of the ligand parameters, such as charges and
torsions, are derived on an ad hoc basis for the
specific molecule at hand might provide more accurate organic molecule
models for the study of dynamics and binding free energies.[66] Certainly, this is not the only issue with classical
force field models; nonetheless, it is one that can be readily tackled
without modification of current functional forms. Small-molecule force
fields are also relatively young when compared to protein force fields,
with much room for improvement. As demonstrated by the recent OPLS3
force field, little adjustments such as off-site charges and careful
parametrization that takes advantage of increasing availability of
experimental data can provide better models of protein–ligand
interactions within the framework of standard functional forms.[69] In this regard, also work on host–guest
systems, for which highly converged calculations can be obtained,
will prove valuable for the parametrization of the next generations
of small-molecule force fields.[70,71]For bromosporine,
even with the optimization of torsional parameters,
reasonable but not excellent agreement with ITC data was eventually
obtained, in particular in terms of correlation. This was caused by
large (>2 kcal/mol) overestimation or underestimation of the binding
free energy in a number of cases. The identification of the causes
for such disagreement with ITC proved challenging, as a large number
of potential reasons can be conceived. The imprecision of the calculations
alone cannot justify the magnitude of such errors (see SI Text T1 for a more detailed analysis of the
uncertainty estimates, and SI Text T2 for
a more comprehensive discussion on the possible sources of error).
However, severe sampling issues, such as the failure to access relevant
states (e.g., side-chain rotamers
or loop motions), might contribute toward the errors.[25,72] Identifying specific sampling deficiencies on a case-by-case basis
can, however, be both challenging and impractical, in particular with
the prospect of being able to apply these calculations to large numbers
of protein–ligand pairs. The combination of free energy calculations
with enhanced sampling techniques might thus be beneficial, as it
does not require the identification of specific degrees of freedom
that have been under-sampled.[12,73,74] Force field inaccuracies can also have a large effect on the calculations.
Aside from bonded terms, inaccuracies in the nonbonded parameters
can cause over/under-estimation of interaction energies and consequently
protein–ligand affinities. While parameters such as the Lennard-Jones
can be improved with larger sets of experimental data,[75,76] effects such as polarization, which can also affect the results
of the calculations across different proteins, will likely need a
more substantial alteration of the current models.[77−79] Cation-π
interactions have been shown to be important for the binding of certain
ligands to bromodomains such as CREBBP,[80−82] and this is an example
of a protein–ligand interaction that is unlikely to be captured
accurately by current classical force fields.[83] These are only some of the challenges that might, currently, negatively
affect the accuracy, reliability, and generality of binding free energy
calculations. Tautomerisation, changes of protonation states, and
effects due to the presence of the buffer, are other issues that may
arise and can also affect predicted protein–ligand binding
free energies.[83,84]Despite all the challenges,
the results presented here are encouraging.
In terms of absolute errors, the predictions achieved a RMSE below
the 1 kcal/mol mark for the RVX-OH/RVX-208 test case and below 2 kcal/mol
for bromosporine. The ability to quantitatively reproduce binding
free energies, rather than only correlate with them, has been a difficult
task for decades and for most computational approaches.[10,11] Now, physics-based models of chemical and biological matter and
computer simulations seem to be able to provide sufficient physical
accuracy and degree of sampling in order to reproduce one of the most
important thermodynamic quantities in biology and pharmacology. It
is, however, wise to exercise some cautious optimism. In fact, while
it is important to start assessing the accuracy and precision of the
calculations on specific systems, the test sets used so far for ABFE
calculations are limited in their size and diversity. As the computer
power available continues to increase, along with algorithmic improvements,[85] in the future it will be possible to expand
such validation studies to larger benchmark sets,[86] similarly to what has been only recently possible for RBFE
calculations.[12,19,87] With an expanded chemical and biological space used for testing,
the results will provide a better picture of the capabilities and
pitfalls of the methodology as well as its most fruitful fields of
applicability.We also showed how rigorous calculations based
on MD, in these
two test cases, outperformed a state-of-the-art machine learning scoring
function, providing superior agreement to experiment both in terms
of correlation and absolute error. Nonetheless, it is important to
realize that the difference in computational cost of the two approaches
is vast. While, currently, ABFE calculations need the use computer
clusters for hours to days, scoring functions can return a prediction
within seconds on a regular desktop machine. The calculations reported
here also required considerable human time for their setup. The use
of a few ad hoc in-house scripts aided the process;
however, ligand parametrization, generation of input files, setup
of restraints, and the data analysis all involved some degree of human
intervention. The lack of automated workflows might be partly due
the method still being explored, so that strong consensus around a
generally applicable “standard protocol” is missing.
Nonetheless, the human effort needed to familiarize and setup the
calculations does constitute an entry barrier for new users and might
prevent a widespread use. In fact, the development of automated and
user-friendly software has contributed to the spread of RBFE calculations
in only the past few years.[12,88,89] Credit must be given, however, to developers of freely available
simulation software who keep improving the capabilities of these programs
for free energy calculations.[90−92] We are confident that, gradually,
as interest in these calculations increases thanks to further developments
and testing, workflows will become more and more automated and user-friendly.
The reader who is already interested in learning how to run these
calculations can find some additional considerations, and references
to background information, in SI Text T3.In our opinion, ABFE calculations can have the highest potential
at the hit discovery and lead development stages, complementing established
techniques such as docking as well as emerging ones like RBFE calculations.
The objective is not to have perfect predictions and replace experiments,
but to shorten and reduce the cost of development cycles, and to promote
the testing of novel and synthetically challenging hypotheses. During
hit discovery, one can foresee virtual screening protocols where a
number of techniques are employed hierarchically.[93] Docking will always be faster than ABFE calculations, and
could be used for the initial enrichment of binders within the library
screened; this could be followed by intermediate approaches in terms
of accuracy and speed, such as implicit solvent calculations like
Molecular Mechanics Poisson–Boltzmann Surface Area (MMPBSA);[94] finally, for the most accurate rescoring and
ranking, ABFE calculations may be used. At the lead development stage,
such calculations could be employed to assess the likelihood of a
ligand with a very different scaffold to retain binding, or to assess
different ligands for their ability to hit the target selectively,
as discussed in this study.
Conclusion
In this work, we have
shown how absolute binding free energy calculations
based on molecular dynamics can be applied in order to computationally
predict the affinity profile of a ligand across multiple proteins
with no prior knowledge of the structure of the complexes or affinities
for similar ligands. The prediction of binding affinities and the
engineering of selectivity are both still major challenges in drug
design; however, with increasing computational power at our hands
and with more rigorous and general computational approaches, these
problems can start being tackled. It is here shown how a thorough
simulation protocol can, in the cases considered, achieve good correlation
and agreement with experimental values for drug-like ligands. It is,
however, evident that the accuracy of the calculations is system dependent,
and further investigation is needed in order to assess the effectiveness
of such predictions in different case scenarios and on different protein
and ligand systems. While the accuracy of these calculations for other
protein families still needs to be established, the performance of
the predictions here reported is encouraging.
Authors: Mindy I Davis; Jeremy P Hunt; Sanna Herrgard; Pietro Ciceri; Lisa M Wodicka; Gabriel Pallares; Michael Hocker; Daniel K Treiber; Patrick P Zarrinkar Journal: Nat Biotechnol Date: 2011-10-30 Impact factor: 54.908
Authors: Gabriel J Rocklin; Sarah E Boyce; Marcus Fischer; Inbar Fish; David L Mobley; Brian K Shoichet; Ken A Dill Journal: J Mol Biol Date: 2013-07-26 Impact factor: 5.469
Authors: Pietro Ciceri; Susanne Müller; Alison O'Mahony; Oleg Fedorov; Panagis Filippakopoulos; Jeremy P Hunt; Elisabeth A Lasater; Gabriel Pallares; Sarah Picaud; Christopher Wells; Sarah Martin; Lisa M Wodicka; Neil P Shah; Daniel K Treiber; Stefan Knapp Journal: Nat Chem Biol Date: 2014-03-02 Impact factor: 15.040
Authors: Zied Gaieb; Shuai Liu; Symon Gathiaka; Michael Chiu; Huanwang Yang; Chenghua Shao; Victoria A Feher; W Patrick Walters; Bernd Kuhn; Markus G Rudolph; Stephen K Burley; Michael K Gilson; Rommie E Amaro Journal: J Comput Aided Mol Des Date: 2017-12-04 Impact factor: 3.686
Authors: Andrea Rizzi; Steven Murkli; John N McNeill; Wei Yao; Matthew Sullivan; Michael K Gilson; Michael W Chiu; Lyle Isaacs; Bruce C Gibb; David L Mobley; John D Chodera Journal: J Comput Aided Mol Des Date: 2018-11-10 Impact factor: 3.686
Authors: David R Slochower; Niel M Henriksen; Lee-Ping Wang; John D Chodera; David L Mobley; Michael K Gilson Journal: J Chem Theory Comput Date: 2019-10-25 Impact factor: 6.006
Authors: Samuel C Gill; Nathan M Lim; Patrick B Grinaway; Ariën S Rustenburg; Josh Fass; Gregory A Ross; John D Chodera; David L Mobley Journal: J Phys Chem B Date: 2018-03-12 Impact factor: 2.991