Mihaela D Smilova1, Peter R Curran2,3, Chris J Radoux4, Frank von Delft1,5,6,7, Jason C Cole2, Anthony R Bradley4, Brian D Marsden1,8. 1. Centre for Medicines Discovery, University of Oxford, Old Road Campus Research Building, Roosevelt Drive, Headington, Oxford OX3 7DQ, U.K. 2. The Cambridge Crystallographic Data Centre (CCDC), Cambridge CB2 1EZ, U.K. 3. Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K. 4. Exscientia Ltd., The Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K. 5. Diamond Light Source Ltd., Harwell Science and Innovation Campus, Didcot OX11 0DE, U.K. 6. Research Complex at Harwell. Harwell Science and Innovation Campus, Didcot OX11 0FA, U.K. 7. Department of Biochemistry, University of Johannesburg, Auckland Park 2006, South Africa. 8. Kennedy Institute of Rheumatology, NDORMS, University of Oxford, Oxford OX3 7DQ, U.K.
Abstract
Selectivity is a crucial property in small molecule development. Binding site comparisons within a protein family are a key piece of information when aiming to modulate the selectivity profile of a compound. Binding site differences can be exploited to confer selectivity for a specific target, while shared areas can provide insights into polypharmacology. As the quantity of structural data grows, automated methods are needed to process, summarize, and present these data to users. We present a computational method that provides quantitative and data-driven summaries of the available binding site information from an ensemble of structures of the same protein. The resulting ensemble maps identify the key interactions important for ligand binding in the ensemble. The comparison of ensemble maps of related proteins enables the identification of selectivity-determining regions within a protein family. We applied the method to three examples from the well-researched human bromodomain and kinase families, demonstrating that the method is able to identify selectivity-determining regions that have been used to introduce selectivity in past drug discovery campaigns. We then illustrate how the resulting maps can be used to automate comparisons across a target protein family.
Selectivity is a crucial property in small molecule development. Binding site comparisons within a protein family are a key piece of information when aiming to modulate the selectivity profile of a compound. Binding site differences can be exploited to confer selectivity for a specific target, while shared areas can provide insights into polypharmacology. As the quantity of structural data grows, automated methods are needed to process, summarize, and present these data to users. We present a computational method that provides quantitative and data-driven summaries of the available binding site information from an ensemble of structures of the same protein. The resulting ensemble maps identify the key interactions important for ligand binding in the ensemble. The comparison of ensemble maps of related proteins enables the identification of selectivity-determining regions within a protein family. We applied the method to three examples from the well-researched human bromodomain and kinase families, demonstrating that the method is able to identify selectivity-determining regions that have been used to introduce selectivity in past drug discovery campaigns. We then illustrate how the resulting maps can be used to automate comparisons across a target protein family.
The past decade has
seen an explosion in the availability of genomic
and structural data for a great number of biomolecular disease targets.[1,2] Rational drug discovery aims to use this knowledge to design chemical
and biological agents that modulate target activity.[2] Despite recent technological advances, the development
of such agents is exceedingly expensive, is not routine, and carries
the risk of failure of compound development program after many years.
There is a great need for methods that can streamline and automate
the process of developing novel drugs and small molecule probes—the
highly selective chemical agents used to investigate the underlying
biology of disease.[2] Fragment-based drug
discovery (FBDD) has established itself as a powerful tool to identify
compounds as the starting point for probe and drug development.[3,4] Fragment hits form a small number of high-quality interactions with
the protein, and compared to larger compounds traditionally used in
high-throughput screens, they have been shown to provide greater hit
rates resulting in more efficient exploration of relevant chemical
spaces.[5] A critical aspect of FBDD is the
elaboration of the initial, low-affinity fragment hits in a stepwise
and rationally guided manner. In the past two decades, FBDD approaches
have contributed four approved drugs,[6−9] and tens of FBDD-derived compounds are currently
in clinical trials.[10] FBDD approaches have
also influenced and enhanced drug discovery efforts that did not start
from an experimental fragment screen. Recently, the potency of an
existing transition-state analogue lead was enhanced by incorporating
insights from FBDD, demonstrating the utility of using the latter
synergistically to add value to existing projects.[11] Such examples of “fragment-assisted” drug
discovery are becoming increasingly common.[10,12]The use of methods to predict fragment hotspots, regions within
the protein’s binding site that make a disproportionately large
contribution to binding affinity,[13,14] has been reported
in the literature[15−18] and used both to determine potentially tractable pockets and subpockets
on the protein surface and to guide the rational design of inhibitors.[19] In 2016, Radoux et al. introduced a method for
hotspot mapping based on the wealth of structural interaction data
in the Cambridge Structural Database (CSD),[20] which takes the molecular context of fragment binding into account.
This method is a promising way for guiding the rational design of
inhibitors as the maps provide an intuitive visual guide to favorable
interactions within the binding site and can indicate suboptimal interactions
within the original hit. The maps also give an objective numerical
understanding of the features important for binding, which allows
them to serve as the basis of automated approaches for hit prioritization
and progression.Tuning compound selectivity, or introducing
polypharmacology, is
a challenge frequently encountered in the progression of hits and
their subsequent development into drugs and chemical probes. A popular
way to identify potential protein off-targets for a compound of interest
is to use a method for comparing binding sites. Algorithms for binding
site comparison have been documented in the literature, as reviewed
by Ehrt et al.[21] and more recently by Konc.[22] Binding site features can be encoded as fingerprints,
graph models, or grids.[21] Fingerprints
are highly computationally efficient and so can be used for comparisons
between large numbers of structures, such as when identifying distantly
related off-target effects. Grid-based methods are computationally
more intensive to generate but can encode nuanced information on the
interactions of the binding sites, which is beneficial in the comparison
of closely related proteins. Examples of grid-based methods for binding
site mapping include GRID,[23] FLAP,[24] APF,[25] DoGSiteScorer,[26] SiteMap,[27] and PLImap.[28] As the importance of accounting for protein
flexibility in binding site comparisons has become increasingly apparent,
methods that calculate binding pocket signatures from an ensemble
of conformations have been developed and employed successfully in
medicinal chemistry programs.[21] Examples
include the study by Österberg et al.[29] and more recently by Volkamer et al.[30] The former combined ensembles of AutoDock[31] interaction energy grids into a single grid by using Boltzmann-weighted
averages for the values at each point in space. These grids showed
improved performance in docking compared to those derived by taking
the mean or minimum values of the interaction energy. Volkamer and
co-workers used grids generated by DogSiteScorer,[26] a method for predicting pocket druggability, and then compared
the frequencies at the points observed in the target and off-target
ensembles. Building on this study, in 2018, Turk et al. used AutoGrid
atom-based energy grids using polar and apolar atom probes as a key
part of a computational pipeline to guide the automated selectivity
conversion of an Aurora kinase inhibitor for the TrkA kinase.[32] In terms of extracting information from an ensemble
of grids, Schmalhorst and Bergner have developed a method based on
SiteMap to identify structures with unique design opportunities within
the ensemble,[27] providing a further example
of the utility and opportunities that can be explored by combining
information from grid-based binding site representations.Fragment
hotspot mapping differs from other methods as it provides
a quantitative estimate of the propensity of a fragment to exploit
particular interactions within the binding site. In addition, the
visualization of the maps grants the user an intuitive and visual
understanding of the binding pocket. The combination of these two
features makes this approach particularly attractive for progressing
hits in a rational and data-driven way while also allowing for expert
insight and intuition. This adds value to drug discovery campaigns
by automating subjective decision making, leading the process to become
more objective, reliable, and scalable.As more structural data
become available during the early stages
of drug discovery campaigns, researchers face the challenge of processing
information from up to hundreds of individual protein–ligand
structures and distilling it into testable hypotheses. To address
the challenge of usefully combining information for structures of
the same protein target, we have developed an “ensemble”
hotspot map approach. By comparing two ensemble maps, a hotspot selectivity
map can be derived. This highlights the structural differences that
determine the selectivity of a compound for one protein over another
member of the same protein domain family. The ensemble and selectivity
maps were parameterized using retrospective examples of compounds
showing selectivity between proteins in the same family. The resulting
maps can be visualized within PyMOL[33] through
a script automatically generated by the Hotspots API[34] or in any molecular viewer that supports visualizing .ccp4
or. grd formats. The method has been adopted in-house at Exscientia
within a number of drug discovery programs. The code supplied in the
Github repository (https://github.com/prcurran/hotspots, https://github.com/ccdc-opensource/hotspots/tree/master, https://github.com/CMD-Oxford/hotspotEnsembles) includes PyMOL scripts for visualization.
Methods
Figure shows the
full workflow for generating ensemble and selectivity hotspot maps.
After curating the ensemble data, the selected structures are aligned
in the region of the binding site and prepared for the hotspot map
calculation. Ensemble and selectivity maps are then calculated as
described below.
Figure 1
Workflow for generating the ensemble and selectivity maps.
Protein
structures in the ensemble are aligned in the region of the binding
site using the CSD Python API, and ligands, metals, and waters are
removed. Hotspot maps for each structure are then calculated and combined
into an ensemble map. The ensemble maps for on and off-targets can
then be compared, highlighting areas predicting target-selective interactions.
The color coding is red for the hydrogen bond acceptor maps, blue
for hydrogen bond donor maps, and yellow for apolar maps. This color
coding is consistent throughout the manuscript.
Workflow for generating the ensemble and selectivity maps.
Protein
structures in the ensemble are aligned in the region of the binding
site using the CSD Python API, and ligands, metals, and waters are
removed. Hotspot maps for each structure are then calculated and combined
into an ensemble map. The ensemble maps for on and off-targets can
then be compared, highlighting areas predicting target-selective interactions.
The color coding is red for the hydrogen bond acceptor maps, blue
for hydrogen bond donor maps, and yellow for apolar maps. This color
coding is consistent throughout the manuscript.
Data Curation
and Structure Preparation
Human bromodomain
structures were collected using the SIENA tool on the ProteinsPlus
webserver.[35] The tool was queried through
its RESTful API; the full parameters are provided in the Supporting
Information, Table S1 and in Github repositories
(https://github.com/prcurran/hotspots, https://github.com/CMD-Oxford/hotspotEnsembles, https://github.com/ccdc-opensource/hotspots/tree/master), which
include the code for querying the API. The structures were protonated
using the Protoss web server.[36] Only structures
with completely modeled residues in the peptide binding site were
included, and no mutations in the binding site were allowed. Only
structures deposited after the year 2000 were used to ensure consistent
model refinement and processing.Human kinase structures were
downloaded from the KLIFS database, accessed in September 2020.[37] Only structures with the DFG-in conformation,
resolution better than 2.5 Å, and a KLIFS quality score above
7 were kept. No restrictions were placed on the position of the glycine-rich
loop or αC helix. All ligands were bound within the ATP-binding
pocket. Structures in KLIFS were protonated with Protoss and have
had alternative residue conformations removed. Structures with mutations
in the binding site were discarded. Structures with an identical ligand
(based on the ligand canonical SMILES) were removed, and the highest
resolution structure was used in the ensemble calculation.Ligands,
water, metals, and ions were removed from all structures
prior to calculating the hotspot maps. While metals and strongly bound
waters can contribute toward selectivity, in this case we were interested
in replicating scenarios where very little previous information is
known about the target, and where such features have been previously
identified, they can be retained through CSD Python API functionality,
as described elsewhere.[20]Finally,
to mimic crystallographic fragment-screening scenarios,
only structures with ligands (excluding solvents) with a molecular
weight below 300 Da were included in both the kinase and bromodomain
data sets. This molecular weight cutoff was chosen since restricting
the filter to a strict “rule of three” results in missing
key protein–fragment complexes that are historic starting points
for successful fragment-to-lead optimizations, notably p38α
structures 1WBW and 1W84.[38] The full list
of structures used in the case studies is provided in the Supporting
Information, Table S3. No unliganded structures
were included in the analysis.
ChEMBL Data Curation
Compound activities were downloaded
from the ChEMBL[39,40] database release 29 (July 2021),
following the protocol described by Bosc et al. with modifications.[41] All activities recorded against human BRD1 (CHEMBL2176774),
BRPF1 (CHEMBL3132741), BRD2 (CHEMBL1293289), BRD4 (CHEMBL1163125),
BRD7 (CHEMBL3085622), and BRD9 (CHEMBL1163125) were retrieved. Only
bioactivities with a standard relation of “=” and standard_flag
= True were considered. Mutant sequences, potential duplicates, or
data points with data validity comments were dropped. The assay type
was restricted to “B” (binding assays) and data sources
to src_id = 1 (“scientific literature).” Only assays
with standard units in nM were included, with standard_type = “IC50”
or “Kd.” Only entries with ChEMBL quality scores of
9 (human targets flagged as “SINGLE PROTEIN”) were included.
Activities against the second bromodomains of BRD2 and BRD4 as well
as against the BRPF1A isoform were removed using a keyword search
in the “assay_description” field. When multiple activity
values were reported for a compound/target pair, the lowest (most
potent) one was taken. Selectivity ratios were calculated by dividing
the standard value for the off-target by that for the on-target. The
selectivity ratios were calculated only for activities of the same
standard type (“IC50” or “Kd).” Only compounds
with activity values reported against at least two of the six bromodomain
targets were considered. In addition, selected compounds were required
to have a crystal structure in complex with at least one of the targets
for which activity had been reported. Crystal structures were retrieved
from the Protein Data Bank (PDB)[42] by querying
with the compound InChI.
Structure Alignment
For each structure,
the binding
site was defined by taking all residues within 5 Å of the binding
site ligand using the CSD Python API’s Protein.BindingSiteFromMolecule()
function with the distance parameter set to 5.0 and “whole_residues”
set to true, meaning that any residue that places a heavy atom within
5 Å of the ligand is included in the binding site definition.
The union of binding site residues from all protein structures within
the ensemble then gave the ensemble binding site. The CSD Python API
(version 3.0.4)[20] was used to align the
ensemble structures based on the residues in the ensemble binding
site using only their Cα atoms.
Fragment Hotspot Mapping
Fragment hotspot maps were
generated as previously described.[14,34] The default
values for the fragment hotspot mapping method, release 1.0.5 were
used. GHECOM[43] version 20200721 was used
as a pocket detection and buriedness estimation method. A total of
3000 probe rotations was used for the fragment probe sampling step,
as previously described.[34] We recommend
this value as it offers sufficient thoroughness in sampling while
still retaining a reasonable speed of calculation of the hotspot maps.[34] The default seven-atom fragment probes were
used as described by Radoux et al.[14] Maps
were then truncated to the region of the binding site to facilitate
downstream analysis.
Generating Ensemble Maps
The fragment
hotspot map algorithm
outputs a set of three maps, one for each interaction probe type (donor,
acceptor, and apolar). Highly scoring points in each map denote areas
where a fragment is likely to form this type of interaction. The hotspot
map for each probe for a single protein structure is a 3-dimensional
grid with a spacing of 0.5 Å. An ensemble map is generated by
calculating hotspot maps for multiple overlaid proteins in the same
reference grid frame. This results in a set of hotspot maps where,
for each probe type, grid points are assigned a set of values, one
from each protein structure. For the apolar grid, values are aggregated
by calculating the median, whereas for polar grids (donor and acceptor),
values are aggregated by calculating the median of the nonzero points
above a frequency cutoff. This difference is necessary as polar hotspots
are intrinsically smaller than apolar hotspots (fewer than a hundred
grid points in polar clusters compared to thousands of grid points
in a cluster for the apolar maps) and more dependent on the orientation
of the donor and acceptor groups. Including these zero values in the
calculation of the median leads to dilution of information (see Figure , final column).
Therefore, to avoid introducing consequential false negatives, only
the nonzero values for grid points that have a hotspot value of at
least 20% (of cases referred to as the “frequency” of
a point in Figure ) are considered when calculating the median. Apolar grid points
are more likely to have associated values in most of the structures
and individual zero values have less of an effect (see Figure ). Thus, in the apolar case,
the median map is calculated including zero values for all points.
Figure 5
Setting the frequency threshold parameter for the ensemble maps.
Ensemble maps are compiled by taking the median values of samples
at a grid point for all points that have nonzero values in at least
the threshold fraction of structures in the ensemble. The first three
columns demonstrate the effect of setting this parameter to 0 (all
points sampled at least once), 20, and 50% for the respective protein
ensembles. The last column shows the median maps for the ensemble
when zero values are also included.
Generating Selectivity Maps
For a given pair of proteins
(on-target and off-target), the off-target ensemble maps are subtracted
from the on-target ensemble maps to create difference maps. The difference
maps are characteristically sparse grids with occasional clusters
of nonzero values. To identify features within the difference map
and enable downstream analysis, the density-based clustering algorithm
HDBSCAN[44] is used. HDBSCAN does not require
any initial estimate on the number of clusters to identify, merely
requiring a single clustering parameter, which is the minimal number
of points in a cluster. We use a value of 7 for polar maps as that
is equivalent to the smallest spherical element in a voxelized grid
with a radius comparable to that of the polar probe atoms (oxygen
and nitrogen) and a value of 27 to approximate a methyl group as the
minimal apolar feature.Clusters are annotated for the on-target
(positive values) and off-target (negative values) regions of the
difference map. Any overlapping clusters (centroids less than 1.5
Å for rigid binding sites and 3 Å for more flexible targets
with flexible binding pockets) are disregarded. Each cluster is assigned
a score corresponding to the median value of all the points in that
cluster. Clusters with a median cluster score below 10 are not considered
selective, reflecting the minimum hotspot value, which is considered
to denote a favorable fragment interaction as previously reported.[14]
Results and Discussion
We envisage
that the ensemble and selectivity maps can be used
as a tool to guide the elaboration of fragment and lead-like compounds
by both highlighting favorable unexploited interactions in the vicinity
of the compound as well as identifying unfavorable interactions that
are currently made by a ligand. We demonstrate using three well-explored,
therapeutically significant examples that the ensemble and selectivity
map generation protocol can retrospectively rationalize observed compound
selectivity between members of the same protein family. The example
data sets include the bromodomain proteins BRD1 and BRPF1, the CAMK
family kinases p38α and ERK2, and the more distantly related
kinases PIM1 and CK2α. To mimic prospective crystallographic
fragment-screening data, the example data sets consist solely of crystallographic
structures in complex with fragment-sized bound ligands. While the
methodology presented is applicable to a wider range of protein structural
data (apo-structures, complexes with larger ligands, etc.) used in
compound optimization, in this study, we have chosen to focus on fragments
as there is a great need for methods in this space to drive the progression
from hits to leads with increased potency and selectivity, and these
methods need to be specifically developed and parameterized for use
with fragment-sized ligands. Finally, we show how the ensemble and
selectivity maps can enable automated analyses within the bromodomain
protein family.
Bromodomains: Selectivity between BRD1 and BRPF1
Bromodomain
proteins are a family of epigenetic regulators acting as readers of
histone tail modifications. They bind acetylated lysine residues on
histone protein tails usually through a highly conserved asparagine
residue in the bromodomain binding site.[45] Over the past decade, there has been significant pharmaceutical
interest in the development of selective bromodomain inhibitors as
epigenetic dysregulation underpins a number of human diseases, including
many cancers.[45−47] In the human bromodomain BRPF subfamily, the substitution
of a serine (S592) in BRD1 by a proline (P658) in the closely related
BRPF1 (Figure ) results
in a notable difference proximal to the conserved asparagine residue
within the binding sites of these two proteins. The serine backbone
nitrogen can form a hydrogen bond with ligand acceptor atoms within
the bromodomain binding pocket, while the corresponding proline backbone
nitrogen in BRPF1 cannot. Although this substitution can be identified
at the sequence level, visual inspection is needed to show that the
serine backbone NH in BRD1 is accessible from the binding site. However,
this is not sufficient information to conclude whether the presence
or absence of this interaction would make a measurable difference
in binding affinity and, therefore, whether it can be used to drive
selectivity between the two targets. To explore whether this information
can be automatically derived from our approach, ensemble maps were
calculated using 23 BRD1 and 26 fragment-bound BRPF1 structures (the
full list of PDB codes is provided in the Supporting Information, Table S3). Figure shows that the maps were able to identify the selective
binding feature, automatically capturing this information without
the need for visual inspection of large numbers of protein structures.
A review of the literature studies shows that this feature is exploited
by the BRD1-selective probe BAY-299, published in 2017 by Bouché
et al.[48] The original hit was found in
a high-throughput screen and already exhibited nanomolar affinity
for BRD1 while not exhibiting activity against BRPF1, BRPF3, and BRD4
up to 20 μM. The addition of a methyl group at position 6 of
the 1,3-dimethylbenzimidazolone core introduced a fourfold increase
in affinity for BRD1, locking the tricyclic group in a bioactive conformation.
The structure of this compound in complex with BRD1 was solved (PDB
ID: 5N49), which showed that one of the carbonyl groups on naphthalimide
makes the selective hydrogen bond to S592. The IC50 values show that
the compound has a little over 15-fold selectivity for BRD1 over BRPF1
(in a TR-FRET assay). As this compound had poor solubility, an alkyl
alcohol tail was added at position 4 on naphthalimide, yielding the
chemical probe BAY-299. The authors report the hydrogen bond with
S592 as a key factor driving the selectivity of these compounds for
BRD1 versus BRPF1. Generating selectivity maps from an ensemble was
critical in this example as the selective feature is not present in
all the individual maps from BRD1 structures (as exemplified by PDB
ID 5POS, Supporting Information, Figure S1). This is due to the sensitivity of hydrogen bonds to the orientation
of the donor and acceptor groups, where small twisting motions in
the backbone can mean that the feature is not detected in a minority
of conformations.
Figure 2
Selectivity in human bromodomains: BRD1 over BRPF1. The
figure
shows the selective precursor to the chemical probe BAY-299, which
was cocrystallized with BRD1 (PDB ID: 5N49). It forms two hydrogen
bonds with the protein in the binding pocket, shown as (1) and (2)
in the pop-out. In BRPF1 (orange ribbon and sticks), this interaction
(2) cannot occur because of the substitution of a proline (P658) at
this location. The acceptor selectivity maps for BRD1 over BRPF1 identify
this difference, shown by the area of acceptor propensity (red surface).
The table shows the compound’s IC50 values for both
proteins as reported by Bouché et al.[49]
Selectivity in human bromodomains: BRD1 over BRPF1. The
figure
shows the selective precursor to the chemical probe BAY-299, which
was cocrystallized with BRD1 (PDB ID: 5N49). It forms two hydrogen
bonds with the protein in the binding pocket, shown as (1) and (2)
in the pop-out. In BRPF1 (orange ribbon and sticks), this interaction
(2) cannot occur because of the substitution of a proline (P658) at
this location. The acceptor selectivity maps for BRD1 over BRPF1 identify
this difference, shown by the area of acceptor propensity (red surface).
The table shows the compound’s IC50 values for both
proteins as reported by Bouché et al.[49]
Designing Selectivity between
Closely Related Kinases: p38α
and ERK2
In the family of human protein kinases, an ATP-binding
pocket residue known as the gatekeeper is an important determinant
of selectivity.[30] This is exploited by
the p38α-selective inhibitor SB1 (SB203580) as shown in Figure , which possesses
selectivity over related MAPK kinases, notably ERK2.[49,50] The apolar selectivity maps for an ensemble of five fragment-bound
p38α structures against 17 ERK2 fragment-bound structures (Supporting
Information Table S3) can identify and
highlight the selective hydrophobic pocket that the inhibitor binds
in, as illustrated in Figure . As with the previous example, all structures in this case
study had fragment-sized bound ligands in order to recreate a prospective
fragment-screening scenario.
Figure 3
Kinase selectivity: identifying gatekeeper differences.
The figure
shows the overlaid structures of p38α (PDB ID 1A9U, in steel
blue) and ERK2 (PDB ID 4QP1, in peach). The close-up shows the binding
mode of SB1 (light green). The ligand’s fluorophenyl moiety
is situated in a hydrophobic back pocket located behind the gatekeeper
residue T106. The apolar selectivity map (yellow surface) for p38α
over ERK2 highlights this area as a favorable location to place a
selective apolar group, such as fluorophenyl in SB1. The table shows
the compound’s IC50 values for both proteins, as
reported by Wang et al.[50]
Kinase selectivity: identifying gatekeeper differences.
The figure
shows the overlaid structures of p38α (PDB ID 1A9U, in steel
blue) and ERK2 (PDB ID 4QP1, in peach). The close-up shows the binding
mode of SB1 (light green). The ligand’s fluorophenyl moiety
is situated in a hydrophobic back pocket located behind the gatekeeper
residue T106. The apolar selectivity map (yellow surface) for p38α
over ERK2 highlights this area as a favorable location to place a
selective apolar group, such as fluorophenyl in SB1. The table shows
the compound’s IC50 values for both proteins, as
reported by Wang et al.[50]The fluorophenyl group of SB1 occupies the selective hydrophobic
back pocket of p38α and clashes with the glutamine gatekeeper
of ERK2. The hotspot maps were able to identify this feature using
only fragment-bound structures as an input for the ensemble maps.
In fact, this back pocket is often explored by fragment-sized hits
(e.g., PDB ID 1W7H). In the case of fragment screening against p38α,
the selectivity maps could be used to indicate which fragments (and
chemical groups within the fragments) might be selective for p38α
over ERK2, providing suggestions for achieving selectivity at a very
early stage in the compound design process. Despite using only five
structures, all the bound ligands in the p38α ensemble have
unique Murcko scaffolds,[51] and they all
explore the selective pocket. This includes the minimal pharmacophore
chlorophenol (PDB ID 1WBO) that preferentially explores the selective
area behind the gatekeeper compared to the canonical kinase hinge
hotspot. The ERK2 ensemble contains more structures and higher ligand
diversity (19 structures and 11 unique scaffolds), and while structures
such as PDB ID 3ERK (not included in the ensemble as it is above the
300 Da limit) indicate that it is possible to reach that pocket in
ERK2, the majority of the ERK2 fragments do not. Despite a relatively
low number of structures in the p38α ensemble, and a significant
imbalance in the number of structures between the p38α and ERK2
ensembles, our approach is still able to help identify selective features.
CK2α and PIM1: Distantly Related Kinases that Bind the
Same Ligand
The above examples demonstrate that the selectivity
maps are able to retrospectively rationalize polar and apolar selectivity
features within protein subfamilies. In this final case study, we
explore a retrospective example of selectivity between kinases from
different subfamilies: human CK2α (CK2 subfamily) and human
PIM1 (CAMK subfamily).CX-4945 was originally developed as an
ATP-competitive, orally available CK2α inhibitor with nanomolar
affinity for its target but which also displayed off-target activity
for the PIM1 kinase.[52,53] In 2011, Battistutta et al. developed
a series of CK2α inhibitors among which CX-5279 (Figure ) retained affinity for CK2α
while achieving selectivity against PIM1.[53]
Figure 4
Tuning
the selectivity of CK2α inhibitors toward PIM1. The
binding mode of inhibitors in the ATP-binding pocket of CK2α
is shown on the left. The close-up shows the overlay of the CK2α-bound
structures of CX-5279 (PDB ID 3ROT, ligand only, shown in aqua) and
CX-4945 (PDB ID 3PE1, protein shown in light blue and ligand in dark
teal), showing the selective hotspot regions for CK2α. The apolar
and donor maps are shown as yellow and blue surfaces, respectively.
PIM1 (PDB ID 2C3I) is shown in peach for comparison. Residues giving
rise to the selective features are shown as sticks and labeled. The
red circle indicates the donor nitrogen in the cyclopropylamine group
of the selective compound and the underlying donor density (blue surface)
in the selectivity map. The IC50 values shown are as reported
by Battistutta et al.[54]
Tuning
the selectivity of CK2α inhibitors toward PIM1. The
binding mode of inhibitors in the ATP-binding pocket of CK2α
is shown on the left. The close-up shows the overlay of the CK2α-bound
structures of CX-5279 (PDB ID 3ROT, ligand only, shown in aqua) and
CX-4945 (PDB ID 3PE1, protein shown in light blue and ligand in dark
teal), showing the selective hotspot regions for CK2α. The apolar
and donor maps are shown as yellow and blue surfaces, respectively.
PIM1 (PDB ID 2C3I) is shown in peach for comparison. Residues giving
rise to the selective features are shown as sticks and labeled. The
red circle indicates the donor nitrogen in the cyclopropylamine group
of the selective compound and the underlying donor density (blue surface)
in the selectivity map. The IC50 values shown are as reported
by Battistutta et al.[54]Ensemble and selectivity maps were calculated for 28 CK2α
and 32 PIM1 fragment-bound structures (Supporting Information, Table S3). The apolar selectivity map for CK2α
over PIM1 reveals an area of apolar propensity in CK2α that
is inaccessible in PIM1 due to the conformation of residue F49 in
the PIM1 structures (Figure ). This difference stems from the dominant conformation of
this residue in the majority of ligand-bound PIM1 structures and so
it would be nontrivial to predict from the sequence alone and without
visual inspection of a large number of off-target structures. The
ensemble and selectivity maps automatically highlight this feature
without the need for human inspection and subjective judgment. The
starting inhibitor, CX-4945, has a chlorophenyl group in this location.
The selectivity maps suggest that placing a larger, highly lipophilic
substituent on the phenyl ring in that location may improve the selectivity
for CK2α over PIM1. In the closely related inhibitor CX-5279,
trifluoromethyl was substituted for chlorine, which resulted in improved
selectivity for CK2α, despite the relatively small increase
in molecular volume between the substituents (Figure ). The authors reported that increasing the
size of the apolar substituent at that position leads to a decrease
in potency for PIM1, providing a link between that substituent and
the selectivity of the compounds against PIM1. The authors suggested
that this is due to a clash with the inward-facing F49 in the PIM1
kinase structures.Another difference between CK2α and
PIM1 lies in the hinge
region and is identifiable by the selectivity maps. PIM1 contains
a proline (P123) at the position of the hinge valine (V116) in CK2α,
meaning it is unable to form one of the hydrogen bonds that CX-4945
makes with CK2α (Figure ). The PIM1 hinge also has an insertion of two residues. This,
coupled with the valine to proline substitution, prevents the donor
nitrogen in the cyclopropylamine group from forming a hydrogen bond
with the backbone acceptor. The donor selectivity map highlights this
feature (Figure ).
In the study reported in ref (54) in 2011, the SAR for the compound series that includes
CX-4945 showed no increase in affinity for CK2α when cyclopropylamine
was added.[53] Furthermore, the influence
of this moiety on selectivity against PIM1 is not explicitly shown.
However, the resulting ligand is less lipophilic and makes an additional
hydrogen bond, which is desirable in the design process; the selectivity
maps suggest this modification. In the case of the apolar selective
feature, a visual inspection of the ensemble structures reveals that
F49 consistently adopts the inward-facing conformation that clashes
with the selective ligand, increasing confidence in the selectivity
of the feature.
Exploring and Adjusting the Method Parameters
The quality
of the ensemble maps is highly dependent on the quality and quantity
of the input ensemble data. A key property of the ensemble is the
number of structures included. While crystallographic fragment-screening
experiments can provide up to tens of structures of ligands in complex
with the target protein, such data are not always available. The ensemble
of p38α structures above shows that even a small ensemble with
a diverse selection of ligands can be used to identify potentially
selective features. With smaller ensemble sizes, individual structures
contribute proportionally more to the ensemble map. Visual inspection,
or plots such as those shown in the Supporting Information Figure S1, can be used to establish which structures
contribute to a particular hotspot cluster. In the case of p38α,
all the maps contribute to the selective feature discussed. By reducing
the frequency threshold of the ensemble maps, the contribution of
individual structures to the maps can be amplified in larger ensembles
as well. A complementary computational method can then be used to
assess the significance of rare or unique conformations. For instance,
molecular dynamics-based methods such as dynamic undocking[54] can be used to assess the stability of the ligand–protein
complex and guide the decision whether that would be a desirable structure
to include in the ensemble. For an ensemble with tens of structures,
even at a 20–30% frequency threshold, signals arising from
rare conformations may be lost in noise from adjacent clusters. In
such cases, the question of the minimum representative ensemble arises.
In the examples above, we employed a ligand-based measure for ensemble
diversity—the number of unique Murcko scaffolds (Supporting
Information Table S4)—and considered
its value to guide the minimum number of structures in the ensemble.
Protein or protein-and-ligand-based methods, such as protein–ligand
interaction fingerprints[55,56] or backbone clustering
methods,[57] may be used to select a diverse
ensemble or even to generate ensemble maps for a group of structures.
For instance, in a lead optimization campaign, a subset of structures
in complex with ligands from the same series might be used to identify
unexploited hotspot interactions in this very specific context. A
general recommendation for ensembles with over 10 structures is to
build ensemble maps with a frequency threshold of 20% to identify
the most frequent hotspots throughout the ensemble. If a comparison
of these maps against an off-target generates no selective clusters,
switching to a lower threshold can reveal rarer features. Conversely,
if the maps appear noisy (see Figure , first column),
switching to a higher frequency threshold can reveal the most prominent
hotspots.Setting the frequency threshold parameter for the ensemble maps.
Ensemble maps are compiled by taking the median values of samples
at a grid point for all points that have nonzero values in at least
the threshold fraction of structures in the ensemble. The first three
columns demonstrate the effect of setting this parameter to 0 (all
points sampled at least once), 20, and 50% for the respective protein
ensembles. The last column shows the median maps for the ensemble
when zero values are also included.Another key point to consider when assembling a protein ensemble
is the presence of mutually exclusive protein conformational states.
In the kinase examples discussed above, only structures in the DFG-in
conformation were included. In the case where a protein can adopt
mutually exclusive states, we have found that compiling separate ensembles
for each state can provide more detailed information for designing
compounds against that particular conformation. Class imbalances within
the ensembles can result in artifacts when calculating selectivity
maps downstream. An example of this is shown in Supporting Information Figure S2, where a feature originating from the
DFG-out conformation of p38α passes the frequency cutoff in
that ensemble (35 structures were in the DFG-in conformation, 17 in
DFG-out, and 3 were classified as “out-like” in the
KLIFS database. The full list of PDB codes is provided in the Supporting
Information, Table S5). The corresponding
ERK2 ensemble is exclusively DFG-in (69 structures). When selectivity
maps of p38α over ERK2 were calculated, a false positive region
of density corresponding to the DFG-out conformation confounds the
interpretation of the maps.In the case of selectivity maps,
two key parameters to be considered
are the minimal distance between the centers of selective clusters
in the target and off-target selectivity maps as well the minimum
hotspot score a cluster must pass in order to be considered selective.
In principle, the ideal selective feature would both score highly
and be located at a reasonable distance from other features that possess
binding propensity for the off-target. The lower limit for the first
parameter is twice the step size of the grid: 1 Å between features.
The minimum distance is dependent on other factors such as the volume
and flexibility of the binding site. For the small and rigid bromodomain
binding sites, we found that using a cutoff of 1.5–2.0 Å
was sufficient to isolate the polar selective features, which is consistent
with values published previously.[58] For
kinases, a threshold of 3 Å was chosen as their binding sites
exhibit conformational diversity, so a value twice the minimal distance
was used as a precaution against false positives. In cases where little
information is available for the target and off-target proteins, prioritizing
highly scoring clusters at a distance of 3 Å or more from the
center of the closest off-target cluster would be a recommended starting
point, lowering this parameter only if no features are detected, with
the caveat that features will be identified with lower certainty.
Selectivity Maps Identify Selectivity-Determining Regions across
Subsets of Targets in the Same Protein Family
After ascertaining
that the selectivity maps are able to identify known selectivity features
between pairs of proteins within the same family, we developed a procedure
that would allow for automated and objective analyses across a target
protein family. We again chose to focus on human bromodomains as there
is a wealth of structural and activity data publically available for
validating the method predictions. In the example presented in Figure , BRD1 was chosen
as the target protein and compared to both high-sequence identity
(BRPF1, BRD7, and BRD9) and lower-sequence identity (BET bromodomains
BRD2(1) and BRD4(1)) off-targets. Ensemble maps were calculated for
all the proteins without applying the 300 Da cutoff on bound ligands,
in order to take advantage of all the information available for the
potential off-targets, as would be the case in a drug discovery project.
Selectivity maps were then calculated for BRD1 against each of the
off-target ensembles. The selectivity maps were combined into “summary”
selectivity maps, using the methodology developed for compiling the
ensemble maps. The frequency cutoff was set to zero, so a feature
had to be present in at least one of the individual selectivity maps
in order to be included. The summary selectivity maps contain information
on the specific off-targets against which this feature can be exploited.
As in the previous case studies, a hotspot score cutoff of 10 was
used for the selectivity maps. Two prominent features in the summary
selectivity maps are shown below in Figure , panel B. The Acceptor1 feature adjacent
to S592 includes contributions from the selectivity maps against BRPF1,
BRD4, BRD7, and BRD9, while the apolar feature (Apolar1) is scored
favorably against BRD7, BRD9, BRD2, and BRD4. We then identified compounds
in the literature with published activities against BRD1 and at least
one of the off-targets along with a published crystal structure in
complex with at least one of the target or off-targets.
Figure 6
Selectivity
maps identify selectivity-determining regions across
subsets of targets in the same protein family. (A) Human bromodomain
phylogenetic tree (adapted from https://www.thesgc.org/chemical-probes) showing the target (BRD1) circled in green and off-targets circled
in red. (B) Summary maps of the BRD1 selectivity maps against the
off-targets. The acceptor feature (Acceptor1) has contributions from
the BRPF1, BRD4, BRD7, and BRD9 selectivity maps and the apolar feature
(Apolar1) has contributions from BRD2, BRD4, BRD7, and BRD9. The apo-protein
structure shown is of BRD1, PDB ID 3CRW. (C) Selectivity ratios for
BRD1 versus off-targets for the eight compounds in the data set. Values
above 1 indicate selectivity for BRD1 and are colored in green. Values
below 1 are considered not selective and are colored in red, and combinations
for which information is not available in the data set are colored
in gray. The last two columns indicate whether the crystal structure
of the compound places a heavy atom in the predicted selective region.
(D) 2D structures of the compounds in the data set and their PDB molecule
IDs. The corresponding ChEMBL IDs and the PDB codes of the protein-bound
structures are provided in the Supporting Information Table S7.
Selectivity
maps identify selectivity-determining regions across
subsets of targets in the same protein family. (A) Human bromodomain
phylogenetic tree (adapted from https://www.thesgc.org/chemical-probes) showing the target (BRD1) circled in green and off-targets circled
in red. (B) Summary maps of the BRD1 selectivity maps against the
off-targets. The acceptor feature (Acceptor1) has contributions from
the BRPF1, BRD4, BRD7, and BRD9 selectivity maps and the apolar feature
(Apolar1) has contributions from BRD2, BRD4, BRD7, and BRD9. The apo-protein
structure shown is of BRD1, PDB ID 3CRW. (C) Selectivity ratios for
BRD1 versus off-targets for the eight compounds in the data set. Values
above 1 indicate selectivity for BRD1 and are colored in green. Values
below 1 are considered not selective and are colored in red, and combinations
for which information is not available in the data set are colored
in gray. The last two columns indicate whether the crystal structure
of the compound places a heavy atom in the predicted selective region.
(D) 2D structures of the compounds in the data set and their PDB molecule
IDs. The corresponding ChEMBL IDs and the PDB codes of the protein-bound
structures are provided in the Supporting Information Table S7.The procedure for querying activities and crystals identified eight
compounds for which high-quality data were available for binding to
BRD1 and for at least one of the off-targets, and which have also
been crystallized in complex with one of the proteins of interest.
We chose to limit our search to compounds with available crystal structures
in order to avoid introducing noise that would necessarily be created
with a docking procedure.The structures of the eight compounds
were rescored against the
summary selectivity maps using the Hotspots API (as previously described[14,34]) in order to identify substituent groups that interact with the
predicted selective areas. A hit was defined if at least one heavy
atom in the compound structure scored favorably for a particular feature
(last two columns in Figure , panel C). This analysis shows that compound 8LW, which is
selective for BRD1 against both BRPF1 and the more distantly related
targets, places appropriate atom types in both clusters. Compounds
that hit only the apolar cluster do not exhibit selectivity over BRPF1,
as expected. However, compound N48 demonstrates that the selective
apolar cluster on its own may not be sufficient to grant selectivity
over BRD9. This is not unexpected as selectivity is a complex phenomenon
achieved by the interplay of multiple structural features; hence,
covering multiple features that are selective against different subsets
of the protein family would result in a more selective ligand. This
principle has been used extensively for well-researched protein families
with known selectivity determinants such as kinases.[59] The selectivity maps have shown to be able to detect such
features, and as scoring compound poses against the maps is computationally
very fast (a few seconds per pose), these types of analyses can be
used to objectively score large numbers of docked potential follow-up
poses. The previously published Hotspots API[34] also allows for the extraction of pharmacophoric features from the
maps, which can then be provided as an input to programs such as CrossMiner[60] and used for the growing and merging of compounds.
Conclusions
Fragment hotspot mapping has previously been
shown to be a promising
method for guiding the hit-to-lead phase of drug and probe discovery
campaigns. Building upon this approach, we have introduced ensemble
hotspot maps to summarize important fragment-binding interactions
made by an ensemble of structures, extending the original hotspot
method to work over multiple conformations of the same protein. Importantly,
ensemble hotspot maps can then be used to highlight differences between
related proteins in the same family. These maps can highlight nuanced
differences between protein binding sites. We have shown three case
studies from well-researched protein families in which the selectivity
maps were able to identify binding site differences used to design
selective inhibitors. The method will be applied to further protein
families to obtain the recommended values for the various map parameters
as well as to cases in which X-ray structures of defragmented bound
ligands have been crystallized. Compiling ensemble maps from snapshots
of molecular dynamics trajectories is also being investigated to further
understand the behavior of binding site hotspots, especially in cases
where few experimental structures are available. Currently, the fragment
hotspot mapping method has been validated for apolar, hydrogen bond
donor, and hydrogen bond acceptor probes. These interactions represent
a large portion of protein–fragment intermolecular contacts,
but the method could in the future be extended by adding charged or
halogen fragment probes. The workflows for calculating ensemble and
selectivity maps can also be applied to process grid-based representations
from other binding site mapping methods although the values for the
map-specific parameters will likely have to be adjusted. Overall,
the ensemble and selectivity maps are a quick and scalable means to
summarize and intuitively present structural information from closely
related proteins and generate hypotheses on achieving selectivity.
Data and
Software Availability
The code for the ensemble
and hotspot maps along with the data from the examples presented above
and tutorials on installing the Hotspots API and its dependencies
can be found in the following Github repositories: https://github.com/ccdc-opensource/hotspots, https://github.com/prcurran/hotspots (latest version of the Hotspots API) and https://github.com/CMD-Oxford/hotspotEnsembles (this contains the data for the figures presented in this study
and the scripts used to generate it). The code is available for free
but is based on the CSD Python API[20,34] and the CSD
program SuperStar, which require a CSD license. The full data for
the case studies, including the prepared input protein structures,
are provided in the following link: https://doi.org/10.5281/zenodo.5574443.
Authors: Alex Preston; Stephen Atkinson; Paul Bamborough; Chun-Wa Chung; Peter D Craggs; Laurie Gordon; Paola Grandi; James R J Gray; Emma J Jones; Matthew Lindon; Anne-Marie Michon; Darren J Mitchell; Rab K Prinjha; Francesco Rianjongdee; Inmaculada Rioja; Jonathan Seal; Simon Taylor; Ian Wall; Robert J Watson; James Woolven; Emmanuel H Demont Journal: J Med Chem Date: 2020-08-20 Impact factor: 7.446
Authors: Fabrice Pierre; Peter C Chua; Sean E O'Brien; Adam Siddiqui-Jain; Pauline Bourbon; Mustapha Haddach; Jerome Michaux; Johnny Nagasawa; Michael K Schwaebe; Eric Stefan; Anne Vialettes; Jeffrey P Whitten; Ta Kung Chen; Levan Darjania; Ryan Stansfield; Kenna Anderes; Josh Bliesath; Denis Drygin; Caroline Ho; May Omori; Chris Proffitt; Nicole Streiner; Katy Trent; William G Rice; David M Ryckman Journal: J Med Chem Date: 2010-12-21 Impact factor: 7.446