A "fragment hit", a molecule of low molecular weight that has been validated to bind to a target protein, can be an effective chemical starting point for a drug discovery project. Our ability to find and progress fragment hits could potentially be improved by enhancing our understanding of their binding properties, which to date has largely been based on tacit knowledge and reports from individual projects. In the work reported here, we systematically analyzed the molecular and binding properties of fragment hits using 489 published protein-fragment complexes. We identified a number of notable features that these hits tend to have in common, including preferences in buried surface area upon binding, hydrogen bonding and other directional interactions with the protein targets, structural topology, functional-group occurrence, and degree of carbon saturation. In the future, taking account of these preferences in designing and selecting fragments to screen against protein targets may increase the chances of success in fragment screening campaigns.
A "fragment hit", a molecule of low molecular weight that has been validated to bind to a target protein, can be an effective chemical starting point for a drug discovery project. Our ability to find and progress fragment hits could potentially be improved by enhancing our understanding of their binding properties, which to date has largely been based on tacit knowledge and reports from individual projects. In the work reported here, we systematically analyzed the molecular and binding properties of fragment hits using 489 published protein-fragment complexes. We identified a number of notable features that these hits tend to have in common, including preferences in buried surface area upon binding, hydrogen bonding and other directional interactions with the protein targets, structural topology, functional-group occurrence, and degree of carbon saturation. In the future, taking account of these preferences in designing and selecting fragments to screen against protein targets may increase the chances of success in fragment screening campaigns.
A major challenge in
drug discovery research is the identification
of a suitable small molecule that binds the target and can serve as
a starting point for chemistry exploration and optimization. This
critical early hit-identification step significantly influences the
overall likelihood of success of a drug discovery project. Very small
molecules (molecular weight <300 Da),[1] referred to as “fragments”, can be ideal hits in some
respects, as they tend to have favorable physical properties and form
high-quality interactions with the target protein. Screening a library
of fragments by evaluating their binding to a target protein has proven
to be an effective method for identifying hits. From its origins as
an infrequently used NMR-based method,[2] fragment screening has evolved into an approach that has been widely
adopted by industry and academia[3−10] and has played a central role in the discovery of two approved drugs
to date[11−13] and in the identification of a number of clinical
candidates.[14]Fragment screening
has a number of potential advantages over screening
larger compounds as an approach to the identification of starting
points for drug development. Although, in contrast to larger compounds,
a fragment typically has fewer interactions with the target protein
and thus lower affinity overall, thermodynamic[15] and probabilistic[16] models of
small molecule–macromolecule binding suggest that these fragment–protein
interactions are individually of greater energetic reward. Additionally,
screening can more effectively sample the chemical space of smaller,
less complex compounds,[17] which improves
the odds of identifying binders that have high ligand efficiency.[18] Fragments may thus provide medicinal chemists
with a greater number of promising opportunities and, owing to their
smaller size, greater flexibility in the optimization process.These advantages notwithstanding, fragment-based approaches have
a number of limitations. It can be challenging to detect weak-affinity
fragment hits[19] and effectively distinguish
them from false positives.[3] Furthermore,
structural information describing the atomic interactions between
the fragment hit and its protein target is typically necessary for
successful fragment optimization, but such information can be difficult
to obtain.[20] These caveats hinder fragment-hit
identification, evaluation, and prioritization, thus significantly
limiting the reliability and success of fragment-based screening protocols.Fragment-based approaches could potentially be improved with a
deeper understanding of the molecular properties of fragment hits
and how such fragments bind to their target proteins. To date, our
understanding of fragment–protein binding has largely been
derived from individual case studies rather than from a broad structural
analysis of validated fragment hits. Here, we have collected and reviewed
publicly available fragment-hit data, of which a substantial amount
was deposited since 2012, from a variety of fragment-based screening
campaigns.[21−23] We first describe the protein systems used in these
fragment screens, followed by a cheminformatics analysis of the fragment
hits to assess their properties (such as lipophilicity and structural
topology). We then analyze how the fragments interact with their targets
using various measures (such as the amount of buried polar surface
area (SA) and the number of hydrogen bonds (H-bonds) between the fragments
and the proteins). By studying a large set of fragment hits in this
manner, we elucidate some of their salient properties, quantifying
and adding to what is typically tacit knowledge among fragment-based
screening practitioners and medicinal chemists. Our findings could
potentially be used to improve the chances of success for fragment-based
screening methods, in particular by informing fragment design and
the evaluation of fragment hits.
Methods
Structures of the complexes were extracted from the Protein Data
Bank (PDB)[24] using the keyword “fragment”
in a text-based query performed on Jan 13, 2019 and then filtered
by keeping only the structures with a crystallographic resolution
≤2.5 Å and containing a ligand with a total number of
nonhydrogen atoms ≤20. The resulting list of 5115 complexes
was further refined by removing all entries that did not relate to
fragment screening, identification, and characterization by analysis
of the primary literature citation associated with each structure
as well as the examination of the components of the structure. PDB
entries where the chemical component consisted only of typical crystallization
buffer elements (e.g., glycerol), adjuvants (e.g., carboxylates),
or native cofactors/cosubstrates (e.g., pyridoxal phosphate), for
example, were not considered. Visual inspection of the resulting 1623
complexes supported by the corresponding literature sources served
to distinguish bona fide fragment hits from optimized fragments or
lead compounds and to identify the associated binding pockets. Since
crystallization buffers and conditions vary greatly across the considered
complexes, any chemical component from the crystallization buffer
that did not have biological relevance to the system (e.g., dimethyl
sulfoxide) was removed from the structure before further analysis.Structural complexes where multiple fragments bound to the protein
within close structural proximity (<4 Å for their shortest
interatomic distance) were also removed. Structures in which the fragment
was tethered to the protein through covalent linkages were discarded.
Fragments bound to different binding sites on the same protein were
treated independently in cases where the evidence supports the relevance
of the alternate binding pockets, based on the primary literature
source and electron density analysis. In cases where multiple structures
were found of a single fragment bound to the same protein site, preference
was given to the structure with the highest atomic resolution and
clearest electron density, unless a significant change in the fragment-binding
conformation was detected (root-mean-square deviation > 1 Å
for
the fragment heavy atoms between binding conformations), in which
case both structures were included. Fragments that displayed a real-space
correlation coefficient of less than 0.8 (which is the threshold value
proposed by Deller and Rupp for unambiguous fragment binding[25]) were not considered further. Fragment-binding
sites located at the interface between the protein and its copies
in the crystal lattice were not considered, to create a final data
set in which every pocket involves just one copy of the protein. The
489 complexes that remained on the list after these filtering steps
were then analyzed with respect to the observed protein–fragment
interactions.The ionization, tautomer, and rotamer states of
the fragment and
the amino acids in the binding pocket were generated using Protonate3D[26] and manually refined as needed based on typical
geometric features for molecular interactions (e.g., H-bonds),[27] the local protein environment and potential
interaction networks, the calculated pKa[28] for the fragment, and the pH of the
experimental crystallization conditions used. No heavy-atom coordinates
were modified during the preparation process. Water molecules are
only discussed for structures having a crystallographic resolution
≤1.5 Å,[29] and only if the water
molecules appear to be structurally relevant and are clearly supported
by the electron density maps (i.e., a real-space correlation coefficient
≥0.8). We define structurally relevant water molecules to be
those with ≥2 H-bonds to the protein and ≥1 H-bonds
to the fragment.Molecular properties of the fragments, including
the number of
heavy atoms, rotatable bonds, chiral centers, formal charges, and
H-bond donors and acceptors in their protonation state at pH 7, were
calculated as implemented in MOE.[30] The
cxcalc command line script from ChemAxon[28] was used to compute the pKa values for
ionizable groups on the fragment hits. The dissimilarity distribution
of fragments was evaluated using extended connectivity fingerprints
(ECFP4)[31] with the screenmd command line
script within JChem from ChemAxon.[28] Octanol–water
partition coefficient (clog P) values for
the fragments were calculated using the classic algorithm within the
batch version of ACD/Percepta.[32,33] The number and identity
of ring assemblies, Bemis–Murcko frameworks, and the fraction
of sp3-hybridized carbon atoms (Fsp3) were calculated
using Vortex.[34] Protein pocket descriptors
were computed using default parameters in dpocket and implemented
as part of the α spheres-based methodology available in fpocket.[35] Here, selection of the relevant fragment hit
was used to explicitly define the associated binding pocket.Surface-based descriptors including total, polar, and apolar solvent-accessible
surface areas were calculated in MOE[30] using
a 1.4 Å solvent radius probe. The differences in these values
between the unbound and bound states of fragment hit and protein yielded
the corresponding buried solvent-accessible surface areas. Molecular
interaction counts between the fragment and corresponding protein,
water, and metal ions were computed using a probabilistic receptor
potential within MOE.[30] Here, H-bonds,
metal coordination bonds, arene-based interactions, carbon–hydrogen
bonds, halogen bonds, and sulfur-mediated contacts are scored with
empirical type-based scoring functions using the extended Hückel
theory. These functions are trained using statistics derived from
contacts in the RCSB PDB, and each interaction is scored in terms
of the percentage likelihood of being geometrically ideal. The default
energy threshold of interaction (0.5 kcal mol–1)
was used to identify relevant interactions. The H-bond count only
includes interactions involving oxygen or nitrogen atoms (whereas
weak H-bonds, which are interactions in which the hydrogen is covalently
bonded to a carbon atom or in which the acceptor is a halogen, are
omitted). Water molecules are only discussed in cases where they are
potentially relevant to the fragment–protein interaction, as
defined above. The degree of burial of an H-bond was calculated as
the difference between the solvent-accessible surface area in the
unbound and bound states for the protein atom involved in the H-bond.Atom types for the fragment hits used in the frequency of distribution
analysis were assigned based on the MMFF94 force field[36] as implemented in MOE. The atom-type-based frequencies
of molecular interactions were obtained by dividing the number of
observed interactions by the total number of occurrences for a given
atom type across the entire fragment set to control for over-representation
of specific functional groups and thus provide useful background information
for molecular design purposes.A procedure analogous to the
one described above was used to generate
a comparative data set in which larger ligands were bound to the protein
pockets that were observed in the original (fragment hit) data set;
the construction of this second data set was such that any given protein
pocket has a similar frequency of occurrence in each data set. Structures
of the complexes were extracted from the PDB[24] using the UniProtKB accession numbers[38] of the fragment–protein complexes previously identified in
a protein-based query, and then filtered by keeping only the structures
with a crystallographic resolution ≤2.5 Å and containing
a ligand with a total number of nonhydrogen atoms ≥25. This
list was further refined by removing all entries that did not contain
a chemical component that resulted from ligand screening and optimization,
using analysis of the primary literature citation associated with
each structure as well as examination of the components of the structure.
Structures in which the ligand was covalently bound to the protein
were discarded. Preparation of the ligand–protein complexes
for calculation of ligand–protein interactions and ligand properties
followed the same procedure described for the fragment–protein
complexes.To ensure that the analysis was not skewed by particular
protein
pockets that are over-represented in the data set, each data point
was normalized by the number of occurrences of the bound protein pocket
in the entire data set. A given observation for one of the 19 PDE10A–fragment
complexes surveyed in this work, for example, carries a weight of
5.26% in the final analysis. When a data normalization based on a
pocket-sequence-identity cutoff of 60%[37] was used, there were no significant changes in the results (Figures S1 and S2).
Results
General Characteristics
of Fragment Hits and Their Protein Targets
Our data set consists
of 489 structures of a fragment bound to
a pocket of a protein or protein domain. The set contains 126 unique
proteins spanning 20 different protein families and 79 structural
domains (Figures and S3). As shown in Figure , 67% of the structures of the complexes
have a crystallographic resolution of ≤2 Å, with 1.03
Å as the best-reported resolution (PDB ID: 4Y4J). The structural
data is for the most part relatively recent; 79% of the structures
were deposited in the PDB since the beginning of 2012 (Figure S1). Transferases and hydrolases account
for 58% of the complexes, with 44 and 26 unique protein entries from
each class, respectively. The remainder of the data set is distributed
across 18 protein families, including DNA-binding proteins, oxidoreductases,
isomerases, and viral proteins (Figure ). Several important drug targets appear in the data
set, including poly ADP-ribose polymerase, carbonic anhydrase, β-lactamase,
estrogen receptor, DNA gyrase, Bruton’s tyrosine kinase, and
Janus kinase 2. The most frequent protein entries, representing 19%
of the data surveyed here, are the aspartic protease endothiapepsin
(N = 57),[39,40] cAMP and cAMP-inhibited
cGMP 3′,5′-cyclic phosphodiesterase 10A2 (PDE10A2, N = 19),[41,42] and heat-shock protein 90 (Hsp90, N = 18).[43,44]
Figure 1
Characteristics of the proteins, structures,
and binding pockets.
(a) Distribution of protein classes, as codified by the PDB.[24] Numeric labels indicate the number of unique
proteins belonging to each protein class, as indicated by their respective
UniProt access codes.[66] (b) Distribution
of crystallographic resolution values. (c) Distribution of fragment-binding
pocket volumes using dpocket,[35] normalized
by the occurrence of a given protein binding site.
Characteristics of the proteins, structures,
and binding pockets.
(a) Distribution of protein classes, as codified by the PDB.[24] Numeric labels indicate the number of unique
proteins belonging to each protein class, as indicated by their respective
UniProt access codes.[66] (b) Distribution
of crystallographic resolution values. (c) Distribution of fragment-binding
pocket volumes using dpocket,[35] normalized
by the occurrence of a given protein binding site.A total of 168 unique fragment-binding pockets
were identified
in the 126 proteins in the data set, according to the selection criteria
described in the Methods section. To ensure
that the analysis was not skewed by the over-representation of certain
protein pockets, each binding data point was normalized by the number
of occurrences of the bound protein pocket in the data set. 82% of
the proteins (N = 103) feature only a single binding
site. The remaining 23 proteins bind fragments at more than one site,
with HIV-1 reverse transcriptase containing 7 different sites (Table S1). The fragment-binding pockets span
a 6-fold difference in size, as estimated using a Voronoi tessellation
and α sphere-based method[35] (Figure ), ranging from a
small cleft on humancyclophilin D accommodating pyrrolidine-1-carbaldehyde
(PDB ID: 3R54) to a voluminous funnel-shaped pocket on hepatitis C virus polymerase
NS5B bound to 4-(2-phenylhydrazinyl)-1H-pyrazolo[3,4-d]pyrimidine (PDB ID: 4IH5).In total, 462 unique fragments
are covered in the present analysis.
21 of these fragments occur more than once in the data set, either
because they bind different proteins or because they bind to distinct
pockets on the same protein (Table ). None of these 21 fragments match the pan-assay interference
compounds (PAINS) structural filters defined by Saubern et al.[45] Across the whole data set, only four fragments
(pyrocathecol, PDB: 4K7I; 4-methylbenzene-1,2-diol, PDB: 4K7N; 4-(tert-butyl)benzene-1,2-diol,
PDB: 4K7O; 4-(2-amino-1-hydroxyethyl)benzene-1,2-diol,
PDB: 4Y4J) were
flagged as potential PAINS hits, all with the polyphenolic structural
alert. The fragments range in size from 6 to 20 heavy atoms, with
81% of the fragment set containing between 10 and 16 heavy atoms (Figure ). At the extremes,
2-chloro-1H-imidazole (with 6 heavy atoms) binds
to the BAZ2B bromodomain (PDB ID: 5E9K), and N-[2-(morpholin-4-yl)phenyl]thiophene-3-carboxamide
(with 20 heavy atoms) binds to soluble epoxide hydrolase (PDB ID: 3WKD). Calculated octanol–water
partition coefficient (clog P) values[32] range from −2.4 to 4.8, with 68% of the
compounds displaying clog P < 2, as summarized
in Figure . Examples
of the extremes of lipophilicity in this data set include 4-acetylpiperazin-2-one
complexed with the bromodomain-containing protein 1 (clog P: −2.4, PDB ID: 5AME) and 2-(5-chloro-3-methylbenzo[b]thiophen-2-yl)acetic acid complexed with farnesyl pyrophosphate
synthase (clog P: 4.8, PDB ID: 3N1V). 71% of the fragment
hits have a significant proportion (20–30%) of their heavy
atoms as nitrogen or oxygen atoms that are capable of accepting or
donating H-bonds. More than half of the fragments (68%) display a
net formal charge of 0, and for the remainder of the (charged) fragments,
twice as many have a net negative formal charge (22%) as a net positive
charge (11%). Only eight fragments could have zwitterionic character,
with both strongly basic and acidic functionalities (predicted pKa > 9 and <5, respectively).
Table 1
Fragments Hits That
Bind to More than
One Protein Pocketa
The captions show
the PDB IDs of
the corresponding complexes. In cases where the fragment bound distinct
pockets on the same protein, the number of pockets bound is indicated
in parentheses.
Figure 2
Characteristics
of the fragment hits (N = 462,
black profiles). Distributions of the (a) number of heavy atoms, (b)
ACD calculated log P, (c) number of atoms
capable of accepting or donating H-bonds expressed as a percentage
of the fragment’s heavy atoms, (d) fraction of sp3-hybridized carbon atoms (Fsp3), (e) number of rotatable
bonds, (f) number of chiral centers, (g) number of ring assemblies,
and (h) formal charges are shown. (i) Frequency distribution of Tanimoto
distances calculated based on ECFP4 for all possible fragment pairs.
Distributions for a representative commercial fragment library[46] (N = 1794, gray profiles) are
included.
Characteristics
of the fragment hits (N = 462,
black profiles). Distributions of the (a) number of heavy atoms, (b)
ACD calculated log P, (c) number of atoms
capable of accepting or donating H-bonds expressed as a percentage
of the fragment’s heavy atoms, (d) fraction of sp3-hybridized carbon atoms (Fsp3), (e) number of rotatable
bonds, (f) number of chiral centers, (g) number of ring assemblies,
and (h) formal charges are shown. (i) Frequency distribution of Tanimoto
distances calculated based on ECFP4 for all possible fragment pairs.
Distributions for a representative commercial fragment library[46] (N = 1794, gray profiles) are
included.The captions show
the PDB IDs of
the corresponding complexes. In cases where the fragment bound distinct
pockets on the same protein, the number of pockets bound is indicated
in parentheses.In terms
of structural complexity, the vast majority (>90%) of
the fragment hits are achiral, with limited carbon saturation (fraction
of sp3-hybridized carbon atoms Fsp3 < 0.5),
and up to three rotatable bonds and two ring assemblies (Figure ). The fragment hits
are structurally diverse (extended connectivity fingerprint 4 (ECFP4)-based
Tanimoto distance >0.7 for 97% of the set; Figure ). The fragment hits presented here compare
well to an established commercial fragment library;[46] the most notable difference is that the latter has a ∼20%
greater proportion of fragments with a high degree of carbon saturation
and a high number of ring assemblies (Figure ). The fragment hits in this work are described
by 52 unique Bemis–Murcko frameworks.[47] Of the 52 unique molecular frameworks, 33 (63%) occur only once,
whereas five frameworks account for 71% of the total. These five frameworks
are: monocyclic 6- and 5-membered rings; bicyclic 5-6- and 6-6-fused
ring systems; and 5-6 rings connected by one bond (Figure ). When hybridization, heteroatoms,
and exocyclic carbonyl groups are considered, the present set encompasses
138 unique ring assemblies, of which 78 (57%) appear only once. Benzene,
pyridine, pyrazole, thiophene, and indole taken together account for
49% of the ring assemblies, as shown in Figure . Among the other ring assemblies with more
than 10 representatives, piperidine and indazole each occur in several
distinct binding pockets (Figure ).
Figure 3
(a)
Bemis–Murcko frameworks occurring more than 10 times
across the fragment hits. The number of observations and frequency
of occurrence as a percentage of the total observed frameworks (N = 52, see Table S2 for the
full list) are indicated in the caption. Values in parentheses indicate
the number of unique proteins the framework was found bound to. (b)
The most frequently occurring individual ring assemblies. The number
of observations and frequency of occurrence as a percentage of the
total observed individual ring assemblies (N = 138, Table S3) are indicated in the caption. Values
in parentheses indicate the number of unique proteins bound by the
ring assembly in this data set.
(a)
Bemis–Murcko frameworks occurring more than 10 times
across the fragment hits. The number of observations and frequency
of occurrence as a percentage of the total observed frameworks (N = 52, see Table S2 for the
full list) are indicated in the caption. Values in parentheses indicate
the number of unique proteins the framework was found bound to. (b)
The most frequently occurring individual ring assemblies. The number
of observations and frequency of occurrence as a percentage of the
total observed individual ring assemblies (N = 138, Table S3) are indicated in the caption. Values
in parentheses indicate the number of unique proteins bound by the
ring assembly in this data set.
Characteristics of Fragment
Hits: Biological–Target Interactions
We analyzed the
fragment–pocket complexes to assess the
molecular interactions between the two binding partners. As molecular
interactions are conceptual models that describe complex physical
phenomena, general considerations regarding surface contacts and atomic
contacts are summarized first, followed by more specific interaction
models.Most of the fragment hits (73%) bury more than 80% of
their total solvent-accessible surface area (SA) upon binding. Even
the two fragments that are the most exposed to solvent upon binding
still hide 50 and 57% of their surface in the bound pose (PDB IDs: 5JAN and 5J4H, respectively);
these fragments are the only ones that bury <60% of their SA. In
contrast, 21 fragments are completely engulfed by the protein (Figures and 5a,b). Notably, 77% of fragment hits bury more than 80% of
their polar SA, and 53% of hits bury over 90% of their polar SA. The
apolar SA of fragments is also largely buried upon binding, although
to a slightly lesser extent, with 70 and 42% of the set burying >80
and >90% of the lipophilic surface, respectively. The polar fraction
of the buried SA of a protein pocket varies significantly, ranging
from 5 to 74% of the total protein buried surface. In terms of absolute
values, fragments bury on average 336 Å2 upon binding,
with a substantial tendency to bury more apolar than polar SA (Figure ). Accordingly, the
ratio of apolar to polar SA buried has a mean of 3.1 and a median
of 2.2 (Table S4).
Figure 4
Surface characteristics
of the fragment–pocket complexes.
(a) Polar and apolar buried SA of fragments, expressed as a percentage
of their total polar and apolar SA, respectively. The total buried
SA of fragments is also indicated as a percentage of the whole-fragment
SA. (b) Extents of the total, apolar, and polar SA that are buried
by the fragments upon binding. (c) Buried polar SA of the protein.
Figure 5
Selected fragment hits bound to proteins. (a)
Maximum and (b) minimum
solvent exposure of fragments (PDB IDs: 3OMQ and 5JAH, respectively). (c) Maximum number of
polar interactions (PDB ID: 3FGD). (d) Maximum number of additional directional interactions
(PDB ID: 5EGS). (e) and (f) Examples of fragments binding to multiple pockets
within the same protein (PDB IDs: 5FPO and 5CLP, respectively). Fragments are depicted
as bold sticks (cyan carbon atoms). Relevant H-bonds and additional
directional interactions are indicated as dashed black lines. Relevant
protein pocket surfaces are displayed as gray mesh.
Surface characteristics
of the fragment–pocket complexes.
(a) Polar and apolar buried SA of fragments, expressed as a percentage
of their total polar and apolar SA, respectively. The total buried
SA of fragments is also indicated as a percentage of the whole-fragment
SA. (b) Extents of the total, apolar, and polar SA that are buried
by the fragments upon binding. (c) Buried polar SA of the protein.Selected fragment hits bound to proteins. (a)
Maximum and (b) minimum
solvent exposure of fragments (PDB IDs: 3OMQ and 5JAH, respectively). (c) Maximum number of
polar interactions (PDB ID: 3FGD). (d) Maximum number of additional directional interactions
(PDB ID: 5EGS). (e) and (f) Examples of fragments binding to multiple pockets
within the same protein (PDB IDs: 5FPO and 5CLP, respectively). Fragments are depicted
as bold sticks (cyan carbon atoms). Relevant H-bonds and additional
directional interactions are indicated as dashed black lines. Relevant
protein pocket surfaces are displayed as gray mesh.Having analyzed the binding of fragment hits to
their targets with
surface-based descriptors, we next evaluated their molecular interactions.
H-bonds to protein and water molecules, as well as metal coordination
bonds, were identified based on geometric criteria.[27] As shown in Figure , 92% of the fragment–protein complexes are stabilized
by at least one H-bond to the protein or to a structural water or
by a coordination bond to a structural metal ion. In one complex,
seven such molecular interactions were noted (PDB ID: 3FGD, Figure c). A large majority of H-bonds
between fragments and proteins (88%) are completely buried (Figure ).
Figure 6
Polar interactions established
by the fragment hits. (a) Frequency
distribution of polar interaction counts per fragment complex, including
H-bonds to protein or water and coordination bonds to metal ions.
Fragment complexes that do not display any water H-bonds or metal
coordination bonds (54 and 95% of the total, respectively) have been
omitted from the histogram for clarity. (b) Frequency distribution
of fragment H-bonds to protein amino acids at the side-chain and the
backbone level. (c) Frequency distribution of buried surface area
for protein atoms involved in H-bonds to fragments. (d) Frequency
distribution of H-bonds between fragment atoms (neutral and ionizable)
and protein amino acid side chains.
Polar interactions established
by the fragment hits. (a) Frequency
distribution of polar interaction counts per fragment complex, including
H-bonds to protein or water and coordination bonds to metal ions.
Fragment complexes that do not display any water H-bonds or metal
coordination bonds (54 and 95% of the total, respectively) have been
omitted from the histogram for clarity. (b) Frequency distribution
of fragment H-bonds to protein amino acids at the side-chain and the
backbone level. (c) Frequency distribution of buried surface area
for protein atoms involved in H-bonds to fragments. (d) Frequency
distribution of H-bonds between fragment atoms (neutral and ionizable)
and protein amino acid side chains.A group of 37 complexes (from 8 unique proteins) feature
fragment
hits bound to the structural metal ions present in the pockets, namely,
zinc, manganese, and iron ions (Table S5). Negatively charged oxygen atoms from fragment carboxylic acid
groups are the atoms that most frequently establish coordination bonds
to these metal ions (Table S5). A maximum
of two metal coordination bonds was recorded for a given fragment-pocket
entry in the current data set (PDB ID: 5ACW).53 of the 116 X-ray structures
with resolution ≤1.5 Å
(46% of the total, with 20 unique binding sites) have structural water
molecules with at least two H-bonds to the protein and one to the
fragment hit. The maximum number of individual water-based H-bonds
observed for a fragment was 3 (PDB ID: 5MOH). Nitrogen fragment atoms have a higher
occurrence of H-bonds to water than do oxygen fragment atoms (Table S6). Anilinic nitrogen atoms and sp3-hybridized nitrogen atoms in aliphatic amine groups display
the highest frequency of water H-bonds in the data set (0.33 and 0.52,
respectively; Table S6).Fragment
H-bonds to proteins stabilize 89% of the bound complexes,
with 74% of the entries displaying between one and three H-bonds.
The highest number of fragment–protein H-bonds in a complex
is 6 (PDB ID: 4Y4G) (Figure ). Side-chain
atoms account for 58% of the total protein–fragment interactions.
H-bonds occurring between ionizable functional groups on both the
fragment and the amino acid side chain represent 17% of the total
number of H-bonds in the data set (Figure ). As expected, the side chains of polar
amino acids make the greatest contribution to H-bonds between proteins
and fragments. Aspartic acid and serine each provide more than 10%
of the total H-bonds observed in the fragment–protein complexes.
Interestingly, glycine, which establishes H-bonds exclusively through
its backbone atoms, also accounts for greater than 10% of total H-bonds
observed, occurring in 27 unique protein pockets (Figure ). Cysteine, isoleucine, phenylalanine,
proline, and tryptophan each account for fewer than 2% of H-bonds.
Histidine and glutamic acid have a preference for H-bonds to ionizable
and neutral fragment atoms, respectively (Figure ).The occurrence of H-bonds to the
protein based on specific fragment
atom types is summarized in Table . After normalizing for the overall atom-type occurrence
in our data set, nitrogen and oxygen atoms show a similar preference
for H-bond formation, with 0.62 and 0.61 H-bonds per atom, respectively.
Positively charged nitrogen atoms establish the highest number of
H-bonds to protein residues, with aliphatic amines displaying a higher
likelihood of forming such bonds than aromatic or conjugated ones
(cf., sp2 NH+ and sp3 NH+, Table ). The only
negatively charged nitrogen atoms that form H-bonds to proteins are
embedded in heterocyclic systems (Table S5). Tetrazole moieties, for example, were found to establish up to
three H-bonds to serine and threonine residues of CTX-M-9 class A
β-lactamase (PDB IDs: 3G2Y and 3G32). In contrast, the deprotonated nitrogen atoms of sulfonamide mainly
coordinate metal ions (Table S5). H-bond
donors in the form of neutral nitrogen atoms are represented by five
different functional groups (Table ). Among the functional groups with more than 50 observations,
anilinic nitrogen atoms form the highest number of H-bonds per atom
(0.80), followed by heterocyclic and amidic NHs (0.66 and 0.61, respectively).
H-bond acceptors featuring a neutral, unprotonated nitrogen atom engage
in H-bonds with the protein in one-third of cases. As shown in Table , 4 out of 10 nitrile
functionalities form an H-bond in this data set (e.g., PDB ID: 4Y4T). Interestingly,
one aliphatic amine may potentially accept an H-bond from the protein,
based on the surrounding atomic environment, crystallization pH (7.5),
and predicted pKa (7.3) for the fragment
species (PDB ID: 5FYU).
Table 2
Fragment Atoms Involved in H-Bonds
to the Proteina
atom type
total occurrence
H-bond occurrence
ratio
nitrogen
833
517
0.62
N (neutral)
328
97
0.30
sp2
317
92
0.29
sp
10
4
0.40
sp3
1
1
1.00
N–
29
7
0.24
heterocyclic
29
7
0.24
N+
104
150
1.44
sp2
41
50
1.22
sp3
63
100
1.59
NH
372
263
0.71
amide
137
83
0.61
anilinic
138
110
0.80
heterocyclic
79
52
0.66
sulfonamide
11
8
0.73
hydrazide
7
10
1.43
oxygen
720
441
0.61
O (neutral)
363
171
0.47
carbonyl
214
142
0.66
sulfonamide
38
20
0.53
ether
78
5
0.06
aromatic
33
4
0.12
O–
270
202
0.75
carboxylic acid
192
188
0.98
sulfonic acid
6
5
0.83
phenol
57
5
0.09
nitro
14
3
0.21
N-oxide
1
1
1.00
OH
87
68
0.78
aliphatic
30
46
1.53
aromatic
57
22
0.39
The total occurrence
of the various
atomic types in the fragment set based on the MMFF94 force field definitions,[36] corresponding subclass based on atomic hybridization
or associated functional group, and the calculated number of H-bonds
and occurrence ratios are presented.
The total occurrence
of the various
atomic types in the fragment set based on the MMFF94 force field definitions,[36] corresponding subclass based on atomic hybridization
or associated functional group, and the calculated number of H-bonds
and occurrence ratios are presented.Negatively charged oxygen atoms occur in five different
chemical
environments in this data set. Carboxylic acid groups, which occur
the most frequently, form on average one H-bond per oxygen atom (Table ). All other functional
groups within this subclass occur much less frequently (N ≤ 57). Notably, there is one phenolic group in the data set
that may capture two H-bonds (PDB ID: 3GVB), based on the crystallization pH (8.7)
and calculated pKa (8.7). Alcoholic groups
on fragment hits secure on average 0.78 H-bonds, with aliphatic alcohols
having a higher H-bond formation frequency than their phenolic counterparts
(1.53 and 0.39, respectively). Unprotonated oxygen atoms as H-bond
acceptors represent the most conspicuous oxygen atom class (N = 363, 50%), with an average of 0.47 H-bonds per atom.
Carbonyl oxygen atoms from amide, urea, ester, and ketone moieties
display, on average, 0.66 H-bonds each, compared to 0.53 H-bonds each
for oxygen atoms in sulfonamide groups. Ether and aromatic oxygen
atoms display H-bonds in only 6 and 12% of cases, respectively.Only 7% (N = 34) of the fragment–pocket
complexes analyzed here do not contain any of the polar interactions
described above. The distribution of the fragment buried surface values
of this subset of fragments was not significantly skewed compared
to the remainder of the set. In half of these complexes, the fragment
establishes at least one H-bond to a water molecule, which in turn
makes an H-bond to the protein, thus acting as a bridge between the
fragment hit and the protein. Interestingly, when additional directional
interactions including arene- and sulfur-mediated contacts,[27,48] halogen bonds,[49] and carbon–H-bonds[50] are also considered, all complexes displayed
at least one such interaction between the fragment and protein. Figure summarizes the distribution
of additional directional interactions across the whole data set,
as assessed using MOE’s nonbonded-contacts detection algorithm[30] (Supporting Information). 56% of fragment hits establish at least one such interaction with
the corresponding protein. Arene-based interactions occur most frequently
(42% of cases), followed by carbon H-bonds (12%), sulfur-mediated
contacts (11%), and halogen bonds (3%). As many as seven different
non-H-bond interactions in a single complex have been found in the
current set (PDB ID: 5EGS, Figure d). Sulfur
atoms in fragment hits establish, on average, 0.84 interactions, with
sulfur-oxygen contacts the most frequent (0.4 per sulfur atom). Halogen
bonds and carbon H-bonds occur markedly less often: 0.13 and 0.02
per atom, respectively (Table S7).
Figure 7
Frequency distribution
of additional directional interactions per
fragment pocket including: arene-based interactions (i.e., arene–arene,
arene–cation, and arene–hydrogen), carbon H-bonds, sulfur-mediated
contacts, and halogen (i.e., iodine, bromine, and chlorine) bonds
to protein atoms. The fractions of fragment complexes without arene
(58%), carbon H-bonds (88%), sulfur contacts (89%), and halogen bonds
(96%) have been omitted from the graph for clarity.
Frequency distribution
of additional directional interactions per
fragment pocket including: arene-based interactions (i.e., arene–arene,
arene–cation, and arene–hydrogen), carbon H-bonds, sulfur-mediated
contacts, and halogen (i.e., iodine, bromine, and chlorine) bonds
to protein atoms. The fractions of fragment complexes without arene
(58%), carbon H-bonds (88%), sulfur contacts (89%), and halogen bonds
(96%) have been omitted from the graph for clarity.
Comparison of Fragment Hits and Larger Ligands
We constructed
a comparative data set of 445 protein–ligand complexes (Table S8) such that any given protein pocket
occurs with similar frequency as it does in the fragment data set.
The 439 unique ligands are on average twice as large and twice as
lipophilic as the fragment hits previously discussed and have a greater
number of ring assemblies, rotatable bonds, and sp3-hybridized
atoms in their structures. There are no significant differences between
the fragments and the ligands with respect to the distribution of
formal charges or the ratios of atoms capable of establishing H-bonds
(Figure S4).The ligands are ∼10%
less likely than the fragments to bury their total SA, and they are
equally less likely to bury their polar and apolar SA. In absolute
terms, owing to their larger size and lipophilicity, ligands bury
a greater amount of total SA, mainly of apolar nature, when compared
to fragments (Figure S5).On average,
ligands establish one additional protein H-bond and
twice as many arene interactions compared to fragments, whereas water
H-bonds and additional directional interactions do not substantially
change. Despite the higher number of protein H-bonds established,
these bonds are 10% less likely to be fully buried, and the polar
atoms of ligands show a reduced share of H-bonds compared to those
of fragments (Figure S6).
Discussion
We compiled and analyzed a data set composed of 462 unique fragments
bound to 168 different pockets on 126 individual proteins, resulting
in a total of 489 fragment–pocket complexes (Figure ). This work was made possible
by a large number of crystallographic studies performed by scientists
across numerous organizations in the past two decades (Table S9). Several systems are over-represented
due to the large number of fragment-binding studies published for
these proteins,[39−44] but the data set is nevertheless diverse in terms of structural
domains and distinct protein families (Figure S1). From the analysis of this diverse data set, we observe
a number of notable fragment features, discussed below, that can be
used as a guide for the design, selection, and evaluation of fragments.
Binding
Versatility of the Fragment Hits
21 fragments
bound to more than one protein pocket, in many cases on different
proteins (Table ).
The ability of fragments to bind to multiple proteins[16] reinforces the appeal of using fragment-based methods to
generate chemical starting points for drug discovery. In seven cases,
a single fragment bound to different binding sites on its target protein
(Table , Figure e,f). Fragment screening
is thus well suited to uncover and evaluate alternative binding sites
and target interaction mechanisms of potential therapeutic relevance.[51] Interestingly, with the exception of saturated
carbocyclic rings, these 21 fragments recapitulate most of the pharmacophoric
elements typically exploited for molecular interactions. If aptly
complemented with missing fragment pharmacophores and normalized for
relative pharmacophoric occurrence, we recommend these fragments as
a useful choice for a minimalistic, “first pass” library
for pilot fragment screens, particularly for X-ray crystallography
screening. Here, the so-called “promiscuity” of fragments
in a well-defined structural context is understood as a practical
advantage for mapping hotspots on a protein[39,52,53] and identifying fragment binders for further
optimization. In the absence of relevant structural information, however,
fragment promiscuity could be detrimental, especially when relying
on biophysical screening.[54]The fragment
hits are remarkably versatile in their interactions with different
binding pockets, as shown in Figure for selected examples. 5-Hydroxyindole, for example,
is fully engulfed by leukotriene A-4 hydrolase (PDB ID: 3FUH), in contrast to
the complex this fragment forms with the DNA repair and recombination
protein RadA, on which it binds to a highly solvent-exposed cleft
(PDB ID: 4B3C). Interestingly, in both complexes, the fragment nitrogen atom (and
not its 5-hydroxy group) is H-bonded to the protein. In another example
of fragment versatility, adenine exploits several of its pharmacophoric
elements when binding to the BAZ2B bromodomain and Hsp90 (PDB IDs: 5DYX and 2YED, respectively).
The tautomerism of adenine’s imidazole moiety further adds
to its adaptability; both the 7- and 9-position nitrogen atoms are
independently H-bonded to backbone carbonyl groups on the two proteins.
Another fragment, 5-nitro-benzimidazole, exemplifies how a structural
element typically frowned upon in recent medicinal chemistry practice
(i.e., the nitro group) can serve an important molecular-recognition
function in the early phases of drug discovery. Here, it secures H-bonds
to both neutral (serine) and charged (arginine) side chains in the
binding pockets of PDE10A2 (PDB ID: 4MSA) and nicotinamide phosphoribosyltransferase
(PDB ID: 4N9C), respectively. These examples highlight the ability of fragments
to effectively sample chemical space during screening.
Figure 8
Selected fragment hits
bound to different proteins. (a, b) Different
degrees of solvent exposure of 5-hydroxyindole bound to RadA (PDB
ID: 4B3C) and
leukotriene A-4 hydrolase (PDB ID: 3FUH), respectively. (c, d) Different tautomers
of adenine bound to BAZ2B bromodomain (PDB ID: 5DYX) and Hsp90 (PDB
ID: 2YED), respectively.
(e, f) Different interactions of the nitro group of 5-nitro-benzimidazole
bound to nicotinamide phosphoribosyltransferase (PDB ID: 4N9C) and PDE10A2 (PDB
ID: 4MSA), respectively.
Fragments are depicted as bold sticks (cyan carbon atoms) and relevant
H-bonds as dashed black lines. Relevant protein pocket surfaces are
displayed as gray mesh.
Selected fragment hits
bound to different proteins. (a, b) Different
degrees of solvent exposure of 5-hydroxyindole bound to RadA (PDB
ID: 4B3C) and
leukotriene A-4 hydrolase (PDB ID: 3FUH), respectively. (c, d) Different tautomers
of adenine bound to BAZ2B bromodomain (PDB ID: 5DYX) and Hsp90 (PDB
ID: 2YED), respectively.
(e, f) Different interactions of the nitro group of 5-nitro-benzimidazole
bound to nicotinamide phosphoribosyltransferase (PDB ID: 4N9C) and PDE10A2 (PDB
ID: 4MSA), respectively.
Fragments are depicted as bold sticks (cyan carbon atoms) and relevant
H-bonds as dashed black lines. Relevant protein pocket surfaces are
displayed as gray mesh.
Properties of the Fragment Hits and Relevance to Fragment Design
The fragment hits described here mostly comply with the rule-of-three
(Ro3) guidelines,[1] with less than 5% of
the set deviating from any of the Ro3 parameters. The propensity of
fragment hits to display a quarter of their atoms as H-bond recognition
elements (Figure )
hints at a particular balance between exposed polarity and lipophilicity
that is most conducive to productive interactions with different proteins,
while ensuring that physicochemical properties are compatible with
fragment-screening experiments. We thus strongly suggest favoring
fragments with a polar atom fraction of ∼0.25 when evaluating
novel fragment topologies and pharmacophores during fragment-library
enrichment campaigns. Interestingly, in our independent data set of
protein–ligand complexes, we found that the ligands have a
similar fraction of polar atoms as the fragments (Figure S4), suggesting that this ∼0.25 polar atom fraction
may be of general utility in ensuring favorable interactions with
target proteins.The limited degree of chirality and carbonsp3 saturation of the validated fragment hits is noteworthy,
especially when coupled to their wide chemical diversity and over-reliance
on a handful of topological skeletons (Figures −3). This
might reflect historic trends in the first generation of fragment
libraries, especially given the retrospective nature of the present
study. Recently, there has been renewed interest in fragment structural
complexity and three-dimensionality as driving forces in fragment
design. This is borne out by the fact that the primary difference
between the fragment hits presented here and those in a representative
commercial fragment library from an established vendor in the fragment-based
community[46] is the higher Fsp3 distribution of the latter (Figure ). Nevertheless, it is noteworthy that achiral, heteroaromatic
assemblies belonging to five topological frameworks can result in
productive binding against a diverse set of pocket shapes and features.
We suggest that the three-dimensional character of fragments (and
molecules in general) is misrepresented by typical two-dimensional
molecular descriptors such as Fsp3. Several of the aromatic
fragment hits in this study are indeed nonplanar both in their shape
and, more importantly, in the way that they interact with the protein.
This nonplanarity is afforded by virtue of ortho substituents (PDB
ID: 5JAO), monoatomic
linkers connecting individual rings (PDB ID: 2YE7), and a small amount
of hydrocarbon saturation (PDB ID: 5FYU).In our opinion, fragments should
be kept relatively simple during
fragment screening to maximize their potential interactions with proteins.
To this end, we view the current set of 462 validated fragment hits
as a relevant first approximation of a diverse fragment library with
adequate structural complexity, which could then be subjected to further
refinement based on, for example, target-specific hypotheses or diversity-optimization
goals. In accordance with probabilistic interaction models,[16] the comparison in this work between the molecular
properties of fragments and larger ligands (Figure S5) suggests that structural complexity can and should be built
in at a later stage, during fragment optimization. The challenge (and
an underappreciated differentiation element) is to devise adequate
synthetic protocols for fragment diversification.[55] As shown in Figure and Table S2, there are
still ample opportunities to generate novel fragment matter even when
considering limited structural complexity (to illustrate this, it
is worth noting that although the data set of fragment hits presented
here and the representative commercial fragment library we analyze
for comparison display comparable distributions of molecular properties
(Figure ), there is
not a single fragment structure shared between the two sets). Furthermore,
7-membered rings are under-represented in both sets, both as individual
rings and as part of fused systems. Their peculiar conformational
preferences[56] and projected substitution
vectors represent interesting design features that are well suited
to a fragment-based context. A number of additional fragment design
considerations, directly derived from the analysis of the molecular
interactions between the fragment hits and target proteins studied
here, will be presented in the following sections.
Interactions
of the Fragment Hits
Fragments are smaller
than lead compounds, and thus tend to have fewer productive interactions
with target proteins (Figures S4 and S6). Although the fragment–protein interactions are typically
referred to as “higher-quality” interactions, the net
effect is a weaker binding affinity. The thermodynamics underlying
fragment–protein binding is an area of active study.[57−59] Our current analysis of fragment hit–protein complexes serves
to identify propensities in these interactions as approximated with
surface-based and interaction-count descriptors. It is envisaged that
these descriptors, together with the associated functional-group preference,
could support triaging of fragment-screening results in virtual campaigns
(e.g., molecular dynamics–based screening of a cocktail of
different fragments) by, for example, reducing the number of false
positives.
Solvent Exposure and Protein Complementarity
More than
70% of the fragment hits surveyed here, as validated by X-ray crystallography,
reduce their total solvent exposure by >80% upon binding, a finding
that is consistent with previously published comparisons of primary
and secondary fragment-binding sites from proprietary databases.[51] Accordingly, we find that the polar fraction
of the surfaces of fragments in this data set is almost entirely buried
(>80%) upon binding. The fragments consistently bury their polar
SA
regardless of the diverse physicochemical features of the observed
protein pockets. Fragments tend to bury on average about twice as
much apolar surface as polar surface (Figure ), in line with the observed polar/apolar
atomic composition of the fragments (Figure ). Importantly, larger ligands of higher
affinity reduce their total, polar, and apolar solvent exposure to
a lesser degree than do fragments (Figure S5). Thermodynamic analyses have shown, however, that fragments cannot
rely entirely on apolar desolvation as the main driver for binding.[57,60] Indeed, the large amounts of buried polar areas suggest a very effective
use of fragment H-bond donor and acceptor functionalities. In most
cases, the protein H-bond donors and acceptors are completely isolated
from solvent (Figure ) by virtue of significant apolar surface burial and are fully engaged
in H-bonds with the fragment (see next section).Taken together,
these results are consistent with previous findings of enthalpically
favored binding events at protein hotspots that are composed of polar
sites buried in a lipophilic environment.[57,60] Given the observed fragment size and fraction of polar atoms, as
well as typical H-bond chemical functionalities, a potential strategy
to maximize molecular interaction diversity would be to present a
minimum set of individual polar pharmacophoric elements, as opposed
to distributing several pharmacophores on a given fragment. This strategy
would provide fragments with greater freedom to satisfy the geometric
constraints for optimal interactions. It would also result in better
sampling of the reduced pharmacophoric space during fragment screening,[16] and additional pharmacophores could be built
in and evaluated during the subsequent phase of fragment growing.
H-Bonds to Protein and Water
Stabilizing polar interactions
are a recurring feature of the majority of the fragment–hit
complexes (93%, Figure ). These include strong attractive interactions, such as H-bonds
to protein and to structural water molecules, as well as coordination
bonds to catalytic metal ions. 87% and 58% of the complexes are stabilized
by at least one or two H-bonds to the protein, respectively, most
of which are completely isolated from solvent upon fragment binding
(Figure ). This result
supports the previous finding of an average of two H-bonds per fragment
from a minimally overlapping data set (the two overlapping entries
are PDB IDs 3ESS and 3FGD).[57] Importantly, the substantial network of observed
H-bonds provides an interaction context for the systematic and significant
degree of fragment- and protein-polarity burial observed. The desolvation
of polar groups on the fragment and the protein, as directed by the
formed H-bonds, may result in an important enthalpic contribution
to fragment binding, enhancing apolar desolvation, as previously reported.[57,58,60] It is noteworthy that the larger
ligands cannot match the fragments’ share of H-bonds per polar
atom, and that the additional H-bonds formed tend to be more solvent-exposed
than the ones established by fragments (Figure S6).Fragment hits display a slight preference (58%)
for establishing H-bonds to side-chain groups. Glycine is also highly
represented as an H-bond target. These findings emphasize the importance
of H-bond-site accessibility and geometric constraints, in addition
to the need to populate diverse H-bond functionalities in fragment
libraries, as summarized in Table . Functional groups with dual H-bond accepting and
donating character (e.g., amide or alcohol groups) are particularly
attractive for interaction-sampling purposes and can be further complemented
by groups whose protomeric and tautomeric states can be influenced
by the protein environment. Overall, the functional-group diversity
observed in the fragments analyzed in the current study strengthens
the conceptual appeal of fragment-based methods. An instructive example
of fragment adaptability to molecular interactions is provided by
selected fragments that demonstrate paired H-bonds to the side chain
of asparagine residues across the present data set (Figure ). In these fragments, nine
individual atom types engaged the asparagine in paired H-bonds. The
observed chemical and topological diversity is very inspiring for
molecular design purposes, as it indicates opportunities for original
bioisosteric replacements as well as for the optimization and diversification
of pharmacophoric elements.
Figure 9
Fragment hits H-bonded to the side chain of
asparagine. (a) Spatial
distribution of the fragment atoms interacting with asparagine side
chains (N = 70). (b–h) Diverse selection of
fragment hits engaged in paired H-bonds to the side chain of asparagine
(PDB IDs: 4CUR, 4LR6, 4TZ8, 4YK0, 5DYU, 5E3G, and 5E9Y). Fragment hits
and the asparagine side chain (backbone atoms omitted for clarity)
are depicted as bold sticks (cyan and light gray carbon atoms, respectively)
and relevant H-bonds as dashed black lines.
Fragment hits H-bonded to the side chain of
asparagine. (a) Spatial
distribution of the fragment atoms interacting with asparagine side
chains (N = 70). (b–h) Diverse selection of
fragment hits engaged in paired H-bonds to the side chain of asparagine
(PDB IDs: 4CUR, 4LR6, 4TZ8, 4YK0, 5DYU, 5E3G, and 5E9Y). Fragment hits
and the asparagine side chain (backbone atoms omitted for clarity)
are depicted as bold sticks (cyan and light gray carbon atoms, respectively)
and relevant H-bonds as dashed black lines.Water plays an important role in the binding of fragments
to proteins.[58,59] When only high-resolution (≤1.5
Å) structures are considered,
46% of the fragment hits establish at least one H-bond to structural
water molecules in the binding pocket. Water molecules could form,
for example, extended nonbonded interaction networks by filling pocket
cavities and offering interaction hotspots for fragments (Figure a,b). Importantly,
in several cases, water-mediated H-bonds are the only polar interaction
for the fragment hits (Figure b,c). This limits the ability of the energetic and solvation
approximations used in current modeling software to adequately characterize
and predict fragment binding using computational methods such as docking
and molecular dynamics.[59,61−63]
Figure 10
Fragment hits H-bonded to structural water molecules. (a, b) Water
molecules as part of extended nonbonded interaction networks (PDB
ID: 5MOH), (b,
c) water molecules as the only H-bond partners for the fragment (PDB
IDs: 5NOW and 4Y3P, respectively).
Fragment hits are depicted as bold sticks (cyan carbon atoms), water
molecules as red spheres, and relevant H-bonds as dashed black lines.
Fragment hits H-bonded to structural water molecules. (a, b) Water
molecules as part of extended nonbonded interaction networks (PDB
ID: 5MOH), (b,
c) water molecules as the only H-bond partners for the fragment (PDB
IDs: 5NOW and 4Y3P, respectively).
Fragment hits are depicted as bold sticks (cyan carbon atoms), water
molecules as red spheres, and relevant H-bonds as dashed black lines.
Beyond H-Bonds
Although we are still far from a complete
understanding of the energetics associated with H-bonds and metal-coordination
bonds, these classes of polar interactions are relatively well studied,
and medicinal chemists are accustomed to optimizing compounds based
on them. Additional types of directional molecular interactions have
only recently started to become more widely recognized, including
arene-based contacts,[27] weak H-bonds, such
as carbon H-bonds,[50] CH/π H-bonds,[64] halogen bonds,[49] and
sulfur-mediated contacts.[48] More than half
of the fragment hits display at least one such interaction, with arene
contacts being the most frequent (42%). Although their occurrence
is limited in comparison to canonical H-bonds, these additional directional
interactions are likely to make important contributions to overall
affinity in the context of fragment binding, where a reduced number
of atoms is available for interactions. In a number of fragment hit–protein
structures, such interactions stabilize the complex in the absence
of more specific polar interactions (such as H-bonds), as shown by
selected examples in Figure . Here, arene groups on the fragment hits are sandwiched against
peptide bonds (Figure a) and stacked against the aromatic side chains of phenylalanine
and tryptophan residues (Figure b–d).
Figure 11
Fragment hits engaged in arene-based interactions
with the protein
as the main attractive interaction in the absence of protein- and
water-mediated H-bonds. (a) Arene interaction with backbone amide
bonds (PDB ID: 4K2Y), (b, c) arene interaction with the side chains of phenylalanine
(PDB IDs: 4Y37 and 5I5W),
and (d) arene interaction with the side chain of tryptophan (PDB ID: 5JAN). Fragment hits
are depicted as bold sticks (cyan carbon atoms).
Fragment hits engaged in arene-based interactions
with the protein
as the main attractive interaction in the absence of protein- and
water-mediated H-bonds. (a) Arene interaction with backbone amide
bonds (PDB ID: 4K2Y), (b, c) arene interaction with the side chains of phenylalanine
(PDB IDs: 4Y37 and 5I5W),
and (d) arene interaction with the side chain of tryptophan (PDB ID: 5JAN). Fragment hits
are depicted as bold sticks (cyan carbon atoms).The large variety of fragment heteroaromatic arrangements
that
are able to establish arene-type interactions is an indication of
future opportunities for the design of novel fragments and, more importantly,
for the generation of intellectual property during fragment optimization.
The thienodiazaborinine scaffold engaged in a face-to-face stacking
interaction with phenylalanine 291 in the binding pocket of endothiapepsin
(PDB ID: 4Y37, Figure ) is an
excellent example of under-represented and innovative heterocycles
that could open up relevant pharmacophoric and chemical spaces for
exploitation in fragment-based campaigns. The fine-tuning and optimization
of such interactions at a fragment level still represent a significant
challenge, given the conspicuous polarization and marked dispersive
characteristics. To this end, the ability to query and mine existing
structural data for nonbonded interactions, and to readily visualize[65] them in the context of a fragment-evolution
effort, would greatly facilitate progress in this area.
Conclusions
Analysis of the fragment–protein complexes curated here
highlights salient features of validated fragment hits originating
from fragment-based screening efforts. Despite limitations in sample
size at both the fragment and the protein levels, we believe that
this data set offers important insights relevant to hit-discovery
activities, including the design and selection of fragments for fragment-screening
libraries and the evaluation of the quality of fragment hits. The
observed topological and functional-group diversity of fragments coupled
with their polarity–lipophilicity balance could, for example,
inform fragment-library selection and expansion schemes. Likewise,
the observed surface and interaction-based propensities of the fragment–protein
complexes could support the development of intuitive classification
methods during in silico pocket and fragment-hit identification. As
the number of deposited protein structures with bound fragment hits
increases, the preliminary analysis presented here could be updated
and used to refine empirical potentials for protein–fragment
interactions and develop probabilistic models for molecular design
applications. Our analysis emphasizes the essential role played by
crystallographers and the importance of structural information in
fragment-based thinking and methodologies.The structural and
chemical details revealed by publicly available
protein–fragment hits have the potential to significantly impact
molecular design and drug discovery. We believe that the ability of
users of fragment-based approaches to distill this information for
compound design can be a significant determinant of success during
the fragment-hit evaluation and the fragment hit-to-lead phases. In
these processes, interactive visualization of bound fragment hits
across drug discovery projects could further enhance design and idea
generation.
Authors: Anil K Pandey; Steven E Kirberger; Jorden A Johnson; Jennifer R Kimbrough; Danika K D Partridge; William C K Pomerantz Journal: Org Lett Date: 2020-04-29 Impact factor: 6.005
Authors: Jorden A Johnson; Christos A Nicolaou; Steven E Kirberger; Anil K Pandey; Haitao Hu; William C K Pomerantz Journal: ACS Med Chem Lett Date: 2019-11-22 Impact factor: 4.345
Authors: José T Moreira-Filho; Arthur C Silva; Rafael F Dantas; Barbara F Gomes; Lauro R Souza Neto; Jose Brandao-Neto; Raymond J Owens; Nicholas Furnham; Bruno J Neves; Floriano P Silva-Junior; Carolina H Andrade Journal: Front Immunol Date: 2021-05-31 Impact factor: 7.561
Authors: Rupesh Agarwal; Barbara A Bensing; Dehui Mi; Paige N Vinson; Jerome Baudry; Tina M Iverson; Jeremy C Smith Journal: Biochem J Date: 2020-10-16 Impact factor: 3.766
Authors: Dávid Bajusz; Warren S Wade; Grzegorz Satała; Andrzej J Bojarski; Janez Ilaš; Jessica Ebner; Florian Grebien; Henrietta Papp; Ferenc Jakab; Alice Douangamath; Daren Fearon; Frank von Delft; Marion Schuller; Ivan Ahel; Amanda Wakefield; Sándor Vajda; János Gerencsér; Péter Pallai; György M Keserű Journal: Nat Commun Date: 2021-05-27 Impact factor: 14.919