In view of the increasing interest in and success of fragment-based drug discovery (FBDD), this Review describes the current chemistry challenges within the field: the design of new fragments and the elaboration of weakly binding fragments into nM leads guided by X-ray crystal structures.[1] Ideally, synthetic elaboration of fragment hits in three-dimensions from many different growth vectors is experimentally worked out prior to fragment screening.[2,3] This will increase the chance of success during fragment-to-lead optimisation.[4-6]As the field of small molecule drug discovery has advanced, the demand for molecular complexity has increased in line with ambitions to modulate the functions of increasingly complex human protein systems.[7,8] This evolution has resulted in calls to the synthetic organic chemistry community for advances in synthesis methodology to keep pace with the demands of modern drug design in an attempt to avoid situations where desired molecules cannot be synthesised or, more commonly, are avoided in favour of designs that are more accessible.[9] Such calls are being met by recent advances in broadly impactful organic synthesis methodology including transition metal catalysed couplings,[10-14] electrocatalysis,[15-18] photocatalysis,[19-23] and C–H activation[24-27] together with new technologies such as high-throughput experimentation (HTE),[28-30] flow chemistry,[31,32] and artificial intelligence.[33-35]
Fragment-based drug discovery
Over the past 20 years fragment-based drug discovery (FBDD) has become widely used in pharma, biotech and academic institutes to identify over 40 compounds in clinical trials and 4 launched drugs pexidartinib,[36] vemurafenib,[37] erdafitinib,[38] and venetoclax.[6,39] In FBDD, the binding of low molecular weight fragments to their target protein is typically characterised by high resolution X-ray crystallography.[40-42] This is critical not only in understanding and optimising the interactions underlying fragment binding, but in providing insights for the progression into high affinity leads.[43]The fragment-to-lead process involves the bespoke design of small molecules with 3D shape and electrostatics complementary to the target protein.[44-46] In order to engage in additional interactions with the protein, substituents need to be added to the starting fragment at precise positions referred to as growth vectors (Fig. 1). Clearly, the direction and synthetic tractability of growth vector elaboration are critical in defining the suitability of a fragment for further development, together with other properties relevant to drug discovery.[1,47] Furthermore, fragment elaboration may also reveal cryptic subpockets that result from residue movement to accommodate binding of the ligand.[48,49] In addition to these factors, commercial availability of closely related analogues[50] with exemplified growth vectors, heterocycle core modifications, or well-established scaffold modifications are important to consider when selecting which fragment to consider as a suitable starting point for a fragment-to-lead program.
Fig. 1
Example of a protein–fragment crystal structure that is used to identify specific growth vectors (arrows) to guide fragment-to-lead elaboration.
For fragment optimisation, access to close analogues of the fragment hit determines the speed with which fragments can be evaluated and prioritised. This involves the analysis of protein fragment X-ray crystal structures to identify suitable growth vectors for the introduction of common functional groups onto the fragment and should be limited to a HAC (heavy atom count) ≤ 6 (Fig. 2). Frequently, closely related analogues can be obtained from commercial suppliers to determine structure activity relationships (SAR).[51] However, this process can be a limiting factor for unexemplified fragments given the timelines of many commercial drug discovery projects. Recently, several groups have explored synthetic methods to address this issue, such as the concept of ‘poised fragments’ by Brennan and colleagues which utilise pre-functionalised fragments and the Spring group with derivatization of DOS-derived fragments.[52-54]
Fig. 2
Representative examples of common functional groups added during fragment elaboration. Note – not an exhaustive list of functional groups. HAC = number of non-hydrogen atoms (heavy atom count).
In this Review, we describe a retrospective comparison of fragments from screening campaigns at Astex Pharmaceuticals. Through this retrospective analysis we have developed a working definition of fragment sociability: an unsociable fragment is one that has limited (if any) synthetic methodology to enable growth vector elaboration and few commercially available close analogues. In contrast, a sociable fragment is one supported with robust synthetic methodology that enables every growth vector to be elaborated and a significant number of commercially available close analogues. To illustrate the concept of sociability we identified two X-ray hits from each of the selected programs (uPA, H-PGDS, and HCV NS3 protease-helicase); an unsociable fragment that was down prioritized due to synthetic intractability and a sociable fragment that progressed to a lead compound (Table 1).
Fragments that have been identified binding to three protein targets and classified by sociability as determined by published methods and close analogues available on eMolecules®
Protein target
Unsociable fragments
Sociable fragments
Structure
Published method
Commercially available close analogues
Structure
Published method
Commercially available close analogues
Urokinase-like plasminogen activator (uPA)
1
No[55]
<5
2
Yes[56]
>100
Hematopoietic prostaglandin D2 synthase (H-PGDS)
3
Yes[57]
<10
4
Yes[58]
>250
Hepatitis C virus NS3 protease-helicase
5
Yes[59]
<5
6
Yes[60]
>1000
Results and discussion
Urokinase-type plasminogen activator (uPA)
Urokinase-type plasminogen activator (uPA) is a trypsin-like serine protease that catalyses the conversion of plasminogen to plasmin through amide bond hydrolysis.[61] Plasmin is responsible for a number of proteolytic processes that degrade components of the extracellular matrix enabling cellular migration.[62] As such, uPA is linked to metastasis in cancer and therefore a target for therapeutic intervention.[63]We screened our fragment library against uPA and detected 105 X-ray hits bound in the catalytic site of the enzyme.[64] This enabled the team to identify the essential binding pharmacophore as the charged ammonium or amidine acting as a formal hydrogen bond donor (Fig. 3A). The pharmacophore interacts with the backbone carbonyls of Gly219 and Ser190 as well as forming a salt bridge with the side chain of Asp189. Among the fragments binding to the active site of the enzyme we identified the fragments 1 and 2 (Fig. 3A and C). Additionally, both fragments make beneficial hydrophobic interactions with the protein.
Fig. 3
Urokinase-type plasminogen activator (uPA)-fragment co-complexes A) overlay of unsociable fragment 1 (orange) and sociable fragment 2 (green) bound to the active site of uPA. B) Overlay of sociable fragment 2 (green) and lead compound 7 (pink). C) Properties and biochemical potencies of unsociable fragment 1, sociable fragment 2, and lead compound 7. Red circle – binding pharmacophore, blue circle and arrow – growth vector.
Fragment 1 was a ligand efficient starting point (LE > 0.51) for fragment-to-lead program. The X-ray structure of fragment 1 bound to uPA identified several potential growth vectors from the saturated ring. Fortuitously, during the initial screening campaign Abbott Laboratories (now Abbvie) reported on the development of napthamidine inhibitors of uPA which identified that growth towards the catalytic triad (Ser195, His57, and Asp102) resulted in a substantial gain in affinity.[65] Guided by the crystal structure and literature information the optimal growth vector to access the catalytic machinery was identified as the 5-position from 1 (Fig. 3A and C). Although commercially available from several suppliers, the exact synthetic route of 1 is not reported[55] and close analogues that are elaborated at the 5-position were not commercially available, therefore early SAR could not be easily obtained. This situation would require a substantial investment in chemistry resource at an early stage in the program to develop a bespoke diastereo- and enantioselective synthetic route to close analogues of 1.[66] Due to this intractability, fragment 1 was not selected for optimisation against uPA.In contrast to fragment 1, the orally active drug mexiletine 2 was identified as a more attractive fragment hit for follow-up.[67] While the clog P of 2 was quite high, this fragment has clear developability and substantial chemistry reported in the literature, there was inherent confidence in this fragment as a suitable starting point. Thorough exploration of the active site pocket from the 4-position of the fragment was enabled by chemistries such as Suzuki–Miyaura cross-coupling and amide bond formation. This facilitated rapid SAR gathering and led to the lead compound 7, a potent inhibitor of uPA with an IC50 = 0.07 μM (LE = 0.31) (Fig. 3B and C).[64]
Hematopoietic prostaglandin D2 synthase (H-PGDS)
H-PGDS is an enzyme responsible for the isomerisation of prostaglandin H2 to prostaglandin D2 (PGD2).[68] The biological effects of PGD2 include vasodilation, bronchoconstriction, inhibition of platelet aggregation among others. As such, H-PGDS has been indicated as a target for allergic rhinitis and other inflammatory disorders.[69]Fragment screening against H-PGDS identified 76 fragments binding to the catalytic site of the protein.[70] These fragments contain a similar chemical architecture that compliments the binding pocket with a polar heterocycle linked via a carbon–carbon bond to an aromatic ring. This is illustrated in both fragments 3 and 4 (Fig. 4A and B). Additionally, the pyrazole moiety of 4 binds to the protein through a donor–acceptor interaction with Asp96 and Tyr152. This was an unexpected result as H-PGDS is known to bind lipophilic aryl pharmacophores, whereas our fragment screening identified a strong polar interaction within the lipophilic pocket.[58]
Fig. 4
Hematopoietic prostaglandin D2-synthase (H-PGDS)-fragment co-complexes A) overlay of unsociable fragment 3 (orange) and sociable fragment 4 (green). B) Overlay of sociable fragment 3 (green) and lead compound 8 (pink). C) Properties and biochemical potencies of fragments 3, 4 and the lead compound 8. Red circle – binding pharmacophore, blue circle and arrow – growth vector.
Similar to compound 2, fragment 3 is an approved oral drug ((+/−)-tetramisole, (+)-dexamisole, and (−)-levamisole),[57] and therefore represents an attractive starting point for hit-to-lead. This fragment does not undergo a formal hydrogen bond donor or acceptor interaction but has a beneficial π–π-stacking interaction with Trp104 and excellent shape complementarity with the protein (Fig. 4A). However, the route to 3 is linear and does not lend itself to the rapid generation of analogues, in particular from the sp3-growth vectors of the core bicycle. Thus, limited access to close analogues and a linear synthetic route rendered compound 3 an unsociable fragment. Coupled with the sub-optimal ligand efficiency (LE ∼ 0.29) this hit was down prioritised in favour of 4.Fragment 4 is a weak but ligand efficient inhibitor of H-PGDS and the aromatic substituent of 4 forms a π–π-stacking interaction with Trp104 (Fig. 4B). As with all sociable fragments, there were a substantial number of commercially available analogues that facilitated rapid follow up of fragment 4 from the 3-position of the pyrazole and from the 4-position of the other aromatic ring. Once the commercially available compounds were exhausted, reliable synthetic chemistry enabled rapid progress to the lead compound 8 which transitioned into lead optimization (Fig. 4C).[58]
Hepatitis C virus NS3 protease-helicase (HCV NS3 protease-helicase)
The hepatitis C virus genome encodes ten viral proteins that ensure the propagation of the viral particle.[71,72] Of these proteins, the NS3 protein is a bifunctional enzyme that contains an N-terminal serine protease domain and a C-terminal helicase domain that are closely associated in the full length protein.[73] While there had been extensive studies on the isolated protease domain[74] and early reports on the helicase domain,[75] we performed the first fragment screen on the full-length protein.The fragment screen was performed against the HCV NS3 full-length genotype 1b holozyme.[76] The output from the screen identified a novel binding site at the interface of the protease-helicase domains.[77] It is a relatively lipophilic binding site with acidic amino acid residues (Asp527, Asp79, Glu628) lining the entrance to the tunnel. This was reflected in the prevalence of lipophilic fragment hits, such as compounds 5 and 6 (Fig. 5A and C).
Fig. 5
Hepatitis C virus NS3 protease-helicase (HCV NS3 protease-helicase) – fragment co-complexes A) overlay of unsociable fragment 5 (orange) and sociable fragment 6 (green). B) Overlay of sociable fragment 6 (green) and lead compound 9 (pink). C) Properties and biochemical potencies of fragments 5, and 6 and the lead compound 9. Red circle – binding pharmacophore, blue circle and arrow – growth vector.
Compound 5, the commercially available efaroxan,[59] contains a semi-saturated bicycle with a quaternary stereocentre that interacts with the entrance to the tunnel of the protein (Fig. 5A) This was an essential growth vector to interact with the acidic residues around the entrance of the pocket. Another key growth vector that needs to be synthetically enabled originates from the 5-position of the aromatic ring. While 5 itself is commercially available and there is literature precedent for this compound, there is a limited number of commercially available analogues and the synthetic route is a lengthy linear process.[78,79] This necessitates the installation of the desired substitution or a synthetic handle at an early stage of the synthesis which limits throughput and lengthens the design-synthesise-test paradigm of medicinal chemistry, making this an unsociable fragment.[80]The HCV NS3 fragment co-complexes of 6 enabled the effective deployment of structure-based drug design (SBDD) to identify viable growth vectors to target specific residues and sub-pockets within the protein to rapidly improve affinity. Examining this structural data, a systematic exploration of the SAR in the tunnel site was carried out whilst maintaining the benzylamine as the binding pharmacophore. This process was facilitated by the large number of commercially available analogues with this motif. Additionally, the facile introduction of the benzylic stereocentre and ease of amine substitution enabled the efficient growth of the ligand around the tunnel entrance. Finally, exploration of the entrance to the tunnel from the benzylic amine growth vector culminated in the identification of the lead molecule 9, a potent allosteric inhibitor (IC50 = 0.1 μM, LE = 0.38) of the HCV NS3 protease-helicase (Fig. 5B and C).[77] When the aforementioned factors are considered, it is clear that compound 6 can be classified as a sociable fragment.
Method for the identification of unsocial fragments
The fragment network, recently described by Hall and co-workers, is based on a graph database, which is a data structure that is common in social media applications. In social media a node in the network represents a person and each edge in the network represents a friendship between two people. A person with many friendships can be thought of as sociable. In the fragment network a node in the network represents a fragment molecule and an edge represents a relationship between two fragments (based on their similarity). By analogy to social media we denote a fragment with many edges to be a sociable fragment.[81]To identify unsocial fragments, we utilized the fragment network on all fragments in the current version of our core screening library (1651 compounds). For each fragment, we interrogated the corresponding node in the fragment network and focused on edges between this node and neighbouring nodes that have a higher heavy atom count, indicating commercially available analogues that are growth vector enabled. By grouping the connecting edges by positional growth vector and comparing the ratio of observed positions at which a fragment is grown to the maximum theoretical number of growth points, we could estimate how many growth points are synthetically accessible. We then ranked the compounds in our fragment library according to this ratio of observed growth points. The number of protein targets against which each fragment had been observed as an X-ray hit was also used to rank the least sociable fragments. For the purposes of this work, only commercially available fragments available in eMolecules® were considered in the analysis.[82] This analysis resulted in the identification of 30 putative unsociable fragments.The fragment network is our preferred methodology for the rapid assessment of sociability for a large number of fragments, however, sociability can also be assessed by a collection of substructure searches and analysis of individual fragments in standard searching tools such as SciFinder®. To further confirm the initial analysis performed by the fragment network, we performed a manual assessment of the 30 compounds for commercial availability of close analogues on SciFinder®. We subsequently went further and manually assessed the literature for synthetic methods to access the target molecule and analogues thereof within 4-synthetic steps. Robustness of the chemistry with a focus on commonly utilised medicinal chemistry reactions was also considered to target suitable quantities of material.[83] This manual process led to the identification of 12 compounds within our fragment library that we consider as unsociable fragments (Scheme 1).
Scheme 1
12 fragments contained within our fragment library are considered unsociable fragments. These are examples of fragments that require organic methodology development to become sociable.
Synthetic routes could be envisioned for these unsocial fragments, but they may not be consistent with the time constraints of typical drug discovery projects with the pressure of pressing unmet medical needs. Some of these fragments could become more synthetically tractable if minor changes were made to the structure (e.g. introduction of a carbonyl at C-4 of 1 or removal of the –Et from 5) or selecting different growth vectors. However, in general these changes disrupt the protein binding interactions and therefore are not suitable from a SBDD perspective. If there was methodology to selectively functionalize these unsociable fragments at each and every carbon growth vector with the common functional groups added during fragment elaboration (Fig. 2) while maintaining the binding pharmacophore, we would consider this a valuable development in FBDD and organic chemistry as a whole.The Astex fragment library has been subjected to constant analysis to improve performance. These analyses have resulted in evolution of the library over several generations taking into account these learnings.[1] Two of these key factors – synthetic tractability and synthetic vectors – have influenced the design and chemical space occupied by the library and are important aspects influencing which fragments that are included in the library. This correlates to the low rate of unsociability for the fragments contained within the Astex library. However, if more esoteric fragments were socialized then they would warrant inclusion into our library and provide an opportunity to identify novel starting points for drug discovery.
Identification of false positives – the eye of an organic chemist
Computational analysis (such as our fragment network) can result in sociable fragments being identified as unsociable (apparent unsociable fragment), so it is important to engage a skilled organic chemist prior to assessing if a compound is or is not sociable. As the implications of an apparently unsociable fragment can down-prioritise an otherwise valuable fragment in favour of other chemical matter. For example, this is observed for simple bond disconnections or single-step functional group transformations and will be discussed in more detail below.One class of false positive is a double scaffold in which two ring systems are joined by a simple linker or single bond (Scheme 2). Such molecules do not score well in the fragment network analysis because the growth points on each ring system are not well represented in commercial databases; yet analogues would be facile to synthesise as the apparent unsociable fragment yields two highly social compounds upon disconnection. These functionalised reagents represent good starting points for diversification through robust chemical transformations.[52]
Scheme 2
Examples of apparently unsociable fragments and the single bond transformation that yields functionalised sociable reagents that enable rapid analogue synthesis via robust organic methods.
A second type of false positive manifests when a simple functional group modification can result in an otherwise social fragment being misidentified. As a representative example, the dihydrobenzothiazine-dioxide (10) is poorly socialised when analysed by the fragment network.[84] However, ‘simplification’ to the reduced dihydrobenzothiazine (11) results in identification of several commercially available analogues.[85] It is important to note that synthetic elaboration of every carbon position of 11 is exemplified thus enabling access to every growth vector of 10 through a single-synthetic transformation step. As such, complex molecules should be simplified to the core scaffold by a single bond-forming or bond breaking chemical transformation to identify near neighbours (Scheme 3). We expect that the recent advances in AI to make a significant impact in this area of fragment sociability and growth vector elaboration in the near future.[86,87]
Scheme 3
Example of false positive ‘unsociable fragment’ based on functional group manipulation. Simplification of fragment 10 results in a more sociable compound 11 that is growth vector enabled at each carbon atom (selected commercially available examples identified by the fragment network).
Conclusions
FBDD is a key hit-finding technology for drug discovery and has enabled the discovery of several approved drugs. However, as the field of FBDD has developed over the past 20 years it has revealed the need for further development in the field of organic synthesis to successfully functionalise specific growth vectors of polar, unprotected small molecules using medicinal chemistry relevant transformations.[7,88] This Review is intended to inspire the development of organic methodology targeted at unsociable fragments in an effort to socialise them and thereby facilitate the development of novel medicines.[1,89]
Conflicts of interest
The authors are employees of Astex Pharmaceuticals.
Authors: Frederick W Goldberg; Jason G Kettle; Thierry Kogej; Matthew W D Perry; Nick P Tomkinson Journal: Drug Discov Today Date: 2014-10-02 Impact factor: 7.851
Authors: Kaibo Feng; Raundi E Quevedo; Jeffrey T Kohrt; Martins S Oderinde; Usa Reilly; M Christina White Journal: Nature Date: 2020-03-16 Impact factor: 49.962
Authors: Peter C Ray; Michael Kiczun; Margaret Huggett; Andrew Lim; Federica Prati; Ian H Gilbert; Paul G Wyatt Journal: Drug Discov Today Date: 2016-10-26 Impact factor: 7.851
Authors: Thomas J Struble; Juan C Alvarez; Scott P Brown; Milan Chytil; Justin Cisar; Renee L DesJarlais; Ola Engkvist; Scott A Frank; Daniel R Greve; Daniel J Griffin; Xinjun Hou; Jeffrey W Johannes; Constantine Kreatsoulas; Brian Lahue; Miriam Mathea; Georg Mogk; Christos A Nicolaou; Andrew D Palmer; Daniel J Price; Richard I Robinson; Sebastian Salentin; Li Xing; Tommi Jaakkola; William H Green; Regina Barzilay; Connor W Coley; Klavs F Jensen Journal: J Med Chem Date: 2020-04-14 Impact factor: 7.446
Authors: Iwan J P de Esch; Daniel A Erlanson; Wolfgang Jahnke; Christopher N Johnson; Louise Walsh Journal: J Med Chem Date: 2021-12-20 Impact factor: 7.446
Authors: Hannah E Askey; James D Grayson; Joshua D Tibbetts; Jacob C Turner-Dore; Jake M Holmes; Gabriele Kociok-Kohn; Gail L Wrigley; Alexander J Cresswell Journal: J Am Chem Soc Date: 2021-09-20 Impact factor: 15.419
Authors: Radoslaw Kitel; Ismael Rodríguez; Xabier Del Corte; Jack Atmaj; Magdalena Żarnik; Ewa Surmiak; Damian Muszak; Katarzyna Magiera-Mularz; Grzegorz M Popowicz; Tad A Holak; Bogdan Musielak Journal: ACS Chem Biol Date: 2022-09-08 Impact factor: 4.634
Authors: David J Hamilton; Marieke Beemsterboer; Caroline M Carter; Jasmina Elsayed; Rilana E M Huiberts; Hanna F Klein; Peter O'Brien; Iwan J P de Esch; Maikel Wijtmans Journal: ChemMedChem Date: 2022-03-30 Impact factor: 3.540