Literature DB >> 36008483

Advances in covalent drug discovery.

Lydia Boike^1,2,3, Nathaniel J Henning^1,2,3, Daniel K Nomura^4,5,6.

Abstract

Covalent drugs have been used to treat diseases for more than a century, but tools that facilitate the rational design of covalent drugs have emerged more recently. The purposeful addition of reactive functional groups to existing ligands can enable potent and selective inhibition of target proteins, as demonstrated by the covalent epidermal growth factor receptor (EGFR) and Bruton's tyrosine kinase (BTK) inhibitors used to treat various cancers. Moreover, the identification of covalent ligands through 'electrophile-first' approaches has also led to the discovery of covalent drugs, such as covalent inhibitors for KRAS(G12C) and SARS-CoV-2 main protease. In particular, the discovery of KRAS(G12C) inhibitors validates the use of covalent screening technologies, which have become more powerful and widespread over the past decade. Chemoproteomics platforms have emerged to complement covalent ligand screening and assist in ligand discovery, selectivity profiling and target identification. This Review showcases covalent drug discovery milestones with emphasis on the lessons learned from these programmes and how an evolving toolbox of covalent drug discovery techniques facilitates success in this field.

Entities: Chemical

Year: 2022 PMID： 36008483 PMCID： PMC9403961 DOI： 10.1038/s41573-022-00542-z

Source DB: PubMed Journal: Nat Rev Drug Discov ISSN： 1474-1776 Impact factor: 112.288

Introduction

Covalent drugs incorporate a mildly reactive functional group that forms a covalent bond with protein targets to confer additional affinity beyond the non-covalent interactions involved in drug binding[1]. Historically, concerns about the interference of these reactive molecules with biological assays and potential lack of selectivity often discouraged further investigation[2,3]. Many early covalent drugs were discovered serendipitously and bind active sites to inhibit enzymatic activity[4]. These drugs often mimic a substrate transition state to enable covalent modification of a catalytic amino acid residue. Over the past 30 years, the rational design of covalent drugs has garnered increased interest, and covalently targeting non-conserved amino acids to increase selectivity has become commonplace[2,5]. The prolonged target engagement of covalent drugs can provide distinct pharmacodynamic profiles and exceptional potency[6]. The potential benefits of covalency have inspired medicinal chemists to explore the covalent drug space despite concerns about reactivity. In many cases, compromises between reactivity, selectivity and potency have produced safe and effective drugs. Key examples that we discuss here (Fig. 1 and Table 1) include the Bruton’s tyrosine kinase (BTK) inhibitor ibrutinib (AbbVie) and the epidermal growth factor receptor (EGFR) inhibitor osimertinib (AstraZeneca), with sales totalling US$8.43 billion and $4.33 billion in 2020, respectively[7,8]. Moreover, potent inhibition through covalent modification has enabled targeting of traditionally ‘undruggable’ proteins, exemplified by the approval of sotorasib (Amgen), which is an inhibitor of mutant KRAS(G12C), a GTPase that resisted decades of drug discovery efforts[9,10] (Fig. 1). At the same time, more traditional covalent targeting of protease active sites has continued to yield valuable drugs, such as nirmatrelvir (Pfizer), which inhibits the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease (Mpro)[11] (Fig. 1).

Fig. 1

Timeline of the development of major covalent drugs.

Table 1

Key examples of covalent drugs

Drug (company; former compound name)	Target	Approval^a	Refs.
Osimertinib (AstraZeneca; AZD9291)	Mutant-selective EGFR inhibitor	2015 for treatment of NSCLC	[44,45,48,49]
Ibrutinib (AbbVie; PCI-32765)	BTK inhibitor	2013 for treatment of mantle-cell lymphoma and, subsequently, many other B cell malignancies	[55,56,60]
Sotorasib (Amgen; AMG-510)	KRAS(G12C) inhibitor	2021 for treatment of NSCLC with KRAS^G12C mutation	[93–95]
Nirmatrelvir (Pfizer; PF-07321332)	SARS-CoV-2 main protease inhibitor	Authorized for emergency use by FDA in 2021 for treatment of COVID-19^b	[103,105]
Voxelotor (Global Blood Therapeutics; GBT-440)	Mutant-haemoglobin modulator	2019 for treatment of sickle cell anaemia	[142,143,145]

BTK, Bruton’s tyrosine kinase; COVID-19, coronavirus disease 2019; EGFR, epidermal growth factor receptor; NSCLC, non-small-cell lung cancer; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2. aUnless otherwise indicated, the date of the first approval by one of the major regulatory agencies is provided, which was the FDA in each case. bNirmatrelvir is approved for use in combination with ritonavir (Paxlovid). The FDA’s emergency use authorization preceded approval by other agencies, but Paxlovid has not yet been fully approved by the FDA.

Timeline of the development of major covalent drugs.

Each covalent drug is classified according to the drug type or type of disease it treats. Unless otherwise indicated, the date refers to the first approval by the US Food and Drug Administration. NSAID, non-steroidal anti-inflammatory drug. Key examples of covalent drugs BTK, Bruton’s tyrosine kinase; COVID-19, coronavirus disease 2019; EGFR, epidermal growth factor receptor; NSCLC, non-small-cell lung cancer; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2. aUnless otherwise indicated, the date of the first approval by one of the major regulatory agencies is provided, which was the FDA in each case. bNirmatrelvir is approved for use in combination with ritonavir (Paxlovid). The FDA’s emergency use authorization preceded approval by other agencies, but Paxlovid has not yet been fully approved by the FDA. Targeted covalent inhibitors are often discovered through structure-guided design by incorporating an electrophile into a ligand that would otherwise reversibly bind the target protein. The incorporated electrophile binds irreversibly to an amino acid on the target protein, introducing a covalent interaction in addition to the reversible interactions already at play. Covalent ligand screening is another ligand discovery approach that is becoming more common, whereby various methods are used to discover covalent ligands from libraries of electrophilic compounds[7,12,13]. This ‘electrophile-first’ approach is partly facilitated by the development of chemoproteomic platforms that enable rapid target identification and selectivity profiling of covalent ligands[14-18]. Combining the advances in covalent ligand screening and chemoproteomics with structural biology to empower medicinal chemistry has the potential to generate molecules that selectively bind challenging protein targets. In this Review, we start by briefly highlighting historical examples of covalent drugs and their mechanisms of action. We then elaborate on milestones in covalent drug discovery over the past decade, categorizing our discussion on the basis of the discovery approach taken. Finally, we summarize the toolbox of emerging covalent drug discovery techniques, with emphasis on screening strategies and selectivity profiling.

History of covalent drugs

Compounds that contain protein-reactive functional groups have often been avoided in medicinal chemistry and excluded from compound screening collections owing to their potential for assay interference and off-target promiscuity. Many historical examples of covalent drugs were discovered to act through covalent mechanisms after their use was already widespread. One of the most prominent among these is the non-steroidal anti-inflammatory drug (NSAID) aspirin, which has been marketed since 1899 (ref.[19]) (Fig. 1). Aspirin’s mechanism of action was unknown until 1971, when it was discovered to exert its anti-inflammatory effects by acetylating Ser529 in the substrate-binding channel of cyclooxygenase 1, preventing conversion of the substrate arachidonic acid into prostaglandins[20]. Early covalent drugs also tend to be derived from or inspired by natural sources. β-Lactam antibiotics such as penicillin (Fig. 1), produced by Penicillium fungi, bind to penicillin-binding proteins (PBPs), which are involved in bacterial cell wall synthesis[21]. All PBPs contain active-site serine residues that can be acylated by penicillin, inhibiting PBP activity and leading to cell membrane rupture[21]. Another covalent antibiotic is the epoxide-containing fosfomycin (Fig. 1), which is produced by some Streptomyces bacteria and acts by reacting with the catalytic cysteine of UDP-N-acetylglucosamine-enolpyruvyl transferase (MurA) to disrupt peptidoglycan synthesis and induce membrane rupture[22-24]. Some covalent drugs are prodrugs with thiol-containing metabolites that form disulfide bonds to inactivate their targets[25]. The proton pump inhibitor omeprazole (Fig. 1), approved by the US Food and Drug Administration (FDA) in 1988 to treat gastro-oesophageal reflux disease, is an example of this and is also a drug that was brought to market before its mechanism of action was understood to be covalent. Both omeprazole and clopidogrel (Fig. 1), an antiplatelet medication used to prevent strokes and heart attacks, are activated by cytochrome P450 enzymes in the liver to produce bioactive thiol metabolites[26]. Covalent drugs have also been historically significant in cancer therapy. The pyrimidine nucleoside analogues 5-fluorouracil[27,28] and gemcitabine[29] are prodrugs used to inhibit thymidylate synthase and ribonucleotide reductase I, respectively, to treat a wide range of cancers (Fig. 1). Bortezomib (Fig. 1), a dipeptide boronic acid that covalently binds to and inhibits a catalytic threonine residue of the 26S proteasome, was approved by the FDA in 2003 to treat patients with multiple myeloma[30]. Covalent drugs have been used to treat a variety of diseases. However, focusing on covalency from the outset of a project, instead of discovering a covalent mechanism of action after the fact, provides opportunities to improve drug design. Recent work in this field showcases how covalent drug discovery tools present solutions to otherwise intractable drug discovery challenges.

Discoveries by ligand-first approaches

Major milestones of covalent drug discovery have been reached over the past decade, including the FDA approval of the first covalent EGFR inhibitor, afatinib, in 2013, the BTK inhibitor, ibrutinib, in 2013 and the discovery of other kinase inhibitors. To discover these compounds, mildly reactive electrophilic functional groups were incorporated into known reversible ligands to enhance the inhibition of protein function. These examples offer lessons for future programmes, as each compound must balance reactivity, potency and selectivity.

Covalent EGFR inhibitors

Overactivity of the receptor tyrosine kinase EGFR drives the progression of non-small-cell lung cancer (NSCLC), making EGFR a key drug target in oncology[31]. During clinical development in the early 2000s, the reversible, first-generation EGFR inhibitors gefitinib and erlotinib (Fig. 2a) were discovered to be effective against tumours harbouring somatic activating mutations in EGFR, either deletions in exon 19 or the L858R point mutation, which occur in 10–30% of patients with NSCLC[31-33]. However, the disease in these patients eventually still progressed; in 60% of cases this was due to the acquisition of the T790M ‘gatekeeper mutation’[34,35]. This mutation of the gatekeeper residue in the ATP-binding site of EGFR not only decreases the binding affinity of many reversible inhibitors for EGFR but also increases the binding affinity of EGFR for ATP[36].

Fig. 2

Progression of EGFR inhibitor structures.

Progression of EGFR inhibitor structures.

Progression from first-generation (part a) to second-generation (part b) epidermal growth factor receptor (EGFR) inhibitors involved the addition of a reactive acrylamide electrophile (highlighted in red) to covalently bind a cysteine residue (Cys797) in EGFR. In the progression from second-generation to third-generation (part c) EFGR inhibitors, the quinazoline moiety is replaced with a pyrimidine unit to provide selectivity for the T790M mutant. To overcome this problem, covalent second-generation inhibitors were strategically designed with acrylamide Michael acceptors to react with a cysteine residue (Cys797) in EGFR (Fig. 2b). Cys797 is located adjacent to the ATP-binding site, and irreversible binding of EGFR ligands to EGFR partially restores activity against the T790M gatekeeper mutant[33]. In addition to modest activity against T790M, covalent second-generation inhibitors provided prolonged suppression of EGFR signalling, suggesting that these covalent EGFR inhibitors could be more efficacious than reversible first-generation inhibitors such as erlotinib[33]. Afatinib (Boehringer Ingelheim) (Figs. 1 and 2b) was approved by the FDA in 2013 as a first-line treatment for patients with metastatic NSCLC with activating mutations in EGFR[37,38]. Despite the increased potency that covalent engagement brought against the disease target, the dose-limiting toxicity caused by inhibition of wild-type EGFR likely prevented afatinib from increasing overall survival when compared head-to-head with platinum-based chemotherapy in treating cancers bearing the T790M gatekeeper mutation[39]. Other second-generation inhibitors include neratinib (Puma), which potently inhibits HER2 by covalently binding Cys805 (a cysteine residue homologous to Cys797 on EGFR) and was approved by the FDA for treatment of HER2+ breast cancer in 2017, and dacomitinib (Pfizer) (Fig. 2b), which was approved by the FDA to treat NSCLC in 2018 (refs.[40-42]). A third generation of EGFR inhibitors followed afatinib; these covalent inhibitors selectively target the T790M mutant over wild-type EGFR, and include WZ4002 (Dana–Farber Cancer Institute)[43], osimertinib[44,45] and rociletinib (Clovis Oncology; CO-1686)[46] (Fig. 2c). These compounds maintain the acrylamide group to covalently bind Cys797 but exchange the quinazoline moiety of first-generation and second-generation compounds for a pyrimidine to promote selectivity for mutant EGFR(T790M) (ref.[47]). Higher affinity for T790M over wild-type EGFR not only results in efficacy in cancers with the EGFR gatekeeper mutation but also contributes to an improved safety profile and enables a higher recommended dose for osimertinib than for afatinib[48]. Osimertinib was granted accelerated approval by the FDA in 2015 as a second-line treatment for NSCLC, and was approved as a first-line treatment for metastatic NSCLC in 2018 (ref.[49]). However, osimertinib depends on Cys797 for covalent binding, and C797X mutations account for 15% of cases of resistance to second-line osimertinib[50-52]. Generally, drugs whose efficacy relies on covalent binding to a specific nucleophilic amino acid are vulnerable to mutations at that site, which could lead to drug resistance. The success of covalent EGFR inhibitors has validated the approach of covalently engaging non-catalytic, non-conserved cysteines adjacent to kinase active sites to increase the potency and modulate the pharmacodynamics of initially reversible ligands. Development of these drugs has shown that the acrylamide electrophile is reactive enough to engage a cysteine adjacent to an ATP-binding site but not so reactive as to induce haptenization and an adverse immune response. Incorporating covalent binding in EGFR inhibitors also enables selectivity between kinases through an interaction with a non-conserved cysteine instead of highly conserved active-site residues that typically interact with ATP.

Covalent BTK inhibitors

The discovery of covalent BTK inhibitors shares several themes with the discovery of covalent EGFR inhibitors, including the ligand-first approach and the use of Michael acceptor electrophiles. BTK became a target of interest in chronic lymphocytic leukaemia owing to its crucial role downstream of the B cell receptor[53]. Activation of the B cell receptor induces phosphorylation of BTK through Lyn and Syk kinases, and eventually activates transcription factors related to B cell proliferation, differentiation, cell migration and adhesion[54]. This key role in B cell development indicated that BTK was a relevant target for B cell malignancies. In the early 2000s, scientists at Celera Genomics who were interested in using BTK inhibitors to treat rheumatoid arthritis used a structure-based approach to discover an acrylamide-containing inhibitor of the BTK kinase domain that could be used as a tool compound to fluorescently label BTK[55]. It was subsequently discovered that the tool compound itself, later named ibrutinib (Table 1), had sufficient activity and suitable physicochemical properties to advance into clinical studies[56-58]. Ibrutinib was approved by the FDA for the treatment of mantle-cell lymphoma in 2013 and subsequently for chronic lymphocytic leukaemia (CLL), Waldenstrom’s macroglobulinaemia and chronic graft versus host disease[59-63]. Similarly to EGFR inhibitors, ibrutinib binds to a cysteine residue (Cys481) adjacent to the ATP-binding site in BTK, and because only a few kinases have a homologous cysteine, ibrutinib should exhibit a degree of selectivity for BTK over other kinases[64]. The rapid clearance of ibrutinib (which has a half-life of 2–3 h) could also enable kinase selectivity; ibrutinib should maintain activity against BTK owing to prolonged covalent engagement, while the reversible inhibition of off-targets is minimized[65]. This combination of fast covalent engagement of BTK with rapid clearance might allow for selectivity in vivo despite the off-target kinase inhibition observed in biochemical assays. Several other covalent BTK inhibitors have been approved or are currently in clinical trials and some of these highlight the variety of Michael acceptors that can be used as alternative electrophiles to acrylamides[66-68]. Most prominent among these is acalabrutinib (AstraZeneca), approved by the FDA in 2019 to treat CLL, which contains a butyramide electrophile instead of an acrylamide[69]. The butyramide electrophile is less reactive than an acrylamide, which, in addition to other substitutions, is hypothesized to account for the superior selectivity of acalabrutinib compared with ibrutinib for BTK and could be responsible for the reduced number of adverse cardiovascular events[70,71]. Further work has examined the use of cyanoacrylamides as electrophiles to design reversible covalent BTK inhibitors, which would ideally show increased potency and lower covalent off-target reactivity than irreversible covalent BTK inhibitors[72-74]. The long, tunable off-rates of reversible covalent inhibitors highlights the grey area that exists between reversible inhibition and irreversible covalent mechanisms. Overcoming historical concerns relating to the potential toxicity of covalent drugs, the success of ibrutinib demonstrates that rationally designed covalent drugs can achieve acceptable safety profiles and blockbuster status. Ibrutinib, and covalent EGFR inhibitors, demonstrate that kinase inhibitors that target non-conserved cysteines adjacent to the ATP-binding site can be developed into selective and potent drugs. The pharmacokinetic and pharmacodynamic properties of ibrutinib allow for prolonged BTK blockade while reducing off-target kinase inhibition through rapid clearance in vivo. Notably, the performance of ibrutinib in treating B cell malignancies emphasizes that molecules once considered chemical biology tool compounds can become effective drugs.

Other covalent kinase inhibitors

Covalent inhibitors have been used to selectively target kinases other than EGFR and BTK with non-conserved cysteine residues adjacent to their ATP-binding sites[75,76]. One example is Janus kinase 3 (JAK3), a non-receptor tyrosine kinase primarily expressed in leukocytes and involved in cytokine signalling[77]. Covalent targeting of the non-conserved Cys909 of JAK3 has yielded inhibitors selective for JAK3 over other JAK family members for the treatment of autoimmune diseases[77-81]. One of these inhibitors, ritlecitinib (Pfizer; PF-06651600), has shown promising results for patients with rheumatoid arthritis in a phase II clinical trial[82]. Several covalent inhibitors of fibroblast growth factor receptor 4 (FGFR4) target the non-conserved Cys552 residue in FGFR4 to confer selectivity over FGFR1, FGFR2 and FGFR3, as well as to overcome mutations that confer resistance to reversible FGFR inhibitors in hepatocellular carcinoma (HCC)[83,84]. The acrylamide-containing FGFR4 inhibitor fisogatinib (Blueprint Medicines; BLU-554) is currently the subject of a phase II clinical trial (NCT04194801). Aldehyde-containing roblitinib (Novartis; FGF401), which is a reversible covalent FGFR4 inhibitor that also reacts with Cys552, is also under clinical investigation (NCT02325739)[83,85]. Overall, the rational design of covalent kinase inhibitors that target non-conserved cysteines adjacent to the ATP-binding site has become a routine approach to enhancing the potency and selectivity of kinase inhibitors.

Discoveries by electrophile-first approaches

Covalent drugs are also discovered through electrophile-first approaches, meaning that the initial discovery process is rooted in finding a covalent ligand from the outset, instead of incorporating covalency into a known reversible ligand. Key examples of drugs discovered through this approach include the KRAS(G12C) inhibitor sotorasib and the SARS-CoV-2 Mpro inhibitor nirmatrelvir (Fig. 1).

Covalent KRAS(G12C) inhibitors

The discovery and development of covalent KRAS(G12C) inhibitors is one of the most exciting discovery-to-clinic stories featuring covalent drugs. KRAS is a GTPase-encoding oncogene that is mutated in about 25% of all cancers, most notably in pancreatic, colorectal and lung cancers[86]. Wild-type KRAS is carefully regulated between the active GTP-bound state and inactive GDP-bound state, but many KRAS mutations attenuate GTPase activity, leading to low rates of GTP hydrolysis and elevated RAS signalling, driving tumorigenesis[87]. Since the discovery of the role of KRAS in cancer nearly 30 years ago, attempts to drug it directly using traditional drug discovery methods have been unsuccessful[9,10]. KRAS does not have accessible pockets for reversible inhibitors to bind to, competitive inhibitors would need to overcome the picomolar binding affinities of GTP and GDP, and inhibitors active against wild-type KRAS could show on-target toxicity[86,88]. Covalent KRAS inhibitors against the G12C mutant are appealing for several reasons. First, targeting mutant KRAS could allow for selective cytotoxicity to cancer cells. Second, the affinity enabled by covalent binding would be advantageous as KRAS lacks easily ligandable pockets. Third, 12–14% of KRAS mutations in NSCLC are KRAS, presenting a promising patient group that would directly benefit from KRAS(G12C) inhibition[89]. Finally, position 12 in KRAS sits closely beneath the effector-binding region and the nucleotide-binding pocket, suggesting that covalent KRAS(G12C) ligands might affect KRAS function[87]. In 2013, researchers at the University of California, San Francisco reported the first mutant-selective covalent KRAS(G12C) inhibitor. The inhibitor (compound 12 in their study) was discovered through a disulfide-fragment screening approach known as tethering, whereby a library of 480 disulfide fragments was screened against KRAS(G12C) in the GDP-bound state using intact protein mass spectrometry (MS)[88,90]. Co-crystal structures of KRAS(G12C) showed that hit compounds bound to the switch II region, and subsequent medicinal chemistry efforts to exchange the disulfide moiety for acrylamide and vinyl sulfonamide electrophiles yielded KRAS(G12C) inhibitors that were active in vitro, including compound 12. Binding of compound 12 to the switch II pocket impaired KRAS signalling by shifting nucleotide affinity from favouring GTP to GDP and led to the accumulation of KRAS in its inactive state[91]. This novel mechanism for selective KRAS(G12C) inhibition set the stage for the development of clinical covalent KRAS(G12C) inhibitors. In 2016, Wellspring Biosciences disclosed ARS-853, which is a selective covalent inhibitor of KRAS(G12C) with in cellulo efficacy in the low micromolar range[92]. Structure-guided optimization of compound 12 and use of a cellular liquid chromatography with tandem mass spectrometry (LC–MS/MS)-based assay to determine the degree of KRAS(G12C) engagement in H358 cells, yielded ARS-853 (ref.[92]). ARS-853 treatment in KRAS(G12C)-dependent cell lines decreased the amount of active KRAS(G12C), inhibited downstream RAS signalling and induced apoptosis[92]. Although KRAS(G12C) had been thought to be constitutively active, the selective binding of ARS-853 to GDP-bound, inactive KRAS(G12C) provided evidence that KRAS mutants cycle between GTP-bound and GDP-bound states[92]. The discovery of clinical KRAS(G12C) inhibitors continued with ARS-1620, which was the result of an effort to overcome metabolic stability and bioavailability limitations of ARS-853 to facilitate in vivo studies of KRAS(G12C) inhibition[87]. ARS-1620 is based on a novel quinazoline core scaffold, designed to better occupy the switch II pocket and, thus rigidify a more favourable conformation for covalent reaction between the acrylamide electrophile and cysteine[87]. Ultimately, ARS-1620 was identified as the first KRAS(G12C) inhibitor suitable for in vivo studies and showed efficacy in KRAS(G12C) patient-derived xenograft models treated at 200 mg kg−1 once per day or twice per day[87]. The increased potency of this series of KRAS(G12C) inhibitors and success in in vivo models indicated that it might be possible to design clinically efficacious drugs. Sotorasib (AMG-510) (Table 1) was the first selective KRAS(G12C) inhibitor to enter clinical trials in 2018 and was developed by Amgen, building on discoveries from a partnership with Carmot Therapeutics in which a custom library of small molecules that covalently bind cysteine were screened against KRAS(G12C)[93]. Molecules identified through this collaboration led to the discovery of a previously unknown pocket on KRAS (a cryptic pocket), which Amgen scientists exploited to discover sotorasib through structure-based design[94]. Sotorasib was designed to occupy the cryptic pocket by interacting with His95, Tyr96 and Gln99 (ref.[94]) (Fig. 3). A phase II clinical trial investigating sotorasib was successfully completed in 2020, and was followed by FDA approval for the treatment of adults with KRAS-mutated locally advanced or metastatic NSCLC in May 2021 (ref.[95]). Other covalent KRAS(G12C) inhibitors are entering clinical trials. Adagrasib (MRTX849) emerged from a joint drug discovery collaboration between Mirati Therapeutics and Array BioPharma, in which irreversible covalent inhibitors of KRAS(G12C) were identified; Mirati Therapeutics subsequently used structure-based design approaches to optimize adagrasib, which entered clinical trials in January 2019 (refs.[96,97]) (Fig. 3). JNJ-74699157 (ARS-3248; J&J and Wellspring Biosciences) was being investigated in patients with several types of advanced solid tumour that express KRAS, including NSCLC and colorectal cancer, but its clinical trials have been terminated[98].

Fig. 3

Aligned structures of KRAS(G12C) co-crystallized with adagrasib (MRTX849) and sotorasib (AMG-510).

The covalent inhibitors adagrasib (PDB ID: 6UT0) and sotorasib (PDB ID: 6OIM) are bound to the switch II pocket, which is adjacent to the GDP-binding pocket[93,96].

Aligned structures of KRAS(G12C) co-crystallized with adagrasib (MRTX849) and sotorasib (AMG-510).

The covalent inhibitors adagrasib (PDB ID: 6UT0) and sotorasib (PDB ID: 6OIM) are bound to the switch II pocket, which is adjacent to the GDP-binding pocket[93,96]. Designing small-molecule covalent KRAS(G12C)-selective inhibitors provides an elegant solution to drugging an undruggable cancer target. Before KRAS(G12C) inhibitors, recently discovered targeted covalent inhibitors in oncology were mostly identified using ligand-first approaches. The success of covalent KRAS(G12C) inhibitors validates an electrophile-first approach to covalent drug discovery and affirms the importance of covalent fragment screening techniques (discussed below). Optimization of initial hit compounds that emerge from covalent screening platforms, such as compound 12, can subsequently lead to programmes that produce potent covalent inhibitors such as sotorasib. In addition, the sotorasib story suggests that in other diseases in which a key protein target undergoes substitution of an amino acid to a cysteine residue, covalent inhibitors present an increasingly validated method to potentially provide precision therapy for patients.

SARS-CoV-2 main protease inhibitors

Vaccines against coronavirus disease 2019 (COVID-19) were developed at unprecedented speeds, and similar research momentum has led to the development of therapeutics that will benefit patients with COVID-19. In December 2021, the FDA issued an Emergency Use Authorization for Pfizer’s Paxlovid (a combination of nirmatrelvir and ritonavir) to treat mild-to-moderate COVID-19 (caused by SARS-CoV-2) in adults and some paediatric patients, marking the first approved oral treatment for the disease[99]. Nirmatrelvir (Table 1) covalently inhibits the Mpro of SARS-CoV-2 (ref.[3]). This programme highlights how the adaptation of relatively inactive peptidomimetics into potent and selective cysteine protease inhibitors can be accomplished by the addition of cysteine-reactive covalent functional groups to target the protease active site and by structurally informed medicinal chemistry efforts. SARS-CoV-2 is a virus with a single-stranded RNA genome that encodes two polyproteins (pp1a and pp1ab) as well as structural and accessory proteins[100]. Viral replication depends on successful cleavage of pp1a and pp1ab by the Mpro (also referred to as 3CLpro), which is a cysteine protease, into functional viral proteins[100]. The discovery of covalent inhibitors against SARS-CoV-2 Mpro emerged from extensive work on protease inhibitors for SARS-CoV-1, which is the causative virus for severe acute respiratory syndrome coronavirus 1 (SARS1)[100]. During the 2002–2003 SARS1 outbreak, researchers used a crystal structure of the homologous porcine transmissible gastroenteritis coronavirus (TGEV) Mpro bound to a hexapeptidyl chloromethylketone covalent cysteine-reactive inhibitor to provide a foundation for the design of covalent inhibitors against SARS-CoV-1 Mpro (ref.[101]). Because of the homology across all Mpro, this work enabled the discovery of rupintrivir (AG7088), which is a mechanism-based inhibitor of the human rhinovirus (HRV) Mpro (ref.[101]). Because the SARS-CoV-1 outbreak subsided, work into developing coronavirus Mpro inhibitors slowed until the emergence of SARS-CoV-2 in 2019. SARS-CoV-2 Mpro shares 96% sequence identity with SARS-CoV-1 Mpro, and there is 100% sequence overlap of the catalytic sites[102]. Renewed interest in improving on previous chloromethylketone inhibitors motivated researchers to adapt rupintrivir into a SARS-CoV-2 Mpro inhibitor potent enough to obtain a co-crystal structure. This discovery in turn enabled identification of the α-hydroxymethylketone-containing antiviral PF-00835231, which demonstrated potent SARS-CoV-2 Mpro inhibition in an activity assay based on fluorescence resonance energy transfer (FRET), activity in antiviral cell-based assays, stability in plasma and low clearance in vivo[103]. In subsequent studies, the oral bioavailability of PF-00835231 was improved by replacing the α-hydroxymethylketone moiety with a nitrile group, which can also act as an electrophile[103]. Nitriles can covalently bind particularly reactive nucleophiles; however, the ease of thiol elimination from the thioimidate adduct makes nitriles more reversible than some other electrophiles, such as acrylamides[104]. Optimization from PF-00835231 eventually yielded PF-07321332, named nirmatrelvir (Fig. 4 and Table 1), which is a highly potent SARS-CoV-2 reversible covalent inhibitor that displayed potent inhibition in a FRET-based assay across all human coronaviruses, while no inhibitory effects were seen against human cysteine or serine proteases. The in vivo efficacy of nirmatrelvir was demonstrated in a mouse-adapted SARS-CoV-2 (SARS-CoV-2 MA10) model.

Fig. 4

Nirmatrelvir in complex with SARS-CoV-2 main protease.

Nirmatrelvir in complex with SARS-CoV-2 main protease.

The nitrile group of nirmatrelvir (shown in green) reacts with Cys145 of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease (shown in light blue) to form a covalent thioimidate adduct. Extensive hydrogen-bond interactions (yellow dashed lines) occur throughout the pocket (PDB ID: 7RFW)[103]. In phase II/III clinical trial data released in November 2021, Paxlovid was shown to be highly effective at preventing progression to severe COVID-19 in symptomatic patients[105]. The emergence of this orally bioavailable drug for COVID-19 will help to ameliorate illness for non-hospitalized patients in high-risk groups[106]. Overall, SARS-CoV-2 Mpro covalent inhibitors provide a promising avenue for the treatment of coronavirus infections either as monotherapies or in combination with other antiviral drugs. The quick adaptation of previous protease inhibitors to selectively target SARS-CoV-2 Mpro is an example of using structure-based design while taking into account the valuable properties of covalent drugs. Researchers started with an electrophile-containing peptide and optimized both the peptide and electrophile to obtain a highly potent covalent inhibitor. In some ways, this story mirrors that of the discovery and optimization of JAK3 kinase inhibitors, in which EGFR inhibitors were adapted to target Cys909 of JAK3 (ref.[79]). Both JAK3 and mutant EGFR(T790M) contain a methionine gatekeeper, as well as a homologous cysteine adjacent to the ATP-binding site. These shared features enabled the discovery of potent JAK3-selective covalent inhibitors using an EGFR inhibitor as a starting point, similar to how rupintrivir provided the starting structure for the discovery of nirmatrelvir. Generally, when reactive cysteine residues are shared across a protein family, structurally guided adaptation of previously studied cysteine-reactive covalent inhibitors can lead to selective and potent drugs for other proteins with this feature.

HCV NS3/4a protease inhibitors

α-Ketoamide-based covalent inhibitors have been developed to treat hepatitis C virus (HCV) infection through inhibiting NS3/4a — a serine protease that cleaves the HCV polyprotein into multiple non-structural proteins required for replication[107]. Although HCV had been treated with a combination of PEGylated interferon-α and ribavirin, modest response rates and notable adverse events prompted studies that resulted in the discovery of NS3/4a protease inhibitors to treat HCV[108]. On the basis of initial observations that hexapeptide cleavage products could inhibit NS3/4a, the linear peptidomimetic inhibitors boceprevir (Merck)[109,110] (Fig. 1) and telaprevir (Vertex)[111,112] were designed. These compounds, along with narlaprevir[113], use a ketoamide electrophile to covalently engage the catalytic serine of NS3/4a. This covalent interaction is relatively reversible owing to elimination of the serine alcohol group from the protein–inhibitor adduct[114]. Boceprevir and telaprevir were effective in treating HCV and were approved by the FDA in 2011 after successful trials[115,116]. However, telaprevir was withdrawn in 2014 owing to adverse events, and boceprevir was discontinued by Merck in 2015 owing to the superiority of newer direct-acting antivirals, in particular the ledipasvir–sofosbuvir combination (Gilead), which targets the HCV polymerases NS5a and NS5b[117-119]. Nevertheless, the success of ketoamide-based NS3/4a inhibitors in increasing the efficacy of interferon–ribavirin therapy emphasizes the utility of the ketoamide group as a serine-reactive electrophile in designing covalent antivirals.

Covalent proteasome inhibitors

Bortezomib (Takeda) was the first boron-containing drug to be approved by the FDA, and was approved for treatment of multiple myeloma in 2003 (ref.[120]). Bortezomib was discovered through optimization of an aldehyde-containing proteasome substrate peptide, and set a precedent for the discovery of HCV NS3/4a inhibitors such as telaprevir[121-124]. Exchange of the aldehyde electrophile for a boronic acid substantially increased the potency of proteasome inhibition[121]. The effect of bortezomib comes from the boronic acid covalently binding to the hydroxy group of the β5 subunit N-terminal threonine of the 20S proteasome, leading to inhibition of the proteasome’s chymotrypsin-like activity[125]. Bortezomib takes advantage of the increased sensitivity of haematological cancers such as multiple myeloma and mantle-cell lymphoma to proteasome inhibition[123]. The development of bortezomib validated the proteasome as a cancer target, encouraging the discovery of other proteasome inhibitors. Medicinal chemistry efforts transformed the natural product epoxomicin into carfilzomib (Amgen), which was approved by the FDA in 2012 (ref.[126]) (Fig. 1). The epoxyketone moiety in carfilzomib forms a morpholino ring with the catalytic N-terminal threonine of the 20S proteasome, and this mechanism has been proposed to confer selectivity because most proteases do not have nucleophilic side chains at their N terminus[127]. Although the epoxyketone could have additional off-targets that bortezomib does not, this proposed mechanism of carfilzomib illustrates how covalency can help to drive selectivity through highly specific mechanisms. Further work has been done to identify orally bioavailable proteasome inhibitors to improve upon bortezomib, which is administered intravenously[128]. Ixazomib (Takeda) is a second-generation proteasome inhibitor that also contains a boronic acid group[129]. This drug is administered orally as the prodrug ixazomib citrate, a boronic ester that hydrolyses upon exposure to aqueous media or plasma[130,131]. Oprozomib (Amgen), a second-generation epoxyketone-containing proteasome inhibitor, can also be orally administered[132]. The discovery of these covalent proteasome inhibitors illustrates the utility of boronic acid electrophiles for targeting protease active sites and demonstrates how a covalent drug discovery project can use an unoptimized electrophile as a starting point and subsequently introduce an alternative electrophile. Just as a chloromethylketone group in a SARS-CoV-2 ligand was optimized to the nitrile in nirmatrelvir, bortezomib was discovered through optimization from an aldehyde. Although targeting especially nucleophilic protease active sites might provide greater flexibility in terms of electrophile choice, using highly reactive electrophiles to gain an initial foothold can be beneficial.

Other boron-containing drugs

Several additional boron-containing drugs inhibit serine hydrolases beyond the proteasome through formation of covalent adducts between catalytic serine residues and boron[133,134]. Most of these drugs contain benzoxaborole groups and have been discovered through screening of compound collections that contain boron-based electrophiles[135]. For example, the antifungal tavaborole (Pfizer) was discovered through focused screening of boron-containing compounds previously investigated as antibacterials, and was approved by the FDA to treat onychomycosis in 2014 (refs.[136,137]). Tavaborole covalently binds to the 2′ and 3′ hydroxy groups on the 3′ terminus of leucyl-tRNA, trapping the tRNA–tavaborole adduct in the editing site of leucyl-tRNA synthetase to block protein synthesis[138]. Crisaborole (Pfizer), a phosphodiesterase 4 inhibitor, was approved by the FDA to treat psoriasis in 2016, and the β-lactamase inhibitor vaborbactam (Rempex) was approved to treat various bacterial infections in 2017 (refs.[139,140]). These boron-based drugs target serine hydrolases, and the weak boron–sulfur bond potentially provides selectivity for serine over cysteine hydrolases[121]. The reversibility of the serine–boron bond hinders chemoproteomic profiling experiments commonly used to characterize the selectivity of covalent ligands. But as the serine hydrolase family is rich with potential drug targets, the numerous boron-containing drugs in clinical trials hold promise across a wide variety of disease types[141].

Mutant-haemoglobin modulators

Voxelotor (Global Blood Therapeutics) is a lysine-targeting covalent drug used to treat sickle cell anaemia, and the discovery of voxelotor (Table 1) was dependent on knowledge of heightened lysine side chain reactivity. Sickle cell anaemia is caused by a single mutation in the gene encoding the β-haemoglobin chain that induces polymerization of mutant haemoglobin (HbS) under hypoxic conditions[142]. An aldehyde-containing natural product and several synthetic aldehyde analogues were found to prevent polymerization by increasing the affinity of HbS for oxygen[143]. These aldehydes bind in a reversible covalent manner, forming a Schiff base with the N-terminal valine of the α-haemoglobin chain[143]. Almost 50 years ago, this N-terminal amine was discovered to have a particularly low pKa of 6.9, indicating that it is primarily unprotonated under physiological conditions and, thus, more nucleophilic[144]. Based on earlier aldehydes, voxelotor was discovered through a structure-guided effort to discover compounds that increase the oxygen affinity of HbS, and was designed to bind the HbS tetramer in a 1:1 stoichiometry, unlike the 2:1 ratio of earlier compounds[143,145]. With a remarkable red blood cell to plasma ratio of ~150 that likely reduces off-target effects, voxelotor was approved by the FDA in 2019 with a recommended dose of 1.5 g daily, which is an unusually high dose for a covalent drug[142,146]. The success of voxelotor is similar to that of ibrutinib in demonstrating that covalent drugs can be dosed at high amounts given favourable absorption, distribution, metabolism and excretion (ADME) properties. Furthermore, the discovery of voxelotor shows how the identification of unusually reactive amino acid residues, such as the α-haemoglobin N-terminus, provides opportunities for drug discovery.

The covalent drug discovery toolbox

Many covalent drug discovery programmes, including those for covalent EGFR and BTK inhibitors, have involved the addition of reactive functional groups to previously identified ligands. However, emerging technologies make it possible to approach covalent ligand discovery from an electrophile-first perspective, in which covalent ligands against protein targets of interest are identified before structure-based optimization. For example, this type of approach proved successful in drugging KRAS(G12C) and has facilitated the rapid discovery of E3 ligase ligands for targeted protein degradation applications. Activity-based protein profiling (ABPP) approaches have transformed the characterization of electrophilic compounds, facilitating selectivity profiling and target identification experiments. Special considerations must also be made in evaluating the binding affinity and reactivity of covalent ligands (Box 1). In this section, we discuss the toolbox of emerging techniques that covalent drug discovery relies on. Although half-maximal inhibitory concentration (IC50) values are often used to measure the potency of covalent inhibitors, the kinact/Ki second-order rate constant (where Ki is the inhibition constant and kinact is the rate of enzyme inactivation) is preferred because it describes inhibitor potency in a time-independent manner[212,213]. The ratio kinact/Ki can be determined by measuring total occupancy over time in a binding assay, calculating the first-order rate constant of inhibition, kobs, and in turn kinact/Ki (ref. [212]). One example of the direct measurement of kinact/Ki involved a time-resolved fluorescence energy transfer assay in which competitive fluorescent probes were used to determine kinact/Ki values versus epidermal growth factor receptor (EGFR)[214]. Because this kinetic parameter can be resource-intensive to measure, researchers have formulated models that relate IC50 values to kinact/Ki (refs.[215-217]). For example, a model has been developed to estimate Ki and kinact from time-dependent IC50 data, from work on CYP450 enzyme inhibitors[216]. Moreover, a method has been designed that uses a competitive covalent probe with a known kinact/Ki value to facilitate the rapid calculation of inhibitor kinact/Ki values based on standard end point IC50 data[217]. Scientists working on a covalent JAK3 inhibitor programme at Pfizer, however, suggested that fixed-time-point IC50 values can serve as a valuable surrogate for kinact/Ki (ref. [218]). When using IC50 as a surrogate, extra care should be taken to confirm that target engagement is covalent (such as by measuring off-rates). However, with an appropriate time-point choice, IC50 values can be a useful tool to rapidly assess potency for covalent inhibitors. Regardless, determination of kinact/Ki provides the most complete information to medicinal chemists about the binding affinity and kinetics of covalent target engagement. To ensure that potent covalent inhibitors are not promiscuous ligands that lack selectivity, electrophilic compounds are frequently tested for reactivity, usually with respect to thiols. This measurement is often carried out using a glutathione (GSH) reactivity assay, whereby a compound is incubated with excess GSH and consumption of the compound is observed using liquid chromatography–mass spectrometry to determine a rate constant[69,219,220]. Alternatively, plate-based 5,5-dithio-bis-(2-nitrobenzoic acid) (DTNB; also known as Ellman’s reagent) and NMR-based GSH assays have also been used[156,221]. In a helpful resource for acrylamide-focused projects, two related studies used an NMR-based assay to explore the effects of various substitutions on acrylamide GSH reactivity[222,223]. Assaying for reactivity is a crucial step, particularly for key molecules and for compound series containing novel electrophiles.

Screening platforms

Many emerging covalent ligand screening platforms involve MS-based detection, enabled by covalent bond formation between the compound and protein. Other phenotypic, DNA-encoded or computational approaches have also been used, which are then paired with MS-based validation, ABPP-based experiments to inform selectivity and structural biology to enable medicinal chemistry. The most prominent methods for covalent screening are summarized in Table 2. The growth of commercial libraries of electrophilic fragment-like compounds is a key factor contributing to the rise of these electrophile-first discovery strategies.

Table 2

Comparison of screening methods for covalent drug discovery

Method	Key features	Limitations	Refs.
Intact protein MS	MS detection; compounds can be pooled; provides information on binding stoichiometry; high-throughput	Requires a large amount of purified protein; electrophile must be swapped when using disulfide tethering approaches	[88,156]
MS-based ABPP	MS detection; provides information on the selectivity of the ligand; can be done in complex biological systems (such as lysate or tissue)	Lower throughput than intact protein MS; sensitivity affected by peptide abundance and digestion conditions	[180,183]
Gel- or plate-based ABPP	Easy detection of hits through fluorescence of competitive reactive probe; can be paired with ABPP proteomics to assess selectivity of hit ligands	Lower throughput compared with assays that are not gel based; complicated by the presence of multiple reactive amino acid residues within a protein; requires purified protein	[189]
Phenotypic	ABPP enables rapid identification of covalent hits; potentially high-throughput; can use identical assays for discovering reversible ligands	Does not take advantage of known reactive hotspots identified in chemoproteomics data; requires control counter screens to eliminate untenable (for example, promiscuous or toxic) hit ligands	[186,211]
Covalent docking	Advanced software is widely available (such as DOCKovalent); scoring assesses non-covalent interactions	Difficult to model covalent bond formation (relies on assumptions about ligand–target interactions)	[167,169]
Covalent DNA-encoded libraries	Affords a massive library size	Enrichment complicated by covalent binding; limited library availability	[165]

ABPP, activity-based protein profiling; MS, mass spectrometry.

Comparison of screening methods for covalent drug discovery ABPP, activity-based protein profiling; MS, mass spectrometry.

Intact protein MS

Originally, MS-based compound screening grew out of ‘tethering’, which is a technique employed since 2000 that uses libraries of compounds linked to disulfides to identify fragments that bind cysteine-adjacent pockets[90,147]. Molecules that bind undergo disulfide exchange with the cysteine (which could be endogenous or engineered) to form an adduct with the protein, and pooled screening with MS detection can identify the bound compound. Binding fragments can be combined or grown in a fragment-based approach to identify high-affinity ligands. This strategy was originally designed to identify reversible ligands for challenging targets, but covalent binding can be maintained by replacing the disulfide with an electrophile such as an acrylamide, as in the case of the initial covalent KRAS(G12C) inhibitors[88]. This approach has been used to discover compounds that modulate protein–protein interactions[148,149], and a more recent study employed a similar tethering strategy that uses aldehydes to form imines with lysine residues[150]. Over the past decade, covalent ligand discovery has shifted towards screening more drug-like electrophilic fragments[13]. In 2012, researchers curated 177 electrophilic compounds from the Pfizer compound collection and used an MS-based primary screening strategy to identify covalent inhibitors of the interaction between hypoxia-inducible factor 1α (HIF1α) and aryl hydrocarbon receptor nuclear translocator (ARNT)[151]. This approach was informed by determining an X-ray crystal structure of the HIF1α–ARNT complex[151]. Around the same time, the concept of tethering was expanded by using a small set of acrylamides and an MS-based assay to identify thymidylate synthase inhibitors, an approach termed kinetic template-guided tethering[152]. Building on these studies, an acrylate functionality was appended to 100 fragments to identify non-peptidic inhibitors of the cysteine protease papain[153]. Electrophilic fragments were pooled, and screening through electrospray ionization (ESI) MS led to the identification of hit compounds out of pooled experiments. Use of a similar library of acrylates enabled the discovery of covalent inhibitors for the HECT E3 ligase NEDD4-1 (ref.[154]). As with the HIF1α–ARNT protein–protein interaction inhibitors discovered at Pfizer, co-crystal structures were crucial to understanding the mechanism of these NEDD4-1 inhibitors, which prevent association of ubiquitin with the E3 ligase and thus induce a switch from a processive to a distributive mechanism. In another example, acrylate-based inhibitors of the RBR E3 HOIP were also discovered using MS-based screening, highlighting how covalent fragment screening approaches can be useful for protein classes that have been challenging to discover ligands against, such as E3 ligases[155]. More recently, a commercial library of 993 acrylamides and chloroacetamides was screened using intact MS to identify ligands of the deubiquitinase OTUB2 and the pyrophosphatase NUDT7 (ref.[156]). The authors used co-crystal structures with OTUB2 and NUDT7 in complex with the hit compounds to inform fragment growing to increase potency. Although previous studies had paired MS-based screening with structural information to identify cysteine residues in functional sites or to understand the mechanism of inhibition, in this case, the pairing of MS-based screening and structure-guided fragment-based drug discovery supported optimization of the potency of the hit compounds. The same compound collection was also used to screen against the peptidyl-proline cis–trans isomerase Pin1, which is overexpressed or activated in several tumour types but has been challenging to target selectively[157]. The resulting chloroacetamide sulfopin was shown to be selective for Pin1 in a covalent inhibitor target-site identification (CITe-Id) chemoproteomics experiment and was effective in regressing neuroblastoma growth in mice[157]. This result suggests that although chloroacetamides have disadvantages, including rapid metabolism, they can be valuable tool compounds with which to assess target relevance in various disease models. One particularly powerful example of covalent ligand screening is the discovery of the initial compounds in the series that led to the first approved KRAS(G12C) inhibitor, sotorasib. A library of 3,300 acrylamides was screened in three assays: a thiol reactivity assay, a RAF-coupled nucleotide exchange assay and an intact MS assay[158]. Combined with crystallographic data that showed how ligand binding revealed the presence of previously closed sub-pockets, this effort provided the basis for the rapid discovery of KRAS(G12C) inhibitors discussed above.

Covalent DNA-encoded libraries

DNA-encoded libraries (DELs) present an alternative approach that enables the screening of massive libraries of small molecules. Unlike MS or ABPP approaches, there is no specialized advantage of covalency in enabling DEL screening, but the throughput of DELs allows for the screening of much larger covalent libraries through a workflow of immobilization, enrichment, amplification and sequencing. The first reports of electrophilic protein–nucleic acid-encoded libraries described the targeting of protease active sites[159], but over the past 5 years, cysteine-targeted DNA-encoded or protein–nucleic acid-encoded libraries have been used to identify covalent ligands for bromodomains, including PCAF and BRD4, as well as for JNK1, MEK2 and HER2 (refs.[160-162]). Further work has explored improvements in the enrichment step of the covalent DEL screening workflow, which differs because covalent engagement prevents elution of DNA-tagged molecules[163]. A covalent ligand of mitogen-activated protein kinase kinase 6 (MAP2K6) was also identified serendipitously through screening of a DEL against a DNA-encoded protein library[164]. More recently, even larger covalent DELs (with approximately 100,000,000 members) have been developed and used to identify acrylamide-based and epoxide-based BTK inhibitors with novel scaffolds[165]. In this study, similar screening results were obtained after storing the library at –80 °C for several years, suggesting that the electrophilic compounds are sufficiently stable in this context. Expansion of the covalent DEL library size and the increasing commercial availability of DELs represent an exciting development, and although unique considerations with respect to enrichment workflow and compound stability must be kept in mind, DELs that contain electrophilic molecules may become more widespread in covalent ligand discovery.

Covalent docking

The advantages of various covalent docking methods in different covalent docking scenarios have been described elsewhere[166,167]. Most computational programs for covalent docking rely on directly linking models to model conformations under the constraint of a predefined bond between a ligand and a corresponding amino acid site. The covalent docking platform GOLD relies on this assumption, and an example of its use was in the virtual screening for covalent inhibitors of the NEDD8-activating enzyme (NAE), in which three of the hits were confirmed as novel NAE inhibitors[168]. Development of the DOCKovalent method for virtual screening facilitated the discovery of boronic acid AmpC β-lactamase inhibitors, as well as cyanoacrylamide inhibitors of JAK3 (ref.[169]). DOCKovalent uses non-covalent docking methods to pre-generate conformations and states for an electrophilic virtual library and then samples each state against the target nucleophile. The same method has been used to identify new covalent inhibitors of the kinase MKK7 (ref.[170]), as well as compounds that bind KRAS(G12C) to destabilize the protein and accelerate nucleic acid exchange[171]. Apart from screening, covalent docking is a useful tool for investigating binding modes of known covalent ligands. For example, GOLD has been used to model the binding modes of the aldehyde-containing proteasome inhibitor MG132 (ref.[172]). AutoDock uses a flexible side chain approach, whereby the covalent ligand is treated as an amino acid side chain and poses of this flexible ‘side chain’ are scored, using a physics-based scoring function that evaluates the energetics of ligand–protein interactions, as the remainder of the protein is held rigid[166]. Another method, called CovDock, is based on the Schrӧdinger Glide docking algorithm and Prime structure refinement methodology. CovDock uses traditional non-covalent docking approaches to dock a ligand to a protein target, and then models the covalent bond attachment and refines the complex[166]. This approach does not consider the reactivity of the electrophile, which can limit the ability to virtually study the differences in docking ligands with different electrophilic functional groups[167]. Overall, covalent docking software provides useful information on ligand–protein interactions when key assumptions can be made — namely, when the reactivity of the electrophile and the site of modification is known.

Chemoproteomics-enabled discovery

Chemoproteomic platforms enable the identification of covalent compounds and their corresponding ligandable sites on target proteins directly in complex biological systems. Advances in chemoproteomics have facilitated the discovery of covalent ligands against undruggable disease targets and enabled the selectivity profiling of covalent ligands across the proteome to identify targets and off-targets of these ligands. Here, we discuss the chemoproteomics profiling of reactive ligandable hotspots, ABPP screening platforms and target identification within the context of recent work relevant to drug discovery. Chemoproteomics experiments can provide key information on ligand selectivity and give early guidance for selecting targets for covalent drug discovery programmes.

Chemoproteomics profiling of reactive ligandable hotspots

ABPP facilitates the discovery of covalently ligandable sites and the corresponding ligands in complex biological samples. ABPP was pioneered by Cravatt and Bogyo using active-site-directed chemical probes that covalently target catalytic residues of various enzyme classes, including hydrolases, proteases and kinases[173,174]. This technique, often using gel-based assays, was employed to gain functional readouts of active enzymes in biological contexts[14,173,175]. ABPP probes contain a warhead that covalently reacts with nucleophilic amino acids (such as cysteine) and a reporter handle to monitor probe binding, such as a fluorophore, biotin or alkyne moiety for subsequent click chemistry-enabled applications[173] (Fig. 5a).

Fig. 5

Isotopic tandem orthogonal proteolysis–activity-based protein profiling.

Isotopic tandem orthogonal proteolysis–activity-based protein profiling.

a | Example of a reactive probe for activity-based protein profiling (ABPP) designed with a broadly reactive electrophilic warhead linked to an analytical handle. b | Schematic of the competitive isoTOP-ABPP (isotopic tandem orthogonal proteolysis–activity-based protein profiling) methodology. Treatment of cells or lysate with a protein-reactive compound prevents subsequent binding of the pan-reactive probe, and this competitive ligand binding can be detected by tandem mass spectrometry (MS/MS) after an enrichment step, to indirectly identify covalent protein targets for the compound of interest. TEV, tobacco etch virus. Instead of focusing on active sites, more recent ABPP approaches use MS and broadly reactive chemical probes to also map allosteric sites[176]. In the first reports of the isoTOP-ABPP (isotopic tandem orthogonal proteolysis–activity-based protein profiling) approach, an iodoacetamide probe functionalized with an alkyne handle was used to identify hyper-reactive cysteines across the proteome[176,177] (Fig. 5b). The alkyne handle can be used to link the probe-modified protein to a tobacco etch virus (TEV) protease-cleavable tag that contains an azide group and biotin moiety separated by either an isotopically light or heavy valine (Fig. 5b). These functionalities enable enrichment of probe-modified peptides and tandem analysis of the light and heavy samples with MS, which, after controlling for run-to-run variability, allows for quantitative comparisons between samples, including competitive analysis of covalent compounds (Fig. 5b). Using this approach, it was discovered that the hyper-reactivity of cysteines predicts their functionality in catalysis and at sites of post-translational modifications[177]. Recent adaptations of isoTOP-ABPP have been developed to increase the coverage of cysteines across the proteome and to increase the throughput. For example, optimization of the sample preparation steps (namely single-pot, solid-phase-enhanced sample preparation (SP3)) and combination of this workflow with off-line fractionation and field asymmetrical waveform ion mobility spectrometry (FAIMS) allow for additional separation before MS detection, and enabled the identification of more than 30,000 reactive cysteines across a panel of tumour cell lines[178]. Additionally, as new probes are developed for other nucleophilic amino acids, reactivity profiling of other amino acid hotspots (such as lysines) across the proteome will allow for the expansion of this technology beyond cysteine. Overall, reactivity profiling generates large quantities of information across thousands of proteins, ultimately providing a relatively unbiased picture of nucleophilic (usually cysteine) amino acid reactivity. This information can be used to either select for appropriate protein targets in drug discovery programmes or to identify allosteric sites on proteins of interest that may have previously been considered un-ligandable or undruggable. As an example of targeting a traditionally undruggable protein with a covalent molecule, ABPP was paired with a MYC transcription factor activity assay to identify a covalent MYC ligand, EN4 (ref.[179]). EN4 targets Cys171, which is located within a predicted intrinsically disordered region of MYC, and showed selectivity on a proteome-wide scale in profiling of more than 1,500 cysteines using competitive isoTOP-ABPP. Cys171 was initially identified as a ligandable hotspot on MYC through analysis of compiled cysteine-reactive chemoproteomics data, and this information spurred the subsequent search for a selective covalent ligand against that cysteine.

Activity-based protein profiling screening platforms

The use of competitive isoTOP-ABPP was expanded to identify proteome-wide targets of a small covalent fragment library by competing individual acrylamide and chloroacetamide fragments against an iodoacetamide–alkyne probe[180]. In this study, more than 700 ligandable cysteines were identified, and information was provided about the proteome-wide selectivity of each covalent fragment in the library. The covalent ligands discovered with this approach and their corresponding ligandable sites were used to help elucidate the role of caspase 8 and caspase 10 in extrinsic apoptosis in T cells, showing that this approach can rapidly identify compounds that target proteins of biological interest. Several studies have built on this work, using isoTOP-ABPP to find new covalent ligands. For example, cysteine reactivity and ligandable sites were mapped in mutant Kelch-like ECH-associated protein 1 (KEAP1) and compared with those in wild-type KEAP1 NSCLC lines. The authors discovered compounds that bind to a ligandable cysteine on the nuclear receptor NR0B1, which is regulated by NRF2, a substrate of KEAP1 (ref.[181]). Cysteine ligandability has also been explored in activated T cells through the use of promiscuous fragment-like compounds, termed ‘scout fragments’, to map ligandability and functional assays to identify more structurally complex electrophilic compounds that suppress T cell activity[182]. With exciting implications for drug discovery, this approach was also used to identify several proteins that could be targeted covalently to impair T cell activity, including BIRC2 and BIRC3, the nucleosome remodelling deacetylase (NuRD) complex, and the kinases ITK and CYTIP. To dramatically increase sample throughput, a tandem mass tag (TMT)-based streamlined cysteine activity-based protein profiling (SLC-ABPP) methodology was designed and used to profile an electrophilic fragment library at an impressive depth of more than 8,000 reactive cysteine sites with a total instrument time of 18 min per compound[183]. As competitive isoTOP-ABPP becomes more high throughput, comprehensive selectivity and reactivity information will rapidly become available for diverse covalent reactive libraries within a wide context of biological disease states. This information will enable rapid identification of covalent ligands against reactive hotspots, along with providing selectivity information on each ligand. When paired with a parallel phenotypic screen against a desired outcome (such as cancer cell death), this methodology facilitates the identification of functional covalent ligands and their corresponding protein targets in a high-throughput manner.

Chemoproteomics platforms for target identification

IsoTOP-ABPP can also be used to identify the protein targets of known electrophilic drugs. As an example, this approach was applied to dimethyl fumarate, which is used to treat autoimmune disease. Although dimethyl fumarate has been used for three decades to treat psoriasis and was approved by the FDA in 2013 for the treatment of multiple sclerosis, the direct covalent targets of dimethyl fumarate remained unclear until more recently. In separate studies, chemoproteomic approaches were used to identify protein kinase Cθ (PKCθ) and IRAK4 as targets of dimethyl fumarate[184,185]. In both cases, covalent engagement of a cysteine residue disrupted a protein–protein interaction to modulate immune cell function. Disrupting the interaction of PKCθ with the costimulatory receptor CD28 reduced T cell activation, and disrupting the IRAK4–MYD88 interaction suppressed the production of interferon-α in plasmacytoid dendritic cells[184,185]. ABPP can also be used to identify off-targets and generally assess the selectivity of covalent molecules. For example, SLC-ABPP was used to analyse spleen tissue extracted from C57BL/6 mice treated with ibrutinib[183]. Of ~9,200 cysteine sites identified, BTK Cys481 was one of the cysteines most liganded by ibrutinib. Cys313 on B lymphocyte kinase (BLK), which, analogous to BTK, contains a cysteine within the ATP-binding pocket, was also identified as an off-target of ibrutinib[183]. Novel screening platforms are also often easily paired with isoTOP-ABPP target identification experiments. For example, a multiplexed in vivo screening platform was developed in which barcoded pancreatic ductal adenocarcinoma lines were pretreated with electrophilic compounds and injected into mice to observe the compound-dependent decrease in metastatic potential[186]. IsoTOP-ABPP experiments also enabled the identification of the lipase ABHD6 as the target of hit compounds from this screen, even though ABHD6 was not previously known to have a role in metastasis or cancer progression. Beyond identifying the lipase ABHD6 as crucial for metastatic fitness, this approach enabled screening in a biological context more relevant to the disease state through adaptation of covalent ligand screening to a multiplexed in vivo phenotypic assay. In general, target identification experiments using chemoproteomic platforms are crucial in investigating the mechanism of action of electrophilic compounds discovered through phenotypic assays.

Covalent ligand discovery for induced proximity modalities

Covalent ligands are not only useful as functional inhibitors, but also have important roles in emerging induced proximity modalities. Covalent drug discovery platforms have facilitated the expansion of targeted protein degradation approaches by enabling the discovery of covalent recruiters against E3 ubiquitin ligases[14,187-193]. Although most bifunctional degrader molecules (proteolysis-targeting chimeras (PROTACs)) recruit the E3 ligases cereblon (CRBN) or von Hippel–Lindau (VHL) protein to degrade target proteins, there are more than 600 E3 ligases with varying substrate scopes. Since 2019, covalent recruiters have been used to validate a large proportion of the E3 ligases that have been harnessed for targeted protein degradation, including RING finger protein 114 (RNF114), RNF4, DDB1 and CUL4-associated factor 16 (DCAF16), DCAF11, KEAP1 and, most recently, fem-1 homologue B (FEM1B)[187-193]. In 2019, isoTOP-ABPP was used to identify RNF114 as the target of the enone-containing natural product nimbolide, which was used to make bifunctional degraders of bromodomain-containing protein 4 (BRD4) and BCR–ABL[187]. In a separate study, scout fragments were used to construct bifunctional FKBP12 and BRD4 degraders, and the authors identified DCAF16 as the target E3 ligase responsible for degradation[190]. These discoveries led to the variety of covalent E3 recruiters now available, which have been reviewed elsewhere[194]. On the basis of analyses of chemoproteomic data sets assessing cysteine reactivity, 97% of E3 ligases possess reactive cysteines, suggesting that covalent approaches to harness more E3 ligases could continue to be successful[194]. Beyond degradation, the identification of non-inhibitory covalent ligands has the potential to contribute to the discovery of novel induced proximity modalities. For example, a targeted protein stabilization platform, termed deubiquitinase-targeting chimeras (DUBTACs) has been developed using a covalent deubiquitinase recruiter[195]. Through analysis of chemoproteomic data and an ABPP-based screen, a covalent OTUB1 recruiter was discovered that could be incorporated into bifunctional compounds that stabilize mutant cystic fibrosis transmembrane conductance regulator (CFTR), the degradation of which drives cystic fibrosis. In general, the identification of non-inhibitory, allosteric ligands through covalent ligand screening has the potential to facilitate recruitment of other enzymes (such as kinases and deacetylases) to target proteins and direct protein function to neosubstrates for therapeutic benefit.

Nucleophilic covalent ligands

Most protein-reactive covalent drugs tend to be electrophilic, to enable reactions with nucleophilic amino acids. By contrast, nucleophilic drugs can react with electrophilic cofactors and post-translational modifications. Several hydrazine-containing compounds act as mechanism-based inhibitors of monoamine oxidase (MAO) A and B, whereby activation by MAO enables alkylation of a flavin cofactor, inhibiting the enzyme[196]. ABPP principles and ‘reverse-polarity’ probes have been employed to study electrophilic post-translational modifications, such as N-terminal pyruvoyl and glyoxylyl modifications, that can react with hydrazine-containing compounds[197]. Hydrazine-based probes have also been used to help identify and characterize ligands for proteins with electrophilic cofactors or post-translational modifications[198,199].

Lysine-directed covalent ligands

The low abundance of cysteine enables selectivity but limits opportunities for covalently targeting specific proteins of interest. This problem has driven scientists to investigate the targeting of other nucleophilic amino acids, particularly lysine. Owing to the low nucleophilicity of the ε-amino group of lysine under physiological conditions, the discovery of efficient lysine-targeting covalent ligands requires identification of unusually reactive lysines. IsoTOP-ABPP experiments using several lysine-directed probes have proved powerful in profiling lysine reactivity across the proteome[200,201]. Through isoTOP-ABPP experiments, more than 9,000 ligandable lysines were identified and more elaborated pentafluorophenol-containing or N-hydroxysuccinimide-ester-containing compounds could selectively label specific proteins of interest[200,201]. Building on these experiments, a library of approximately 180 electrophiles was assembled and isoTOP-ABPP experiments then performed to assess the selectivity of different chemotypes and identify lysines ligandable with small molecules[202]. This study yielded more broadly reactive electrophiles, such as dicarboxaldehydes, that could be used for further lysine profiling experiments, but also identified less-reactive electrophiles, including N-acyl-N-alkyl sulfonamides, which had been previously used as tools for bioconjugation in cells[202,203]. Aside from voxelotor, which was discovered through optimization from fragment-like aldehydes, most lysine-targeting ligands have been designed using structure-based methods from an existing ligand, often through rationally placing an electrophilic sulfonyl fluoride, fluorosulfate or vinyl sulfone in an appropriate orientation to react with an ε-amino group of a lysine adjacent to an established binding site. Such ligands include a kinetic transthyretin stabilizer[204], an isoform-selective PI3Kδ inhibitor[205] and inhibitors of cyclin-dependent kinase 2 (CDK2)[206] and Hsp90 (ref.[207]). A sulfonyl fluoride-bearing promiscuous kinase inhibitor that targets a conserved lysine in the ATP-binding site was also used as a probe to profile kinase inhibitor selectivity in live cells[208]. However, sulfonyl fluoride-based probes are not completely selective for lysine and have also been used to target tyrosine residues[209]. A recent preprint reported the use of a chemoproteomic approach to profile the amino acid reactivity preference of 54 different electrophiles, which will prove to be a great resource for covalent ligand discovery[210]. Combining comprehensive electrophile profiling, lysine-directed chemoproteomics and structure-guided approaches will enable scientists to leverage the abundance of lysine residues adjacent to ligand binding sites to enhance covalent drug discovery.

Outlook

Over the past decade, advances in covalent drug discovery have led to successful drugs, including inhibitors of EGFR, BTK, KRAS(G12C) and SARS-CoV-2 Mpro. The approvals of these drugs represent milestones that showcase the evolution of covalent drug discovery from a serendipitous effort to a field with established roadmaps for success. Adoption of electrophile-first discovery strategies represents a notable shift in the field. Ligand-first strategies will continue to be highly applicable for designing covalent drugs against proteins when existing reversible ligands are already known to bind near a nucleophilic amino acid such as cysteine. However, we anticipate that electrophile-first approaches will be increasingly employed, especially when the discovery of reversible ligands proves challenging. Electrophile-first approaches will be facilitated, in part, by chemoproteomics experiments that profile amino acid reactivity across the proteome, leading to the identification of novel ligandable cysteines, for example, that could be targeted with electrophilic compounds. Additionally, we expect to see an increase in research that explores the use of reversible covalent mechanisms that strike a balance between potency and selectivity in various contexts. Reversible covalent compounds with long off-rates might achieve a desired therapeutic effect while minimizing covalent off-target effects, and the use of more varied electrophiles will allow for increasingly tailored reactivity. We also look forward to further improvements in chemoproteomic workflows that enable additional multiplexing in MS experiments, which will be valuable for assessing compound selectivity and target engagement. Covalent drug discovery overcomes obstacles in designing ligands against otherwise undruggable protein targets. We expect that the unique features of covalent ligands will continue to spur the discovery of covalent drugs.

214 in total

Review 1. A proton-pump inhibitor expedition: the case histories of omeprazole and esomeprazole.

Authors: Lars Olbe; Enar Carlsson; Per Lindberg
Journal: Nat Rev Drug Discov Date: 2003-02 Impact factor: 84.694

2. The T790M mutation in EGFR kinase causes drug resistance by increasing the affinity for ATP.

Authors: Cai-Hong Yun; Kristen E Mengwasser; Angela V Toms; Michele S Woo; Heidi Greulich; Kwok-Kin Wong; Matthew Meyerson; Michael J Eck
Journal: Proc Natl Acad Sci U S A Date: 2008-01-28 Impact factor: 11.205

Review 3. Activity-based protein profiling: from enzyme chemistry to proteomic chemistry.

Authors: Benjamin F Cravatt; Aaron T Wright; John W Kozarich
Journal: Annu Rev Biochem Date: 2008 Impact factor: 23.643

Review 4. Reactive-cysteine profiling for drug discovery.

Authors: Aaron J Maurais; Eranthie Weerapana
Journal: Curr Opin Chem Biol Date: 2019-03-18 Impact factor: 8.822

5. Identification of the Clinical Development Candidate MRTX849, a Covalent KRAS^G12C Inhibitor for the Treatment of Cancer.

Authors: Jay B Fell; John P Fischer; Brian R Baer; James F Blake; Karyn Bouhana; David M Briere; Karin D Brown; Laurence E Burgess; Aaron C Burns; Michael R Burkard; Harrah Chiang; Mark J Chicarelli; Adam W Cook; John J Gaudino; Jill Hallin; Lauren Hanson; Dylan P Hartley; Erik J Hicken; Gary P Hingorani; Ronald J Hinklin; Macedonio J Mejia; Peter Olson; Jennifer N Otten; Susan P Rhodes; Martha E Rodriguez; Pavel Savechenkov; Darin J Smith; Niranjan Sudhakar; Francis X Sullivan; Tony P Tang; Guy P Vigers; Lance Wollenberg; James G Christensen; Matthew A Marx
Journal: J Med Chem Date: 2020-04-06 Impact factor: 7.446

6. An Activity-Guided Map of Electrophile-Cysteine Interactions in Primary Human T Cells.

Authors: Ekaterina V Vinogradova; Xiaoyu Zhang; David Remillard; Daniel C Lazar; Radu M Suciu; Yujia Wang; Giulia Bianco; Yu Yamashita; Vincent M Crowley; Michael A Schafroth; Minoru Yokoyama; David B Konrad; Kenneth M Lum; Gabriel M Simon; Esther K Kemper; Michael R Lazear; Sifei Yin; Megan M Blewett; Melissa M Dix; Nhan Nguyen; Maxim N Shokhirev; Emily N Chin; Luke L Lairson; Bruno Melillo; Stuart L Schreiber; Stefano Forli; John R Teijaro; Benjamin F Cravatt
Journal: Cell Date: 2020-07-29 Impact factor: 41.582

7. Systematic Study of the Glutathione Reactivity of N-Phenylacrylamides: 2. Effects of Acrylamide Substitution.

Authors: Adam Birkholz; David J Kopecky; Laurie P Volak; Michael D Bartberger; Yuping Chen; Christopher M Tegley; Tara Arvedson; John D McCarter; Christopher Fotsch; Victor J Cee
Journal: J Med Chem Date: 2020-10-09 Impact factor: 7.446

Review 8. Design and discovery of boronic acid drugs.

Authors: Jessica Plescia; Nicolas Moitessier
Journal: Eur J Med Chem Date: 2020-03-30 Impact factor: 6.514

9. The mechanism of action of fosfomycin (phosphonomycin).

Authors: F M Kahan; J S Kahan; P J Cassidy; H Kropp
Journal: Ann N Y Acad Sci Date: 1974-05-10 Impact factor: 5.691

10. Discovery of a Functional Covalent Ligand Targeting an Intrinsically Disordered Cysteine within MYC.

Authors: Lydia Boike; Alexander G Cioffi; Felix C Majewski; Jennifer Co; Nathaniel J Henning; Michael D Jones; Gang Liu; Jeffrey M McKenna; John A Tallarico; Markus Schirle; Daniel K Nomura
Journal: Cell Chem Biol Date: 2020-09-22 Impact factor: 8.116