Literature DB >> 34697818

Haste makes waste: A critical review of docking-based virtual screening in drug repurposing for SARS-CoV-2 main protease (M-pro) inhibition.

Guillem Macip¹, Pol Garcia-Segura¹, Júlia Mestres-Truyol¹, Bryan Saldivar-Espinoza¹, María José Ojeda-Montes², Aleix Gimeno³, Adrià Cereto-Massagué⁴, Santiago Garcia-Vallvé^1,5, Gerard Pujadas^1,5.

Abstract

This review makes a critical evaluation of 61 peer-reviewed manuscripts that use a docking step in a virtual screening (VS) protocol to predict SARS-CoV-2 M-pro (M-pro) inhibitors in approved or investigational drugs. Various manuscripts predict different compounds, even when they use a similar initial dataset and methodology, and most of them do not validate their methodology or results. In addition, a set of known 150 SARS-CoV-2 M-pro inhibitors extracted from the literature and a second set of 81 M-pro inhibitors and 113 inactive compounds obtained from the COVID Moonshot project were used to evaluate the reliability of using docking scores as feasible predictors of the potency of a SARS-CoV-2 M-pro inhibitor. Using two SARS-CoV-2 M-pro structures and five protein-ligand docking programs, we proved that the correlation between the pIC50 and docking scores is not good. Neither was any correlation found between the pIC50 and the ∆G calculated with an MM-GBSA method. When a group of experimentally known inactive compounds was added, neither the docking scores or the ∆G were able to distinguish between compounds with or without M-pro experimental inhibitory activity. Performances improved when covalent and noncovalent inhibitors were treated separately, but were not good enough to fully support using a docking score as a cutoff value for selecting new putative M-pro inhibitors or predicting the relative bioactivity of a compound by comparison with a reference compound. The two sets of known SARS-CoV-2 M-pro inhibitors presented here could be used for validating future VS protocols which aim to predict M-pro inhibitors.

Entities: Chemical

Keywords: M-pro inhibitors; SARS-CoV-2 3C protease; drug repurposing; molecular docking; virtual screening

Mesh：

Substances：

Year: 2021 PMID： 34697818 PMCID： PMC8662214 DOI： 10.1002/med.21862

Source DB: PubMed Journal: Med Res Rev ISSN： 0198-6325 Impact factor: 12.388

INTRODUCTION

It has been more than a year since the onset of the coronavirus disease 2019 (COVID‐19) global pandemic, which has affected 219 countries and territories around the world. According to the Johns Hopkins Coronavirus map tracker, there have been more than 123 million cases worldwide. During the last year, more than 100 million people have recovered from the disease while over 2.7 million have died. The pandemic has had a detrimental effect on healthcare systems worldwide and has affected almost all aspects of life. When the World Health Organization declared the COVID‐19 outbreak a global emergency, governments started to enforce border shutdowns, travel restrictions, quarantines, curfews and other restrictions, which in turn fuelled an impending economic crisis and recession. The scientific community reacted to the COVID‐19 global pandemic in an outstandingly rapid and effective way, and thousands of articles were published in the first few months. Between January and May 2020, roughly 100 new articles on COVID‐19 were added every day to PubMed, with a median time from submission to acceptance of only 6 days. Although the nature of the COVID‐19 emergency warranted accelerated publishing, measures need to be taken to safeguard the integrity of the scientific evidence. There is concern among some scientists that poor quality research prevents an effective evidence‐based response. , Coronaviruses (CoVs) are a group of viruses containing a linear, positive sense single‐stranded RNA genome. Their genome size varies, but is larger than that of other types of virus. The genome is housed within an enveloping lipid bilayer in which three structural proteins—membrane (M), envelope (E) and spike (S)—are anchored. The E and M proteins are responsible for the shape of the virus, while the S protein mediates receptor attachment and viral and host cell membrane fusion. The nucleocapsid (N) protein binds to the viral RNA and forms a ribonucleoprotein that is packaged in the virus envelope. In addition to these four structural proteins, the CoV genome also codes for several nonstructural proteins, synthesized as two very large polyproteins that are cleaved by two or three virus proteases, and several auxiliary proteins. CoVs are classified within four genera: Alphacoronavirus, Betacoronavirus, Gammacoronavirus and Deltacoronavirus. To date, seven strains of Alphacoronavirus and Betacoronavirus that infect humans have been identified. But our main concern here lies within the Betacoronavirus genus which contains the severe acute respiratory syndrome (SARS) viruses. SARS‐like viruses emerged as human infectious diseases in 2002, and since then have caused several epidemics and outbreaks. Before COVID‐19, outbreaks of SARS in 2003 and Middle East Respiratory Syndrome (MERS) in 2012 had already been reported as serious diseases caused by Betacoronaviruses. The causative agent of COVID‐19 is SARS‐CoV‐2. This virus has the capacity to affect multiple organs as well as the central nervous system and sometimes causes respiratory problems which might end up having fatal consequences. The SARS‐CoV‐2 genome consists of around 30,000 nucleotides and contains 12 functional open reading frames (ORFs): from 5′ to 3′, ORF1a, ORF1b, S, ORF3a, E, M, ORF6, ORF7a, ORF7b, ORF8, N, and ORF10. , Two‐thirds of the SARS‐CoV‐2 genome, located at the 5′ genomic region, codify the ORF1a and ORF1b genes, which encode the polyproteins pp1a and pp1ab. Polyprotein pp1ab contains the pp1a protein and its synthesis requires a −1 ribosomal frameshift. Both polyproteins are further cleaved by the main protease (M‐pro, also known as 3CLpro) and papain‐like (PLpro) protease to produce 16 proteins that include both proteases, an RNA‐dependent RNA polymerase, a helicase, two nucleases and a methyltransferase, among other proteins. Due to the lack of target‐specific drugs for treating COVID‐19, several strategies are being investigated. Among these strategies, several clinical trials are being developed to: (i) inhibit the RNA‐dependent RNA polymerase, , (ii) inhibit the viral M‐pro or PLpro proteases, , , , (iii) block the fusion with the host cells, , , (iv) enhance the innate immune system, (v) attenuate the inflammatory response, (vi) control the symptoms, and (vii) create pathogen‐specific artificial antigen‐presenting cells. Given its importance in the replication of SARS‐CoV‐2, M‐pro has become a key target for the development of antiviral drugs. M‐pro proteins from SARS‐CoV‐2 and SARS‐CoV show a high sequence (i.e., >96%) and structural similarity (i.e., an root mean‐squared deviation [RMSD] of 0.44 Å between the Cα of both structures). SARS‐CoV‐2 M‐pro has a length of 306 amino acids that form three different domains (domains I, II, and III). The substrate‐binding site is located at a cleft between domain I (residues 8–101) and domain II (residues 102–184), which share a common fold consisting of antiparallel ß‐barrel structures (Figure 1). Domain III (residues 201–303) consists of an antiparallel globular cluster of five alpha‐helices and is involved in M‐pro dimerization, which is essential for M‐pro activity. Unlike the conventional Ser(Cys)‐His‐Asp(Glu) triad found in other chymotrypsin‐like enzymes, SARS‐CoV‐2 M‐pro has a catalytic dyad formed by His41 and Cys145 (Figure 1) that catalyzes the hydrolysis of the peptide bond at highly specific sites of a polypeptide chain through a common nucleophilic‐type reaction. It was suggested that a water molecule might complete the catalytic triad by mediating crucial interactions between His41 and other important conserved residues, such as His164 and Asp187. While noncovalent small inhibitors have the ability to interact with different residues that lie within the binding pocket of SARS‐CoV‐2 M‐pro, covalent inhibitors are found to mainly interact with the catalytic cysteine residue after an initial noncovalent mechanistic step that ensures the warhead is correctly oriented. Thus, it is of great importance to consider covalent and noncovalent inhibitors as different, because their binding mode and mode of action are quite distinct. Most of the characterized SARS‐CoV‐2 M‐pro inhibitors were originally designed for other pathogens, such as SARS‐CoV, HIV and Hepatitis C viruses, which shows the importance of drug repurposing strategies.

Figure 1

SARS‐CoV‐2 M‐Pro homodimer (PDBid 6LU7). Panel (B) shows an overview of the main structural features of M‐Pro and panel (A) displays in detail one of its binding pockets where the co‐crystallized ligand N3, in grey, fits. Each of the two protomers is a different tone of blue. In both panels, the residues that form the binding pocket (according to Gimeno et al. ) are also highlighted in ball and stick format and the residues of the catalytic dyad (His41 and Cys145) are colored orange and labeled. This figure has been generated using the Flare software. , , Drug repurposing, also known as drug repositioning, is a well‐known strategy in drug development, that is used to identify new medical uses for already approved or investigational drugs. This has enormous advantages compared to traditional drug development from scratch. Briefly, drug repurposing shortens time‐consuming steps in drug development, as safety assessment and testing are usually completed, while it is also known to substantially diminish the investment needed. This, together with the decreased risk of failure, makes drug repurposing an interesting approach when time presses, such as in a global pandemic like COVID‐19. Given the high throughput of computational approaches, they are one of the mainstays of drug repurposing. Specifically, the most popular structure‐based in silico approach, protein‐ligand molecular docking, has been extensively used not only in drug discovery but also in drug repurposing. , Molecular docking is a receptor‐based VS method, that can be used to assess the possible binding poses of the ligands (e.g., a library of approved drugs) to a target (e.g., SARS‐CoV‐2 M‐pro). Thus, molecular docking makes it possible to interrogate thousands of drugs against the protein target at a relatively low computational cost, so it is a popular tool in computational and structural biology. Several articles have reviewed the SARS‐CoV‐2 M‐pro inhibitors discovered to date. , , , , , , , In this review, we focus on the computational methods, especially protein‐ligand docking, which have been used to discover new SARS‐CoV‐2 M‐pro inhibitors. We check 61 peer‐reviewed manuscripts that use at least one docking step in a virtual screening (VS) protocol to predict SARS‐CoV‐2 M‐pro inhibitors in approved or investigational drugs. We put special emphasis on the validation of the docking process and whether the use of docking scores or other estimates of binding affinity are a reliable way to select new putative inhibitors of SARS‐CoV‐2 M‐pro. For this purpose, we have taken a set of 150 M‐pro inhibitors with known experimental bioactivity from the literature and a second set of 81 M‐pro known inhibitors and 113 inactive compounds from the COVID Moonshot project to evaluate the reliability of using docking scores as feasible predictors of the potency of a SARS‐CoV‐2 M‐pro inhibitor.

RECENT VIRTUAL SCREENINGS EFFORTS THAT USE PROTEIN‐LIGAND DOCKING TO PREDICT NEW SARS‐COV‐2 M‐PRO INHIBITORS

Computational approaches can contribute to drug discovery by reducing the cost and time of drug development and speeding up the analyses of the interactions between a target and candidate drugs. Of all the VS methods, and specifically in drug repurposing, molecular docking is an attractive choice because it can predict the preferred orientation of a drug at the binding site of a target. Molecular docking methods use two independent steps: pose generation and scoring. The first step refers to the methods used to create different ligand poses within the binding site of the protein. The second, scoring, is required to rank the different poses and quantitatively estimate the pose quality. Scoring functions attempt to estimate the binding affinity between the ligand and the protein. We collected 61 articles that use a VS protocol based on protein‐ligand docking to predict drugs as SARS‐CoV‐2 M‐pro inhibitors. These articles were published in peer‐reviewed journals between the beginning of the COVID‐19 pandemic in early February 2020 and November 2020. It should be pointed out that preprints were not included. Table 1 outlines the most relevant information of these 61 articles. Further details and the drugs predicted in each can be found in Supporting Information File 1. As shown in Table 1, most of these articles used an initial database of approved drugs (e.g., the Drugbank database). A quarter of the articles also included a group of antiviral molecules, such as HIV or Hepatitis C protease inhibitors. The first crystallized structure of the SARS‐CoV‐2 M‐pro protein, the 6LU7 structure released on February 5, 2020 and crystallized with the covalent inhibitor N3 was largely used. Before this structure was available, eight articles used a structural model of the M‐pro constructed using the homologous protein from SARS‐CoV as a template. Subsequently, many other SARS‐CoV‐2 M‐pro structures were released and several articles used the M‐pro structure, such as 6W63, complexed with a noncovalent inhibitor. Most of the VS protocols reviewed here used AutoDock Vina or other docking methodologies available at the Schrödinger Drug Discovery suite (i.e., Glide HTVS, Glide SP and Glide XP; see Table 1). The majority of these VS protocols used only one docking program, although some of them used two or three. Generally, protocols using multiple docking programs are performed in a stepwise manner; i.e., the programs are used in a funnel approach to reduce the number of hits by using the different prediction algorithms and/or punctuation functions, which is indeed what happens when the increasingly exhaustive methodologies of the Schrödinger package Glide (Glide HTVS, SP, and XP) are used. Nonetheless, there is also room to use different programs in a parallel manner and then evaluate the equivalent poses predicted by each of them, as in our last contribution. Usually, the docking score of the best pose of a reference compound (normally, a known or expected inhibitor) is generally used as a cutoff value for selecting new putative M‐pro inhibitors. If the best docking score of a compound is more negative than the docking score of the reference compound, it is usually assumed that the compound will be more active than the reference one. When a reference compound is not used, the most negative docking score values tend to be used as a cutoff to select a group of hit compounds and analyze them in further detail. The predicted interactions between the substrate‐binding site of the M‐pro protein and each selected compound are usually analyzed. In addition, some reviewed articles use molecular dynamics (MD) to evaluate the binding dynamics of the group of selected hits to the M‐pro structure. The molecular mechanics generalized Born surface area (MM‐GBSA) or the molecular mechanics Poisson–Boltzmann surface area (MM‐PBSA) algorithms were generally used to estimate the binding free energies of the putative inhibitors.

Table 1

Outlines of the 61 articles that predict drugs to be SARS‐CoV‐2 M‐pro inhibitors

Compounds database	Counts	% articles
FDA‐approved	27	44.26
Approved	36	59.02
Drugbank	21	34.43
Selleckchem	7	11.48
ZINC	12	19.67
PubChem	5	8.20
ChEMBL	6	9.84
Antiviral	15	24.59
Others	3	4.92
PDB structure
6LU7	39	63.93
6W63	6	9.84
6M03	4	6.56
Model	5	8.20
Others	8	13.11
Docking software
AutoDock Vina	26	42.62
Glide HTVS	13	21.31
Glide SP	13	21.31
Glide XP	14	22.95
GOLD	4	6.56
Others	13	21.31
N° of docking programs
1	37	60.66
2	12	19.67
3	12	19.67
Molecular dynamics
Yes	35	57.38
No	26	42.62
Validation method
Actives enrichment in silico	3	4.92
In vitro	2	3.28
Re‐docking	18	29.51
No	41	67.21
Total of articlesa	61

Note: Information is shown about the initial database, the PDB structure, the protein‐ligand docking method, whether they use molecular dynamics or not, and the validation method used (if any).

Abbreviations: FDA, Food and Drug Administration.

% Of articles does not necessarily add 100% because some articles may be included in more than one group.

Outlines of the 61 articles that predict drugs to be SARS‐CoV‐2 M‐pro inhibitors Note: Information is shown about the initial database, the PDB structure, the protein‐ligand docking method, whether they use molecular dynamics or not, and the validation method used (if any). Abbreviations: FDA, Food and Drug Administration. % Of articles does not necessarily add 100% because some articles may be included in more than one group. The best way to validate the methodology and results of a VS protocol is to experimentally prove the bioactivity of some selected hits. Only two of the 61 articles reviewed confirmed their computational predictions in vitro. , When this is not possible, the methodology should be subject to a computational validation. A common strategy for validating the docking step in some of the 61 articles was re‐docking or self‐docking. Re‐docking consists of removing the ligand from a protein‐ligand complex to dock it back and evaluate if a docking program can predict its initial binding pose. The RMSD between the best docking pose and the crystallographic one, or the lowest RMSD value of all the predicted poses, gives an estimation of the ability of the docking program to reproduce the co‐crystallized binding pose of the ligand. RMSD values under 2.0 Å are usually considered to be acceptable, but values close to zero are ideal. Although 18 of the 61 VS protocols analyzed used re‐docking to estimate the docking accuracy (see Table 1 and Supporting Information File 1), only two , converted the extracted ligand to SMILES format and then back to 3D. This is done to lose the binding coordinates of the ligand and evaluate the docking accuracy from scratch, similar to how the compounds are predicted. Because docking calculations are very sensitive to the starting geometries of ligands, it may also be beneficial to re‐run the re‐docking with different starting conformations each time. In addition to re‐docking, it is also recommended to use cross‐docking to try to predict the binding pose of a ligand using a different PDB structure. In this way, the possible effect of protein flexibility can be evaluated or the best protein structure identified when several are available. However, re‐docking and cross‐docking cannot suitably validate an entire VS process or each of its steps. In addition, if docking scores are used to rank the bioactivity of the predicted compounds, it needs to be shown that they correlate with the bioactivity. Without a proper validation, the predictions obtained are speculative and dubious. A VS protocol can be computationally validated by applying the same VS protocol to a set of active compounds and to a set of inactive or decoy compounds (when a group of inactive compounds is not available). The more accurate the methodology is, the more true positives (i.e., active compounds that pass the VS procedure or VS step) and true negatives (i.e., inactive molecules that do not pass the VS procedure or VS step) there will be. To quantify the performance of the entire methodology or each VS step, several metrics can be used. Of these, the most popular are the area under a ROC curve and the Enrichment Factor (EF), which quantify the extent to which a step of the VS can be enriched with active compounds. Only three of the 61 articles reviewed used a group of known M‐pro inhibitors to confirm that their VS procedure was able to enrich the predicted compounds with these inhibitors. , , In one of the articles 12 M‐pro inhibitors, 10 weak M‐pro inhibitors and 114 DUDE‐generated decoys were screened to calculate the EF for each step of the VS and to choose the optimal scoring function for the final hit selection. Although some of the 61 articles used re‐docking as a validation step, more than the 67% did not apply a validation step (Table 1). This lack of validation of the structure‐based approaches to drug repurposing that target M‐pro and other SARS‐CoV‐2 proteins has been described elsewhere. , In a survey of the literature before 26 August, Bellera and coworkers reviewed 96 articles that reported the use of molecular docking for identifying potential compounds against SARS‐CoV‐2 M‐pro. None of the 96 articles reported any level of experimental validation and 40.6% of the articles did not perform any kind of in silico validation. We agree with Bellera and coworkers that “the preceding figures are surprising, considering the known fact that the performance of docking studies is highly dependent on the system under study, and the high false‐positive rate associated with docking‐based VS”. An interesting approach used in some of the articles reviewed , , , , was to add a pharmacophore‐based procedure before the docking step to enrich the database with some characteristics of the reported M‐pro inhibitors or M‐pro co‐crystallized ligands, or to consider only the compounds that can engage in specific interactions with the M‐pro active site. This strategy makes it possible to identify which of the initial number of compounds are most likely to be inhibitors, even though it requires prior knowledge of the interactions between the receptor and its ligands. Thus, it cannot be applied in early attempts, when little information is available, but it becomes a more powerful tool as more information arises. Figure 2 shows that the 61 VS protocols analyzed predict very few M‐pro inhibitors in common. It could be argued that this lack of agreement about the potential inhibitors is due to differences in the initial set of compounds analyzed and the methods used. However, Figure 3 shows that when different protocols using similar initial datasets are compared, such as FDA‐approved drugs (Figure 3A) or approved and investigational drugs (Figure 3B), very few compounds are predicted by two protocols even if the same PDB structure and docking program have been used (Figure 3). Some of the compounds most commonly predicted as inhibitors of SARS‐CoV‐2 M‐pro in several of the 61 VS protocols reviewed are: the covalent inhibitors of the HIV‐protease ritonavir, saquinavir, indinavir, lopinavir and nelfinavir; other antiviral compounds, such as the hepatitis C protease inhibitors simeprevir and paritaprevir; and the HIV integrase inhibitor raltegravir. All of these compounds were predicted by more than five different articles (Figure 2). As these antiviral compounds were initially used in clinical trials to combat the SARS‐CoV‐2 pandemic, they are also often selected to be included among the final hits of the VS protocols analyzed. Ritonavir inhibits cytochrome P450‐3A4, and is often simultaneously used with lopinavir or other HIV or hepatitis C protease inhibitors since it increases their half‐life. Although ritonavir was found to inhibit the SARS‐CoV‐2 M‐pro in cell culture, high concentrations are required to achieve significant inhibition, so it will be of limited clinical relevance. The results of several clinical trials that have evaluated the role of lopinavir/ritonavir combination therapy in patients with COVID‐19 have shown little or no improvement. , , Remdesivir was also predicted as an M‐pro inhibitor by 7 of the 61 articles. This compound was the first compound approved by the FDA to treat patients with severe symptoms of COVID‐19. However, remdesivir is an adenosine nucleoside triphosphate analog that inhibits the SARS‐CoV‐2 RNA‐dependent RNA polymerase and induces an irreversible chain terminator. So, it is surprising that it is predicted to be a SARS‐CoV‐2 M‐pro inhibitor. There is no experimental evidence to suggest that remdesivir binds to SARS‐CoV‐2 M‐pro. In addition, remdesivir is a prodrug, converted to its active form by host enzymes, and this was often not taken into account in the VS protocols analyzed. To the best of our knowledge, only one study has docked remdesivir and its active form to SARS‐CoV‐2 M‐pro and suggested that both molecules have a comparable affinity against SARS‐CoV‐2 M‐pro. More experiments are required to precisely determine if, in addition to inhibiting the RNA‐dependent RNA polymerase of SARS‐CoV‐2, remdesivir's mechanism of action includes the inhibition of M‐pro.

Figure 2

Figure 3

Circos plots showing the number of shared compounds among the predicted hits of different VS studies which use similar initial datasets. Panel (A) shows the number of compounds that are predicted simultaneously by two different articles that use a group of FDA‐approved drugs (between 1500 and 2800 drugs) as the initial dataset. Panel (B) shows the compounds predicted simultaneously by two articles that use a group of approved and investigational drugs (between 8500 and 13,563 drugs). The information shown is the PubMed ID of each article, the PDB structure and docking program used, the number of compounds predicted by each article and the number of common compounds. FDA, Food and Drug Administration; VS, virtual screening [Color figure can be viewed at wileyonlinelibrary.com]

Dendrogram and bar plot of the compounds that are most predicted to be inhibitors of M‐pro in several VS. The dendrogram clusters the compounds in terms of their structural pairwise similarity established by the Tanimoto coefficients calculated from the ECFP6 circular fingerprint from each compound. The bar plot represents the number of times that each compound has been predicted as a putative M‐pro inhibitor in the reviewed articles (VS occurrences). The in vitro activity, when available, is expressed as pIC50 values and represented in a green scale. The only compounds displayed are those that are predicted in two or more different VS or for which experimental pIC50 is available. The names of the compounds with known activity and those with five or more VS occurrences are displayed. VS, virtual screening Circos plots showing the number of shared compounds among the predicted hits of different VS studies which use similar initial datasets. Panel (A) shows the number of compounds that are predicted simultaneously by two different articles that use a group of FDA‐approved drugs (between 1500 and 2800 drugs) as the initial dataset. Panel (B) shows the compounds predicted simultaneously by two articles that use a group of approved and investigational drugs (between 8500 and 13,563 drugs). The information shown is the PubMed ID of each article, the PDB structure and docking program used, the number of compounds predicted by each article and the number of common compounds. FDA, Food and Drug Administration; VS, virtual screening [Color figure can be viewed at wileyonlinelibrary.com] Figure 2 also shows that the M‐pro inhibitors predicted by most articles are not the most active inhibitors. The IC50 values of lopinavir, saquinavir, indinavir and nelfinavir have been estimated to be 486, between 31.4 and 411, , 43.1, and 234 µM , respectively. In other studies, lopinavir, indinavir, nelfinavir, tipranavir and other HIV protease inhibitors have not shown any inhibitory activity between 20 and 200 µM. It has been suggested that the reduction in viral activity shown by nelfinavir and lopinavir in cell cultures is due to the inherent cytostatic and/or cytotoxic effects of these compounds. Dipyridamole, , baicalein, candesartan cilexetil , (predicted in 2, 1, and 2 studies, respectively) have the lowest IC50 values of the most predicted M‐pro inhibitors, but in the three cases, their respective IC50 of 0.6, 0.94 and 2.78 µM , are a long way from the most potent SARS‐CoV‐2 M‐pro inhibitors, such as PF‐00835231, MI‐21 and MI‐23, which have an IC50 in the nanomolar range. , As depicted in the dendrogram in Figure 2, in structural terms the most predicted compounds in VS studies vary considerably, probably due to the high flexibility induced by ligand binding and to the size of the M‐pro's binding site. , Also, relatively similar compounds (e.g., ritonavir and lopinavir) can have a quite different pIC50, which makes M‐pro a challenging target in docking and SAR studies, as a wide variety of scaffolds, activity and selectivity cliffs might have to be taken into account. To discover new therapeutics for COVID‐19 it would be useful to make a comprehensive and systematic assessment of the chemical space of known SARS‐CoV‐2 inhibitors. Although this approach opens up an interesting line in SARS research for the discovery of new SARS‐CoV‐2 M‐pro inhibitors, it falls outside the scope of the present work and, for the sake of brevity, no further discussion will be conducted. In the following sections we describe a group of known SARS‐CoV‐2 M‐pro inhibitors and inactive compounds, and show how their bioactivity can be used to validate a protein‐ligand docking step.

KNOWN SARS‐COV‐2 M‐PRO INHIBITORS

Figure S1 and Supporting Information File 2 show the 2D structure, bioactivity and other details of 150 known SARS‐CoV‐2 M‐pro inhibitors extracted from the literature. This set of inhibitors includes only those compounds whose inhibitory capacity, mainly expressed as the IC50 value, against M‐pro from SARS‐CoV‐2 has been determined in vitro. The bioactivities of these compounds have been obtained since the second trimester of 2020 and show how quickly the scientific community has responded to the challenge of the COVID‐19 pandemic, which appeared abruptly at the beginning of 2020. Most of these 150 compounds are found in other reviews that describe recent progress made in searching for SARS‐CoV‐2 M‐pro inhibitors. , , , , , , , Among these M‐pro inhibitors there are SARS‐CoV M‐pro inhibitors such as the compound MAC‐5576; HIV‐protease inhibitors such as atazanavir, darunavir, indinavir, lopinavir, nelfinavir, ritonavir and saquinavir; Hepatitis C protease inhibitors such as boceprevir, narlaprevir, simeprevir and telaprevir; Caspase‐3 inhibitors such as Z‐DEVD‐FMK and Z‐FA‐FMK; Calpain and other protease inhibitors; and inhibitors obtained from high‐throughput screening experiments. Most of the HIV, SARS‐CoV or hepatitis C protease inhibitors can make a covalent bond with the catalytic Cys145 of SARS‐CoV‐2 M‐pro. Among the 150 SARS‐CoV‐2 M‐pro inhibitors reviewed here, there are compounds with moderate or low inhibitory activity; compounds, such as carmofur, ebselen, disulfiram, PX‐2, shikonin and tideglusib, whose inhibition potencies decreased in the presence of reducing agents, such as DTT, and are considered to be nonspecific inhibitors , ; compounds such as shikonin, which quench the signal used to evaluate its activity and therefore may represent a false positive of the analysis ; and compounds such as the HIV‐protease inhibitors lopinavir, indinavir, nelfinavir saquinavir, and tipranavir, which have not shown M‐pro inhibitory activity in some concentrations analyzed. , , , To test the hypothesis that inhibiting the enzymatic activity of M‐pro will inhibit SARS‐CoV‐2 replication, a cellular antiviral assay is usually performed and an EC50 value calculated. This EC50 value is defined as the concentration of the compound that reduces the viral‐induced cytopathic effect by 50% of with respect to the virus control. Several SARS‐CoV‐2 M‐pro inhibitors have EC50 values that are slightly more than 10 times their IC50 values. Some of the 150 known SARS‐CoV‐2 M‐pro inhibitors, such as carmofur, darunavir, disulfiram, ebselen, MG‐132, PX‐2, shikonin and tideglusib, cannot inhibit SARS‐CoV‐2 replication in cell cultures or do inhibit it due to their cytotoxicity. , , , , This shows that not all the compounds that inhibit the SARS‐CoV‐2 M‐pro in vitro will be useful in vivo. Boceprevir, calpain inhibitors I, II, and XII, carmofur, cinanserin, disulfiram, ebselen, GC376, N3, narlaprevir, PX‐12, shikonin, telaprevir and tideglusib are some of the covalent inhibitors described for the SARS‐CoV‐2 M‐pro (see Figure S1 and Supporting Information File 2). The role of covalent inhibitors in drug discovery is controversial. Some of the risks they pose are toxicity, immune‐mediated drug hypersensitivity, nonspecificity, and hyperreactivity that might lead to further drug‐induced toxicity (e.g., hepatotoxicity, mutagenicity, or carcinogenicity). However, three potential benefits are that the high biochemical efficiency may lead to lower doses and reduced off‐target effects, lower drug resistance and fewer cases of decreased potency due to competing endogenous substrates. Reversible or irreversible covalent inhibitors that have some kind of specificity for a target or group of targets must first be recognized by the target binding site to place a reactive functional group, usually referred to as a warhead, in an appropriate position to form a covalent bond between the inhibitor and its target. , The formation of the covalent bond depends not only on the correct structural complementarity between the protein and the inhibitor, but also on the appropriate chemical reactivity of the inhibitor and the protein environment that stabilizes the covalent complex. The protonation state of catalytic Cys145 and His41 and His172 is crucial for the activity variation caused by pH changes and the M‐pro catalytic mechanism , , and may be crucial to the action of covalent inhibitors. The majority of SARS‐CoV‐2 covalent inhibitors are large substrate mimetics with an electrophilic warhead that covalently binds with the catalytic Cys145 sulfur atom. But other more nucleophilic cysteines, such as Cys44 located near the substrate binding pocket, are additional sites for covalent linkage. Table 2 shows such covalent warheads, as acrylamide, chloroacetamide, vinylsulfonamide, nitrile, a Michael acceptor group, alpha‐ketoamide, aldehyde, disulfide and several methyl ketone groups, which have been used here to identify putative M‐pro covalent inhibitors. Whereas some covalent warheads, such as acrylamide and chloromethyl ketones, form irreversible bonds with their biological targets, others, such as nitriles or electron‐deficient ketones, have been found to form a reversible covalent bond. , For irreversible covalent inhibitors, if a sufficient period of time is allowed, inhibition is expected to be complete. Thus, conventional IC50 measurements are of limited value for characterizing the potency of irreversible inhibitors because incubation for a different period of time would provide a different IC50 value. Of the 150 SARS‐CoV‐2 M‐pro inhibitors reviewed here, 73 present a covalent warhead (Supporting Information File 2) and are plausibly covalent inhibitors.

Table 2

Warhead	SMARTS	Examples
acrylamide	[C;H2:1]=[C;H1]C(N)=O
chloroacetamide	Cl[C;H2:1]C(N)=O
vinylsulfonamide	NS(=O)([C;H1]=[C;H2:1])=O
nitrile	N#[C:1]‐[*]	Cimetidine, isavuconazole
Michael_acceptors	C=!@CC=[O,S]	Cinanserin, MPI2, MPI9, N3
alpha‐ketoamide	C(=O)(C=O)N	Boceprevir, narlaprevir, telaprevir
aldehyde	[CX3H1](=O)	Leupeptin, 2413716‐71‐7
bisulfite_adduct_of_aldehyde	C(O)S(=[OX1])([O])(=[OX1])	GC376
urea_carbonyl	[NX3][CX3](=[OX1])([NX3,nX3])	Carmofur
bis(dialkylaminethiocarbonyl)disulfide	[CX3](=[SX1])SS[CX3](=[SX1])	Disulfiram
carbamoylsulfanyl	[NX3,nX3][C,c](=[OX1])([SX2,sx2])	Tideglusib
disulfide	[SX2][SX2]	PX‐12
Hydroxymethy_lketone	[CX3H0](=[OX1])[CH2][OH]	PF‐00835231
Alkoxymethyl_ketone	[CX3H0](=[OX1])[CH2][OX2H0]	GW‐0742, 2434601‐94‐0
Acyloxymethyl_ketone	[CX3H0](=[OX1])[CH2][OX2H0][CX3H0](=O)	2434601‐94‐0, 2434601‐96‐2
Fluoro, Chloro‐methyl_ketone	[CX3H0](=[OX1])[CH2][Cl,F]	Z‐DEVD‐FMK, Z‐FA‐FMK

Note: These covalent warheads were used to identify putative covalent inhibitors among the known SARS‐CoV‐2 M‐pro inhibitors (Figures S1 and S2) and SARS‐CoV‐2 M‐pro inactive compounds (Figure S3).

Covalent warheads that can be used to identify putative covalent inhibitors. It shows the SMARTS that can be used to identify each warhead and some examples of SARS‐CoV‐2 inhibitors that contain each warhead Note: These covalent warheads were used to identify putative covalent inhibitors among the known SARS‐CoV‐2 M‐pro inhibitors (Figures S1 and S2) and SARS‐CoV‐2 M‐pro inactive compounds (Figure S3). Among the most potent inhibitors of the SARS‐CoV‐2 M‐pro, the compounds PF‐00835231, MPI3, UAWJ248, MPI4 and MPI5 have a pIC50 higher than 7.5, and an IC50 in the nanomolar range. PF‐008325231 is a hydroxymethyl ketone covalent inhibitor selected as a development candidate for SARS‐CoV, the clinical advancement of which was suspended in 2003 due to the end of the SARS‐CoV outbreak. It is the most potent SARS‐CoV‐2 M‐pro inhibitor known to date, with an IC50 of 0.00027 µM. It is a highly selective pan‐coronavirus M‐pro inhibitor that can also inhibit SARS‐CoV‐2 replication in cells. Preclinical research is ongoing in the use of PF‐00835231 as a possible treatment for COVID‐19. Pfizer has initiated a clinical trial with PF‐07304814, a prodrug which metabolizes into PF‐008325231, against the M‐pros of SARS‐CoV‐2 and other coronaviruses. GC376 is one of the SARS‐CoV‐2 M‐pro covalent inhibitors that is most commonly used as a reference compound or positive control. Its IC50 activity ranges between 0.03 and 0.89 µM , , , , , , , , , , and this activity is consistent both in the absence and presence of the reducing agent DTT. GC376 was developed as an inhibitor of the main protease of the feline coronavirus FCoV, and it also showed activity against the M‐pro from MERS and SARS‐CoV viruses. GC376 is a prodrug and its bisulphite adduct is converted to an aldehyde that has been proved to form a covalent interaction with the catalytic Cys145 of the SARS‐CoV‐2 M‐pro. Several GC376 analogs or derived compounds have shown greater inhibitory activity than GC376. MPI3, MPI4, and MPI5 are reversible covalent inhibitors derived from the GC376 inhibitor. MPI3, MPI4, and MPI5 display an IC50 value of 0.0085, 0.015, and 0.033 µM, respectively. Although MPI3 is ranked second in potency as a SARS‐CoV‐2 M‐pro inhibitor, its inhibition of SARS‐CoV‐2 replication is much lower than other compounds, which may be explained by the compound's low cell stability. UAWJ248 is a GC376 analog with an alpha‐ketoamide reactive warhead that binds irreversibly to Cys145. The IC50 of UAWJ248 is 0.012 µM, 3.2‐fold more potent than GC376. Another set of SARS‐CoV‐2 M‐pro inhibitors and inactive compounds can be obtained from the COVID Moonshot project , set up to develop M‐pro inhibitors. It focused on turning early fragment‐screening results into potent compounds. Drug‐designers the world over have proposed more than 10,000 M‐pro inhibitors, and over 1000 compounds have been either ordered or synthesized and tested for activity against SARS‐CoV‐2 M‐pro using two complementary biochemical assays: a fluorescence based assay and a RapidFire Mass Spectrometry assay. The Fluorescence M‐pro inhibition assay consists of a screening in duplicate at 20 and 50 µM. Compounds with a value greater than 50% M‐pro inhibition at 50 µM are confirmed by a dose response assay and the value of IC50 is calculated. For the RapidFire Mass Spectrometry assay, compounds are triaged by testing the % of inhibition at a final concentration of 20 and 50 µM, and IC50 values are calculated from dose response curves. From the COVID Moonshot activity data available on the August 18, 2020, we selected those compounds that have a pIC50 difference between the two activity assays of less than two units. In addition, further quality control steps were carried out: (i) a compound was not selected if the percentage of inhibition at 20 µM was higher than the percentage at 50 µM or (ii) if the percentage of inhibition at 50 µM and the IC50 of a compound were both simultaneously higher than 50. Finally, compounds with undefined chirality were discarded. Figure S2 and Supporting Information File 3 show the final list of the 81 SARS‐CoV‐2 M‐pro inhibitors selected from the COVID Moonshot activity data. The pIC50 values of these 81 compounds are between 4.14 and 7.28, and 36 of them contain at least one of the covalent warheads defined in Table 2 (Supporting Information File 3) and are putative covalent inhibitors of the SARS‐CoV‐2 M‐pro. The activity data available from the COVID Moonshot initiative makes it possible to select a group of inactive compounds against the SARS‐CoV‐2 M‐pro. Although some articles have reported inactive compounds against SARS‐CoV‐2 M‐pro, such as the DDP‐4 inhibitors linagliptin, other gliptins and structural analogues, this kind of information is not usually available. From the COVID Moonshot activity data available on 18 August 2020, we selected the compounds with a percentage of inhibition less than 2% at 50 µM in the two different assays simultaneously (fluorescence and RapidFire). Compounds with undefined chirality were discarded. Thus, a group of 113 inactive compounds against SARS‐CoV‐2 were selected (see Figure S3 and Supporting Information File 4). Of these 113 inactive compounds, 29 present at least one of the covalent warheads from Table 2. The set of 81 M‐pro inhibitors from the COVID Moonshot project and the 150 M‐pro inhibitors extracted from the literature and presented above could be used to validate a VS procedure and/or a docking step that aims to predict new SARS‐CoV‐2 M‐pro inhibitors. The COVID Moonshot set of active compounds is less diverse than the set taken from the literature, but their inhibitory activities were estimated using the same procedure and laboratory, and all the inhibitors were confirmed by two different assays (fluorescence and RapidFire). In the section below, these sets of compounds are used to check whether protein‐ligand docking methods can rank these compounds in order of their potency and if they can discriminate between active and inactive compounds.

DOCKING METHODOLOGY NEEDS TO BE VALIDATED BEFORE USE

Although molecular docking programs can usually generate ligand poses that are similar to the crystallographic conformation for several targets, protein‐ligand docking has several limitations. One of these is that, for practical reasons, most docking methods use a rigid protein approximation, where the ligand is treated as flexible, but the protein conformation is restricted. , Some methods, such as ensemble docking which takes place in an ensemble of protein conformations, enable protein flexibility to be analysed. Another limitation is that, contrary to what is expected, the poses and docking scores obtained by docking can be heavily dependent on the ligand input structure, so small changes in the ligand input conformation can lead to large differences in the docked poses. One of the major limitations of the scoring functions used by docking programs is that they do not always make a useful prediction of ligand binding affinity. , Thus, the best energy score is not a reliable criterion for selecting the best solution in common docking applications. For these reasons, docking scores should not be used as a measure or threshold for compound activity, but rather to enrich the library by discarding those compounds that do not fit inside the binding site. It should also be noted that the docking scores of different docking programs cannot be directly compared because the scores are dependent on the force fields and protocols used for each program. Another limitation of molecular docking is how they deal with water molecules present in the binding site during the docking process , which can lead to errors when trying to ascertain the possible interaction between the ligand and the target. Finally, the ability of a docking program to separate decoys from active compounds is usually strongly dependent on the protein structure used and the similarity between the screened ligands and the co‐crystallized ones. We used the aforementioned sets of 150 M‐pro inhibitors extracted from the literature and 81 M‐pro inhibitors from the COVID Moonshot project to check whether docking scores can be used to predict the potency of a SARS‐CoV‐2 M‐pro inhibitor. To test this hypothesis we performed a protein‐ligand docking of both sets of compounds with two different 3D structures of the SARS‐CoV‐2 M‐pro protein, the 6LU7 and 6WQF, and three docking programs, i.e, FRED, Glide and Autodock Vina. For Glide, we used three distinct docking methodologies: Glide HTVS, Glide SP and Glide XP. So in total we considered 5 different docking methods, which appeared to be the most common in the reviewed articles (see Table 1). Briefly, the three methodological approaches from the Schrödinger package Glide differ from each one. Essentially, Glide HTVS and Glide SP use the same docking algorithm, but their main difference is that Glide HTVS minimizes the number of intermediate conformations during the docking procedure and the accuracy of the final torsional refinement and sampling. On the other hand, Glide XP uses a more exhaustive and sophisticated scoring function and sampling process, and is stricter in terms of ligand‐receptor shape complementarity. We used the 6LU7 structure because it is the most used structure in molecular docking studies of the SARS‐CoV‐2 M‐pro (see Table 1). The 6WQF structure of unliganded SARS‐CoV‐2 M‐pro was also used as it was suggested that this structure may provide more relevant information at physiological temperatures, which could be helpful in protein‐ligand molecular docking studies. Both M‐pro structures (6LU7 and 6WQF) were obtained from the Protein Data Bank and were superimposed using the alpha carbons (Cα) as reference points. The 6LU7 ligand (N3) was removed from the structure before the protein was prepared. Protein preparation was performed with different tools, depending on the docking software used. For Glide, the structures were prepared with the Protein Preparation Wizard with the following settings: (a) N and C termini were capped; (b) hydrogens were added after original hydrogens had been removed; (c) water molecules were removed; (d) Epik was used to generate tautomers and protonation states with neutral pH; (e) H‐bond assignment was optimized with PROPKA ; and (f) the structure was minimized with the default force field. The grid for docking was generated using Glide. The center coordinates corresponded to the centroid of N3 (−10.36, 12.46, 68.7) and the box size was set to 35 × 35 × 35 Å. In the case of FRED, the utility MakeReceptor was used to (a) define a box, (b) set a shape potential and (c) define the inner and outer contours of the grid based on those used to create the grid for Glide. The preparations needed for AutoDock Vina were performed with AutoDockTools which consisted of removing all waters and adding polar hydrogens. The grid was based on the same coordinates and size as the previous programs, and the M‐pro inhibitors were converted into isomeric SMILES format. Only achiral molecules or molecules with a fully defined chirality were used. Before the docking procedure with Autodock Vina and the three methods of Glide, Omega was used to generate the 3D conformation with the lowest energy and Ligprep was used to generate all the possible protonation states in the pH range 7.2 ± 1.0 and the default number of tautomers. The docking program FRED had its own pipeline that included (i) fixpka, tautomer and molcharge from the QUACPAC program which set the ionization states, enumerated the tautomeric forms and assigned atomic partial charges, and (ii) Omega which generated one conformation per compound and discarded unspecified chiralities. The most negative docking scores obtained for each compound and each protein‐ligand methodology were correlated with the inhibitory potency, measured as pIC50 values. Figure 4 and Figure S4 show that for both datasets, the correlations were far from satisfactory. The correlations improved if covalent and noncovalent inhibitors were treated separately (see Figure 4 and Figure S4). For the 6LU7 structure, and considering solely the covalent inhibitors extracted from the literature, correlations were negative, as one would expect because the more negative the docking score, the higher pIC50 value (Figure 4A–E). The best correlation had a Pearson's r coefficient of −0.335 (p value 0.003) for the covalent inhibitors when AutoDock Vina was used (Figure 4E). However, the Pearson correlation coefficients for the noncovalent inhibitors were positive or around zero (Figure 4A–E). For the set of 81 M‐pro inhibitors from the COVID MoonShot project (Figure 4F–J), most of the correlations were unexpectedly found to be positive. Conclusions were similar when the 6WQF structure was used (Figure S4). This lack of correlation between the bioactivities and docking scores for this target was previously reported , and questions the reliability of using a docking score as a cutoff value for selecting new putative M‐pro inhibitors or predicting the relative bioactivity of a compound by comparison with reference compounds or even with other predicted compounds.

Figure 4

Correlation between the highest docking scores from five different docking methods (FRED, Glide HTVS, Glide SP, Glide XP and AutoDock Vina) and the experimental inhibitory activity, measured as pIC50, of a group of 150 M‐pro inhibitors extracted from the literature (A–E) and 81 M‐pro inhibitors from the COVID Moonshot project (F–J). The protein from the 6LU7 PDB structure was used for docking. Data has been adjusted to a linear regression and the correlation coefficient is displayed in gray. Covalent and noncovalent inhibitors are also represented separately in orange and blue, respectively [Color figure can be viewed at wileyonlinelibrary.com] To overcome the limitations of docking programs, they can be combined with other methods. Standard molecular dynamics or binding free energy estimations, such as MM‐GBSA and MM‐PBSA enable the protein flexibility to be explored. MM‐GBSA or MM‐PBSA are popular methods for estimating ligand‐binding affinities with a modest computational effort. Their results are considered to be more accurate than those of most docking scoring functions and they have been widely used in virtual screening protocols to find M‐pro inhibitors (see Supporting Information File 1). Because these methods are not suitable for screening large libraries of compounds, they are usually used to analyze compounds selected by molecular docking. The best docking poses of a selected group of hits can be the input of the MM‐GBSA or MM‐PBSA algorithms which can be used to estimate their ligand‐binding affinities, expressed as a ∆G value. The combination of molecular docking and MM‐GBSA or MM‐PBSA re‐scoring has proven to be a promising strategy for identifying the correct binding poses and correctly ranking the binding affinities of a series of ligands. However, the success of these methods depends on the target structure and they cannot be used to predict true drug candidates in drug design without experimental verification because their accuracy is relatively limited. To assess how successfully these methods predict the bioactivity ranking of the two sets of known M‐pro inhibitors presented above, we used the MM‐GBSA minimization available at Prime to re‐score the best docking pose obtained with the five docking methods and two SARS‐CoV‐2 M‐pro structures, and to estimate the ∆G value of ligand‐binding affinity for each compound. Figure 5 and Figure S5 show the correlation between the pIC50 and the estimated ∆G values. A negative correlation is expected, because the higher the pIC50 of a compound is, the lower the ∆G value. Figure 5 and Figure S5 show that most correlations were positive, even though noncovalent and covalent inhibitors were treated separately. Moderate negative correlations were obtained for covalent inhibitors from the bibliography (Figure 5A–E and Figure S5A–E). The best Pearson r coefficient (with a value of −0.303) was obtained with the covalent inhibitors from the bibliography docked to the 6LU7 structure with the FRED program (Figure 5A). Thus, for the inhibitors and M‐pro structures assayed here, the MM‐GBSA algorithm was not able to correct the lack of correlation obtained with the docking programs.

Figure 5

Correlation between the binding energy, calculated as the ∆G of the MM‐GBSA method, and the experimental inhibitory activity, measured as pIC50, of a group of 150 M‐pro inhibitors extracted from the literature (A–E) and 81 M‐pro inhibitors from the COVID Moonshot project (F–J). The protein from the 6LU7 PDB structure was used for docking. The poses with the highest docking scores from each of the five different docking methods (FRED, Glide HTVS, Glide SP, Glide XP, and AutoDock Vina) were used as the input to the MM‐GBSA program to estimate the binding energy. Data has been adjusted to a linear regression and the correlation coefficient is displayed in gray. Covalent and noncovalent inhibitors are also represented separately in orange and blue, respectively. MM‐GBSA, molecular mechanics generalized Born surface area [Color figure can be viewed at wileyonlinelibrary.com] Although the docking scores or ∆G values estimated with the MM‐GBSA algorithm of known SARS‐CoV‐2 M‐pro inhibitors do not correlate with their potency, we wanted to find out whether these parameters made it possible to discriminate between SARS‐CoV‐2 M‐pro inhibitors and compounds with no SARS‐CoV‐2 M‐pro inhibitory activity. For this reason, we added a group of inactive compounds to the set of COVID Moonshot inhibitors, that is to say, compounds with no activity against M‐pro in the two assays used in the COVID Moonshot project. As inactive compounds do not have IC50 values, we used ROC curves to test whether M‐pro inhibitors could be distinguished from inactive compounds by either the docking score or ∆G estimated with MM‐GBSA. Again, we used the five aforementioned docking methods and the two SARS‐CoV‐2 M‐pro structures. ROC curves are used to assess the performance of a classification method. They represent true positive versus false positive rates and the area under the ROC curve (AUC) measures the performance across all possible classification thresholds. AUC values range from 0 to 1, and represent the probability that a random positive example is positioned before a random negative example, when they are sorted by decreasing docking scores or ∆G values. A random two‐class classification has an AUC value of 0.5. Figure 6 and Figure S6 show the ROC curves and AUC values for the COVID Moonshot M‐pro inhibitors and inactive compounds presented above when using the two M‐pro structures. Most of the AUC values obtained were near to 0.5 (Figure 6 and Figure S6), which means that M‐pro inhibitors generally cannot be distinguished from inactive compounds when using docking scores or ∆G values estimated by the MM‐GBSA method. In some cases, the predictions improved if noncovalent and covalent inhibitors were treated separately (Figure 6 and Figure S6). The best AUC obtained was 0.744 when the 6WQF M‐pro structure and the FRED docking poses of the noncovalent inhibitors were used to estimate the ∆G values with the MM‐GBSA method (Figure S6F). For the 6LU7 structure, the best AUC was obtained with the Glide XP docking poses of the noncovalent inhibitors and ∆G values estimated with the MM‐GBSA method (Figure 6I).

Figure 6

ROC curves obtained from the docking scores of five different docking methods (FRED, Glide HTVS, Glide SP, Glide XP and AutoDock Vina) (A–E) and the ∆G calculated with the MM‐GBSA program (F–J) of a group of known SARS‐CoV‐2 M‐pro inhibitors and compounds with no activity against M‐pro from the COVID Moonshot initiative. The protein from the 6LU7 PDB structure was used for docking. Covalent, noncovalent and the whole set of inhibitors are represented separately. The area under the curve (AUC) values are displayed for each group. MM‐GBSA, molecular mechanics generalized Born surface area [Color figure can be viewed at wileyonlinelibrary.com] To test if there is a correlation between the pIC50 of a group of SARS‐CoV‐2 M‐pro inhibitors and their ∆G values without using docking, from the COVID Moonshot activity data we selected a group of 40 noncovalent inhibitors that were co‐crystallized with the SARS‐CoV‐2 M‐pro (Supporting Information File 5). The 3‐D structures of the protein‐ligand complex of these compounds, available from Fragalysis, were used to estimate the protein‐ligand affinity, expressed as ∆G values, using the KDeep server. In this server, the structure of the protein and the ligand is represented using 8 pharmacophoric‐like features (i.e., hydrophobic, aromatic, hydrogen‐bond donor and acceptor, positive and negative ionizable, metallic and total excluded volume) and protein‐ligand affinity is predicted using a Deep Convolutional Neural Network pre‐trained with the PDBbind v.2016 database. Figure 7 shows that the correlation between the pIC50 and ∆G values is good, with a Pearson correlation coefficient of −0.688 (p value 9.32E‐07). However, this correlation decreased when the ligands were transformed to SMILES format to lose the binding coordinates of the ligand and were docked with the five docking programs used above (Figure 8 and Figure S7). The best correlation, with a Pearson coefficient value of −0.359 (p value 0.023), was obtained with the FRED docking program, a long way from the correlation of −0.688 obtained with the crystallographic poses. This shows that for the two SARS‐CoV‐2 M‐pro structures and the five docking methods used here, these methodologies were not able to predict the crystallographic pose as the best docking pose, as it has recently been shown. Again this reinforces the idea that it is essential that docking results be validated before use.

Figure 7

Figure 8

Correlation between the pIC50 and the ∆G values estimated with the KDeep server from the best docking poses from five different docking methods (FRED, Glide HTVS, Glide SP, Glide XP and AutoDock Vina) (A–E) and the 6LU7 M‐pro structure of a group of 40 noncovalent SARS‐CoV‐2 M‐pro inhibitors that were co‐crystallized with the SARS‐CoV‐2 M‐pro protein [Color figure can be viewed at wileyonlinelibrary.com]

Correlation between the pIC50 and the ∆G values estimated with the KDeep server from the crystallized protein‐ligand complexes of a group of 40 noncovalent SARS‐CoV‐2 M‐pro inhibitors that were co‐crystallized with the SARS‐CoV‐2 M‐pro protein [Color figure can be viewed at wileyonlinelibrary.com] Correlation between the pIC50 and the ∆G values estimated with the KDeep server from the best docking poses from five different docking methods (FRED, Glide HTVS, Glide SP, Glide XP and AutoDock Vina) (A–E) and the 6LU7 M‐pro structure of a group of 40 noncovalent SARS‐CoV‐2 M‐pro inhibitors that were co‐crystallized with the SARS‐CoV‐2 M‐pro protein [Color figure can be viewed at wileyonlinelibrary.com]

OUTLOOK

Although the SARS‐CoV‐2 pandemic needed an accelerated process of drug development, measures are required to safeguard this process. Dotolo and coworkers put this idea into words with the proverb “haste makes waste”, and remarked that “what is true for popular wisdom is doubly true for scientific research”. Overconfident statements and conclusions that are not fully supported by experimental evidence are frequently found in articles that use protein‐ligand docking to find SARS‐CoV‐2 M‐pro inhibitors. The methods used and the results obtained by computational approaches must be validated so that these methods are not devalued and efforts focus on the most promising compounds. If docking scores are used to rank or select a set of compounds or to compare the potency of two inhibitor compounds, it must first be shown that there is a correlation between docking scores and activity or potency, for example expressed as the IC50. If there is no such correlation, there is no reason to apply this methodology. A VS protocol can be computationally validated by applying the same VS protocol to a set of both active and inactive compounds. Therefore, the methodology is expected to be able to selectively identify the known active compounds from the validation set. The use of inactive compounds for this validation is crucial because they act as a negative control. Of course, the validation of a VS protocol with active and inactive compounds requires prior knowledge of the target, so it is not applicable if information is scarce, as is the case when targets are new or poorly understood. However, it must be used as an efficient validation tool when more information is available. The set of SARS‐CoV‐2 M‐pro inhibitors and inactive compounds reviewed here could be helpful in validating future screenings. The flexibility and plasticity of the SARS‐CoV‐2 M‐pro make it a challenging target for small‐molecule inhibitor design. Thus, techniques that have been applied to other targets are not always effective. When using any methodology, such as protein‐ligand docking, it is important to consider not only its well‐known advantages, but also its weaknesses. Binding free energies solely based on docking poses are, in general, insufficiently reliable. Here, in two large groups of SARS‐CoV‐2 M‐pro inhibitors, using two different SARS‐CoV‐2 M‐pro structures and five protein‐ligand docking methods we have shown that there is not a good correlation between the bioactivity and such affinity scores, as the docking scores or the ∆G calculated with the MM‐GBSA method. In addition, these affinity scores cannot distinguish between compounds with or without M‐pro inhibitory activity, especially if covalent and noncovalent inhibitors are analyzed together. This is not to suggest that protein‐ligand docking is useless. On the contrary, it is indeed an invaluable tool for reducing the initial compounds in a virtual screening when used in the initials steps, for plausible pose prediction and for semiquantitative assessment of the inhibitory power of a ligand. There are some successful examples of the use of protein‐ligand docking as an initial step to identify new SARS‐CoV‐2 M‐pro inhibitors, , , , , but none of them used protein‐ligand docking alone. Consensus docking, i.e., the use of different protein‐ligand docking programs, has proved to be also helpful. , When designing a new VS procedure to find SARS‐CoV‐2 M‐pro inhibitors it should be considered if either covalent or noncovalent inhibitors (or both) are expected. The covalent warheads provided herein can be used to differentiate between potentially covalent and noncovalent M‐pro inhibitors and to treat both sets separately. Covalent docking , is an alternative when the aim is to find covalent inhibitors, and it has been used successfully to find M‐pro inhibitors. , In addition to docking, other computational methods have been used to identify new SARS‐CoV‐2 M‐pro inhibitors. These methods include pharmacophores , , ; combined quantum mechanics/molecular mechanics (QM/MM) methods , ; a Docking Consensus Approach and Common Hits Approach that combine molecular dynamics simulations with molecular docking and pharmacophores, respectively ; interactive molecular dynamics simulation in virtual reality ; Quantitative Structure‐activity Relationship models, and binding free energies calculations. , , Among the various methods to calculate protein‐ligand binding affinities, Free Energy Perturbation (FEP) has been used successfully to predict new M‐pro inhibitors. , , Advances in computer power, enhanced sampling algorithms, and increase of force field accuracy has led to newer FEP implementations that has enabled accurate and reliable calculations of protein‐ligand binding free energies. Only when combined with other methods, should the results of automated protein‐ligand docking be considered worthy for further experimental testing. Supplementary file 1. Details of the 61 reviewed articles and the drugs they predicted to be SARS‐CoV‐2 M‐pro inhibitors. When available, information is provided about the methodology other experimental aspects, database identifiers for the predicted drugs (CAS number, UNII, DrugBank ID, PubChem ID, ChEMBL and ZINC), their IUPAC names and SMILES. Click here for additional data file. Supplementary file 2. Details of the 150 known SARS‐CoV‐2 M‐pro inhibitors extracted from the bibliography. Click here for additional data file. Supplementary file 3. Details of the 81 SARS‐CoV‐2 M‐pro inhibitors from the COVID Moonshot project used for validation. Click here for additional data file. Supplementary file 4. Details of the 113 compounds from the COVID Moonshot project that are inactive against SARS‐CoV‐2 M‐pro and which were used for validation. Click here for additional data file. Supplementary file 5. Details of the 40 SARS‐CoV‐2 M‐pro inhibitors from the COVID Moonshot project crystallized with the SARS‐CoV‐2 M‐pro and whose inhibitory activity is available. Click here for additional data file. Supporting information. Click here for additional data file.

124 in total

1. Lessons in molecular recognition: the effects of ligand and protein flexibility on molecular docking accuracy.

Authors: Jon A Erickson; Mehran Jalaie; Daniel H Robertson; Richard A Lewis; Michal Vieth
Journal: J Med Chem Date: 2004-01-01 Impact factor: 7.446

2. Effect of input differences on the results of docking calculations.

Authors: Miklos Feher; Christopher I Williams
Journal: J Chem Inf Model Date: 2009-07 Impact factor: 4.956

3. A novel coronavirus associated with severe acute respiratory syndrome.

Authors: Thomas G Ksiazek; Dean Erdman; Cynthia S Goldsmith; Sherif R Zaki; Teresa Peret; Shannon Emery; Suxiang Tong; Carlo Urbani; James A Comer; Wilina Lim; Pierre E Rollin; Scott F Dowell; Ai-Ee Ling; Charles D Humphrey; Wun-Ju Shieh; Jeannette Guarner; Christopher D Paddock; Paul Rota; Barry Fields; Joseph DeRisi; Jyh-Yuan Yang; Nancy Cox; James M Hughes; James W LeDuc; William J Bellini; Larry J Anderson
Journal: N Engl J Med Date: 2003-04-10 Impact factor: 91.245

4. Drug repurposing against SARS-CoV-2 using E-pharmacophore based virtual screening, molecular docking and molecular dynamics with main protease as the target.

Authors: K G Arun; C S Sharanya; J Abhithaj; Dileep Francis; C Sadasivan
Journal: J Biomol Struct Dyn Date: 2020-06-22

5. Both Boceprevir and GC376 efficaciously inhibit SARS-CoV-2 by targeting its main protease.

Authors: Lifeng Fu; Fei Ye; Yong Feng; Feng Yu; Qisheng Wang; Yan Wu; Cheng Zhao; Huan Sun; Baoying Huang; Peihua Niu; Hao Song; Yi Shi; Xuebing Li; Wenjie Tan; Jianxun Qi; George Fu Gao
Journal: Nat Commun Date: 2020-09-04 Impact factor: 17.694

6. Computational drug repurposing for the identification of SARS-CoV-2 main protease inhibitors.

Authors: Diego Fiorucci; Eva Milletti; Francesco Orofino; Antonella Brizzi; Claudia Mugnaini; Federico Corelli
Journal: J Biomol Struct Dyn Date: 2020-07-24

7. Ensemble Docking Coupled to Linear Interaction Energy Calculations for Identification of Coronavirus Main Protease (3CL^pro) Non-Covalent Small-Molecule Inhibitors.

Authors: Marko Jukič; Dušanka Janežič; Urban Bren
Journal: Molecules Date: 2020-12-09 Impact factor: 4.411

8. A Trial of Lopinavir-Ritonavir in Adults Hospitalized with Severe Covid-19.

Authors: Bin Cao; Yeming Wang; Danning Wen; Wen Liu; Jingli Wang; Guohui Fan; Lianguo Ruan; Bin Song; Yanping Cai; Ming Wei; Xingwang Li; Jiaan Xia; Nanshan Chen; Jie Xiang; Ting Yu; Tao Bai; Xuelei Xie; Li Zhang; Caihong Li; Ye Yuan; Hua Chen; Huadong Li; Hanping Huang; Shengjing Tu; Fengyun Gong; Ying Liu; Yuan Wei; Chongya Dong; Fei Zhou; Xiaoying Gu; Jiuyang Xu; Zhibo Liu; Yi Zhang; Hui Li; Lianhan Shang; Ke Wang; Kunxia Li; Xia Zhou; Xuan Dong; Zhaohui Qu; Sixia Lu; Xujuan Hu; Shunan Ruan; Shanshan Luo; Jing Wu; Lu Peng; Fang Cheng; Lihong Pan; Jun Zou; Chunmin Jia; Juan Wang; Xia Liu; Shuzhen Wang; Xudong Wu; Qin Ge; Jing He; Haiyan Zhan; Fang Qiu; Li Guo; Chaolin Huang; Thomas Jaki; Frederick G Hayden; Peter W Horby; Dingyu Zhang; Chen Wang
Journal: N Engl J Med Date: 2020-03-18 Impact factor: 91.245

9. Multiscale Simulations of SARS-CoV-2 3CL Protease Inhibition with Aldehyde Derivatives. Role of Protein and Inhibitor Conformational Changes in the Reaction Mechanism.

Authors: Carlos A Ramos-Guzmán; J Javier Ruiz-Pernía; Iñaki Tuñón
Journal: ACS Catal Date: 2021-03-25 Impact factor: 13.084

10. Benchmarking the Ability of Common Docking Programs to Correctly Reproduce and Score Binding Modes in SARS-CoV-2 Protease Mpro.

Authors: Shani Zev; Keren Raz; Renana Schwartz; Reem Tarabeh; Prashant Kumar Gupta; Dan T Major
Journal: J Chem Inf Model Date: 2021-05-28 Impact factor: 4.956

8 in total

Review 1. Insights into the Pharmacological Effects of Flavonoids: The Systematic Review of Computer Modeling.

Authors: Amir Taldaev; Roman Terekhov; Ilya Nikitin; Anastasiya Zhevlakova; Irina Selivanova
Journal: Int J Mol Sci Date: 2022-05-27 Impact factor: 6.208

2. Discovery of C-12 dithiocarbamate andrographolide analogues as inhibitors of SARS-CoV-2 main protease: In vitro and in silico studies.

Authors: Bodee Nutho; Patcharin Wilasluck; Peerapon Deetanya; Kittikhun Wangkanont; Patcharee Arsakhant; Rungnapha Saeeng; Thanyada Rungrotmongkol
Journal: Comput Struct Biotechnol J Date: 2022-05-30 Impact factor: 6.155

Review 3. A Review of the Current Landscape of SARS-CoV-2 Main Protease Inhibitors: Have We Hit the Bullseye Yet?

Authors: Guillem Macip; Pol Garcia-Segura; Júlia Mestres-Truyol; Bryan Saldivar-Espinoza; Gerard Pujadas; Santiago Garcia-Vallvé
Journal: Int J Mol Sci Date: 2021-12-27 Impact factor: 5.923

4. In Silico Drug Repositioning to Target the SARS-CoV-2 Main Protease as Covalent Inhibitors Employing a Combined Structure-Based Virtual Screening Strategy of Pharmacophore Models and Covalent Docking.

Authors: Luis Heriberto Vázquez-Mendoza; Humberto L Mendoza-Figueroa; Juan Benjamín García-Vázquez; José Correa-Basurto; Jazmín García-Machorro
Journal: Int J Mol Sci Date: 2022-04-03 Impact factor: 5.923

5. On drug discovery against infectious diseases and academic medicinal chemistry contributions.

Authors: Yves L Janin
Journal: Beilstein J Org Chem Date: 2022-09-29 Impact factor: 2.544

6. Epicatechin is a promising novel inhibitor of SARS-CoV-2 entry by disrupting interactions between angiotensin-converting enzyme type 2 and the viral receptor binding domain: A computational/simulation study.

Authors: Mohammed Baqur S Al-Shuhaib; Hayder O Hashim; Jafar M B Al-Shuhaib
Journal: Comput Biol Med Date: 2021-12-17 Impact factor: 6.698

Review 7. Haste makes waste: A critical review of docking-based virtual screening in drug repurposing for SARS-CoV-2 main protease (M-pro) inhibition.

Authors: Guillem Macip; Pol Garcia-Segura; Júlia Mestres-Truyol; Bryan Saldivar-Espinoza; María José Ojeda-Montes; Aleix Gimeno; Adrià Cereto-Massagué; Santiago Garcia-Vallvé; Gerard Pujadas
Journal: Med Res Rev Date: 2021-10-26 Impact factor: 12.388

8. Fullerenes against COVID-19: Repurposing C₆₀ and C₇₀ to Clog the Active Site of SARS-CoV-2 Protease.

Authors: Tainah Dorina Marforio; Edoardo Jun Mattioli; Francesco Zerbetto; Matteo Calvaresi
Journal: Molecules Date: 2022-03-16 Impact factor: 4.411

8 in total