Literature DB >> 33426509

A multi-pronged approach targeting SARS-CoV-2 proteins using ultra-large virtual screening.

Christoph Gorgulla^1,2,3, Krishna M Padmanabha Das^1,3, Kendra E Leigh⁴, Marco Cespugli⁵, Patrick D Fischer^1,3,6, Zi-Fu Wang¹, Guilhem Tesseyre⁷, Shreya Pandita⁷, Alec Shnapir⁷, Anthony Calderaio⁸, Minko Gechev⁷, Alexander Rose⁹, Noam Lewis⁷, Colin Hutcheson⁷, Erez Yaffe⁷, Roni Luxenburg⁷, Henry D Herce^1,3, Vedat Durmaz⁵, Thanos D Halazonetis¹⁰, Konstantin Fackeldey^11,12, J J Patten¹³, Alexander Chuprina¹⁴, Igor Dziuba¹⁵, Alla Plekhova¹⁶, Yurii Moroz^16,17, Dmytro Radchenko^14,17, Olga Tarkhanova¹⁶, Irina Yavnyuk¹⁴, Christian Gruber^5,18, Ryan Yust⁷, Dave Payne⁷, Anders M Näär¹⁹, Mark N Namchuk¹, Robert A Davey¹³, Gerhard Wagner¹, Jamie Kinney⁷, Haribabu Arthanari^1,3.

Abstract

The unparalleled global effort to combat the continuing severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic over the last year has resulted in promising prophylactic measures. However, a need still exists for cheap, effective therapeutics, and targeting multiple points in the viral life cycle could help tackle the current, as well as future, coronaviruses. Here, we leverage our recently developed, ultra-large-scale in silico screening platform, VirtualFlow, to search for inhibitors that target SARS-CoV-2. In this unprecedented structure-based virtual campaign, we screened roughly 1 billion molecules against each of 40 different target sites on 17 different potential viral and host targets. In addition to targeting the active sites of viral enzymes, we also targeted critical auxiliary sites such as functionally important protein-protein interactions.

Entities: Species

Keywords: Drugs; High-Performance Computing in Bioinformatics; Structural Biology; Virology

Year: 2021 PMID： 33426509 PMCID： PMC7783459 DOI： 10.1016/j.isci.2020.102021

Source DB: PubMed Journal: iScience ISSN： 2589-0042

Introduction

At the end of 2019, cases of pneumonia with an initially unknown etiology were identified in Wuhan city, in the Hubei Province of China (Zhou et al., 2020a; Wu et al., 2020; Andersen et al., 2020). The cause was determined to be a novel coronavirus (CoV) (Zhu et al., 2020), and by early July 2020, there were over 12 million confirmed cases worldwide (Dong et al., 2020a; Dong et al., 2020b) of what is now designated severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (Coronaviridae Study Group of the International Committee on Taxonomy of Viruses, 2020). A Betacoronavirus (lineage B) related to severe acute respiratory syndrome CoV (SARS-CoV), SARS-CoV-2 commonly causes fever, cough, myalgia, and/or fatigue (Huang et al., 2020; Roman et al., 2020). While even a year later, our clinical knowledge is still developing, in addition to asymptomatic and mild cases, dyspnea, lymphopenia, and anosmia, with or without dysgeusia, have also been reported as clinical features (Huang et al., 2020; Xydakis et al., 2020; Rothe et al., 2020; Zhou et al., 2020b), and complications can include acute respiratory distress syndrome, acute cardiac injury, and secondary infections (Huang et al., 2020). As of July 8th, 2020, over half a million deaths have been attributed to coronavirus disease 2019 (COVID-19) (Dong et al., 2020a; Dong et al., 2020b), and the rapid expansion in case number in combination with severe symptoms requiring hospitalization has resulted in an unprecedented unprecedented strain on the global healthcare system. Coronaviridae comprises a family of large positive-sense, single-stranded RNA viruses that derive their name from the “corona” that fringes the virions in electron micrographs (Almedia et al., 1968; Masters, 2006). CoV virions are composed of a lipid envelope decorated with spike (S) protein, which facilitates entry and causes their corona-like appearance (Neuman and Buchmeier, 2016). Envelope (E) protein, which contributes to virion assembly and viral pathogenesis, and membrane (M) protein, which also facilitates virion assembly, are also both embedded in this bilayer (Neuman and Buchmeier, 2016), and the viral genome, in close association with nucleoprotein (N), is encapsulated within. To initiate entry, the receptor-binding domain of S must engage with its receptor on the surface of its target cell, and several studies have already identified the SARS-CoV receptor, angiotensin-converting enzyme 2 (ACE2), as a receptor for SARS-CoV-2 (Zhou et al., 2020a; Yan et al., 2020a; Walls et al., 2020). While engagement with the receptor initiates conformational rearrangements in S, the spike protein must also be cleaved at its S2′ site as part of the entry process. Unlike the S1/S2 cleavage event, which can occur at any point from viral assembly to entry, S2′ cleavage likely only occurs during entry and involves host proteases at the cell surface, such as transmembrane protease, serine 2 (TMPRSS2), or in endosomes, such as cathepsins (Millet and Whittaker, 2015). Further conformational rearrangements in S result in membrane fusion, allowing the release of the nucleocapsid into the cytoplasm. As the genome is positive-sense, replication starts with the expression of ORF1a and ORF1ab. The resulting polyproteins (pp1a and pp1ab) are further processed into sixteen non-structural proteins (nsp1-16; Figure 1) that form, in conjunction with host proteins, membrane-associated replication and transcription complexes (Snijder et al., 2016). The genome is replicated via an intermediate negative-sense copy of the genome, and both structural and accessory proteins are expressed from 3′-co-terminal sub-genomic RNAs (de Wilde et al., 2017). Assembly occurs on membranes between the endoplasmic reticulum (ER) and the trans-Golgi network, with the virions budding into vesicular compartments that then fuse with the plasma membrane, releasing their cargo (de Wilde et al., 2017) (Figure 1).

Figure 1

Schematic of the viral life cycle of SARS-CoV-2

The genome organization is based on other coronaviruses and published predictions (Snijder et al., 2016; Wu et al., 2020). ACE2: angiotensin-converting enzyme 2; TMPRSS2: transmembrane protease, serine 2; RdRp: RNA-dependent RNA polymerase; ExoN: exonuclease; N7-MT N7-methyltransferase; 2′O-MTase: 2′O-methyltransferase; EndoU: uridylate-specific endonuclease; RTC: replication and transcription complex; ER: endoplasmic reticulum; TGN: trans-Golgi network.

Schematic of the viral life cycle of SARS-CoV-2 The genome organization is based on other coronaviruses and published predictions (Snijder et al., 2016; Wu et al., 2020). ACE2: angiotensin-converting enzyme 2; TMPRSS2: transmembrane protease, serine 2; RdRp: RNA-dependent RNA polymerase; ExoN: exonuclease; N7-MT N7-methyltransferase; 2′O-MTase: 2′O-methyltransferase; EndoU: uridylate-specific endonuclease; RTC: replication and transcription complex; ER: endoplasmic reticulum; TGN: trans-Golgi network. While low-pathogenicity human CoVs such as HCoV-229E, HCoV-OC43, HCov-NL63, and HCoV-HKU1 (Su et al., 2016) continually circulate and generally cause mild disease, the emergence of SARS-CoV (Ksiazek et al., 2003; Kuiken et al., 2003; Peiris et al., 2003; Drosten et al., 2003) in 2002 and Middle East respiratory syndrome CoV (MERS-CoV) (de Groot et al., 2013; Sander van et al., 2012; Zaki et al., 2012) in 2012 as the result of zoonotic jumps into the human population (Cui et al., 2018) demonstrated that CoVs could cause serious morbidity and mortality. Their importance made them the subject of extensive research and attractive therapeutic targets. While research is only just beginning on SARS-CoV-2, a comparison of the SARS-CoV-2 sequence early in the outbreak to previously identified CoV sequences revealed it to be most related to previously isolated bat SARS-CoV-like coronaviruses (Zhou et al., 2020a; Wu et al., 2020; Lu et al., 2020), and comparison to both a bat SARS-CoV-like virus (bat SL-CoVZC45), as well as a previous human SARS-CoV isolate (SARS-CoV Tor2), predicted that SARS-CoV-2 likely has a similar genomic organization and expresses similar proteins (Wu et al., 2020) (Figure 1). The assignment to the Betacoronavirus genus and in particular the similarity to SARS-CoV allows us to also draw on a field of already published work to infer promising targets within the SARS-CoV-2 life cycle for small molecule intervention. Therapeutics against viruses generally fall into two broad categories: (i) vaccines, which prime the host immune system to mount a targeted defense against infection by the virus, and (ii) other agents with antiviral effects, which include, but are not limited to, small molecule inhibitors, peptides, and biologics. Vaccine development traditionally takes several years or more, and there are many viruses for which there is still no vaccine, even after decades of research. However, the goal of developing new vaccines to target threats such as human immunodeficiency virus (HIV), influenza, and ebolaviruses has resulted in a great deal of research into next-generation vaccine platforms (Delany et al., 2014; Antrobus et al., 2014; Meyer et al., 2018). Before this year, there were no approved vaccines for any of the human CoVs, including SARS-CoV and MERS-CoV, but during the current outbreak, scientists have been able to rapidly leverage previous platform work into prophylactic vaccines for SARS-CoV-2 (Merryn Voysey et al., 2020; Jackson et al., 2020; Polack et al., 2020). In addition to vaccines, viral infection can be limited by the use of other therapeutics with antiviral properties. These therapeutics can be broadly divided into two categories: (i) drugs that target viral proteins and (ii) drugs that target host proteins, including the viral receptor, cellular proteins essential to the viral life cycle, and components of the immune system. Currently, available viral therapeutics include protease inhibitors, reverse transcriptase and polymerase inhibitors (both nucleoside analogs and non-nucleoside inhibitors), integrase inhibitors, and viral entry blockers (Figure S1). Although there are no drugs approved by the United States Food and Drug Administration (FDA) specifically for SARS-CoV, MERS-CoV, or SARS-CoV-2, there have been a number of efforts to target proteins of the viral machinery, including the proteases, helicase, and RNA-dependent RNA polymerase (RdRP). A partial list of these published small molecules is shown in Table S2. A well-publicized example, remdesivir, is an RdRP inhibitor originally developed for hepatitis C that was also previously tested against both SARS-CoV and MERS-CoV. It should be noted that subsequent studies on its use in the treatment of COVID-19 have yielded conflicting results, with some suggesting that remdesivir may reduce the time to recovery in patients with SARS-CoV-2 (Wang et al., 2020a; Beigel et al., 2020), while others showed no appreciable effect (W.H.O. Solidarity Trial Consortium, 2020). For every virus, there are targets such as the entry protein or the viral polymerase that have direct effects on the ability of the virus to replicate, even under ideal conditions for the virus (e.g., in the absence of an immune system). There are also other targets that, if inhibited, would reduce the severity of the viral infection, often referred to as the viral virulence. Some of these targets may only have an effect when inhibitors are evaluated in the context of a whole organism. For example, several viruses paralyze the host immune system by hijacking or sequestering key signaling proteins, and targeting the viral proteins that mediate this effect would reduce the virulence of the virus. In this study, we targeted an array of SARS-CoV-2 proteins in addition to two human proteins, and we posited that inhibitors, which target proteins that affect the viability and/or the virulence, could synergistically combine for more efficacious combination therapies. In silico screening methods enabled us to identify lead drug candidates for SARS-CoV-2 in an expedited manner, fueled by the availability of high-resolution structures of many of the SARS-CoV-2 proteins. Previous work has shown that the number of compounds screened in a structure-based in silico screen affects the quality of the resulting hits, and the potency of the hits derived increases with the number of compounds screened (Lyu et al., 2019). A flurry of recent papers has demonstrated that by using in silico screening to expand the chemical space (screening ∼100 million compounds), one can identify picomolar affinity inhibitors (Lyu et al., 2019; Wan et al., 2020; Stein et al., 2020). To become capable of screening billions of molecules in a relatively short period (weeks), we previously developed VirtualFlow, an open-source computational drug discovery platform (Gorgulla et al., 2020a). VirtualFlow, with its linear scaling feature, leverages the power of computing clusters to perform ultra-large-scale screens. Here, we used the computational power of the Google Cloud Platform (GCP) to identify molecules that could inhibit SARS-CoV-2. We screened over one billion molecules against each of fifteen SARS-CoV-2 proteins and two human proteins, ACE2 and TMPRSS2. For some of the proteins, we targeted multiple functional sites on the protein surface; an approach which provides the following two distinct advantages: (i) the possibility of increasing efficacy using a cocktail approach and (ii) the identification of alternative ways to target the same protein, a necessity when faced with the possibility of resistance mutations. Whenever available, we used recently determined high-resolution structures of SARS-CoV-2 proteins in our screens. A complete list of the target proteins along with the models used can be found in Table 1.

Table 1

Overview of the performed virtual screens

Protein index	Protein name	Alternative names	Screen ID	Target site	Structure used
1	ACE2	Angiotensin-converting enzyme 2	1	Spike RDB binding region – site 1	6m17
			2	Spike RDB binding region – site 1	6m18
			3	Spike RDB binding region – site 2	6m17
			4	Dynamic pocket 1 besides spike RDB binding region	DE Shaw MD simulation 10875754 frame 2715
			5	Dynamic pocket 2 besides spike RDB binding region	DE Shaw MD simulation 10875754 frame 5273
2	TMPRSS2	Transmembrane protease serine 2	6	Active site	SWISS-MODEL of TMPRSS2 – model 1
3	Spike	S-protein, S	7	Spike RDB – ACE2 interface	6w41
3	Spike	S-protein, S	8	Spike HR1 domain	6lxt
4	ORF7a	Protein 7a	9	Blind docking	6w37
5	nsp3-macrodomain	Phosphatase, (macro) X domain	10	Active site	6w6y chain A (closed active site)
5	nsp3-macrodomain	Phosphatase, (macro) X domain	11	Active site	6w6y chain B (open active site)
6	nsp3-PL^pro	PL^pro, PLP, papain-like protease	12	Active site	6w9c
			13	Accessory pocket	6w9c
			14	DUB binding site	6w9c
			15	Active site and accessory pocket	6wx4∗
7	nsp5	M^pro, main protease	16	Active site	6lu7
			17	Active site	6m0k∗
			18	Dimerization site	6wqf
			19	α-helix 5 attachment site	Hybrid in-house model
8	nsp7	Replicase polyprotein 1ab	20	Blind docking (nsp8 PPI, nsp12 PPI)	6wiq
9	nsp8	Primase complex	21	nsp7 PPI	6wiq
9	nsp8	Primase complex	22	nsp12 PPI	7bv1
10	nsp9	Replicase	23	Dimerization interface – site 1	6w4b
10	nsp9	Replicase	24	Dimerization interface – site 2	6w4b
11	nsp10		25	nsp16 PPI	6w4h
			26	nsp16 PPI	SWISS-MODEL of nsp10/14 - model 3
			27	nsp14 PPI	SWISS-MODEL of nsp10/14 - model 3
12	nsp12	RNA-dependent RNA polymerase (RdRP)	28	RNA binding interface – site 1	7bv1
			29	RNA binding interface – site 2	7bv1
			30	Nucleotide binding site	7bv1
			31	nsp8 PPI	6m71
			32	nsp7/8 PPI	7bv1
13	nsp13	Helicase	33	Active site	SWISS-MODEL nsp13 – model 1
			34	RNA binding interface – site 1	SWISS-MODEL nsp13 – model 1
			35	RNA binding interface – site 2	SWISS-MODEL nsp13 – model 1
14	nsp14	Exoribonuclease, N7 methyltransferase, N7-MTase	36	nsp10 PPI	SWISS-MODEL of nsp10/14 – model 3
			37	Active site (ExoN)	SWISS-MODEL of nsp10/14 – model 3
			38	Active site (N7-MT)	SWISS-MODEL of nsp10/14 – model 3
15	nsp15	Endoribonuclease, XendoU	39	Active site	6w01
16	nsp16	2′-O-MTase, 2'-O methyltransferase	40	nsp10 PPI	6w4b
16	nsp16	2′-O-MTase, 2'-O methyltransferase	41	Active site (2'-O MT)	6w4b
17	nucleoprotein	N, NC, NP, RNP, ribonucleocapsid protein	42	NTD – RNA binding site	6yi3
			43	NTD – oligomerization site	6yi3
			44	CTD – dimerization interface	6wji
			45	CTD – oligomerization site	6wji

A total of 45 virtual screens were performed, involving 17 different target proteins and 40 unique target sites. In each screen, approximately 1 billion molecules from the Enamine REAL library and approximately 10 million compounds from the in-stock compounds of the ZINC 15 database were screened. 6wx4∗: The side chain rotamer of Leu162 in PDB 6wx4 was rotated to open up the end of the tunnel that forms the active site (for more details, please refer to the main text). 6m0k∗: The side chain rotamers of the residues Met49, Ser46, and Cys145 of the PDB 6m0k crystal structure were modified to give increased access to the binding pocket (for more details, please refer to the main text and Figure S12).

Overview of the performed virtual screens A total of 45 virtual screens were performed, involving 17 different target proteins and 40 unique target sites. In each screen, approximately 1 billion molecules from the Enamine REAL library and approximately 10 million compounds from the in-stock compounds of the ZINC 15 database were screened. 6wx4∗: The side chain rotamer of Leu162 in PDB 6wx4 was rotated to open up the end of the tunnel that forms the active site (for more details, please refer to the main text). 6m0k∗: The side chain rotamers of the residues Met49, Ser46, and Cys145 of the PDB 6m0k crystal structure were modified to give increased access to the binding pocket (for more details, please refer to the main text and Figure S12). When performing in silico screening, the probability of finding a true hit (a hit that is validated experimentally) increases with the number of molecules screened and the accuracy of the chosen structure relative to what exists in solution inside the cell. Small changes in structure or dynamics can be accommodated by flexible docking, but large conformational changes decrease the ability of in silico screening methods to identify potent binders. Large conformational changes can be tackled by ensemble docking methods (molecular dynamics simulations), but the trade-off for these approaches is additional computational time. Targeting protein-protein interactions (PPIs) is also challenging for in silico screening. Typically, with PPIs, the off-rate between the two interacting proteins is slow, and the interaction interface is large compared to a small molecule inhibitor. Thus, targeting PPIs was approached in this study with the rationale that a small molecule engaging the monomer at the hot spot of the PPI interface could prevent complex formation, particularly if driven by an excess of the small molecule. In the absence of a solved dimeric/multimeric structure, we made the assumption that the structure of the monomer is similar to that of the protein in complex, a fact which may not always hold true. In light of such caveats, we describe any potential limitations that we anticipate could affect the experimental outcome for each of the targets that were screened, as well as generally for the study as a whole.

Results

Preventing coronavirus entry

The spike protein forms the highly glycosylated trimeric receptor-binding protein that decorates the virion surface and facilitates entry into the host cell through interaction with its receptor ACE2 (Hoffmann et al., 2020). The Binding of the receptor-binding domain (RBD) of S to ACE2 induces large conformational changes in the S2 domain. These changes are followed by processing by host proteases such as TMPRSS2 and cathepsins and result in the formation of a stable six-helix bundle (6-HB) by heptad repeats 1 and 2 (HR1 and HR2) culminating in the fusion of the viral and cellular membranes (Huang et al., 2020). In order to develop compounds capable of preventing SARS-CoV-2 from entering cells, we focused on the following three targets: (a) the interaction interface of the RBD and ACE2, (b) the host protease TMPRSS2, and (c) the HR2 hydrophobic binding groove of HR1.

ACE2-spike-RBD interaction

Recent studies have revealed molecular details of ACE2 binding with spike-RBD, and multiple high-resolution structures of the RBD in complex with ACE2 have been released (including Protein Data Bank (PDB): 6m17, 6m0j, 6vw1) (Yan et al., 2020a; Lan et al., 2020; Shang et al., 2020a). As the primary mediator of host cell attachment, the spike trimer is a clear potential target for therapeutic intervention, and in order to find an effective inhibitor of viral entry into host cells, we targeted both ACE2 and the RBD of the spike in parallel.

Targeting the spike protein interaction interface on ACE2

The primary physiological function of the receptor, ACE2, is the maturation of angiotensin, a signaling peptide with cardiovascular regulatory effects (Donoghue et al., 2000), making any targeting of the catalytic site of ACE2 unfavorable. Fortunately, the structure of the ACE2-RBD complex shows that the peptidase (PD) domain of ACE2 has two distinct lobes that are essential for substrate engagement, and while the RBD-ACE2 binding interface includes one of these lobes, it does not obstruct the PD active site (Yan et al., 2020a; Lan et al., 2020) (Figures 2 and S3–S6). This suggests that targeting the RBD interaction interface should not inhibit ACE2 functionality and therefore not affect angiotensin homeostasis. In fact, previous biochemical studies with SARS-CoV have shown that the ACE2 inhibitor, MLN-4760, which favors the closed state, does not interfere with S protein binding and vice versa, also arguing for the independence of ACE2 function and spike binding (Li et al., 2005).

Figure 2

The ACE2 receptor and an example compound from the top 0.0001% of screened compounds bound at the spike interaction interface (site1)

(A) The target protein ACE2 (gold) bound to the RBD of the spike protein (magenta) and an example compound (light pink) from the virtual screen bound to the spike interaction interface (site 1, around Glu37) (screen ID: 1).

(B) The electrostatic surface of the target protein (ACE2) bound to the RBD domain of the spike protein (magenta) and an example compound (light pink).

(D) Receptor residues within 4 Å of the ligand.

(E) Distribution of the docking scores of the top 100 virtual screening hits.

The ACE2 receptor and an example compound from the top 0.0001% of screened compounds bound at the spike interaction interface (site1) (A) The target protein ACE2 (gold) bound to the RBD of the spike protein (magenta) and an example compound (light pink) from the virtual screen bound to the spike interaction interface (site 1, around Glu37) (screen ID: 1). (B) The electrostatic surface of the target protein (ACE2) bound to the RBD domain of the spike protein (magenta) and an example compound (light pink). (C) An overview of the interactions between the ligand and the receptor structure. (D) Receptor residues within 4 Å of the ligand. (E) Distribution of the docking scores of the top 100 virtual screening hits. We have targeted four different sites on the ACE2 surface. Two of them are located on the spike interaction interface and would directly disrupt the ACE2-spike interaction (Figures 2, S3, and S4). For the first of these two sites (centered around Glu37), two independent virtual screens were carried out (screen IDs: 1 and 2): one using an ACE2 structure where the spike protein was bound (PDB: 6m17; screen ID: 1) and one using an ACE2 structure to which the RBD domain of the spike protein was bound (PDB: 6m18; screen ID: 2). The difference between these two conformations at the site of binding is relatively small (Root-mean-square deviation [RMSD]: 0.355 Å in the region of the first two N-terminal α-helices), yet still significant for structure-based virtual screening. The second site that we targeted on the spike binding interface (screen ID: 3) is centered around residue Gly354 and was chosen because there is a shallow hydrophobic pocket. The other two sites targeted were two dynamic pockets adjacent to the spike interface on ACE2, potentially capable of accommodating tighter binders (screen IDs: 4 and 5; Figures S5 and S6). In addition, compounds that bind to the two dynamic pockets could be chemically linked to each other in such a way that the linker, which would then pass over the spike interaction interface, could potentially block the binding of spike protein. In these latter two screens, we used conformations of ACE2 derived from molecular dynamics (MD) simulations conducted by D.E. Shaw Research (Shaw Research, 2020). In these MD simulations, ACE2 samples an open state, revealing the two dynamic/cryptic pockets that we targeted. These conformations have not been confirmed by experimental methods and thus are of an exploratory nature, and hits from these screens can be considered high-risk, high reward.

Targeting the ACE2 interaction interface on the spike RBD

In vitro measurements suggest that SARS-CoV-2 RBD binds to ACE2 with nanomolar affinity (Walls et al., 2020; Wrapp et al., 2020). Each PD domain of ACE2 can accommodate one RBD, and the interaction is mediated through polar interactions. Furthermore, it has previously been shown that the “up” conformation of the RBD is required for receptor binding and that two spike protein trimers can bind to a single ACE2 dimer (Yan et al., 2020a) (Figure S7). The RBD itself consists of a core containing an antiparallel β-sheet (β1 to β4 and β7), with three short α-helices and an extended loop. This extended loop engages the α1 helix of the ACE2-PD, and it is the center of this interface on the RBD that we have targeted by virtual screening (screen ID: 7) (Figure S7). The spike interaction interface on ACE2 is relatively flat and polar, which makes it challenging to identify molecules that bind to it with high affinity. This difficulty is reflected in the docking scores resulting from the virtual screen of this binding interface: the average docking score of the top 100 virtual hit compounds from this screen was approximately −8 kcal/mol, which is among the weakest docking score averages of all the target sites screened (Figure S8). The SARS-CoV-2 RBD has a significantly higher affinity for ACE2 as compared to that of the SARS-CoV RBD (Shang et al., 2020a; Wrapp et al., 2020); however, the full-length spike protein from SARS-CoV-2 shows an either similar or weaker affinity for ACE2 (Walls et al., 2020; Shang et al., 2020b) when compared to that of the full-length SARS-CoV spike. This contrast in binding affinities between the RBD alone and the full-length spike protein could be a consequence of RBD dynamics where the protein switches between RBD “up” and “down” states, as is also seen in the case of SARS-CoV (Yuan et al., 2017; Gui et al., 2016). There is limited information available as to whether the SARS-CoV-2 spike favors a “lying down” binding inactive state or a “standing up” state capable of receptor binding (Walls et al., 2020; Wrapp et al., 2020). Furthermore, while structural studies have looked at conformational flexibility in the spike protein (Turoňová et al., 2020; Ke et al., 2020), there is little information on the inherent dynamics within the RBD domain itself, which is where the targeted molecules are expected to bind. This means that it is possible that local protein dynamics could adversely affect the binding of some of the hits in ways that the screen is unable to predict. It should be noted that while targeting the flat surface of the spike-binding interface with ACE2 (screen IDs: 1-3) resulted in compounds with an average binding energy of −8.3 kcal/mol, targeting the dynamic pockets resulted in hits with the potential for higher affinity binding. The binding energy of hits in the dynamic pockets was as high as −10.3 kcal/mol. However, experimental validation of these hits would be required to confirm this apparent difference in affinity.

TMPRSS2

The essential priming of S during entry can be executed by the host serine protease TMPRSS2 in the case of SARS-CoV-2, making it a potential therapeutic target. In addition, recent research has shown that the TMPRSS2 inhibitor, camostat mesylate, can block viral entry in cell-based assays (Hoffmann et al., 2020). Effective inhibition of TMPRSS2 could have multiple applications since processing of ACE2 by TMPRSS2 has also been shown to be crucial for activation of S in SARS-CoV (Heurich et al., 2014). Although TMPRSS2 has been shown to activate protease-activated receptor 2 and hepatocyte growth factor, giving it possible roles in tumor metastasis and epithelial-mesenchymal transition (Lucas et al., 2014), the exact physiological function of this protein is still unknown. TMPRSS2-deficient mice showed no significant phenotypic changes (Kim et al., 2006), and thus, TMPRSS2 is considered to be dispensable for normal cellular function, making a TMPRSS2-specific inhibitor less likely to have side effects. TMPRSS2 itself contains a cytoplasmic domain, a transmembrane domain, an extracellular low-density lipoprotein receptor domain, a scavenger receptor cysteine-rich domain, and a C-terminal serine protease domain. The TMPRSS2 precursor consists of 492 amino acids and auto-catalytic cleavage at Arg255 generates the active protease (Afar et al., 2001). As a chymotrypsinogen-like protease, TMPRSS2 contains a catalytic triad consisting of residues His296, Asp345, and Ser441. As there are currently no experimental structures available for TMPRSS2, we used a homology model from SWISS-MODEL that is based on the human serine protease hepsin (PDB: 5ce1), which shares 33.82% sequence identity with TMPRSS2 (O15393, n.d.). Approximately 1 billion compounds were screened against the active site of this homology model (Figure S9). Since a homology model was used to screen for inhibitors, success would depend on how closely the model represents the actual structure. The active site of TMPRSS2 is well conserved when compared to the template, the serine protease hepsin, and also contains a catalytic triad consisting of Ser, His, and Asp residues (Figure S10). The residues around the binding pocket are also well conserved, making it more likely that the predicted structure is sufficiently similar to the true structure in the targeted region around the active site binding pocket (screen ID: 6). Our top 10 hits had an average binding energy of −11.3 kcal/mol, which should ideally correspond to nanomolar binding affinities. It must be noted that experiments with the TMPRSS2 inhibitor camostat mesylate in cultured cells showed that complete inhibition of viral entry is achieved only in conjunction with an inhibitor of cathepsins B and L (CatB/L), E-64d (Hoffmann et al., 2020); however, the impact of this residual priming of S by CatB/L on viral pathogenesis in vivo is yet to be determined.

HR1-HR2 interface

While the S1 subunit of the spike protein is crucial for host cell attachment, the S2 subunit plays an essential role in membrane fusion and subsequent entry into the host cell. Heptad repeats HR1 (residues 912–984) and HR2 (residues 1163–1213) in S2 interact to form the classic 6-HB of a class I entry protein (Xia et al., 2020a; White et al., 2008). The HR domains of the spike protein and their mode of interaction are known to be highly conserved across CoVs, making them an attractive target for the development of pan-CoV fusion inhibitors (Liu et al., 2004). The central trimeric coiled coil of HR1 has three hydrophobic grooves (Figure S11) that are proposed to be potential drug binding sites for inhibition of viral fusion. The recent crystal structure of HR1-L6-HR2 (PDB: 6lxt) displays a 6-HB structure with the HR1 domains forming a parallel trimeric coiled coil around the antiparallel HR2 domain. The hydrophobic groove formed by the HR1 helices is also the binding interface for the HR2 domain (Xia et al., 2020b). The hydrophobic interaction between HR1 and HR2 is in the area of the fusion core, and previous studies have shown that both HR1 and HR2 components of this fusion core interface could be effectively targeted using complementary peptides in SARS-CoV-2 and MERS-CoV (Xia et al., 2020a; Lu et al., 2014). We performed a blind docking using QuickVina W against the entire surface of the trimeric alpha helix (screen ID: 8; Figure S11). The resulting hits could not only potentially have high inhibitory activity against multiples CoVs but also have both prophylactic and therapeutic applications, as has been previously observed in the case of a peptide inhibitor (Xia et al., 2019). Ligands binding to each of the three different hydrophobic grooves could be identified and linked together in order to enhance the overall affinity and thereby increase the potency.

Preventing coronavirus replication and curtailing viral load

Once CoV enters the cell, translation of the replicase gene gives rise to two large polyproteins, PP1a and PP1ab, composed of nsp1-11 and nsp1-16, respectively. Mpro and PLpro cleave these polyproteins into individual nsps, which facilitate genomic replication and transcription of sub-genomic RNAs. In order to develop small molecules capable of inhibiting replication, we targeted the following five proteins and protein complexes, selected based on the availability of high-resolution structures and available information about their molecular mechanisms: (a) the CoV primary protease (Mpro, nsp5), (b) the papain-like protease (PLpro, nsp3), (c) the RNA replication complex, composed of the RNA-dependent RNA polymerase (RdRp) (nsp12) and its co-factors nsp7 and nsp8, (d) the helicase (nsp13), and (e) the ssRNA-binding protein nsp9.

Mpro (3CLpro)

Mpro (3CLpro), the main protease of CoV, cleaves PP1a and PP1ab into many of their constituent nsps (11 cleavage sites in PP1ab). The inhibition of Mpro would not only inhibit the protease itself but also hinder downstream processes by preventing the production of key viral proteins through inhibition of their proteolytic processing. Proteolytic cleavage by Mpro via the catalytic dyad (Cys-His) occurs at the P1 glutamine residue in the cleavage motif (Anand, 2003). Dimerization is thought to create a substrate binding cleft (Shi et al., 2008), and simulations suggest that only one monomer is active at a time (Chen et al., 2006). Mpro is highly conserved not only in SARS-CoV and SARS-CoV-2 but also among other viruses of the Nidovirales order (enveloped, positive-sense, single-stranded RNA viruses that produce nested 3′-co-terminal genomic RNAs), making it a potential target for the development of antiviral drugs with broad efficacy (Nukoolkarn et al., 2008). Using the same strategies that allowed crystallization of SARS-CoV Mpro (Yang et al., 2003), several groups determined the three-dimensional structure of SARS-CoV-2 Mpro in its apo form or bound to various inhibitors (Zhang et al., 2020; Jin et al., 2020a; Dai et al., 2020; Jin et al., 2020b; Su et al., 2020; and deposited only, PDB: 6y84). These structures revealed that the N-terminus of SARS-CoV-2 Mpro contains two domains (domain I, residues 10–99; domain II, residues 100–184) that adopt a chymotrypsin-like fold, whereas the Mpro C-terminal domain III (residues 201–303) is composed of five α-helices and is responsible for the intermolecular interactions critical for the dimerization. The protease active site of the picornavirus 3C-protease-like domain is located at the cleft formed between domains I and II and contains a Cys145-His41 catalytic dyad. A total of five protein segments comprising residues 25–27 and 44–50 from domain I and residues 140–143, 165–168, and 188–190 from domain II form the walls of the active site (Figure S12a). It is important to note that the Mpro active site seems to have significant conformational flexibility (Bzówka et al., 2020). Comparison of three apo structures and five structures in complex with inhibitors reveals differences in the conformation and position of the Gln189-containing loop and the short Ser46-containing α-helix (Figure S12b). Specifically, the cleft between domains I and II is narrower in the apo structures than in the structures of Mpro bound to inhibitors, consistent with an induced fit or conformational selection upon ligand binding. Given the plasticity of the active site, we used two different conformations of Mpro for in silico screening (screen IDs: 16 and 17; Figures S13 and S14). The first conformation corresponded to the structure of Mpro bound to the peptide-like inhibitor N3 (Jin et al., 2020a). For this search, the protein model used was identical to the protein structure reported in the PDB file, after removal of all water molecules and bound ligands (screen ID: 16; Figure S12c). The second conformation corresponded to the structure of Mpro bound to inhibitor 11b, an inhibitor with peptide-like features (Dai et al., 2020). However, the model used for screening differed slightly from the conformation reported in the PDB file. In the crystal structure, the side chain of Met49 projects into the active site and thus would potentially interfere with compound docking. To avoid this, a different Met49 side chain rotamer was used that did not project as much into the active site. This change resulted in steric clashes with the side chain of Ser46, for which selection of a different side chain rotamer resolved these incompatibilities. Finally, a rotamer where the side chain points toward the core of the protein was selected for Cys145. This rotamer was selected to allow the possible docking of compounds that could form covalent bonds with the Cys145 thiol group (Figure S12d). Screens against these two conformations, hereafter referred to as Mpro-N3 (screen ID: 16) and 6m0k∗ (screen ID: 17), respectively (Figure S12), identified compounds that docked with high binding energies. Interestingly, the 6m0k∗ conformation resulted in predicted binding free energies that were approximately 0.5 kcal/mol lower than the binding free energies (docking scores) obtained from screening the Mpro-N3 conformation. Moreover, among the top 1000 hits identified by screening each of the two conformations (with the application of the standard filtering criteria of logP less than 6, molecular weight less than 600 daltons, and no reactive groups), only 12 hits overlapped. These results illustrate how screening closely related protein conformations can lead to the identification of a diverse set of docked compounds, and this may be a generalizable strategy for in silico screening of proteins that exhibit conformational flexibility, such as SARS-CoV-2 Mpro (Bzówka et al., 2020). Since the dimerization of Mpro is known to be essential for its catalytic activity (Xia and Kang, 2011; Kuo et al., 2004; Hsu et al., 2005), we also targeted the dimerization interface, which relies primarily on the α1 helix of the C-terminal domain (screen ID: 18). Previous studies have shown that the inter-conversion of Mpro from monomer to dimer is mediated by an order-to-disorder transition of α5, and this aids domain swapping without exposure of the protein's hydrophobic core (Kang et al., 2012). Since the C-terminal region of Mpro is fully conserved between SARS-CoV and SARS-CoV-2, we used an NMR ensemble of the SARS-CoV Mpro C-terminal domain (PDB: 2liz) to build a hybrid model of a disordered C-terminus. We subsequently used this model and screened against the attachment site of the α5 helix (screen ID: 19). Potential binders at this position would be expected to trap the protein in its inactive monomeric form by preventing dimerization.

PLpro

SARS-CoV-2 nsp3 is a multi-domain protein containing an N-terminal ubiquitin-like fold, a glutamic-acid-rich acidic domain, a phosphatase domain, a SARS unique domain, a catalytically active PLpro domain, a marker domain (G2M), and two transmembrane domains (Báez-Santos et al., 2015). In addition to the protease activity, PLpro also possesses deubiquitination and deISGylation activities that aid in the disruption of the interferon regulatory factor 3 pathway and host innate immune responses (Barretto et al., 2005). The central α-helical “thumb” domain of PLpro harbors the catalytic Cys112, while the C-terminal domain is mostly β-sheet and includes the “palm” domain, which harbors the other two residues of the catalytic triad, His275 and Asp287, and the “fingers”, which coordinate a Zn2+ ion (PDB: 6w9c). PLpro also has two blocking loops (BL1 and BL2), targeting of which could prevent the substrate from entering the active site. Comparison of the crystal structures of PLpro in its unbound and peptide inhibitor-bound states (PDBs: 6w9c and 6wx4, respectively) revealed conformational changes in both the “finger” domain, as well as the BL2 loop. We therefore targeted the active site in the unliganded and the bound states, which correspond to the open and closed conformations of the active site (screen IDs: 12 and 15, respectively; Figures S15 and S16). The side chain rotamer of Leu162 of structure 6wx4 was rotated to open one end of the tunnel that forms the active site. Without this modification, the end of the tunnel would not be accessible with the docking programs we used. However, the co-crystal structure 6wx4 has a ligand extending through that tunnel, thus justifying the possibility of a small molecule passing through the tunnel. In addition to this, we also targeted the accessory cleft leading to the active site, which would prevent substrate access (screen ID: 13; Figure S17). In addition to the three sites that are described here, the ubiquitin contact site in PLpro, which is critical for the deubiquitination and deISGylation functions, was also targeted and is described in detail in a later section.

Replication complex: nsp7, 8, and 12

nsp7, nsp8, and nsp12 form a minimal 160-kDa complex capable of nucleotide polymerization with additional nsps playing important roles in RNA modification (Lehmann et al., 2015; Sevajol et al., 194). nsp12 encodes for the RdRp, while nsp7 and nsp8 are cofactors that may form a hexadecameric supercomplex (Smith et al., 2013). When compared to the SARS-CoV replicase, key functional residues are found to be fully conserved in SARS-CoV-2; however, substitutions have been found on the surface of the complex. The polymerase domain of nsp12 has an N-terminal nidovirus-unique N-terminal extension in addition to the “finger”, “palm”, and “thumb” domains common to polymerases (Gao et al., 2020) (Figures S18–S22). The seven catalytic motifs found in other known viral polymerases and which are involved in template and nucleotide binding and in catalysis are also well conserved in the SARS-CoV-2 polymerase (Gao et al., 2020; Kirchdoerfer and Ward, 2019). The SARS-CoV-2 RNA synthesis machinery is mediated largely by PPIs, and this provides us with an opportunity to target these interfaces using small molecules. The C-terminal head domain of nsp8 wraps around the nsp7 helical bundle, forming a hexadecameric complex. nsp8 has been suggested to possess RdRp activity of its own and produce the primers utilized by the primer-dependent nsp12 RdRp. This nsp12 activity requires the formation of a large oligomeric complex that brings active site residues into proximity with each other (Imbert et al., 2006; te Velthuis et al., 2011). A previously determined crystal structure (PDB: 2ahm) shows nsp7 and nsp8 capable of interacting in at least two ways: (i) residues on the C-terminal end of the nsp8 shaft form a hydrophobic core with residues in helix 1 of nsp7, and (ii) helix 1 and the short helix of nsp7 interact with one of the helices of the nsp8 head domain (Figure S23). The heterodimer formed by nsp7-nsp8 interaction binds to nsp12 on the polymerase “thumb” domain facing the nucleotide triphosphate (NTP) entry channel, thus placing the nsp12 polymerase “index finger” loop between nsp7-nsp8 and the polymerase “thumb” domain (Figure S23). In addition to this interaction, a second nsp8 monomer binds to the nsp12 interface domain in the vicinity of Leu117. We targeted the nsp12 binding interface on nsp8 (screen ID: 22; Figure S24), the nsp7, binding interface on nsp8 (screen ID: 21; Figure S25), the entire alpha-helical surface of nsp7, which includes the nsp8 and nsp12 binding interfaces (screen ID: 20; Figure S23), the nsp8 binding interface on nsp12 (screen ID: 31; Figure S18), two sites in the RNA binding region on nsp12 (screen IDs: 28 and 29; Figures S19 and S21), the nucleotide binding site (screen ID: 30; Figure S20), and the nsp7 binding interface on nsp12 (screen ID: 32; Figure S22). The nsp7 binding interface on nsp12 is relatively shallow, and to a lesser degree, the RNA binding sites are as well. One of the principal challenges of targeting an RNA binding interface is the strong avidity effect, frequently in the nanomolar range, that restricts the ability of ligands to displace the RNA. Moreover, biological processes involving RNA-protein interactions are also known to often be dynamics dependent. The average binding free energies of the top 100 hits from the screens of the RNA binding interfaces of nsp12 (screen IDs: 28 and 29) were −9.9 and −9.3 kcal/mol, whereas the docking score for the nucleotide binding site (screen ID: 30) was −9.7 kcal/mol. The screens targeting the PPIs (screen IDs: 31 and 32) resulted in hits with potential for much higher affinity binding with an average binding energy of −11.1 and −10.4 kcal/mol (Figure S8).

nsp13: helicase

nsp13, which is one of the few proteins that are fully conserved in SARS-CoV and SARS-CoV-2, has known activity as an RNA helicase, NTPase, dNTPase, RTPase, and DNA helicase (Ivanov et al., 2004). It is also known to interact with nsp12 (Adedeji et al., 2012). The helicase is 603 amino acids long, contains an N-terminal zinc binding domain, a stalk domain, RecA-like domains 1A, 1B, and 2A, and forms a triangular-shaped structure (Figure S26) (Jia et al., 2019). This protein has multiple potential target sites. Of these, we chose to target the active (ATP binding) site (screen ID: 33; Figure S26) and two sites on the RNA interaction interface (Screen IDs: 34 and 35: Figures S27 and 3).

Figure 3

The helicase (nsp13) and an example compound from the top 0.0001% of screened compounds bound at site 2 of the RNA binding interface

(A) The helicase (violet) bound to an example compound (light pink) at region 2 of the RNA binding interface. A docked DNA strand is shown in light gold (screen ID: 35).

(B) Electrostatic surface of the helicase to which an example compound (light pink) and a docked DNA strand (light gold) are bound.

(D) Residues within 4 Å of the inhibitor.

(E) Distribution of the docking scores of the top 100 virtual screening hits.

The helicase (nsp13) and an example compound from the top 0.0001% of screened compounds bound at site 2 of the RNA binding interface (A) The helicase (violet) bound to an example compound (light pink) at region 2 of the RNA binding interface. A docked DNA strand is shown in light gold (screen ID: 35). (B) Electrostatic surface of the helicase to which an example compound (light pink) and a docked DNA strand (light gold) are bound. (C) An overview of the interactions between the compound and the helicase structure. (D) Residues within 4 Å of the inhibitor. (E) Distribution of the docking scores of the top 100 virtual screening hits. We made use of a homology model derived from SWISS-MODEL repository (N2ngu3, n.d.) for these screens (Bienert et al., 2016); this model is expected to be reliable because the helicase of SARS-CoV-2 shares 99.8% sequence identity with the chosen template (PDB: 6jyt, structure of SARS-CoV nsp13). In fact, the only difference between the helicases of SARS-CoV and SARS-CoV-2 is a valine to isoleucine missense mutation in the C-terminus that is distant from our targeted sites. While this work was under review, additional structures of the helicase as a monomer and in a heteroligomeric complex with nsp12 and RNA were deposited (PDB: 6zsl; Yan et al., 2020b, PDB: 7xcm, 7xcn). The backbone RMSD of the newly deposited apo structure (PDB: 6zsl) when compared to the model we used is 1.46 Å. Previously reported hydrogen-deuterium exchange experiments suggest that the 1A and 2A domains of the helicase undergo a series of conformational changes during nucleotide binding and ATP hydrolysis (Jia et al., 2019). This potentially inherent flexibility could impact the plasticity of the screened binding pockets and therefore the reliability of the model screened. Our two virtual screens for the RNA binding interface (screen IDs: 34 and 35) resulted in top 100 hits with an average binding free energy of −10.2 and −10.7 kcal/mol, respectively, and the screen for the active site inhibitors (screen ID: 33) resulted in an average binding energy of −10.2 kcal/mol.

nsp9: ssRNA-binding protein

nsp9 is known to co-localize with nsp7, nsp8, and nsp10 within the replication complex and is presumed to play a role in RNA replication (Yang et al., 2014). A study on SARS-CoV nsp9 has shown that nsp9 is an ssRNA-binding protein indispensable for viral RNA replication in mouse models (Frieman et al., 2011). There are several high-resolution crystal structures available for nsp9 (Sutton et al., 2004; Hu et al., 2017), including those of the protein from SARS-CoV-2 (PDB: 6w4b and 6w9q). Since dimerization has been found to be essential for viral replication (Miknis et al., 2009), we chose to target the dimerization interface as a method of inhibiting viral replication. nsp9 contains seven antiparallel β-strands and an α-helix (Figure S28). The dimer interface is formed mainly by the parallel association of the C-terminal α-helices (Figure S29), and this surface, in particular the GXXXG motif, was targeted using VirtualFlow (screen ID: 23). We also targeted a shallow cavity around Phe75 that partially overlaps with the first targeted site centered on the α-helix (screen ID: 24). This second site interacts with the N-terminal RNA-binding domain (NTD) of the other nsp9 monomer and is more concave in shape than the first site, making it more pocket-like. The average docking score of the top 100 compounds of the second targeted site was −9.2 kcal/mol, which is a slight improvement over that of the first site with an average of −9.1 kcal/mol (Figure S8). The C-terminal α-helix, which is responsible for dimerization, has a high hydrophobic amino acid content, and dimer formation is mainly facilitated by hydrophobic interactions. Targeting a region with these characteristics could result in hits that are also hydrophobic and that interact in a non-specific manner, something which would need to be addressed during follow-up experimental validation. The interface also lacks deep pockets, making it additionally challenging to find specific, tight binding compounds. Finally, targeting the dimerization interface is made even more difficult because it is not clear if the protein adopts different conformations between its monomeric form, which is to what an inhibitor might bind, and its dimeric form, which is the form that was used for the screen.

Disrupting the ability of coronaviruses to evade or subvert host defenses

Initial studies on SARS-CoV-2 in ex vivo systems and on clinical samples suggest that infection produces lower levels of interferon and differential cytokine/chemokine profiles, even when compared to the closely related SARS-CoV (Daniel Blanco-Melo et al., 2020; Chu et al., 2020). This suggests that SARS-CoV-2 is able to escape, at least initial, host immune surveillance, and when combined with robust early replication (Chu et al., 2020; To et al., 2020), it allows the virus to establish a strong foothold in the host, forcing the immune system play catch-up. In light of this, targeting SARS-CoV-2 immune evasion factors, particularly in combination with other inhibitors, could have clinical benefits. CoVs have an RNA genome and during their life cycle produce RNA species normally absent in the host cell's cytoplasm, such as dsRNA and RNA with a 5′-triphosphate. Sensing these types of RNA intermediates is one way in which the host cell can detect viral infection and initiate an antiviral immune program (Barbalat et al., 2011). An effective way in which CoVs avoid recognition by these intrinsic immune sensors is by using methyltransferases to add a cap structure to viral RNA, thus mimicking the cell's own mRNA (Daffis et al., November 2010). CoVs also encode an endoribonuclease, which prevents the accumulation of viral RNA that could activate dsRNA sensors in the cell (Deng et al., 2018). The deubiquitination and deISGylation activities of PLpro further interfere with the host antiviral response by antagonizing the induction of type I interferon (IFN) pathways (Clementz et al., 2010). With the goal of preventing SARS-CoV-2 from being able to evade or subvert host defense mechanisms, we targeted the following four proteins: (a) the guanine-N7 methyltransferase/ExoN (nsp10-14) complex, (b) the 2′-O-ribose methyltransferase (nsp16), (c) the uridylate-specific endoribonuclease (NendoU/nsp15), and (d) the deubiquitination and deISGylation activities of PLpro.

Guanine-N7 methyltransferase/ExoN (nsp10-14 complex)

nsp14 is critical for viral replication and transcription, as it plays dual roles in proofreading and mRNA capping (Eckerle et al., 2007; Denison et al., 2011; Chen et al., 2009). Disruption of nsp14 exonuclease activity (Minskaia et al., 2006) has been shown to result in increased sensitivity of the virus to ribavirin (RBV) and 5-fluorouracil (Smith et al., 2013), demonstrating the importance of this 3′-mismatched dsRNA excision activity (Bouvet et al., 2012). The ExoN activity relies on heterodimer formation of nsp14 with nsp10, and disruption of this interaction has been demonstrated to result in lowered replication fidelity (Smith et al., 2015). Unlike many other RNA viruses, CoVs have a low mutation rate by virtue of this ExoN activity, which supported the expansion of CoV genomes (Ogando et al., 2019) and is directly linked to its virulence (Graham et al., 2012). Previous work with SARS-CoV has suggested that the limited efficacy of RBV in vivo may be due to excision by this ExoN activity (Ferron et al., 2018), and work in mouse hepatitis virus (MHV) has shown that a mutant strain lacking ExoN activity was more sensitive to remdesivir (Agostini et al., 2018). These data suggest that use of a nucleoside analog might be more effective if combined with an ExoN inhibitor (Ogando et al., 2019). nsp14 also functions as an S-adenosyl-methionine (SAM)-dependent guanine-N7 methyltransferase (N7-MTase) (Chen et al., 2009). Mutation studies in a replicon system have shown that this N7-MTase activity is critical for viral replication and transcription (Chen et al., 2009). This activity is used to add a cap 0 structure at the 5′ end of viral mRNA, thus mimicking a defining structural feature of the host mRNAs and assisting translation (Joseph et al., 1997), as well as evasion of host defenses (Decroly et al., 2011a). For nsp14, we targeted three sites: (i) the active site of the ExoN domain (screen ID: 37; Figure S30), (ii) the active site of the methyltransferase domain (screen ID: 38; Figure S31), and (iii) the PPI interface with nsp10 (screen ID: 36; Figure S32). In addition to this, we also targeted the nsp14 interaction interface on nsp10 (screen ID: 27; Figure S33). In the bimodular nsp14, the N-terminal domain of nsp14 up to Lys288 interacts solely with nsp10 and forms the ExoN domain, whereas the C-terminal domain possesses the N7-MTase activity. The ExoN domain is composed of a twisted parallel β-sheet made up of five β-strands bordered by α-helices on both sides (Figures S30–32) (Ma et al., 2015). The N7-MTase domain of nsp14 is made up of five β-strands, and the ligand binding pocket is situated between the β1 and β2 sheets (Ma et al., 2015). At the time of screening, no experimental structures were yet available for the SARS-CoV-2 nsp10-14 complex, so we used a high confidence homology model available in the SWISS MODEL repository (Bienert et al., 2016) constructed from the SARS-CoV ExoN/N7-MTase structure (PDB: 5c8s) (7bgwuu, n.d.) for our docking experiments. While nsp10 is 97% conserved between the two viruses, nsp14 has a slightly lower 95.1% identity. The nsp10 interaction site on nsp14 has a fairly hydrophobic shallow pocket, resulting in relatively high docking scores for the top scoring compounds. However, due to the sparsity of polar atoms at this site, it is challenging to find compounds that form sufficient electrostatic interactions to provide the compounds with specificity. By comparison, the ExoN active site is more polar in nature, forming a wide shallow pocket. The polarity should lead to a larger proportion of hits that bind with high specificity, but the absence of a deep pocket could affect the binding affinity. The third target site, the active site of the N7-MTase domain of nsp14, also harbors a relatively large hydrophobic cavity but still has polar elements. The suitability of each of these nsp14 pockets to accommodate ligands is reflected in their mean docking scores (for the top 100 hits for each target site): −10.4 kcal/mol for the PPI site, −13.4 kcal/mol for the N7-MTase active site, and −9.8 kcal/mol for the ExoN active site.

2′-O-ribose methyltransferase (nsp16)

RNA cap modifications are known to play a role in the host cell's identification of self-RNA. For example, foreign RNA which lacks 2′-O methylation is inhibited by IFIT1 (Hyde et al., 2014). Pathogenic viruses such as CoVs that replicate in the cytoplasm have developed tactics to escape recognition by the host innate immune system. One such coronaviral mechanism is 2′-O-methyl capping of viral RNA by the nsp10-nsp16 complex. nsp16 is an m7GpppA-specific, SAM-dependent, 2′-O-MTase that must form a complex with nsp10 for its function (Decroly et al., 2011b; Chen and Guo, 2016). We therefore targeted the nsp10 binding interface on nsp16 (Figure S34). nsp16 has a catalytic core comprising a β-sheet flanked by eleven α-helices (Figures S34 and S35). nsp10 has a small antiparallel β-sheet sandwiched between several α-helices and a large loop. The interaction surface of nsp10 with nsp16 is a mixture of hydrophobic, polar, and positively charged residues and helps in the stabilization of the SAM binding site (PDB: 6w75). However, since the interface is relatively flat and polar, it is challenging to find small molecules that bind with sufficient strength. Earlier structural studies showed that stabilization of the SAM binding pocket by the nsp10-16 interaction also expands the RNA-binding pocket of nsp16 (Chen et al., 2011), and short peptides derived from the interaction interface have been shown to inhibit 2′-O-methyltransferase activity of the nsp10-16 complex (Min et al., 2012). Therefore, in addition to targeting the interaction interface of nsp10 and nsp16 (screen ID: 40, Figure S34), we also targeted the putative SAM binding site (screen ID: 41, Figure S35). SAM binding has been shown to be essential for nsp10-16 complex formation and catalytic function in another highly pathogenic CoV, MERS-CoV (Aouadi et al., 2016). The screen targeting the nsp10 PPI and the SAM binding site resulted in top 100 hits with average docking scores of −9.6 kcal/mol and −11.4 kcal/mol, respectively. Besides these target sites on nsp16, we targeted the nsp16 interaction interface on nsp10 using two conformations of the protein (screen IDs: 25 and 26; Figures S36 and S37).

Uridylate-specific endoribonuclease

nsp15 is a uridylate-specific endoribonuclease (NendoU) carrying a C-terminal catalytic EndoU domain that has been described as having different roles in immune evasion in different coronviruses. A recent study in MHV, a murine CoV, showed that nsp15 was critical for evasion from host dsRNA sensors in macrophages (Deng et al., 2017), and in porcine deltacoronavirus, nsp15 was found to inhibit IFN-β production (Liu et al., 2019). The SARS-CoV-2 nsp15 monomer is composed of three distinct domains: (a) an N-terminal domain containing an antiparallel β-sheet and two α-helices, (b) a central domain composed of 10 β-strands and three short helices, and (c) a C-terminal NendoU domain composed of two antiparallel β-sheets and five α-helices (PDB: 6vww) (Kim et al., 2020). The SARS-CoV-2 nsp15 monomer is capable of forming dimeric, trimeric, and double-ring hexameric assemblies. It has been shown that the hexameric form is essential for enzymatic activity and that the hexamer is stabilized by interactions of the N-terminal oligomerization domain (Guarino et al., 2005). The NendoU active site itself is located in a groove between the two β-sheets, and conserved, crucial residues have been previously identified: His235, His250, Lys290, Thr341, Tyr343, and Ser294. With the aim of disrupting the endoribonuclease activity of nsp15, the active site was targeted in an in silico screen (screen ID: 39; Figure S38), and this resulted in a top 100 hits with an average docking score of −10.9 kcal/mol (Figure S8), which theoretically corresponds to nanomolar inhibitors; however, experimental validation is necessary to confirm this.

Deubiquitination and deISGylation activities of PLpro

PLpro (nsp3) cleaves at a consensus cleavage motif, LXGG, which is also the consensus sequence recognized by cellular deubiquitinating enzymes (DUBs). CoVs attenuate the host anti-viral response by deubiquinating key components of the interferon-mediated immune response. While PLpro uses its catalytic site for its deubiquitination function, ubiquitin itself makes contacts with the “palm” and “fingers” regions of PLpro (Ratia et al., 2014). This protein-protein interface is well conserved in SARS-CoV-2 and could be targeted in order to inhibit the deubiquitination function of nsp3. SARS-CoV has also been shown to possess deISGylation activity in which ISG-15, a protein modifier consisting of two tandem ubiquitin-like domains involved in modulating the innate immune response, is cleaved (Lindner et al., 2007). We therefore targeted the proximal “S1” ubiquitin contact site of the “palm” of nsp3 (Figure S39), inhibition of which could effectively block both the deubiquitination, as well as deISGylation, activities of PLpro (screen ID: 14). The binding interface is moderately hydrophobic and concave, making it an attractive drug target. This particular screen resulted in the top 100 ranking compounds exhibiting an average docking score of −9.2 kcal/mol (Figure S8).

nsp3 macrodomain

The macrodomain, sometimes called the X domain, is a highly conserved region of ∼180 amino acids that binds to ADP ribose (Karras et al., 2005) and which has been shown to be dispensable in RNA replication, at least in the context of an RNA replicon (Kusov et al., 2015). However, it has also been shown to have possible roles in evading the host innate immune response (Kuri et al., 2011; Eriksson et al., 2008). Viral macrodomains are known to have two major enzymatic activities: (i) ADP-ribose 1″-phosphatase activity where the phosphate is removed from ADP-ribose 1″-phosphate, and (ii) ADP-ribose hydrolase activity where ADP ribose is removed from mono- or poly-ADP-ribosylated proteins (Li et al., 2016). Previous biochemical studies showed that the nsp3 macrodomain is responsible for processing of ADP-ribose 1″-phosphate, which is a byproduct of pre-tRNA splicing (Snijder et al., 2003; Putics et al., 2005). This phosphatase activity has been found to be specific for ADP-ribose 1″-phosphate; however, the turnover was found to be very low (Putics et al., 2006; Singh Saikatendu et al., Joseph, 2005) with a k of 1.7–20 min−1, making it unlikely that this activity has a strong direct effect on viral virulence. ADP-ribosylation is a critical post-translational modification catalyzed mainly by poly (ADP-ribose) polymerases (PARPs) in which ADP-ribose is added to a protein (Michael, 2015). Several PARPs are known to be interferon-stimulated genes, and some PARPs have been shown to have antiviral activities. Studies in SARS-CoV and MERS-CoV have shown that these viruses were less virulent in the absence of the macrodomain and that this directly associated with changes in pro-inflammatory cytokine expression (Fehr et al., 2016; Fehr et al., 2014). Additional studies in HCoV-299E and SARS-CoV suggested that the nsp3 macrodomain may mediate the resistance to antiviral interferon responses (Kuri et al., 2011). Mutation of the macrodomain in SARS-CoV also induced a strong interferon and inflammatory cytokine response in the lungs of infected mice (Fehr et al., 2016). However, the exact mechanism linking the hydrolase/phosphatase activity of nsp3 and the observed cytokine production is unknown. The fact that (a) the macrodomain participates in reversing antiviral ADP ribosylation (Li et al., 2016; Fehr et al., July 2018) and (b) mutation of a highly conserved asparagine residue abolishes phosphatase and hydrolase activites and mitigates viral virulence (Fehr et al., 2016; McPherson et al., 2017) suggests that targeting the ADP-ribose binding pocket could be an effective therapeutic strategy, particularly in combination with other targets. We observed two different conformations for the active site in the same X-ray structure, which indicated protein dynamics that could affect the ability of the docking to accurately recapitulate the solution-state structure. We, therefore, targeted both conformations of the active site of the macrodomain: one in which the active site is more closed (screen ID: 10; Figure 4) and a second conformation in which the active site is more open (screen ID: 11; Figure S40). It is also important to note that the active site itself is very hydrophobic, which increases the risk that the resulting hits could bind non-specifically to membranes and hydrophobic pockets in other proteins, and this should be accounted for in any follow-up screening. The top 100 ranking compounds from screens 10 and 11 have average docking scores of −10.8 kcal/mol and −12.7 kcal/mol, respectively (Figure S8).

Figure 4

The phosphatase (closed conformation) and an example compound from the top 0.0001% of screened compounds bound at the enzymatic active site

(A) The phosphatase (violet), which is part of nsp3, and an example compound (light pink) from the virtual screen bound to the active site (closed conformation) (screen ID: 10).

(B) Electrostatic surface of the target protein to which an example compound (light pink) is bound.

(D) Residues within 4 Å of the inhibitor.

(E) Distribution of the docking scores of the top 100 virtual screening hits.

The phosphatase (closed conformation) and an example compound from the top 0.0001% of screened compounds bound at the enzymatic active site (A) The phosphatase (violet), which is part of nsp3, and an example compound (light pink) from the virtual screen bound to the active site (closed conformation) (screen ID: 10). (B) Electrostatic surface of the target protein to which an example compound (light pink) is bound. (C) An overview of the interactions between the inhibitor and the phosphatase structure. (D) Residues within 4 Å of the inhibitor. (E) Distribution of the docking scores of the top 100 virtual screening hits.

ORF7a

ORF7a is a non-essential accessory protein with a transmembrane helix at the C-terminus that has been shown to localize to the ER, Golgi, and cell surface (Nelson et al., 2005; Taylor et al., 2015). The assembly of ORF7a into viral particles suggests that the protein is important in the viral replication cycle and that it might have a function early in infection (Huang et al., 2006a). In SARS-CoV, ORF7a is known to interact with a “host restriction factor”, bone marrow stromal antigen 2 (BST-2) (Taylor et al., 2015), which inhibits viral replication by preventing virus budding from the plasma membrane (Sauter et al., 2010). ORF7a has been shown to localize to the ER when co-expressed with BST-2. Moreover, ORF7a interferes with the glycosylation of BST-2, which is essential for its tethering/restriction function (Taylor et al., 2015). There are also indications that ORF7a may play a role in virus-induced apoptosis (Schaecher et al., 2007); however, further studies are needed to understand the mechanism of this role. Structurally, ORF7a has an N-terminal signal peptide, an 80-amino-acid-long luminal domain, a transmembrane domain, and a short C-terminal cytoplasmic tail (Nelson et al., 2005). In the recent experimental structure of the SARS-CoV-2 ORF7a (PDB: 6w37 (Nelson et al., 2020)), the luminal domain displays structural characteristics similar to those of previously published ORF7a structures and is composed of a seven-stranded β-sandwich (Figure S41) with topological similarities to the Ig superfamily. Intriguingly, recent genomic analysis of a sample picked up during sentinel surveillance in the state of Arizona (USA) revealed a 27-amino-acid in-frame deletion in ORF7a, in a region corresponding to the N-terminal signal peptide (Holland et al., 2020). The functional consequences of this deletion, however, are not known. Since the molecular mechanism of the ORF7a interaction is largely unknown, we pursued a blind docking strategy for this protein, covering the entire protein structure (screen ID: 9; Figure S41). Interestingly, almost all top scoring compounds bound at a shallow cavity around Phe46, while a few bound to the opposite side near Leu31. During experimental validation, it would need to be verified whether these hits can inhibit the interaction of ORF7a with small glutamine-rich tetratricopeptide repeat-containing protein (SGT) (Fielding et al., 2006) or lymphocyte-associated antigen 1 (LFA-1) (Hänel and Willbold, 2007), both of which are thought to play a role in virus-host interactions and host cell cycle modulation. The presence of ORF7a on the viral surface has been suggested to mean that LFA-1 could be a potential viral attachment factor in leukocytes (Huang et al., 2006a; Hänel and Willbold, 2007); however currently, insufficient data are available about the possible residues and structural dynamics that would be involved in this interaction. Moreover, the rather flat surface of ORF7a, devoid of pockets or clefts, makes it challenging to find small molecules with selective and potent binding.

Disrupting the viral assembly and packaging

As the viral life cycle progresses, the structural proteins (S, E, M) are also expressed. These structural proteins are initially inserted into the ER but then transit to the ER-Golgi intermediate compartment, where they are incorporated into mature virions (de Haan and Rottier, 2005). Nucleocapsid protein (N) forms large oligomeric complexes with the replicated viral genome. The resulting ribonucleoprotein (RNP) complexes associate with M, facilitating packaging of the genome into a complete virion assembly. With limited structural information available about M and E, our efforts to find compounds that can disrupt the viral assembly and packaging were centered on the N.

Nucleocapsid protein

N packages the viral RNA into a helically ordered RNP (Stohlman et al., 1988) and plays a critical role in virion assembly by interacting with M. N is also known to play key roles in the regulation of viral RNA synthesis and modification of the host cell metabolism (Cong et al., 2019). N is composed of an NTD (Figures S42 and S43), an intrinsically disordered SR-rich linker, and a C-terminal dimerization domain (CTD, Figures S44 and S45). The published crystal structure of the RNA-binding domain reveals a β-sheet core sandwiched between two loops. The β-sheet core has five antiparallel β-strands plus a short helix and a longer β-hairpin between strands β2 and β5 (PDB: 6m3m) (Kang et al., 2020). The central β-strands of the β-sheet core contain a purine/pyrimidine monophosphate binding site, and this ribonucleotide binding mechanism is essential for virulence, making it an attractive drug docking site (Lin et al., 2014). The NTD is also expected to interact with the CTD during RNA packaging. We have therefore targeted two minimally overlapping regions of the NTD, which in combination cover the entire surface of the NTD (screen IDs: 42 and 43) (Figures S42 and S43). The first NTD site includes the ribonucleotide-binding site reported in (Lin et al., 2014). The CTD of N aids dimerization of the protein, as well as RNA binding. The monomer consists of eight α-helices and a β-hairpin. The dimerization interface itself is composed of three helices (α5, α6, and α7) and one β-hairpin from each monomer and involves a combination of hydrogen bonds and hydrophobic interactions, which result in a very stable dimer (PDB: 6wji). The fact that the C-terminal domain has also been found to bind RNA (Hsieh et al., 2005) highlights the critical role of this domain in the overall helical packaging of the viral genome (Chen et al., 2007). We targeted the dimerization interface of the CTD by removing one monomer from the crystallized dimer structure (screen ID: 44) (Figure S44), and we also screened the dimerization interface via a blind docking procedure against the entire dimer surface (screen ID: 45) (Figure S45). The goal of these screens was to identify a small molecule that will bind to the monomer at the dimer interface and disrupt the formation of the nucleocapsid. However, the structure of the monomer alone has not yet been solved, and given the extent of the interaction at the dimer interface, we anticipate considerable dynamics in the monomer. Therefore, the structure extracted from the dimer may not adequately describe the monomer structure, yielding imperfect hits. Another challenge of this target is that each virion has ∼65 spike trimers and with a ratio of S trimer to N of 1:4 (Neuman et al., 2006; Beniac et al., 2006), this amounts to ∼260 monomers or ∼130 dimer interfaces. The abundance of contacts that would need to be disrupted means that a higher concentration of the inhibitor would be required for efficacy. Despite these reservations, it is worth noting that the HIV nucleocapsid inhibitor lenacapvir (GS-6207/GS-CA1), which targets the dimerization interface of the capsid protein has shown significant efficacy and has made it to clinical trials recently (Yant et al., 2019; Daar et al., 2019; Sager et al., 2019). The screens targeting the RNA-binding site (screen ID: 42) and the N-terminal oligomerization site (screen ID: 43) resulted in hits with an average docking score of −10.3 kcal/mol and −9.78 kcal/mol, respectively. The screens targeting the C-terminal dimerization interface (screen ID: 44) and the oligomerization site (screen ID: 45) resulted in average docking scores of −12.7 and −10.4. kcal/mol, respectively (Figure S8).

Discussion

We screened against 17 proteins, 15 proteins of SARS-CoV-2 and two human proteins, targeting a total of 40 sites across these proteins in 45 screens with approximately 1.1 billion molecules per screen. Due to the unprecedented global situation, we wanted to make the screening results available to researchers prior to experimental validation in the hope that others may also be able to take advantage of our large multi-target data set. The average docking score of the top 100 filtered hits as ranked by the docking score for each target site can be seen in Figure S8. The top 1000 hits per virtual screen, along with additional information such as the docking scores, molecular properties, and docking poses, can be found online at https://vf4covid19.hms.harvard.edu/. In addition, the top million hits for each target site are available for download from the same website in DataWarrior format and from Mendeley Data (see the Resource Availability section for more details). As discussed earlier, not all the functional surfaces that were targeted had similar potential to accommodate a small molecule. The targeted site that yielded the weakest virtual hit compounds was the first (site 1) of the two targeted sites located on the spike interaction interface of ACE2. The reason for the relatively weak predicted binding affinities of the top scoring compounds is the flat and polar nature of this target site. These characteristics make it very challenging for small molecules to bind sufficiently well to the surface so as to disrupt the interaction with spike protein. The targeted site that yielded the virtual hits with the highest predicted binding affinities was the active site of the N7-MT domain of nsp14. The high affinities are likely a product of the remarkably deep, buried pocket of this active site and the relatively high number of hydrophobic sites within that pocket. If we consider the top 1000 filtered hits, the weakest screen (ACE2, screen ID: 2) had a mean docking score of −7.6 kcal/mol (2.3 μM) while nsp14 (screen ID: 37) had the best, with a mean docking score of −12.9 kcal/mol (0.3 nM). Although the trend was the same, the mean docking scores were better for the top 100 hits: −8.2 kcal/mol (1.1 μM) for ACE2 (screen ID: 2) and −13.4 kcal/mol (0.14 nM) for nsp14 (screen ID: 37). Finally, as expected, the mean docking scores were even better for the top 10 hits: −8.5 kcal/mol (0.5 μM) for ACE2 (screen ID: 2) and −14 kcal/mol (0.03 nM) for nsp14 (screen ID: 37). The mean docking scores of the top 100 hits for each screen are shown in Figure S8. Out of the 45 virtual screens, 41 had a mean docking score for the top 1000 compounds higher than −8.2 kcal/mol. These scores indicate that sub-micromolar binders should be identifiable from the hits for each target site. The hits derived from the rigid docking procedures described here could be further improved for potency by a second-stage screen, in which the target proteins are allowed to be flexible. This could be achieved either by ensemble docking or allowing flexible side chains, for example, by using GWOVina (Wong et al., 2020; Gorgulla et al., 2020b). Docking results could also be improved in second-stage screens by using alternative docking programs, such as AutoDock Vina, Smina, or Gnina (Trott and Olson, 2010; Koes et al., 2013; Ragoza et al., 2017). For special types of ligands such as compounds containing carbohydrates or halogens, specialized docking programs such as VinaXB or Vina-Carb could be utilized to further increase the accuracy of the results (Koebel et al., 2016; Nivedha et al., 2016). Small-molecule drug candidates that have been previously pre-approved by the FDA or other regulatory bodies for other indications and molecules that have been vetted for use in humans including preclinical candidates, neutraceuticals, metabolites, and select natural products are particularly desirable hits in any inhibitor screen (Kiplin Guy et al., 2020). This is because these compounds can be fast-tracked to clinical trials on the basis of previous data. Here, we mined for such compounds in the top 1% of our hits from the “in stock” compounds of the ZINC library. We found 161 drugs (with 491 occurrences across all the screens) that are “world approved” in our top 1%. We further filtered these drug candidates to only those with a docking score better than −8 kcal/mol and show these 80 approved drugs along with their screening target and docking score in Table S46. Of these 80 drugs, 16 of them are being considered in clinical trials for COVID-19. The complete list of “world approved” hits along with their binding targets and docking scores is available at https://vf4covid19.hms.harvard.edu/world-approved-drugs. Several hits from the “world approved” drugs are from the steroid family. Corticosteroids such as betamethasone, ciclesonide, flumetasone, and meprednisone acetate are already being investigated as COVID-19 treatments (Recovery Collaborative Group et al., 2020; Iwabuchi et al., 2020; Wang et al., 2020b). For example, betamethasone was identified as a hit for nsp14, the bifunctional enzyme that acts as a methyltransferase and an exoribonuclease. Betamethasone bound in the simulation to the active site of the exoribonuclease of nsp14 (Figure S47). While the usefulness of corticosteroids is largely attributed to their anti-inflammatory effects in the host, our screening results suggest that additional mechanisms of action that involve viral proteins may be possible. Other steroid hormones in clinical trial such as estradiol or dutasteride also came up as hits in our screens. Of the non-steroid approved drug hits, tyrosine kinase inhibitors occupy a significant fraction. Drugs such as imatinib (Glivec) are being tested in clinical trials due to their reported in vitro efficacy against SARS-CoV and MERS-CoV (Coleman et al., 2016; Dyall et al., 2014); however, their mode of action is not yet fully understood. In our screens, imatinib binds to two targets: (i) nsp14, at the active site of the methyltransferase, and (ii) the RNA-binding interface of the helicase, nsp13 (Figure S47). Of the 3897 “investigational” drugs in the ZINC 15 database, 137 were hits (500 occurrences), and of the 101378 “in-man” compounds, 401 were hits (1061 occurrences). A full list of these molecules, along with their respective target sites and docking scores, is available at (https://vf4covid19.hms.harvard.edu/investigational-drugs). Multiple occurrences of a hit molecule partly stem from the different docking scenarios that we used for the same target site. In addition, some of the compounds may be represented more than once in the list because of differences in tautomeric, isomeric, or protonation states, any of which earns the molecule a unique ID in the ZINC database. Despite the abundance of approved hits, some of which are already involved in or related to drugs in clinical trials, experimental confirmation of the specificity of hits from our screen against SARS-CoV-2 proteins is still needed to determine an antiviral mechanism of action.

Comparison to previously identified inhibitors

SARS-CoV-2 is not the first CoV to cause serious disease in the human population, and these other CoVs (SARS-CoV and MERS-CoV) have also been the subject of research into small-molecule inhibitors. There have been several efforts in the past, both experimental and computational, to target specific viral proteins from these viruses. The proteins that were most often targeted in these efforts were the proteases, MPro (Jin et al., 2020a; Park et al., 2012; Wang et al., 2017; Yang et al., 2005) and PLPro (Park et al., 2012; Ghosh et al., 2010; Chou et al., 2008; Cheng et al., 2015), the polymerase (Agostini et al., 2019; Sheahan et al., 2017), the spike protein (Xia et al., 2019; Liao et al., 2015; Kao et al., 2004), the endoribonuclease (Ortiz-Alcantara et al., 2010), and the helicase (Lee et al., 2017; Cho et al., 2015; Alnazawi et al., 2017). Although direct binding affinities were not available for most of the molecules, the inhibitory potential, IC50, or the effective potential, EC50, were reported. In order to determine if previously published SARS-CoV and MERS-CoV hits for which an exact binding site had been reported were also top hits against SARS-CoV-2, we calculated docking scores using the same computational methods used in the screens described here, allowing direct comparison with our hits. One of the best inhibitors for SARS-CoV MPro is an asymmetric aromatic disulfide molecule with an IC50 of 0.52 ± 0.06 μM (Wang et al., 2017). When we docked this molecule against SARS-CoV-2 Mpro, we got a docking score of −7.7 kcal/mol , whereas the top 100 hits from two of our recent screens described here, which targeted the active site of Mpro (screen IDs: 16 and 17), resulted in mean docking scores of −10.8 and −11.4 kcal/mol (Figure S8). While promising, this apparent improvement in binding affinity must still be validated experimentally. Similarly, in the case of PLpro, the best previously described active site inhibitor (Ghosh et al., 2010) had a binding free energy of −7.2 kcal/mol when re-screened, while the top 100 compounds from the two corresponding VirtualFlow screens (screen IDs: 12 and 15) have mean binding scores of −8.9 and −10.0 kcal/mol, respectively (Figure S8). Lee, et al. described a novel helicase inhibitor that can block both the dSDNA unwinding and ATP hydrolysis activities of the SARS-CoV helicase (Lee et al., 2017). Docking of this molecule to the SARS-CoV-2 helicase gave a score of −7.9 kcal/mol. We also targeted the active site of the SARS-CoV-2 helicase (screen ID: 33) in our own study and that screen resulted in a top 100 hits with a mean docking score of −10.2 kcal/mol. There have been multiple attempts to develop small molecule inhibitors that target the active site of the RdRp, considered a promising target, from both SARS-CoV and MERS-CoV (Agostini et al., 2019; Sheahan et al., June 2017). Docking of the best molecules from these previously published studies against the SARS-CoV-2 RdRp resulted in docking scores of −6.1 and −6.8 kcal/mol, while our screens (screen IDs: 28 and 29) resulted in compounds with mean docking scores of −11.1 and −9.7 kcal/mol, respectively, for the top 100 hits. It must be noted that, as stated before, the comparisons provided here are based on docking scores, and therefore any apparent improvements in binding still need to be experimentally validated. Multiple attempts have already been made to employ virtual screening of varying library sizes to identify approved drugs, as well as discover new molecules, that can effectively inhibit SARS-CoV-2 viral infection (Jin et al., 2020a; Strodel et al., 2020; Zhou et al., 2020c; Naik et al., 2020; Fischer et al., 2020; Ton et al., 2020; Panda et al., 2020; Yu et al., 2020; Singh and Florez, 2020; Kadioglu et al., 2020), with more than one screen is focused solely on Mpro (Strodel et al., 2020; Fischer et al., 2020; Yu et al., 2020; Singh and Florez, 2020). It is important to note that in the case of Mpro, the conformational plasticity of the active site means that screening against a single conformation, as is done in these studies, may be less effective at identifying functional hits. One work of note is that by Ton, et al., in which 1.3 billion compounds from the ZINC 15 library were screened against Mpro using a Deep Docking algorithm (Ton et al., 2020). While Deep Docking significantly accelerates the screening of ultra-large compound libraries, it relies heavily on using the docking scores of a small random subset of compounds to eliminate a large fraction of the compounds early in the docking workflow, and this can be detrimental to finding an absolute docking score for individual compounds. A recent mass-spectrometry-based study has identified several host proteins that interact with SARS-CoV-2 viral proteins, leading to the identification of multiple druggable host targets and a series of known modulating compounds for these targets that were then shown to have antiviral activity in cell-culture-based assays (Gordon et al., 2020). The efficacy of these already approved host-directed compounds lends support to the development of these or similar drugs as therapeutic interventions for SARS-CoV-2 infection. Coincidentally, some of the compounds from this mass spectrometry study were also identified as hits in our screens against diverse viral proteins. For example, “nafamostat”, an approved anticoagulant identified in Gordon, et al. as having a possible impact on cell entry, was also identified in our screens as a hit with good scores against three viral proteins: ORF7a, PLpro, and nucleoprotein. This compound is already undergoing clinical trials for COVID-19 (Ref:NCT04352400, NCT04418128). Similarly, “progesterone”, a common birth control drug that is also known to have anti-inflammatory properties, was identified in the proteomics study as a ligand with antiviral activity that targets sigma-1 or sigma-2 receptors. In our screens, this compound was identified as an inhibitor for nsp16, nsp7, and spike. Progesterone is also in clinical trials for the treatment of COVID-19 (NCT04365127). Silmitasertib (CX-4945), an inhibitor of casein kinase II, was identified both in the proteomics study as well as in another recent study that mapped the host phosphorylation landscape in the context of SARS-CoV-2 infection (Gordon et al., 2020; Bouhaddou et al., 2020). Silmitasertib was originally developed to be used in combination with chemotherapy. In our screen, silmitasertib showed potential binding to nsp16. If some of these host-protein-targeted compounds can also directly inhibit some viral proteins, as suggested by the docking scores derived from our screens, this could form the basis for a unique regimen of multi-target drugs to treat COVID-19 that acts on both viral as well as host proteins simultaneously. A list of compounds that are common to both the interactome study and our virtual screens, along with their potential targets, is provided in Table S48. To verify if the docking scenarios and procedures that were used in this study worked in principle, we carried out two types of evaluation studies with compounds that were previously identified as inhibitors and/or binders. These verification studies should be interpreted on a qualitative level as they are not able to recapitulate the unique advantages of the ultra-large scale of the screens presented here. First, we completed re-dockings of ligand-protein complexes for the macrodomain, Mpro, PLpro, nsp15, and nsp16. Here, the similarity of the binding geometry of the ligand, derived from re-dock, is compared with that of the available high-resolution experimental structure for the target protein and ligand. Approximately, 35% of the 29 compounds that we re-docked have an RMSD below 2.5 Å when compared to the experimental structure. As a relative benchmark of our performance, in the original AutoDock Vina publication, an average re-docking success rate of around 50% was reported when using rigid receptors (Trott and Olson, 2010). However, the docking accuracy in the aforementioned publication was set to a much higher value (i.e., exhaustiveness parameter set to 8), while we used QuickVina 2 in its fastest setting (exhaustiveness of 1) in our study due to the computational costs of screening a billion compounds per target. The key idea of our approach is to use the scale of the screen to increase the quality of the resulting top scoring compounds (Lyu et al., 2019; Gorgulla et al., 2020a; Gorgulla, 2018). One of the successfully re-docked compounds for nsp16 can be found in Figure S49. In addition, we carried out enrichment studies for Mpro and PLpro, as these were the only two targets with sufficient non-covalent binders reported. Enrichment studies assess the ability of the docking routines to re-identify and enrich previously identified binders when mixed with a pool of random compounds or decoys. Two different flavors of enrichment studies were performed: (i) using experimentally identified molecules and (ii) using the virtual hits identified in this study. The Directory of Useful Decoys, Enhanced (DUD-E) was used for generating decoy molecules based on the active compounds (Huang et al., 2006b; Mysinger et al., 2012). The complete details of these studies can be found in the Supplemental Information under "Enrichment and re-docking verification studies". The experimentally active compounds that were used in the enrichment studies are listed in Table S50, and the virtual hits that were used as active compounds are listed in Table S51. The enrichment factors at 2%, EF2, that were obtained for the experimental binders are 20 and 25 for Mpro and PLpro, respectively. The maximum possible EF2 is 50 for each of the two target proteins. The results obtained here are typical of similar docking procedures (Huang et al., 2006b; Mysinger et al., 2012). When the enrichment studies were performed with ten virtual hits, the enrichment factors at 2% are equal to the theoretical maximum value of 50, which reflects the effect of ultra-large-scale screening. Our ultra-large library size and the multi-pronged screening approach we present here is not only unprecedented in terms of scale, significantly enhancing the chances of finding an effective antiviral, but also provides a pool of potential antiviral compounds that could be developed further as pan-CoV drugs, as well as combination therapies. Combination regimens of reverse transcriptase inhibitors and protease inhibitors have been found to be effective as anti-retroviral therapy for HIV and have led to a dramatic improvement in HIV-infected patient health and survival worldwide (Vella et al., 2012; Kuritzkes et al., April 1999; Palella et al., 1998). Clinical studies have also shown that in cases of severe influenza, a combination of multiple antiviral drugs is much more effective than a monotherapy (Dunning et al., 2014; Hung et al., 2017). Therefore, it could also be beneficial to develop a combination therapy to treat severely affected patients with COVID-19. Considering that vaccine deployment will take time and that the long-term efficacy of any SARS-CoV-2 vaccines is still unknown, combination therapeutic strategies for the control and treatment of CoV infections should continue to be developed in parallel. The mutating nature of CoVs and the potential for the emergence of novel CoVs in the future make the development of broad spectrum antivirals desirable. For a protein to qualify as a candidate target for pan-CoV drug development, it should be (i) critical for the viral replication or virulence, and (ii) highly conserved across all known CoVs and considered likely to be conserved in emerging CoVs. While the structural proteins (S, N, M, and E) are not as well conserved, the accessory proteins are conserved to an even lesser extent across different CoV species (ProViz (Peter et al., 2016)). Highly conserved non-structural proteins and conserved host proteins would therefore be ideal targets for the development of broad spectrum CoV drugs (Allison and Bavari, 2019). While studies (Benvenuto et al., 2020) have suggested that the S and N genes of SARS-CoV-2 are undergoing episodic selection during human transmission, recent analyses of genome samples from across the world suggest that the greatest genetic diversity is occurring in S, ORF3a, and ORF8 (James, 2020). Structural genomic analyses across multiple virus classes, including SARS-CoV-2, have shown that intra-viral interactions are more conserved than viral host interactions (Srinivasan et al., 2020; Warren et al., 2013; Wang et al., 2016; Voitenko et al., 2016). The PPI interfaces involved in the heteromeric RNA polymerase complex (composed of nsp7, nsp8, and nsp12) are evolutionarily conserved (Srinivasan et al., 2020), and our top hits against these crucial PPIs represent promising candidates for the development of pan-CoV inhibitors. Similarly, structural genomic analysis of the methyltransferase complexes (nsp10-nsp14 and nsp10-nsp16) also reveals PPIs that are fully conserved with mutations limited to the surface (Srinivasan et al., 2020). Moreover, the substrate binding regions we used for PLpro, Mpro, nsp14, nsp16, and the RNA binding site of the N have all been shown to be highly conserved (Voitenko et al., 2016), and the correspondingly identified small molecules could also potentially be developed as pan-CoV therapeutics. Since most human CoVs differ in their use of host cell receptors and priming proteases (Millet and Whittaker, 2015), identifying a cell-based assay or functional animal model for developing pan-CoV therapeutics could be challenging. The use of normal human bronchial epithelia cells as a potential universal screening platform for multiple classes of CoVs shows promise (Pyrc et al., 2010; Chan et al., 2014; Sheahan et al., 2020), and the general need for high-throughput ex vivo testing models will hopefully drive development in this area. It is important to understand that the work described here is only the first step in the drug discovery process. Virtual screening seeks to use structure and modeling-guided approaches to identify higher quality starting hits. Hits from these screens still need to be experimentally validated for binding, inhibition, and activity in the context of viral infection. Some of these targets, which affect viral immune evasion rather than replication, may need more complex evaluation in cell culture or in animal models to determine efficacy. While our own experimental validation of hits from these screens has begun, we believe that making these results available to a broader audience for experimentation is important in light of the current pandemic. The alarming global spread of SARS-CoV-2 and the dearth of effective prophylaxis or treatment methods have resulted in a pressing need for accelerated drug discovery. Here, we make use of the recent chemical space revolution that resulted in ultra-large make-on-demand libraries to prioritize lead molecules and minimize the transition time from discovery to clinic. We have used the available-to-date SARS-CoV-2 structome to carry out detailed structure-based virtual screening against 17 different target proteins, frequently at multiple target sites, by leveraging the versatility of our recently developed in silico screening platform, VirtualFlow. The set of small molecule hits that we present here can aid drug development happening at multiple institutions worldwide, and the set of prospective drug molecules derived from this ultra-large multi-target screening campaign will potentially contribute to the building of a future arsenal of anti-coronaviral drugs, ready to tackle future outbreaks.

Limitations of the study

We present here results from our ultra-large-scale computational effort, and the hits have not been experimentally validated for their efficacy. Given the multi-pronged approach targeting 40 different sites on 17 proteins, it is a challenge for any one research group to validate all the targets in the timely manner that the urgency of the current pandemic warrants. We hope that the results provided here can be leveraged collectively by the research community to identify potent inhibitors of SARS-CoV-2. In addition, we have also discussed the potential and plausible challenges, if any, for each target site from a docking perspective that researchers should bear in mind while validating hits. When performing in silico screening, we generally choose a 3D surface on the target protein, referred to as the “docking box”, and evaluate the energetics of a molecule bound to that chosen surface. The docking box is judiciously chosen to target a known functional domain, either an active site, an allosteric site, or a PPI interface. In practice, the docking box is typically extended to accommodate secondary interactions with the small molecule, especially when competing with a substrate. However, occasionally, this results in identifying a tight binder that does not harbor any inhibitory function. It is important to take into account this possibility during follow-up experiments and test for binding using a direct binding assay such as isothermal titration calorimetry (ITC), surface plasmon resonance (SPR), microscale thermophoresis (MST), or nuclear magnetic resonance (NMR). In particular, NMR and X-ray crystallography structures would provide detailed information on the binding site and how the small molecule engages the target. It could still be possible to functionalize these non-inhibitory binders with warheads to degrade the targeted protein by established methods, referred to as proteolysis-targeting chimeras (PROTACs) (Lai & Crews, 2016; Morgan et al., 2017). However, the effectiveness of these PROTACs would depend on the concentration, rate of turnover, and accessibility of the target viral proteins. As noted before, when testing these compounds in viral assays, one should be cognizant of the nature of the target and the possible differential effects on viral viability versus viral virulence. It may also be useful to correlate “pre-approved” hits from the virtual screens with patient data, for example, to determine if patients' regular medication and supplement regimens reveal any hits that positively correlate with either a milder disease course or a faster time to recovery. This could help narrow down targets and provide justification for a more detailed and time-consuming experimental analysis of a hit.

Resource availability

Lead contact

Further information and requests for resources should be directed to and will be fulfilled by the Lead Contact, Dr. Haribabu Arthanari, hari@hms.harvard.edu.

Material availability

This study did not generate nor use any new or unique reagents.

Data and code availability

The virtual screening results and special subsets (e.g., the hits which are natural products) are available on the project homepage at https://vf4covid19.hms.harvard.edu/. The homepage includes the top 200 hits (partially processed/filtered) in csv format, the top ∼1000 hits (partially processed/filtered) in an interactive table with 3D visualization capabilities, as well as the top ∼1 million hits in csv format (raw data, unprocessed/unfiltered). The top 1 million hits (raw data, unprocessed/unfiltered) for each screen has been deposited on Mendeley Data, available at https://doi.org/10.17632/knnpnsdpyn.1. VirtualFlow, the open-source drug discovery platform which was used in this work for the virtual screenings, is available on GitHub at https://github.com/VirtualFlow and via the primary homepage of the project at https://virtual-flow.org/.

Methods

All methods can be found in the accompanying Transparent methods supplemental file.

17 in total

1. Biomolecular modeling thrives in the age of technology.

Authors: Tamar Schlick; Stephanie Portillo-Ledesma
Journal: Nat Comput Sci Date: 2021-05-20

2. The SARS-CoV-2 helicase as a target for antiviral therapy: Identification of potential small molecule inhibitors by in silico modelling.

Authors: Eleni Pitsillou; Julia Liang; Andrew Hung; Tom C Karagiannis
Journal: J Mol Graph Model Date: 2022-04-18 Impact factor: 2.942

3. It all clicks together: In silico drug discovery becoming mainstream.

Authors: Antonina L Nazarova; Vsevolod Katritch
Journal: Clin Transl Med Date: 2022-04

Review 4. Antiviral Drug Discovery for the Treatment of COVID-19 Infections.

Authors: Teresa I Ng; Ivan Correia; Jane Seagal; David A DeGoey; Michael R Schrimpf; David J Hardee; Elizabeth L Noey; Warren M Kati
Journal: Viruses Date: 2022-05-04 Impact factor: 5.818

Review 5. Relevance of Peroxisome Proliferator Activated Receptors in Multitarget Paradigm Associated with the Endocannabinoid System.

Authors: Ana Lago-Fernandez; Sara Zarzo-Arias; Nadine Jagerovic; Paula Morales
Journal: Int J Mol Sci Date: 2021-01-20 Impact factor: 5.923

6. Establishing an Analogue Based In Silico Pipeline in the Pursuit of Novel Inhibitory Scaffolds against the SARS Coronavirus 2 Papain-Like Protease.

Authors: Roxanna Hajbabaie; Matthew T Harper; Taufiq Rahman
Journal: Molecules Date: 2021-02-20 Impact factor: 4.927

Review 7. Resources and computational strategies to advance small molecule SARS-CoV-2 discovery: lessons from the pandemic and preparing for future health crises.

Authors: Natesh Singh; Bruno O Villoutreix
Journal: Comput Struct Biotechnol J Date: 2021-04-26 Impact factor: 7.271

Review 8. RNA helicases required for viral propagation in humans.

Authors: John C Marecki; Binyam Belachew; Jun Gao; Kevin D Raney
Journal: Enzymes Date: 2021-11-02

9. Non-covalent SARS-CoV-2 M^pro inhibitors developed from in silico screen hits.

Authors: Giacomo G Rossetti; Marianna A Ossorio; Stephan Rempel; Annika Kratzel; Vasilis S Dionellis; Samia Barriot; Laurence Tropia; Christoph Gorgulla; Haribabu Arthanari; Volker Thiel; Peter Mohr; Remo Gamboni; Thanos D Halazonetis
Journal: Sci Rep Date: 2022-02-15 Impact factor: 4.379

Review 10. Molecular Modeling Targeting Transmembrane Serine Protease 2 (TMPRSS2) as an Alternative Drug Target Against Coronaviruses.

Authors: Igor José Dos Santos Nascimento; Edeildo Ferreira da Silva-Júnior; Thiago Mendonça de Aquino
Journal: Curr Drug Targets Date: 2022 Impact factor: 2.937

A multi-pronged approach targeting SARS-CoV-2 proteins using ultra-large virtual screening.

Introduction

Results

Preventing coronavirus entry

ACE2-spike-RBD interaction

Targeting the spike protein interaction interface on ACE2

Targeting the ACE2 interaction interface on the spike RBD

TMPRSS2

HR1-HR2 interface

Preventing coronavirus replication and curtailing viral load

Mpro (3CLpro)

PLpro

Replication complex: nsp7, 8, and 12

nsp13: helicase

nsp9: ssRNA-binding protein

Disrupting the ability of coronaviruses to evade or subvert host defenses

Guanine-N7 methyltransferase/ExoN (nsp10-14 complex)

2′-O-ribose methyltransferase (nsp16)

Uridylate-specific endoribonuclease

Deubiquitination and deISGylation activities of PLpro

nsp3 macrodomain

ORF7a

Disrupting the viral assembly and packaging

Nucleocapsid protein

Discussion

Comparison to previously identified inhibitors

Limitations of the study

Resource availability

Lead contact

Material availability

Data and code availability

Methods

1. Biomolecular modeling thrives in the age of technology.

2. The SARS-CoV-2 helicase as a target for antiviral therapy: Identification of potential small molecule inhibitors by in silico modelling.

3. It all clicks together: In silico drug discovery becoming mainstream.

Review 4. Antiviral Drug Discovery for the Treatment of COVID-19 Infections.

Review 5. Relevance of Peroxisome Proliferator Activated Receptors in Multitarget Paradigm Associated with the Endocannabinoid System.

6. Establishing an Analogue Based In Silico Pipeline in the Pursuit of Novel Inhibitory Scaffolds against the SARS Coronavirus 2 Papain-Like Protease.

Review 7. Resources and computational strategies to advance small molecule SARS-CoV-2 discovery: lessons from the pandemic and preparing for future health crises.

Review 8. RNA helicases required for viral propagation in humans.

9. Non-covalent SARS-CoV-2 Mpro inhibitors developed from in silico screen hits.

Review 10. Molecular Modeling Targeting Transmembrane Serine Protease 2 (TMPRSS2) as an Alternative Drug Target Against Coronaviruses.

9. Non-covalent SARS-CoV-2 M^pro inhibitors developed from in silico screen hits.