Literature DB >> 34337272

Production of Proteins of the SARS-CoV-2 Proteome for Drug Discovery.

Choon Kim¹, Kiran V Mahasenan¹, Atul Bhardwaj¹, Olaf Wiest¹, Mayland Chang¹, Shahriar Mobashery¹.

Abstract

The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) is the causative agent of the coronavirus disease of 2019 (COVID-19). Its genome encodes two open reading frames for two large proteins, PP1a and PP1ab. Within the two polypeptide stretches, there are two proteases that process the large proteins into 15 discrete proteins essential for the assembly of the virion during its replication. We describe herein the cloning of the genes for these discrete proteins optimized for expression in Escherichia coli, production of the proteins, and their purification to homogeneity. These included all but six: NSP6, which possesses eight transmembrane regions, and five that are small proteins/peptides (E, ORF3b, ORF6, ORF7b, and ORF10). These proteins are intended for experimental validation of small-molecule binders as molecular template hits. The proof of concept was established with the ADP-ribosylhydrolase (ARH) domain of NSP3 in discovery of small-molecule templates that could serve as the basis for further optimization. The hit molecules include one submicromolar and a few low-micromolar binders to the ARH domain. Availability of these proteins in soluble forms opens up the opportunity for discoveries of novel templates with the potential for anti-COVID-19 pharmaceuticals.

Entities: CellLine Chemical Disease Gene Species

Year: 2021 PMID： 34337272 PMCID： PMC8315141 DOI： 10.1021/acsomega.1c02984

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

A pneumonia-like condition of unknown cause was first detected late in 2019 in Wuhan, China.[1,2] Shortly after, the pathogen was identified as a coronavirus (CoV), whose RNA genome was sequenced shortly after.[3] The World Health Organization (WHO) named the disease that the novel virus caused as COVID-19 (coronavirus disease of 2019). The novel coronavirus was named as “severe acute respiratory syndrome CoV-2” or SARS-CoV-2. The WHO declared COVID-19 a pandemic on March 11, 2020.[4] At the time of writing of this report, COVID-19 has been reported in 219 countries and territories, with >150 million confirmed cases and >3.1 million fatalities.[5] These figures increase daily. SARS-CoV-2 contains a positive-strand RNA genome of approximately 30 kb (Figure ). The genome produces two long polyproteins (PP1a and PP1ab), which are processed into 16 nonstructural proteins (NSPs), four structural proteins—spike glycoprotein (S), envelope protein (E), membrane glycoprotein (M), and nucleocapsid phosphoprotein (N)—and at least seven accessory proteins (ORF3a, ORF3b, ORF6, ORF7a, ORF7b, ORF8, and ORF9b).[6,7] SARS-CoV-2 is primarily transmitted through respiratory droplets.[8] The virus binds to the angiotensin-converting enzyme 2 (ACE2) as its receptor on the cell surface,[9−12] whereby the complex is internalized mainly through endocytosis[13] to initiate the infection. After the viral RNA enters the host cell, it is translated from two open reading frames (ORF1a and ORF1ab) to produce the two aforementioned large polyproteins, PP1a and PP1ab.[7,14] The latter is cleaved/processed by two viral proteases that are already within the polyprotein stretches. These are papain-like protease (PLpro) and 3C-like protease (3CLpro), which produce the mature NSPs.[15] PP1a is cleaved into 10 fragments, constituting NSP1–NSP10. PP1ab is fragmented to produce all NSPs from NSP1–NSP16. The PLpro protease is a component of NSP3, and it is believed to process the polyprotein for the formation of NSP1, NSP2, and NSP3. The rest of the NSPs are processed/matured by the protease 3CLpro, which is contained within NSP5. Both proteases are essential for viral existence.[15] Once all NSPs are produced, viral RNA synthesis by NSP12 (an RNA-dependent RNA polymerase), which needs to associate with other NSPs for catalytic competence, produces both genomic RNA (gRNA) and sub-genomic RNA (sgRNA). The sgRNAs serve as mRNA for the production of the structural and accessory proteins.

Figure 1

Genomic structure of SARS-CoV-2.

Genomic structure of SARS-CoV-2. The recent availability of several vaccines for SARS-CoV-2 is a critically needed development in addressing the COVID-19 pandemic. At the same time, there is a genuine need for COVID-19 therapeutics. Initial efforts focused on repurposing of existing antiviral treatments have so far only yielded a single FDA-approved treatment with modest efficacy, emphasizing the need for drugs that have been conceived and designed expressly for SARS-CoV-2. The identification of initial hits for such efforts requires a robust access to the different proteins of SARS-CoV-2 in high-purity and sufficient amounts. As a part of our efforts to target the proteome of SARS-CoV-2 for the discovery of anti-COVID-19 therapeutic drugs in a target-agnostic manner, we have optimized such methods, as will be outlined in this report. We have cloned genes for these proteins; we assess expression of the genes and the ability to purify the individual proteins. As a demonstration of the use of this method for early-stage drug discovery, we will describe one example of the use of the ADP-ribosylhydrolase (ARH) domain of NSP3 in discovery of small-molecule templates that could serve as the basis for further optimization efforts.

Materials and Methods

Cloning and Expression of the Genes of SARS-CoV-2

In order to express SARS-CoV-2 proteins in Escherichia coli, we synthesized codon-optimized genes from GenScript (Piscataway, NJ). All synthetic genes are listed in Table S1. We synthesized the nsp1 gene including segments encoding 6xHis-tag, a Twin-Strep-tag and a tobacco etch virus (TEV) cleavage site at its 5′-end to replace the thrombin-cleavage site and the T7-tag in an expression vector pET-28a(+) for better purification of the proteins expressed. The synthetic nsp1 gene contained the restriction sites NcoI, NdeI, and XhoI at the 5′-end, after the sequence for the TEV cleavage site, and the 3′-end, respectively. The pET-28a(+) and the nsp1 gene were digested with NcoI and XhoI (NEB, Ipswich, MA) in 1× CutSmart buffer for 6 h at 37 °C, followed by the purification on an agarose gel with a Zymoclean DNA gel recovery kit (Zymo Research, Irvine, CA) following the manufacturer’s instruction. Then, the two fragments were ligated with T4 DNA ligase in 1× T4 DNA ligase buffer (NEB, Ipswich, MA) at 16 °C overnight to generate the plasmid pETHST-NSP1. A competent E. coli DH5α was transformed with the ligation mixture by giving heat shock for 90 s at 42 °C and selected on a LB plate containing 50 μg/mL kanamycin after incubation for 18 h at 37 °C. The nucleotide sequences at the 5′-end and the 3′-end of the nsp1 gene in the resulting plasmid were confirmed by DNA sequencing with the T7 promoter and T7 terminator primers, respectively (MCLAB, South San Francisco, CA). The plasmid pETHST-NSP1 was then inserted into the E. coli LOBSTR-RIL strain, derived from BL21 (DE3), to express the NSP1 fused with the N-terminal 6xHis and Twin-Strep tags and the TEV cleavage site. The pETHST-NSP1 was digested with NdeI and XhoI in 1× CutSmart buffer at 37 °C overnight. The linearized pETHST was purified on agarose gels with a Zymoclean DNA gel recovery kit. It was used to clone all other synthetic genes that were cut with NdeI and XhoI. The nucleotide sequences of each gene were confirmed. The E. coli LOBSTR-RIL was transformed with each discrete plasmid carrying a SARS-CoV-2 gene. The overnight cultures (200 μL) of the transformed LOBSTR-RIL strain were diluted in 20 mL of fresh LB medium containing 50 μg/mL kanamycin, followed by incubation at 37 °C with shaking at 180 rpm until the OD600 of the cultures reached 0.6. The expression of the SARS-CoV-2 proteins was examined by inducing with 0.4 mM IPTG either by incubation at 37 °C for 2 h or at 16 °C for 18 h. The cells were harvested at 7000g for 15 min at 4 °C and were frozen at −80 °C. The frozen cells were resuspended in 2 mL of 1× PBS with 5 mM EDTA, followed by cell disruption using sonification (10 cycles of 10 s of pulse and 20 s of rest on ice) with a Branson 450 sonifier. One milliliter of disrupted cells was centrifuged at 18,000g for 20 min at 4 °C. The supernatants were transferred to new microcentrifuge tubes, and the pellets were resuspended in 800 μL of 1× PBS. The level of soluble proteins in the supernatants was examined on 8–16% ExpressPlus SDS-PAGE gels (GenScript, Piscataway, NJ).

Purification of the SARS-CoV-2 Proteins

For the purpose of demonstration, four proteins, ARH domain (NSP3), PLpro (NSP3), NSP15, and NSP16, were purified. The overnight cultures (1 mL) of the LOBSTR strains harboring pETHST-NSP3N, pETHST-NSP3P, pETHST-NSP15, or pETHST-NSP16 were diluted in 100 mL of fresh LB medium containing 50 μg/mL kanamycin, followed by incubation at 37 °C with shaking at 180 rpm. The expression of the SARS-CoV-2 proteins was induced with 0.4 mM IPTG at 0.6 OD600 of the cultures. Each protein was produced by further incubation at 16 °C for 18 h. The cells were harvested at 7000g for 20 min at 4 °C, and the pellets were stored at −80 °C. The frozen cells were resuspended in 3 mL of washing buffer I (50 mM Tris-Cl, pH 8.0, and 150 mM NaCl) containing 10 mM MgCl2, 50 μg/mL DNase I, 50 μg/mL lysozyme, and 1× Halt protease inhibitor cocktail (Thermo Fisher Scientific, Waltham, MA) and incubated for 30 min on ice. The cells were disrupted by sonification (15 cycles of 10 s of pulse and 20 s of rest) with a Branson 450 sonifier, followed by centrifugation at 18,000g for 20 min at 4 °C to separate the soluble proteins from the unbroken cells and the inclusion bodies. The supernatants were filtrated through 0.2 μm filters prior to the purification. The N-terminal His- and Twin-Strep-tagged SARS-CoV-2 proteins were purified with two-step affinity chromatography on 1 mL of the Strep-Tactin Sepharose resin (IBA Lifesciences, Göttingen, Germany) and 1 mL of the Protino Ni-NTA agarose resin (Macherey-Nagel, Düren, Germany) following the manufacturer’s instruction.

Strep-Tactin Sepharose Affinity Chromatography

One milliliter of the resin was equilibrated with three bed volumes of washing buffer I by gravity flow, followed by application of the filtrated cell lysate supernatants to it. The resin was subsequently washed with five bed volumes of washing buffer I. The N-terminal His- and Twin-Strep-tagged SARS-CoV-2 proteins were eluted with two bed volumes of elution buffer (50 mM Tris-Cl, pH 8.0, 150 mM NaCl, and 2.5 mM desthiobiotin). The fractions at every step (flow-through, washing, and elution) were collected and examined by SDS-PAGE.

Protino Ni-NTA Agarose Affinity Chromatography

One milliliter of the resin was equilibrated with five bed volumes of washing buffer II (50 mM Tris-Cl, pH 8.0, 150 mM NaCl, and 20 mM imidazole). The chromatography was performed by gravity flow. The eluents (2 mL) from the Strep-Tactin Sepharose resin were applied to the Ni-NTA resin and incubated for 30 min at room temperature with gentle rotation, as the binding of the His-tagged proteins to the Ni-NTA resin is a slow process, followed by collection of unbound proteins. The resin was washed with eight bed volumes of washing buffer II. Subsequently, the stepwise increase in imidazole concentrations at 100, 200, 400, and 800 mM was used to elute the SARS-CoV-2 proteins, which were typically collected at 200 and 400 mM imidazole.

Cleavage of the 6xHis- and Twin-Strep-Tags by TEV Protease

The imidazole was removed from the eluents with an Amicon Ultra-15 centrifugal filter (MWCO 10 kDa; MilliporeSigma, Burlington, MA) by five repeats of concentration at 4000g for 20 min and dilution with 15 mL of washing buffer I. The concentration of the purified protein was then determined at 280 nm with the NanoPhotometer NP80 (IMPLEN, Los Angeles, CA) using the molar extinction coefficient and the molecular weight of each protein. The 6xHis-tagged TEV protease (MCLAB, South San Francisco, CA) was added to the protein solution at 1:50 (w/w) ratio, and the solution was incubated at 4 °C for 48 h with gentle agitation in order to cleave the 6xHis- and Twin-Strep-tags from the SARS-CoV-2 proteins. The progression of proteolysis was periodically checked on SDS-PAGE.

Isolation of the Tag-Free Proteins

The tag-free SARS-CoV-2 proteins were separated from the cleaved tags and the 6xHis-tagged TEV protease by applying to Protino Ni-NTA agarose resin (Macherey-Nagel, Düren, Germany), followed by washing the resin with two bed volumes of washing buffer II. The flow-through and washing fractions containing the tag-free SARS-CoV-2 proteins were pooled and concentrated with an Amicon Ultra-15 centrifugal filter (MWCO 10 kDa; MilliporeSigma, Burlington, MA). The purity of the proteins was assessed by SDS-PAGE. The homogeneously purified protein was stored at −80 °C until use.

Structure-Based Virtual Screening

The lead-like library of 3.5 million compounds was downloaded from the ZINC15 database[16] and prepared using LigPrep.[17] The crystal structure of the ARH domain of NSP3 (6WCF, 1.06 Å resolution)[18] was prepared for ligand docking using Schrodinger Suite 2019. The following protocols that are well established in our groups,[19−21] Glide,[22−24] and AutoDock Vina[25] were used to screen the library in a hierarchical order. The binding pocket was defined as a 30 Å box centered on the superimposed ADP-ribose ligand. The library of 3.5 million compounds was screened using Glide HTVS. The top 10%-scoring compounds were redocked using Glide SP, and the top 10%-scoring compounds were redocked using Glide XP.[24] The top 6000 hits from Glide XP were re-evaluated using a complementary scoring function in Autodock Vina.[25] The poses from Glide XP and Autodock Vina were visualized and analyzed for interactions and chemical diversity to selected candidates for purchase.

Determination of Binding Affinity of Potential Inhibitors to the ARH Domain of NSP3 by Microscale Thermophoresis (MST)

Labeling of the ARH Domain

The ARH domain of NSP3, purified to homogeneity as outlined above, was labeled with the dye RED-NHS 2nd Generation (NanoTemper Technologies, Munich, Germany) following the manufacturer’s instruction. Prior to labeling, 100 μL of 10 μM solution of the ARH domain was buffer-exchanged with the labeling buffer (130 mM NaHCO3, 50 mM NaCl, and pH 8.2–8.3) twice using the Zeba spin desalting column, 7K MWCO, (Thermo Fisher Scientific, Waltham, MA) by centrifugation at 1500g for 2 min. The ARH domain (90 μL) was labeled by incubating with 10 μL of 300 μM dye solution (in DMSO) for 30 min at room temperature in the dark. The labeled ARH domain was separated from the uncoupled dye using a PD-10 desalting column (Cytiva Life Sciences, Marlborough, MA) with 1× PBS. The concentration of the labeled ARH domain was calculated based on the equation in the labeling instruction from NanoTemper Technologies.

Assessment of the Compound Library

The binding of the compounds to the labeled ARH domain was assessed by microscale thermophoresis technology (MST) with the Monolith NT.115 Pico (NanoTemper Technologies, Munich, Germany). Briefly, 10 mM of each compound was prepared in DMSO as the stock solution. It was diluted to 1 mM in DMSO and subsequently further diluted to 50 μM in the assay buffer (1× PBS, 0.05% Tween-20). The labeled ARH domain was also diluted to 10 nM with the assay buffer. All solutions were centrifuged at 14,000g for 5 min to remove any precipitants. The supernatants were carefully transferred into fresh microcentrifuge tubes. The diluted ARH domain (25 μL) was mixed with 25 μL of each compound solution and also with 25 μM assay buffer containing 5% DMSO as a control. The capillaries were filled with the mixtures: four for the control and four for the compound. The binding of the compound was determined by comparing the MST traces of the compound capillaries with those of the control capillaries in the Monolith NT.115 Pico. The MST traces were analyzed with MO.Affinity Analysis software (NanoTemper Technologies, Munich, Germany).

Determination of the Dissociation Constants (Kd) of the Compounds

MST was also used to determine the dissociation constant (Kd) values of the compounds exhibiting binding to the ARH domain. Briefly, 800 μM compound in the assay buffer containing 8% DMSO was used as the initial concentration for twofold serial dilution to prepare 12 different concentrations in the assay buffer (1× PBS containing 0.05% Tween-20) supplemented with 8% DMSO. The labeled ARH domain was also diluted to 10 nM with the assay buffer. All solutions were centrifuged at 14,000g for 5 min. Finally, 10 μL of the diluted ARH domain was mixed with 10 μL of each dilution of the compound, followed by filling 12 capillaries with the mixtures. The MST trace of each capillary was measured with the Monolith NT.115 Pico. The MST traces were analyzed with MO.Affinity Analysis software to calculate the Kd values. The determination was performed in triplicate.

Results and Discussion

Overview of the Proteome

The functions of several SARS-CoV-2 proteins have been determined experimentally. Others have functions proposed based on our understanding of related coronaviruses such as SARS-CoV, MERS-CoV, and SARS-like-BatCoV or based on sequence analysis.[26−28] For a few proteins, functions can currently not be surmised. We hypothesize that the spare genome of SARS-CoV-2 indicates that all these genes are essential for the virus. A gene knockout library for SARS-CoV-2 does not exist to assess the essential nature of each gene. The putative functions of SARS-CoV-2 proteins are listed in Table . NSP3 is a large protein of ∼217 kDa, containing multiple domains, the N-terminal ubiquitin-like domain, ARH domain (X domain), PLpro domain, 3Ecto domain (interacting with NSP4), and C-terminal Y1/CoV-Y domain.[29,30] As described above, the PLpro domain produces NSP1, NSP2, and NSP3 from PP1a and PP1ab by cleaving the C-terminus of the Leu-Xaa-Gly-Gly sequence.[29] The PLpro domain is one of the targets of known antiviral agents developed prior to emergence of SARS-CoV-2[30−32] and the focus of ongoing drug-repurposing efforts. Its inhibition would block the maturation of NSP3 and NSP4, which are involved in the formation of double-membrane vesicles in which viral RNA synthesis takes place, resulting in rapid degradation of the viral RNA by the host enzymes. NSP5 (containing 3CLpro) is the second protease of SARS-CoV-2, consisting of the N-terminal finger, the catalytic domain, and the C-terminal domain.[33] It cleaves 11 sites within PP1a and PP1ab to produce NSP4–NSP16 by recognizing the (Leu/Val/Phe/Met)-Gln-(Ser/Ala/Gly/Asn) sequence as its cleavage site.[33,34] NSP12 is a large protein of ∼107 kDa and the only RNA-dependent RNA polymerase (RdRp) responsible for viral RNA synthesis. It replicates and transcribes viral RNA with its essential co-factors NSP7 and NSP8, which work as primases.[35] It also forms a complex with a single-stranded RNA-binding protein NSP9, which itself interacts with NSP8.[36] RdRp (NSP12) is targeted by the antiviral agent remdesivir (GS-5734). Remdesivir, a nucleotide-analogue prodrug, was originally developed by Gilead to treat Ebola and Marburg virus infections but failed to show clinical efficacy for these diseases. Repurposing studies in COVID-19 patients showed moderate efficacy, and it is currently the only FDA-approved—emergency-use authorization in May of 2020—antiviral agent for treatment of COVID-19.[37,38] NSP16 is a 2′-O-ribose methyltransferase (OMT) that matures the viral mRNA by the addition of a cap structure at the 5′-terminal of RNA.[39] The OMT activity of NSP16 requires its essential co-factor NSP10.

Table 1

Available Structures (X-ray, NMR, or EM) are from 17 Proteins Encoded in the SARS-CoV-2 Genome and Three Proteins Encoded in the SARS-CoV Genomea

protein	description	experimental structures at PDB
NSP1	inhibits host translation by interacting with the 40S ribosomal subunit[42]	SARS-CoV-2; 7JQB (cryoEM), 7K7P and 7K3N
NSP2	involved in the disruption of intracellular host signaling during SARS-CoV infections[43]
NSP3	multidomain protein including the papain-like protease domain; involved in cutting the N-terminal part of PP1a and PP1ab to mature NSP1, NSP2, and NSP3 itself and altering many of the infected host proteins; interacts with NSP4[29]	SARS-CoV-2; PL^pro domain (6WUU, 6WX4, 6W9C, 6WZU, 6YVA, 6WRH, 7KRX, 7KOK, and 7D6H), ARH domain (6WOJ, 6WEY, 6YWK, 6YWL, 6YWM, 6WEN, 6VXS, 6W02, 6WCF, 6W6Y, 7KXB, 7LG7, 6Z5T, 7BF5, 7C33, 7CZ4, and 7KG3)
NSP4	forms virally induced cytoplasmic double-membrane vesicles necessary for viral replication[44]
NSP5	3C-like protease maturing most of the NSPs (NSP4–NSP16) from PP1ab[33,34,45−48]	SARS-CoV-2; 137 structures (6LU7, 6M2Q, and 135 more)
NSP6	forms virally induced cytoplasmic double-membrane vesicles necessary for viral replication[44]
NSP7	cofactor of RNA-dependent RNA polymerase (NSP12)[35,49,50]	SARS-CoV-2; 7BV1, 7BW4, 7BV2, 6WTC, 6WIQ, 6WQD, 6M71, 7BTF, 6YYT, 6YHU, and 7DCD
NSP8	cofactor of RNA-dependent RNA polymerase (NSP12)[35,49,50]	SARS-CoV-2; 7BV1, 7BW4, 7BV2, 6WTC, 6WIQ, 6WQD, 6M71, 7BTF, 6YYT, 6YHU, and 7DCD
NSP9	single-stranded RNA-binding protein; interacts with NSP8[36]	SARS-CoV-2; 6W4B, 6WXD, and 6W9Q
NSP10	critical co-factor for the activity of 3′ → 5′ exonuclease (NSP14) and 2′-O-ribose methyltransferase (NSP16)[51]	SARS-CoV-2; 6WKQ, 7C2J, 7C2I, 6WKS, 6W61, 6WVN, 6WQ3, 6WJT, 6WRZ, 6W4H, 6W75, 6YZ1, 7KOA, 7L6T, 6ZPE, and 6ZCT
NSP11	the leftover of PP1a fragmentation[52]
NSP12	RNA-dependent RNA polymerase, responsible for replication and transcription of the viral RNA genome[35,49,50,53]	SARS-CoV-2; 7BTF, 7BV1, 7BV2, 6YYT, 7BW4, 7L1F, 7B3B, and 7D4F
NSP13	helicase; interacts with NSP12[54]	SARS-CoV-2; 7NN0 and 7NIO
NSP14	N-terminal 3′ → 5′ exonuclease domain and C-terminal N⁷-guanine-methyltransferase domain[51]	SARS-CoV; 5C8U, 5C8S, 5C8T, and 5NFY
NSP15	uridylate-specific endoribonuclease[55,56]	SARS-CoV-2; 6W01, 6WXC, 6WLC, 6X1B, 6VWW, 7K0R, 7KEG, 5S71, 7K1O, and 7K9P
NSP16	2′-O-ribose methyltransferase mediating the 5′-cap structure of viral mRNAs[39,51]	SARS-CoV-2; 6WKQ, 7C2J, 7C2I, 6WKS, 6W61, 6WVN, 6WQ3, 6WJT, 6WRZ, 6W4H, 6W75, 6YZ1, 7KOA, and 7L6T
S	spike glycoprotein interacting with the host receptor protein ACE2[12,13,40,57−61]	SARS-CoV-2; receptor-binding domain (6VSB, 7BZ5, 6M0J, 6M17, 6W41, 7C01, 6LZG, 6YM0, 6YLA, 6YOR, and 6VW1), S2 domain (6LXT and 6LVN), cryoEM of the full-length (6VXX, 6WPT, 6WPS, 6VYB, 6X29, 6X2C, 6X2A, and 6X2B)
ORF3a	accessory protein; activates the NLRP3 inflammasome[62]	SARS-CoV-2; 7KJR (cryoEM)
ORF3b	accessory protein; IFN antagonist[6]
E	envelope protein; mediates virus morphogenesis and assembly[63,64]	SARS-CoV; 5X29 and 2MM4
M	membrane glycoprotein; involved in virus morphogenesis and assembly[44,63]
ORF6	accessory protein; antagonizes type I interferon[65]
ORF7a	accessory protein; inhibits the antiviral function of bone marrow stromal antigen 2 (BST-2)[66]	SARS-CoV-2; 6W37 and 7CI3
ORF7b	accessory protein and a structural component of the SARS-CoV virion[67]	SARS-CoV; 6W37
ORF8	accessory protein; important for interspecies transmission of the virus; activates the NLRP3 inflammasome[68,69]	SARS-CoV-2; 7JX6
ORF9b	accessory protein; suppresses innate immunity by targeting mitochondria and the MAVS/TRAF3/TRAF6 signalosome[70]	SARS-CoV-2; 7KDT (cryoEM)
ORF9c	unknown[52]
N	nucleocapsid phosphoprotein; packages the viral RNA genome[41,71]	SARS-CoV-2; N-terminal RNA-binding domain (6YI3, 6M3M, 6WKP, 6VYO, and 7CR5), C-terminal dimerization domain (6WZQ, 6WZO, 6WJI, 6YUN, and 7C22)
ORF10	unknown; may not be produced[7]

PDB code(s) of the structures are listed for each protein, when available, together with the attributed functions for the viral proteins.

PDB code(s) of the structures are listed for each protein, when available, together with the attributed functions for the viral proteins. In addition to the NSPs, four structural proteins are also important for the proliferation of the virus. Among them, the S and N proteins are good candidates for drug development. The S protein is a large protein of ∼141 kDa, consisting of the receptor-binding S1 subunit and the membrane-fusion S2 subunit, which are generated by the cleavage of a host protease type II transmembrane serine protease (TMPRSS2).[40] This cleavage is essential for the entry of the virus into the host cell. The C-terminal domain of the S1 subunit binds to the host receptor protein ACE2. Subsequently, the fusion peptide in the S2 subunit interacts with the host cell membrane, resulting in entry of the viral RNA. Blocking the receptor-binding sites of the S1 subunit can protect the host cell from invasion by the virus. Finally, the N protein is a phosphoprotein that packages the viral RNA genome into a ribonucleoprotein complex to protect genomic RNA and is the most abundant viral protein in the infected cell.[41] It forms a dimer through the C-terminal domains, constituting the basic building block of the nucleocapsid. It also plays an important role in enhancing the efficiency of sgRNA transcription and viral replication.[41]

Cloning of the SARS-CoV-2 Genes, Expression in E. coli, and Purification of the Recombinant Proteins

The genome of SARS-CoV-2 was analyzed for prediction of the properties of the translation products. The prediction of the topography (e.g., signal sequences and transmembrane regions) and the secondary structural elements was performed on the server XtalPred-RF.[72] The gene for each protein was codon-optimized and commercially synthesized for protein production in E. coli. We excluded the signal sequences and transmembrane segments from the synthetic genes. We did not synthesize six genes: one encoding NSP6, which possesses eight transmembrane regions, and five that are small proteins/peptides, including the envelope protein E, ORF3b, ORF6, ORF7b, and ORF10. In addition to the full-length versions of the synthetic genes, we also synthesized six additional DNA stretches for the individual proteins comprising the three large polypeptides: for NSP3, the N-terminal ARH domain, the PLpro domain, and the C-terminal domain; for NSP12, the RdRp domain; and for the S protein, the N-terminal S1 subunit and the C-terminal S2 subunit. The M and S proteins are believed to be glycoproteins, and the N protein is a phosphoprotein. These proteins are produced in E. coli without the post-translational modifications. We modified the expression vector pET-28a(+) by replacing the thrombin-cleavage site and the T7-tag with Twin-Strep-tag and the TEV-cleavage site (Figure ). We found that this variation of the plasmid allows for a better purification of the recombinant proteins. The TEV-mediated cleavage in many cases has been cleaner than that of thrombin, which aids purification of the target proteins to homogeneity. The resulting vector was named as pETHST (Figure S1). This vector contains the sites for restriction enzymes NdeI and XhoI, where the desired genes would be inserted. The corresponding resulting proteins have N-terminal His- and Twin-Strep-tags, which will be used for protein purification, but the tags are then removed by treatment with the TEV protease. GenScript prepared the individual synthetic genes in the pUC57-Kan plasmid for each case. The sequences of the synthetic DNA were confirmed. All synthetic DNA and pETHST were purified on agarose gels after digestion with NdeI and XhoI. Each DNA was ligated into pETHST, followed by transformation of E. coli DH5α with the ligation product. The sizes of the resulting plasmids from the transformants were confirmed by digestion with the restriction enzymes NdeI and XhoI in each case. The plasmids exhibiting the correct sizes were used for transformation of the LOBSTR-RIL strain, derived from E. coli BL21(DE3), to express the proteins by IPTG induction.[73] The LOBSTR-RIL strain provides lower contamination for the recombinant His-tagged proteins than its BL21(DE3) background, since it has a deletion of the natural His-rich proteins ArnA and SlyD. The expression of the gene is typically induced with 0.4 mM IPTG at an OD600 of 0.6 for the growth of the cultures, followed by further incubation either at 37 °C for 2 h or at 16 °C for 18 h. In general, the latter method is utilized for superior expression of the proteins. Some proteins show better expression as a soluble form for a short period of gene expression at 37 °C, notwithstanding the fact that most of the proteins are precipitated as inclusion bodies. Therefore, we routinely test both conditions before selecting a suitable one for each protein. The expression in each case was checked by SDS-PAGE (Figure S2). The level of the soluble protein in each construct is listed in Table .

Figure 2

A SARS-CoV-2 gene is preceded by a 6xHis tag, a Twin-Strep tag, and a TEV cleavage site in pETHST. All synthetic SARS-CoV-2 genes except for the nsp1 were inserted between NdeI and XhoI.

Table 2

Expression Levels of 6xHis-Twin-Strep-Tagged SARS-CoV-2 Proteins in E. coli LOBSTR-RIL

protein	expression levela
NSP1	soluble: L, 16; M, 37
	insoluble: N, 16; M, 37
NSP2	soluble: M, 16; L, 37
	insoluble: N, 16; H, 37
NSP3	soluble: L, 16; L, 37
	insoluble: N, 16; N, 37
NSP3 ARH	soluble: M, 16; L, 37
	insoluble: N, 16; H, 37
NSP3 PLpro	soluble: L, 16; N, 37
	insoluble: L, 16; H, 37
NSP3C	soluble: H, 16; L, 37
	insoluble: L, 16; H, 37
NSP4N	soluble: L, 16; N, 37
	insoluble: M, 16; H, 37
NSP4C	soluble: L, 16; L, 37
	insoluble: N, 16; H, 37
NSP5	soluble: N, 16; M, 37
	insoluble: N, 16; L, 37
NSP7	soluble: L, 16; H, 37
	insoluble: N, 16; L, 37
NSP8	soluble: M, 16; H, 37
	insoluble: L, 16; L, 37
NSP9	soluble: L, 16; H, 37
	insoluble: N, 16; L, 37
NSP10	soluble: M, 16; L, 37
	insoluble: N, 16; H, 37
NSP12	soluble: L, 16; L, 37
	insoluble: H, 16; H, 37
NSP12P	soluble: L, 16; L, 37
	insoluble: H, 16; H, 37
NSP13	soluble: M, 16; N, 37
	insoluble: H, 16; H, 37
NSP14	soluble: L, 16; N, 37
	insoluble: L, 16; H, 37
NSP15	soluble: L; 16; L, 37
	insoluble: N, 16; L, 37
NSP16	soluble: M, 16; M, 37
	insoluble: L, 16; H, 37
S	soluble: L, 16; N, 37
	insoluble: H, 16; H, 37
S1	soluble: L, 16; N, 37
	insoluble: H, 16; H, 37
S2	soluble: L, 16; N, 37
	insoluble: M, 16; H, 37
ORF3a	soluble: M, 16; L, 37
	insoluble: L, 16; H, 37
M	soluble: L, 16; N, 37
	insoluble: L, 16; H, 37
ORF7a	soluble: L, 16; L, 37
	insoluble: L, 16; H, 37
ORF8	soluble: L, 16; N, 37
	insoluble: M, 16; H, 37
ORF9b	soluble: L, 16; M, 37
	insoluble: N, 16; M, 37
N	soluble: L, 16; L, 37
	insoluble: M, 16; H, 37
NS	soluble: M, 16; H, 37
	insoluble: N, 16; M, 37

L, low; M, moderate; H, high; N, none; 16 for 16 °C; and 37 for 37 °C. For example, L, 16 indicates low expression at 16 °C.

A SARS-CoV-2 gene is preceded by a 6xHis tag, a Twin-Strep tag, and a TEV cleavage site in pETHST. All synthetic SARS-CoV-2 genes except for the nsp1 were inserted between NdeI and XhoI. L, low; M, moderate; H, high; N, none; 16 for 16 °C; and 37 for 37 °C. For example, L, 16 indicates low expression at 16 °C. The following is a typical procedure for purification of the recombinant proteins to homogeneity. Subsequent to gene expression, the simultaneously 6xHis- and Twin-Strep-tagged proteins are purified in two-step affinity chromatography on the Strep-Tactin Sepharose resin (IBA Lifesciences) by eluting them with desthiobiotin, followed by a second purification step using the Protino Ni-NTA agarose resin (Macherey-Nagel) and elution with imidazole, following the manufacturer’s instructions (Figure S3A,B for the ARH domain). The purity of the proteins is routinely checked by SDS-PAGE. In the vast majority of the cases where we have used this procedure, the resultant purified protein is homogeneous. The resultant homogenous preparation was then treated with 6xHis-tagged TEV protease (MCLAB) at 4 °C for 24 h or longer if necessary, in order to cleave the 6xHis- and Twin-Strep-tags. The tag-free SARS-CoV-2 proteins were separated from the cleaved tags and the 6xHis-tagged TEV protease by treatment with the Protino Ni-NTA agarose resin (Figure S3C for the ARH domain). This procedure yielded in all cases a homogenous SARS-CoV-2 protein, as documented for four of the proteins in Figure .

Figure 3

Purification of the N-terminal ARH and PLpro domains of NSP3, NSP15, and NSP16 as representatives. M, molecular weight marker; S, the soluble fraction of the cell extract; I, the insoluble fraction of the cell extract; ST, Strep-Tactin affinity chromatography; Ni, Ni-NTA affinity chromatography; and TEV, cleavage with TEV protease. The protein sizes before and after TEV-protease treatment are indicated at the bottom.

Computational Studies of the ARH Domain of SARS-CoV-2

The purpose for the purification of the target proteins is their use in in silico and in vitro screening for inhibitor discovery. As a representative example for this combined approach, we will discuss the discovery of initial hits for the inhibition of the NSP3 ARH domain. NSP3 is, with 1920 amino acids, the largest protein encoded by the SARS-CoV-2 genome and consists of multiple domains.[29] Its function is to evade the post-translational ADP ribosylation that is part of the innate immune response in mammalian cells to viral infection.[74,75] RNA viruses, including coronaviruses, counter this protective step by the host by their ARH (ARH domain) activity.[75−77] This function has been shown in vivo to be essential for pathogenesis of coronavirus infection and that of other RNA viruses.[76,78−80] In light of the presence of the ARH domain in coronaviruses and other viruses, the ARH activity is likely a broad-spectrum target for antiviral compound discovery.[75,81] Multiple crystal structures for the ARH domain have been solved with resolutions in the range of 0.95–2.50 Å. They demonstrate that this domain recognizes ADP-ribose, the product of reversal of ADP-ribosylation, in a C-shaped pocket spanning roughly 15 Å. The binding site can be parsed into three subpockets (Figure B; yellow dotted highlights), which recognize adenosine, the pyrophosphate, and ribose moieties (from left to right). The co-crystalized ligand binds poorly (Ki = 10 μM) but maps out the entire binding site (Figure A,B).[81]

Figure 4

ARH domain of the NSP3 protein of SARS-CoV-2. (A) Stereoview of the ribbon representation of the ARH domain, with the active site in the center. (B) Stereo view of the X-ray structure of the ARH domain depicted as a solvent-accessible surface in complex with ADP-ribose (PDB code 6W02). The adenosine, pyrophosphate, and ribose (from left to right) binding subpockets are shown by yellow broken highlights. (C) Stereoview of the superimposition of 21 top-ranking compounds docked to the NSP3 ARH domain active site (PDB code 6WCF; resolution of 1.06 Å), demonstrating sampling of all three subpockets for binding. The structure-based virtual screening (SBVS) used the high-resolution X-ray crystal structure of the ARH domain (PDB ID 6WCF, 1.06 Å resolution);[18] in workflow, a hierarchical virtual screening workflow with Glide (HTVS, SP, and XP)[22−24] and AutoDock Vina[25] (see the Materials and Methods for details) was used. The final selection of the compounds was conducted by visual inspection of the binding interactions, consensus of the ranking by Glide SP, Glide XP, and AutoDock Vina and the availability of the compounds from commercial vendors. Figure C shows an overlay of 21 top-scoring compounds from virtual screening. The docked structures span all three binding pockets of the active site of the ARH domain. A set of 119 compounds were selected for purchase and for experimental testing. Experimental studies of this first set of compounds identified seven binders to the ARH domain (see Figure S4 in the Supporting Information). Hit expansion of three of the experimentally validated binders by a similarity search in ZINC15 was conducted to identify commercially available compounds in the neighboring chemical space. This resulted in a focused library of 515 compounds. These compounds were further docked to the ARH binding pocket with Glide XP. After visual inspection as described above, 82 analogues were purchased for in vitro evaluation, 13 of which (see Figure S5 in the Supporting Information) were found to be experimentally validated binders.

Assessment of Binding of the Compounds to the ARH Domain of NSP3

As indicated above, a set of 119 compounds from the top-scoring molecules was selected for purchase. They were tested experimentally for binding to the ARH domain by MST. Each compound was prepared as a 10 mM stock solution in DMSO. The compounds were tested in groups of 10 (each at 1 mM in DMSO) in the first-pass experiments (Table S2). Each group was then diluted to 50 μM with the assay buffer (1× PBS, 0.05% Tween-20). The fluorescent-labeled ARH domain was diluted to 10 nM with the assay buffer. Prior to MST determination, all diluted solutions were centrifuged to remove any precipitate that might have existed. The ARH domain was then mixed in 1:1 ratio either with the assay buffer containing 5% DMSO as a control or with the compound mixtures. The binding of the compound groups was determined by measuring the MST traces with the Monolith NT.115 Pico. Any group showing binding with a signal-to-noise (S/N) ratio higher than 5.0 was selected for examination of the individual compounds. Binding was observed for compounds 95, 96, 98, 112, 114, 116, and 119 (Table S3, and Figure S4 for the chemical structures). Compounds 95, 96, and 119 were found to have an S/N ratio >10 and were used to determine the values of their dissociation constants (Kd) (Figure A). For this purpose, higher concentration (800 μM in 8% DMSO) of each compound was used as an initial concentration for 12 twofold serial dilutions with the assay buffer containing 8% DMSO. Each dilution was centrifuged at 14,000g for 5 min prior to mixing with 10 nM ARH domain in 1:1 ratio, resulting in 400 μM, the highest compound concentration in the assay mixture containing 4% DMSO. The binding affinities of three compounds were determined with the Monolith NT.115 Pico by monitoring the MST traces, and their Kd values were calculated using MO.Affinity software. We were able to obtain the Kd values of the compounds 95, 96, and 119 as 45.0 ± 1.8, 65.4 ± 3.8, and 36.3 ± 2.0 μM, respectively.

Figure 5

Structures of compounds and the dose–response plots of compounds for the ARH domain of NSP3. (A) Chemical structures of three compounds in the first set; (B) structures of three derivatives in the second set; and (C) dose–response plots of MST for compounds 200, 147, and 130 from left to right. These binders were subjected to ligand expansion, as described above, which led to a second set of 82 compounds (groups 13–21 in Table S2) that were purchased from MolPort Inc. (Latvia). Their binding to NSP3-ARH was examined as described for the first set. Among them, 13 compounds exhibited binding to NSP3-ARH (Table S4, and Figure S5 for the chemical structures). Three compounds (123, 138, and 200) were derived from 95; four compounds (147, 173, 175, and 190) were derived from 96; and six compounds (121, 122, 127, 128, 129, and 130) were derived from 119. The binding affinity of the individual compound to the ARH domain was determined by MST, as described above, with the exception that the highest concentration of the compounds was 20 μM in 4% DMSO. We could evaluate the Kd values for five of these compounds: two analogues of 95 (123, 1.64 ± 0.70 μM; 200, 0.45 ± 0.28 μM), one analoguw of 96 (147, 1.77 ± 0.25 μM), and two analogues of 119 (128, 3.16 ± 2.42 μM; 130, 1.26 ± 1.01 μM). Figure B,C shows the chemical structures and the dose–response plots of the best analogues (200, 147, and 130) of each parental compound. These hits include one submicromolar and four low-micromolar binders to the ARH domain of NSP3. The work described here outlines the means to produce disparate SARS-CoV-2 proteins and their use in discovery of potential inhibitors for each, as demonstrated for NSP3-ARH as a proof of concept. These hits form the basis for medicinal chemistry development around the structural templates in search of antivirals conceived expressly for SARS-CoV-2. The field is open for such discovery efforts for most of these viral targets.

78 in total

1. Severe Acute Respiratory Syndrome Coronavirus ORF7a Inhibits Bone Marrow Stromal Antigen 2 Virion Tethering through a Novel Mechanism of Glycosylation Interference.

Authors: Justin K Taylor; Christopher M Coleman; Sandra Postel; Jeanne M Sisk; John G Bernbaum; Thiagarajan Venkataraman; Eric J Sundberg; Matthew B Frieman
Journal: J Virol Date: 2015-09-16 Impact factor: 5.103

2. Severe acute respiratory syndrome coronavirus nonstructural protein 2 interacts with a host protein complex involved in mitochondrial biogenesis and intracellular signaling.

Authors: Cromwell T Cornillez-Ty; Lujian Liao; John R Yates; Peter Kuhn; Michael J Buchmeier
Journal: J Virol Date: 2009-07-29 Impact factor: 5.103

3. Hypothesis: angiotensin-converting enzyme inhibitors and angiotensin receptor blockers may increase the risk of severe COVID-19.

Authors: James H Diaz
Journal: J Travel Med Date: 2020-05-18 Impact factor: 8.490

4. Severe Acute Respiratory Syndrome (SARS) Coronavirus ORF8 Protein Is Acquired from SARS-Related Coronavirus from Greater Horseshoe Bats through Recombination.

Authors: Susanna K P Lau; Yun Feng; Honglin Chen; Hayes K H Luk; Wei-Hong Yang; Kenneth S M Li; Yu-Zhen Zhang; Yi Huang; Zhi-Zhong Song; Wang-Ngai Chow; Rachel Y Y Fan; Syed Shakeel Ahmed; Hazel C Yeung; Carol S F Lam; Jian-Piao Cai; Samson S Y Wong; Jasper F W Chan; Kwok-Yung Yuen; Hai-Lin Zhang; Patrick C Y Woo
Journal: J Virol Date: 2015-08-12 Impact factor: 5.103

5. A pneumonia outbreak associated with a new coronavirus of probable bat origin.

Authors: Peng Zhou; Xing-Lou Yang; Xian-Guang Wang; Ben Hu; Lei Zhang; Wei Zhang; Hao-Rui Si; Yan Zhu; Bei Li; Chao-Lin Huang; Hui-Dong Chen; Jing Chen; Yun Luo; Hua Guo; Ren-Di Jiang; Mei-Qin Liu; Ying Chen; Xu-Rui Shen; Xi Wang; Xiao-Shuang Zheng; Kai Zhao; Quan-Jiao Chen; Fei Deng; Lin-Lin Liu; Bing Yan; Fa-Xian Zhan; Yan-Yi Wang; Geng-Fu Xiao; Zheng-Li Shi
Journal: Nature Date: 2020-02-03 Impact factor: 69.504

6. Structure-based design of antiviral drug candidates targeting the SARS-CoV-2 main protease.

Authors: Wenhao Dai; Bing Zhang; Xia-Ming Jiang; Haixia Su; Jian Li; Yao Zhao; Xiong Xie; Zhenming Jin; Jingjing Peng; Fengjiang Liu; Chunpu Li; You Li; Fang Bai; Haofeng Wang; Xi Cheng; Xiaobo Cen; Shulei Hu; Xiuna Yang; Jiang Wang; Xiang Liu; Gengfu Xiao; Hualiang Jiang; Zihe Rao; Lei-Ke Zhang; Yechun Xu; Haitao Yang; Hong Liu
Journal: Science Date: 2020-04-22 Impact factor: 47.728

7. Structural basis for inhibition of the RNA-dependent RNA polymerase from SARS-CoV-2 by remdesivir.

Authors: Wanchao Yin; Chunyou Mao; Xiaodong Luan; Dan-Dan Shen; Qingya Shen; Haixia Su; Xiaoxi Wang; Fulai Zhou; Wenfeng Zhao; Minqi Gao; Shenghai Chang; Yuan-Chao Xie; Guanghui Tian; He-Wei Jiang; Sheng-Ce Tao; Jingshan Shen; Yi Jiang; Hualiang Jiang; Yechun Xu; Shuyang Zhang; Yan Zhang; H Eric Xu
Journal: Science Date: 2020-05-01 Impact factor: 47.728