Yuekun Lang1, Ke Chen1, Zhong Li1, Hongmin Li2. 1. Wadsworth Center, New York State Department of Health, 120 New Scotland Ave, Albany, NY 12208, USA. 2. Wadsworth Center, New York State Department of Health, 120 New Scotland Ave, Albany, NY 12208, USA; Department of Biomedical Sciences, School of Public Health, University at Albany, 1 University Place, Rensselaer, NY 12144, USA. Electronic address: Hongmin.li@health.ny.gov.
Abstract
Betacoronaviruses are in one genera of coronaviruses including severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), severe acute respiratory syndrome coronavirus (SARS-CoV), Middle East respiratory syndrome-related coronavirus (MERS-CoV), etc. These viruses threaten public health and cause dramatic economic losses. The nucleocapsid (N) protein is a structural protein of betacoronaviruses with multiple functions such as forming viral capsids with viral RNA, interacting with viral membrane protein to form the virus core with RNA, binding to several cellular kinases for signal transductions, etc. In this review, we highlighted the potential of the N protein as a suitable antiviral target from different perspectives, including structure, functions, and antiviral strategies for combatting betacoronaviruses.
Betacoronaviruses are in one genera of coronaviruses including severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), severe acute respiratory syndrome coronavirus (SARS-CoV), Middle East respiratory syndrome-related coronavirus (MERS-CoV), etc. These viruses threaten public health and cause dramatic economic losses. The nucleocapsid (N) protein is a structural protein of betacoronaviruses with multiple functions such as forming viral capsids with viral RNA, interacting with viral membrane protein to form the virus core with RNA, binding to several cellular kinases for signal transductions, etc. In this review, we highlighted the potential of the Nprotein as a suitable antiviral target from different perspectives, including structure, functions, and antiviral strategies for combatting betacoronaviruses.
There are four genera of coronaviruses (CoVs) in the subfamily Orthocoronavirinae of Coronaviridae family: alphacoronaviruses (α-CoVs), betacoronaviruses (β-CoVs), gammacoronaviruses (γ-CoVs), and deltacoronaviruses (δ-CoVs) [1]. CoVs have been identified in different mammals and fowl such as dogs, cats, horses, bats, cattle, swine, mice, whales, monkeys, ferrets, camels, turkeys, and chickens [2]. In humans, zoonotic-origin β-CoVs are of the greatest importance. The human coronaviruses (HCoVs), including HCoV-HKU1 and HCoV-OC43, are responsible for 10%–20% of common colds [2]. SARS-CoV caused the 2002–2003 severe acute respiratory syndrome (SARS) pandemic; Middle East respiratory syndrome (MERS)-CoV led to the 2012 MERS epidemic in the Middle East; and SARS-CoV-2 resulted in the coronavirus disease 2019 (COVID-19) pandemic. Diseases related to β-CoVs in humans range from asymptomatic to respiratory infections, enteric infections, encephalitis, and the worst outcome, death [[3], [4], [5], [6], [7], [8]]. Currently, there are four lineages within the genus Betacoronavirus: lineage A (HCoV-OC43, HCoV-HKU1, etc.), lineage B (SARS-CoV, SARS-CoV-2, etc.), lineage C (MERS-CoV, Tylonycteris bat coronavirus HKU4 (BtCoV-HKU4), etc.), and lineage D (Rousettus bat coronavirus HKU9 (BtCoV-HKU9), etc.) [9]. In this review, we will mainly focus on the antiviral aspects of the Nprotein in zoonotic β-CoVs.CoVs infect host cells primarily by receptor binding via the viral spike (S) protein, which mediates viral and host cell membrane fusion triggered by conformational changes of the S protein. After entering the cytoplasm and uncoating, the virus releases the nucleocapsid and the viral genome, followed by initiation of viral replication and transcription in cytoplasm. As the largest known RNA virus, CoVs have a positive-sense, single-stranded RNA with a genome size ranging from 26 to 32 kb packaged in the enveloped virion [10]. Two-thirds of the genome is the 5′.Non-structural protein (NSP) coding regions encoding two overlapping viral replicase proteins, polyproteins 1a (pp1a) and pp1ab [11]. The polyproteins are cleaved into mature NSPs that are related to RNA synthesis by virally encoded papain-like (PL) and 3 chymotrypsin-like (3CL) proteases [[12], [13], [14]]. The last third of the genome encodes structural (SP), envelope (E), membrane (M), and nucleocapsid (N) proteins and nonessential accessory proteins by means of producing subgenomic (sg) mRNAs [11,15]. The replicated RNA genome forms a nucleocapsid with the Nprotein and is packaged into an unmatured virion. The virion is maturated in the Golgi and transported in exocytic vesicles. After plasma membrane fusion, the mature virus is released from the infected cell [16].
Structure of the β-CoV N protein
The Nprotein structure is conserved within different members of β-CoVs. Amino acid (aa) sequence comparisons revealed three distinct and highly conserved domains: a hand-shaped N terminal domain (NTD) (Fig. 1C); a dimerization-related C-terminal domain (CTD) (Fig. 1B); and the disordered central linker region (RNA-binding domain, CLR) (Fig. 1A) [10,14,17]. All three domains are related to RNA binding [18,19]. Because of the characteristics of the Nprotein, such as poor stability and dynamic behavior, no crystal structures were solved for full-length Nproteins of coronaviruses. In solution, the full-length SARS-CoVNprotein predominantly exists as a CTD dimer that is considered to be the basic building blocks of the nucleocapsid (Fig. 1D) [[20], [21], [22]]. A structural model for a di-domain (DD) has been proposed by fitting small angle X-ray scattering data for the SARS-CoVNprotein [23]. The CTD dimer forms as a core with NTDs branching out as two arms connecting to the core via CLR (Fig. 1F). As NTD and CTD are independently folded structural regions, research groups usually study them separately.
Fig. 1
Structure of the β-CoV N protein. (A) C-I-TASSER structure model. The ribbon representations of the structures of NTD (N-terminal RNA-binding domain, red box) and CTD (C-terminal dimerization domain, yellow box) are generated with PyMOL from coordinates in the protein data bank (PDB IDs: NTD, 6M3M; CTD, 6WJI). The relative orientation of NTD, CTD, and CLK are drawn randomly to reflect the dynamic nature of the N protein. (B) CTD. (C) NTD. (D) CTD dimer. (E) Electrostatic surface of the NTD. Blue denotes positive charge potential. Red denotes negative charge potential. Pocket in the square indicates the RNA binding site of NTD; this pocket is various among β-CoVs. (F) The domain organization of the β-CoV N protein. (G) The crystal packing of the CTD 24-mer. Yellow and orange ribbons represent β-CoV viral RNA strands wrapping around the helical oligomer structure. (H) Schematic of the docking of NTD onto the CTD 24–mer complex. The NTD domains are represented by red ellipsoids. Structures were generated using PyMOL (The PyMOL Molecular Graphics System, Version 1.5.0.4 Schrödinger, LLC). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Structure of the β-CoVNprotein. (A) C-I-TASSER structure model. The ribbon representations of the structures of NTD (N-terminal RNA-binding domain, red box) and CTD (C-terminal dimerization domain, yellow box) are generated with PyMOL from coordinates in the protein data bank (PDB IDs: NTD, 6M3M; CTD, 6WJI). The relative orientation of NTD, CTD, and CLK are drawn randomly to reflect the dynamic nature of the Nprotein. (B) CTD. (C) NTD. (D) CTD dimer. (E) Electrostatic surface of the NTD. Blue denotes positive charge potential. Red denotes negative charge potential. Pocket in the square indicates the RNA binding site of NTD; this pocket is various among β-CoVs. (F) The domain organization of the β-CoVNprotein. (G) The crystal packing of the CTD 24-mer. Yellow and orange ribbons represent β-CoV viral RNA strands wrapping around the helical oligomer structure. (H) Schematic of the docking of NTD onto the CTD 24–mer complex. The NTD domains are represented by red ellipsoids. Structures were generated using PyMOL (The PyMOL Molecular Graphics System, Version 1.5.0.4 Schrödinger, LLC). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)Although NTDs are divergent in both sequence and length within Coronaviridae [24], they are relatively conserved within Betacoronavirus genera. As shown in Fig. 2
, NTD has been mapped for HCoV-OC43 (aa 58–195) [25], SARS-CoV-2 (aa 46–174) [26], SARS-CoV (aa 45–181) [27], and MERS-CoV (aa 39–165) [28]. NTD within zoonotic β-CoVs displays a right-handed (loops)-(β-sheet core)-(loops) sandwiched structure which is conserved among all NTD in CoVs (Fig. 2) [26,29]. The hand-shaped NTD is represented by basic fingers, a hydrophobic basic palm, and an acidic wrist (Fig. 1E) [26,30]. The hand-shaped NTD residues in the middle part are more conserved than residues located in the basic fingers and acidic wrist [26]. Based on the crystal structures of β-CoVs-NTD [26,28,31,32], critical residues have been identified related to RNA binding in β-CoVs-NTD. The pockets of NTDs of SARS-CoV-2 and SARS-CoV are distinct from those of HCoV-OC43NTD. For HCoV-OC43, the co-crystal structure of HCoV-OC43NTD with adenosine monophosphate (AMP) revealed the AMP binding site composed of residues Ser 64, Gly 68, Arg 122, Tyr 124, Tyr 126, and Arg 164 [33].
Fig. 2
Domain architectures of β-CoVs N protein. NTD: N-terminal RNA-binding domain; CTD: C-terminal dimerization domain. Multiple sequence alignment of SARS-CoV-2 N-NTD (GenBank: NC_045512.2) with SARS-CoV N-NTD (GenBank: NC_004718.3), MERS-CoV N-NTD (GenBank: NC_019843.3), HCoV-OC43 N-NTD (GenBank: NC_006213.1), HCoV-HKU1 (GenBank: NC_006577.2), Tylonycteris bat coronavirus HKU4 (GenBank: MH002339.1). Red arrows indicate conserved residues for ribonucleotide binding sites, dash-bordered boxes indicate variability of residues in the structural comparisons. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Domain architectures of β-CoVsNprotein. NTD: N-terminal RNA-binding domain; CTD: C-terminal dimerization domain. Multiple sequence alignment of SARS-CoV-2N-NTD (GenBank: NC_045512.2) with SARS-CoV N-NTD (GenBank: NC_004718.3), MERS-CoV N-NTD (GenBank: NC_019843.3), HCoV-OC43N-NTD (GenBank: NC_006213.1), HCoV-HKU1 (GenBank: NC_006577.2), Tylonycteris bat coronavirus HKU4 (GenBank: MH002339.1). Red arrows indicate conserved residues for ribonucleotide binding sites, dash-bordered boxes indicate variability of residues in the structural comparisons. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)The NTD and CTD are linked by CLR, which is rich in serine and arginine residues. The CLR also contains abundant of phosphorylation sites which facilitate the Nprotein to become involved in cell signaling [[34], [35], [36], [37], [38]]. The flexibility of CLR facilitates its direct interactions between the Nprotein and RNAs [23,27,[39], [40], [41]]. However, because there is no structural information for the CLR, and protein phosphorylation is reversible, opposing hypotheses on the function of CLR in Nprotein oligomerization have been proposed based on different facts. Because phosphorylation can stabilize the Nprotein by reducing the total positive charge, hyperphosphorylation of CLR could enhance and regulate oligomerization of DD [42]. In another study, it was reported that oligomerization might be impaired when the CLR is phosphorylated [43,44]. These results suggest that phosphorylation may act as a key to lock or unlock Nprotein oligomerization.Similar to that of NTD, sequence and length of the CTD are relatively conserved within the β-CoV genera, indicating similar structural and functional roles for the CTD [45]. The monomer of the SARS-CoVNprotein is unstable because the CTD folds into an extended conformation with a topology of α1-α2-α3-α4-α5-α6-β1-β2-α7-α8, leading to a large cavity in its center [45,46]. The CTD can be stabilized through domain-swapped dimerization by inserting a β-hairpin of one subunit into the center cavity of the opposite subunit [21,22,45,46,47,48]. Exploration of the crystal structure of SARS CTD revealed a CTD octamer formed by two butterfly-shaped tetramers responsible for RNA-binding activity in the CTD [46]. The dimerization core has been identified at aa 281–365. A positively charged groove related to RNA binding has been mapped to aa 248–280 that contained eight positively charged lysine and arginine residues [46,48]. Based on the crystal structure, a helical supercomplex structure is formed by these octamers and is characterized by a continuous positively charged surface [46]. Through electrostatic interactions between the continuous positively charged surface of the CTD supercomplex and negatively charged RNA strands, the viral RNA strands can bind and wrap around the CTD supercomplex by non-specific charge interactions (Fig. 1G, H) [46,49]. Similar to humanCoV229E (hCoV-229E), all zoonotic β-CoVNPs have C-terminal tail peptides. In hCoV-229E, the disordered C-terminal tail is responsible for dimer–dimer association. It has been reported that peptide N377–389 from the C-terminus of hCoV-229E has an inhibitory effect on viral titer of HCoV-229E [47]. Understanding the mechanism of how the C-terminal tail peptide in the β-CoVNprotein is involved in oligomerization may shed light on identifying antiviral targets for drug discovery to combat β-CoVs by means of disrupting the Nprotein self-association.
Functions of the β-CoV N protein
The primary function of the β-CoVNprotein is to package the viral genomic RNA into nucleocapsids. This structure not only protects the genome, but also further guarantees that the replication and transmission can be done in a timely and reliable manner. Therefore, a correct and stable structure is important for nucleocapsid function. The β-CoVNprotein uses its dimer form as a basic building block for nucleocapsid formation [30]. Based on the crystal structure of SARS-CoV CTD, the dimerization core has been identified at aa 281–365 [46].The Nprotein, M protein, and gRNA form the internal spherical/icosahedral core of CoVs [50,51]. Although the M protein can form virus-like particle (VLP) alone via self-assembly, both particle densities and uniformity are lower than VLPs formed by M protein + Nprotein [50] This suggests that the incorporation of N into M vesicles stabilizes the formation of VLPs. For SARS-CoV, aa 168–208 in the Nprotein and a highly polar and hydrophilic region (aa 194–205) in the M protein are responsible for the interaction via an ionic interaction [[52], [53], [54]]. He et al. demonstrated that serine/arginine-rich motif (SSRSSSRSRGNSR) in CLR at aa 184–196 is also critical for multimerization of the Nprotein [55]. Because the same motif is involved in both N-N and N-M interactions, this region may be critical for maintaining correct conformation of the Nprotein. Because the sequence homology of N and M proteins among zoonotic β-CoVs is relatively conserved, N-M and N-Nprotein interactions of SARS-CoV can be representative [56]. Similar N-M interaction has been observed in mouse hepatitis virus (MHV) and transmissible gastroenteritis virus (TGEV) [51,57,58], suggesting that the CoVs employ the same method to facilitate N-M protein interaction, although the sequence homology of N and M proteins among them is low [59,60].During the early stage of virus replication cycle, the Nprotein accumulates in cytoplasm with viral gRNA and nonstructural protein 3 (nsp3), suggesting the Nprotein might play a critical role in viral transcription and translation [[61], [62], [63]]. In MHV, a commonly used model of β-CoVs, both CLR and NTD are involved, interacting directly with nsp3 as part of the replication-transcription complexes (RTCs) related to CoV RNA synthesis [64,65]. The binding between nsp3 and CLR in MHV may induce a conformational change at the CLR of the Nprotein, subsequently regulating the intracellular localization of the Nprotein [66] and/or other RNA binding functions. In addition, the reverse genetics to rescue SARS-CoV indicates the Nprotein might play a critical role in enhancing the translation of viral mRNAs or enhancing subgenomic transcription [67].The Nprotein can be detected in both cytoplasm and nucleolus for different CoVs, suggesting nucleolar localization of the Nprotein is a shared feature within the coronavirus family and is possibly of functional significance [35,[68], [69], [70], [71]]. For the SARS-CoVNprotein, the nuclear export signal motif identified at aa 324-EVTPSGTWLT-334 (CTD) is the dominant signal in determining Nprotein localization [72]. Phosphorylated Nprotein can be translocated to the cytoplasm from the nucleus by binding to the 14-3-3protein [43,73], although the phosphorylation of the Nprotein happens in both sites [20]. Moreover, the absence of SR-rich domain at the CLR of the SARS-CoVNprotein can dramatically change localization of the Nprotein [55]. The functions of the Nprotein in nucleolus still remain elusive.Through interaction with host components, the Nprotein can inhibit protein translation via EF1α-mediated action [70], modulate the host cell cycle via cyclin-CDK activity regulation [73], induce apoptosis [74], cause lung inflammation in SARS-CoV infectedpatients via activation of cyclooxygenase-2 (COX-2) [75], and inhibit the synthesis of interferons [76,77].
Antiviral strategies for β-CoVs
Since the 1960s, antivirals targeting many different viral diseases have been identified based on different mechanisms [[78], [79], [80]]. Approximately 90 compounds have been formally licensed to use as clinical antiviral therapies; half of these are used for the treatment of humanimmunodeficiency virus (HIV) infection [81,82]. Other approved antiviral drugs are designed for hepadnavirus, hepacivirus, herpes simplex virus, influenza viruses, human cytomegalovirus, varicella-zoster virus, respiratory syncytial virus, and human papillomavirus [82]. However, because of specific licensing, antiviral therapies are still unavailable for many critical emerging viral diseases.Ribavirin, a nucleoside analogue similar to remdesivir, can inhibit a broad spectrum of viruses in vitro and in vivo with pleiotropic mechanisms including inhibition of viral capping enzymes, lethal mutagenesis of viral RNA genomes, and inhibition of viral RNA synthesis [83,84]. However, a single aa change in the viral RdRp can result in resistance in poliovirus; although a 99.3% loss in infectivity was observed after ribavirin treatment, it caused a 9.7-fold increase in mutagenesis [85,86]. The mutant RdRp has a greater fidelity compared with that of the parental strain.Neuraminidase inhibitors (NAIs) are a group of antiviral therapeutics to treat infection caused by influenza A and B viruses. The primary function of NA, a surface protein of influenza viruses, is to cleave sialic acids from the infected cell surface and subsequently release the newly formed mature viruses. After identifying of the mechanism of viral release of influenza, scientists developed NAIs that mimic the structure of sialic acids. After NAI treatment, the mature influenza viruses cannot be released to spread further because of inhibition of NA by NAI [87]. Unfortunately, due to the chronic use of NAIs, several NAI-resistant mutations have been identified in different strains all over the world [[88], [89], [90], [91]]. Although β-CoVs have a relatively low mutation rate due to their proofreading machinery [92], they can still gain resistance to a specific antiviral treatment by mutating over time, which could be a potential risk to public health. To avoid generating directed resistant mutations caused by a single type of treatment, a combination of antiviral therapeutics with different mechanisms should be employed to treat the viral infection. As more than nine million life-years were saved from HIV infection by combination antiretroviral therapy, we should identify multiple antivirals to create a similar therapeutic method to control other important pathogens such as zoonotic β-CoVs [93].Currently, there is no licensed antiviral drug approved for treatment of zoonotic β-CoV infections in the U.S., although the investigational nucleotide analogue remdesivir, an inhibitor of RNA-dependent RNA polymerase (RdRp) in SARS-CoV-2, is in clinical trial program [94]. However, we need to carefully consider putting these drugs into human clinical trials. In addition to potential side effects, antiviral drugs may worsen the situation if drug-resistant strains emerge. Also, unlike most antibiotics, specific antivirals are designed to target specific viral proteins that may be involved in normal human functions instead of simply inactivating the pathogen [95].As shown in Table 1
, there are two main perspectives for therapy development against β-CoVs based on previous experience: 1, virus-directed; 2, host-directed [96]. Because of the pandemic, numerous studies using have been carrying out worldwide and clinical trials are updating on ClinicalTrials.gov almost every day. Numerous strategies have been explored and reviewed previously [96]. In this review, we focus on antiviral strategies targeting the viral Nprotein.
β-CoV antiviral strategies.Since the Nprotein of β-CoVs is a multifunctional structural protein with conserved structure, it is an attractive target for discovery of antiviral drugs. Several groups are developing antiviral drugs targeting Nprotein. Roh, Changhyun found (−)-catechin gallate and (−)-gallocatechin gallate have a remarkable inhibition activity on SARS-CoVNprotein by quantum dots-conjugated RNA oligonucleotide on a biochip platform [97].The compound N-(6-oxo-5,6-dihydrophenanthridin-2-yl)(N,N-dimethylamino)acetamide hydrochloride (PJ34) was identified as an Nprotein inhibitor which can reduce the Nprotein's RNA-binding affinity, leading to inhibition of viral replication at 10 μM (Table 2
) [33]. Subsequently, they found that 6-chloro-7-(2-morpholin-4-yl-ethylamino) quinoxaline-5,8-dione (small-compound H3) worked as an RNA binding inhibitor against HCoV-OC43 by targeting NTD, which can significantly reduce the RNA-binding capacity of Nprotein [98]. For SARS-CoV, nuclear magnetic resonance (NMR) was employed to screen small molecules that bind to the NTD of SARS CoV with low affinity (1 mM) [99]. A compound 6-amino-4-hydroxy-naphthalene-2-sulfonic acid was identified as a potent candidate to bind to the same pocket site of NTD as RNA. For SARS-CoV-2, Kang et al. identified a hydrophobic pocket consisting of Phe 57, Pro 61, Tyr 63, Tyr 102, Tyr 124, and Tyr 126 for SARS-CoV-2 [26]. Therefore, this pocket might be a potential drug targeting site for drug screening, and the same group is doing further research to confirm their hypothesis.
β-CoV inhibitors target Nproteins.In recent research on the MERS-CoVNprotein, an antiviral drug design yielded structure-based stabilization of non-native protein-protein interactions at NTD, leading to an abnormal Nprotein oligomerization (Fig. 3
) [100]. The structure of NTD revealed a conserved hydrophobic pocket consisting of W43, N66, N68, Y102, and F135 of one monomer at the dimerization interface, which accommodates the side chain of M38 of the second monomer. Subsequently, virtual screening by targeting the hydrophobic pocket of the NTD dimeric interface identified 5-benzyloxygramine (P3) as a candidate binder. In cell culture, P3 can significantly suppress Nprotein expression at 100 μM [100]. Although discovery of antiviral drugs usually involves native protein–protein interactions (PPIs), identification and stabilization of non-native PPIs could also be taken into consideration for drug discovery to counter CoV infections.
Fig. 3
Schematic describing the rationale used in designing the allosteric stabilizer of this study. An orthosteric stabilizer is used to bind to the non-native interaction interface of the NTD and stabilize the abnormal interaction between proteins. Then the CTD cannot be packed correctly.
Schematic describing the rationale used in designing the allosteric stabilizer of this study. An orthosteric stabilizer is used to bind to the non-native interaction interface of the NTD and stabilize the abnormal interaction between proteins. Then the CTD cannot be packed correctly.In addition, because only high-biosafety-level labs can handle the virulent strains of zoonotic β-CoVs, hindering the speed of antiviral discovery, it is crucial to find suitable models to study the antivirals in lower-biosafety-level labs to initiate the primary screening of antiviral candidates. VLPs without infectiousproperties should be functional and safe models for this purpose [50]. For example, based on Nprotein functions, we know it mainly serves as a structural protein. So, if the conformation of the Nprotein changes, VLPs might not be able to process successfully [101,102]. Thus, we can use VLPs to screen antiviral candidates at the cell base level.
Conclusions
Currently, we have an urgent unmet medical need for antiviral drugs with different mechanisms to combat β-CoV pandemics or epidemics. The Nprotein of β-CoVs is an attractive and suitable therapeutic target for the following reasons: first, the Nprotein has many conserved sites related to its structural characteristics; second, the Nprotein has multiple functions, some of which are critical for virus replication; and third, we can use VLPs as a tool to initiate the primary screening of antiviral candidates in lower-biosafety-level labs.
Declaration of competing interest
The authors declare no competing financial interest.