Literature DB >> 32654247

Architecture and self-assembly of the SARS-CoV-2 nucleocapsid protein.

Qiaozhen Ye¹, Alan M V West¹, Steve Silletti², Kevin D Corbett^1,2,3.

Abstract

The COVID-2019 pandemic is the most severe acute public health threat of the twenty-first century. To properly address this crisis with both robust testing and novel treatments, we require a deep understanding of the life cycle of the causative agent, the SARS-CoV-2 coronavirus. Here, we examine the architecture and self-assembly properties of the SARS-CoV-2 nucleocapsid protein, which packages viral RNA into new virions. We determined a 1.4 Å resolution crystal structure of this protein's N2b domain, revealing a compact, intertwined dimer similar to that of related coronaviruses including SARS-CoV. While the N2b domain forms a dimer in solution, addition of the C-terminal spacer B/N3 domain mediates formation of a homotetramer. Using hydrogen-deuterium exchange mass spectrometry, we find evidence that at least part of this putatively disordered domain is structured, potentially forming an α-helix that self-associates and cooperates with the N2b domain to mediate tetramer formation. Finally, we map the locations of amino acid substitutions in the N protein from over 38,000 SARS-CoV-2 genome sequences. We find that these substitutions are strongly clustered in the protein's N2a linker domain, and that substitutions within the N1b and N2b domains cluster away from their functional RNA binding and dimerization interfaces. Overall, this work reveals the architecture and self-assembly properties of a key protein in the SARS-CoV-2 life cycle, with implications for both drug design and antibody-based testing.

Entities: Chemical

Keywords: COVID-19; SARS-CoV-2; coronavirus; crystal structure; nucleocapsid

Mesh：

Substances：

Year: 2020 PMID： 32654247 PMCID： PMC7405475 DOI： 10.1002/pro.3909

Source DB: PubMed Journal: Protein Sci ISSN： 0961-8368 Impact factor: 6.993

INTRODUCTION

Severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) , is the third coronavirus to cross from an animal reservoir to infect humans in the twenty‐first century, after SARS‐CoV , and Middle‐East respiratory syndrome (MERS) coronavirus. Isolation and sequencing of SARS‐CoV‐2 were reported in January 2020, and the virus was found to be highly related to SARS and share a probable origin in bats. , Since its emergence in December 2019 in Wuhan, China, the virus has infected over 10.5 million people and caused more than 500,000 deaths as of July 1, 2020 (https://coronavirus.jhu.edu). The high infectivity of SARS‐CoV‐2 and the worldwide spread of this ongoing outbreak highlight the urgent need for public health measures and therapeutics to limit new infections. Moreover, the severity of the atypical pneumonia caused by SARS‐CoV‐2 (COVID‐2019), often requiring multiweek hospital stays and the use of invasive ventilators, , , highlights the need for therapeutics to lessen the severity of individual infections. Current therapeutic strategies against SARS‐CoV‐2 target major points in the life cycle of the virus. The antiviral remdesivir, first developed against Ebola virus, , inhibits the viral RNA‐dependent RNA polymerases of a range of coronaviruses including SARS‐CoV‐2 , , and has shown promise against SARS‐CoV‐2 in small‐scale trials in both primates and humans. , Another target is the viral protease (Mpro/3CLpro), which is required to process viral polyproteins into their active forms. Finally, the transmembrane spike (S) glycoprotein mediates binding to host cells through the angiotensin converting enzyme 2 and transmembrane protease, serine 2 proteins, and mediates fusion of the viral and host cell membranes. , , , As the most prominent surface component of the virus, the spike protein is the major target of antibodies in patients, and is the focus of several current efforts at SARS‐CoV‐2 vaccine development. Initial trials using antibody‐containing plasma of convalescent COVID‐19 patients has also shown promise in lessening the severity of the disease. While the above efforts target viral entry, RNA synthesis, and protein processing, there has so far been less emphasis on other steps in the viral life cycle. One critical step in coronavirus replication is the assembly of the viral genomic RNA and nucleocapsid (N) protein into a ribonucleoprotein (RNP) complex, which interacts with the membrane (M) protein and is packaged into virions. Electron microscopy analysis of related betacoronaviruses has suggested that the RNP complex adopts a helical filament structure, , , , , , but recent cryoelectron tomography analysis of intact SARS‐CoV‐2 virions has revealed a beads‐on‐a‐string like arrangement of globular RNP complexes that sometimes assemble into stacks resembling helical filaments. Despite its location within the viral particle rather than on its surface, patients infected with SARS‐CoV‐2 show higher and earlier antibody responses to the nucleocapsid protein than to the surface spike protein. , As such, a better understanding of the SARS‐CoV‐2 N protein's structure, and structural differences between it and N proteins of related coronaviruses including SARS‐CoV, may aid the development of sensitive and specific immunological tests. Coronavirus N proteins possess a shared domain structure with an N‐terminal RNA‐binding domain and a C‐terminal domain responsible for dimerization. The assembly of the N protein into higher‐order RNP complexes is not well understood, but likely involves cooperative interactions between the dimerization domain and other regions of the protein, plus the bound RNA. , , , , , , , , Here, we present a high‐resolution structure of the SARS‐CoV‐2 N dimerization domain, revealing an intertwined dimer similar to that of related betacoronaviruses. We also analyze the self‐assembly properties of the SARS‐CoV‐2 N protein, and show that higher‐order assembly requires both the dimerization domain and the extended, disordered C‐terminus of the protein. Together with other work revealing the structure and RNA‐binding properties of the nucleocapsid N‐terminal domain, these results lay the groundwork for a comprehensive understanding of SARS‐CoV‐2 nucleocapsid assembly and architecture.

RESULTS

Structure of the

Betacoronavirus nucleocapsid (N) proteins share a common overall domain structure, with ordered RNA‐binding (N1b or N‐terminal domain/NTD) and dimerization (N2b or C‐terminal domain/CTD) domains separated by short regions with high predicted disorder (N1a, N2a, and spacer B/N3; Figure 1a). Self‐association of the full‐length SARS‐CoV N protein and the isolated C‐terminal region (domains N2b plus spacer B/N3; residues 210–422) was first demonstrated by yeast two‐hybrid analysis, and the purified full‐length protein was shown to self‐associate into predominantly dimers in solution. The structures of the N2b domain of SARS‐CoV and several related coronaviruses confirmed the obligate homodimeric structure of this domain, , , , , , , and other work showed that the region C‐terminal to this domain mediates further self‐association into tetramer, hexamer, and potentially higher oligomeric forms. , , Other studies have suggested that the protein's N‐terminal region, including the RNA‐binding N1b domain, can also self‐associate, , highlighting the possibility that assembly of full‐length N into helical filaments is mediated by cooperative interactions among several interfaces.

FIGURE 1

Structure of the severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) nucleocapsid dimerization domain. (a) Domain structure of the SARS‐CoV‐2 nucleocapsid protein, as defined previously, , with plot showing the Jalview alignment conservation score (three‐point smoothed; gray) and DISOPRED3 disorder propensity (red) for nine related coronavirus N proteins (SARS‐CoV, SARS‐CoV‐2, Middle‐East respiratory syndrome [MERS]‐CoV, HCoV‐OC43, HCoV‐HKU1, HCoV‐NL63, and HCoV‐229E, infectious bronchitis virus [IBV], and Murine Hepatitis virus [MHV]). SR, serine/arginine rich domain; SB, spacer B. The boundary between SB and N3 is not well defined due to low identity between SARS‐CoV/SARS‐CoV‐2 and MHV N proteins. All purified truncations are noted at bottom. (b) Top‐down view of the SARS‐CoV‐2 N2b dimer, with one monomer colored as a rainbow (N‐terminus blue, C‐terminus red) and the other colored white. See Figure S1a for comparison with other structures of this domain. (c) Structural overlay of the SARS‐CoV‐2 N2b dimer (blue) and the equivalent domain of SARS‐CoV‐N (PDB ID 2CJR) To characterize the structure and self‐assembly properties of the SARS‐CoV‐2 nucleocapsid, we first cloned and purified the protein's N2b dimerization domain (N2b; residues 247–364). , We crystallized and determined two high‐resolution crystal structures of N2b; a 1.45 Å resolution structure of His6‐tagged N2b at pH 8.5, and a 1.42 Å resolution structure of N2b after His6‐tag cleavage, at pH 4.5 (see Section 4 and Table S1). These structures reveal a compact, tightly intertwined dimer with a central four‐stranded β‐sheet comprising the bulk of the dimer interface (Figure 1b). This interface is composed of two β‐strands and a short α‐helix from each protomer that extend toward the opposite protomer and pack against its hydrophobic core. The asymmetric units of both structures contain two N2b dimers, giving four crystallographically independent views of the N2b dimer. These four dimers differ only slightly, showing overall Cα r.m.s.d values of 0.15–0.19 Å and with most variation arising from loop regions (Figure S1a). Our structures also overlay closely with four other recently deposited structures of the SARS‐CoV‐2 N2b domain (PDB IDs 6WJI, 6YUN, 6ZCO, and 7C22; all unpublished). One of these structures (PDB ID 7C22) adopts the same space group and unit cell parameters as our structure of untagged N2b. Including all of these structures, there are now nine independent crystallographic views of the SARS‐CoV‐2 N2b domain dimer (17 total protomers; the 6ZCO dimer is assembled from crystal symmetry) in five crystal forms at pH 4.5, 7.5, 7.8, and 8.5 (crystallization pH for 6ZCO is not reported). All of these structures overlay closely, with an overall Cα r.m.s.d of 0.15–0.31 Å (Figure S1a). The structure of N2b closely resembles that of related coronaviruses, including SARS‐CoV, infectious bronchitis virus, MERS‐CoV, and HCoV‐NL63. , , , , , The structure is particularly similar to that of SARS‐CoV, with which the N2b domain shares 96% sequence identity; only five residues differ between these proteins' N2b domains (SARS‐CoV Gln268 ➔ SARS‐CoV‐2 A267, D291 ➔ E290, H335 ➔ Thr334, Gln346 ➔ Asn345, and Asn350 ➔ Gln349), and the structures are correspondingly similar with an overall Cα r.m.s.d of 0.314 Å across the N2b dimer (Figure 1c). A crystal structure of the SARS‐CoV N protein revealed a helical assembly of N2b domain dimers that was proposed as a possible structure for the observed helical nucleocapsid filaments in virions. We therefore examined the packing of N2b domain dimers in the six crystal structures of this domain, five of which show distinct space groups and unit cell parameters. We identified two dimer–dimer packing modes that appear in multiple crystal forms, with packing mode 1 appearing in five structures, and packing mode 2 appearing in four (Figure S1b). Neither of these packing modes would result in the assembly of a helical filament if repeated, nor do the dimer–dimer interfaces strongly correlate with conserved surfaces on the N2b domain. This evidence, combined with our finding that N2b forms solely dimers in solution (see below), suggests that packing of N2b domain dimers does not underlie higher‐order assembly of SARS‐CoV‐2 N protein filaments.

N protein variation in

Since the first genome sequence of SARS‐CoV‐2 was reported in January 2020, , over 38,000 full genomic sequences have been deposited in public databases (as of June 3, 2020). To examine the variability of the N protein in these sequences, we downloaded a comprehensive list of reported mutations within the SARS‐CoV‐2 N gene in a set of 38,318 genome sequences from the China National Center for Bioinformation, 2019 Novel Coronavirus Resource. Among these sequences, there are 10,979 instances of amino acid substitutions spread across 250 of the 419 amino acids of the N protein (Figure 2a, Table S2). While many of these substitutions arise only once in our dataset and may therefore reflect errors in sequencing or sequence assembly, most likely reflect true variation among circulating strains of SARS‐CoV‐2. As a whole, the reported substitutions are enriched in the three intrinsically disordered domains (N1a, N2a, and spacer B/N3), with a particularly high density of substitutions in the serine/arginine‐rich subdomain of N2a (SR in Figure 2a). The most common substitutions are R203K and G204R, which occur together as the result of a common trinucleotide substitution in genomic positions 28,881–28,883, from GGG to AAC (~4,100 of the 38,318 sequences in our dataset; Figure S2a,b). While positions 203 and 204 accounted for over two‐thirds of the total individual amino acid substitutions in this dataset, the N2a domain shows a strong enrichment of mutations even when these positions are not considered (Figure 2a). In contrast to the enrichment of missense mutations in the N2a domain, synonymous mutations were distributed relatively equally throughout the protein (Figure S2c, Table S2). Thus, these data suggest that the N2a domain is uniquely tolerant of mutations, in keeping with its likely structural role as a disordered linker between the RNA‐binding N1b domain and the N2b dimerization domain.

FIGURE 2

N protein variability in severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) patient sequences. (a) Top: Plot showing the number of observed amino acid variants at each position in the N gene in 38,318 SARS‐CoV‐2 genomes (details in Table S2). The most highly‐mutated positions are R203 and G204, which are each mutated more than 4,000 times due to a prevalent trinucleotide substitution (Figure S2a,b). Bottom: Plots showing amino acid variants in the N1b and N2b domains. (b) Surface views of the N protein N1b domain (PDB ID 6VYO; Center for Structural Genomics of Infectious Diseases [CSGID], unpublished). At left, blue indicates RNA‐binding residues identified by NMR peak shifts (A50, T57, H59, R92, I94, S105, R107, R149, and Y172). At right, two views colored by the number of variants at each position observed in a set of 38,318 SARS‐CoV‐2 genomes. The two most frequently mutated residues are shown in stick view and labeled. Only one mutation (A50E, observed in one sequence) overlaps the putative RNA binding surface. (c) Cartoon view of the N protein N2b domain, with one monomer colored gray and the other colored by the number of variants at each position observed in a set of 38,318 SARS‐CoV‐2 genomes. The four most frequently mutated residues are shown in stick view and labeled While the majority of N protein mutations are in the N2a domain, we nonetheless identified 345 instances of amino acid variants in the RNA‐binding N1b domain, and 315 instances in the N2b domain. We mapped these onto high‐resolution structures of both domains (Figure 2b,c). Two high‐resolution crystal structures of the SARS‐CoV‐2 N1b domain have been determined (PDB ID 6M3M and 6VYO), and a recent NMR study determined a solution structure of the domain and defined its likely RNA binding surface (Figure 2b). In keeping with its functional importance, the identified RNA binding surface shows only a single mutation in this dataset (Figure 2b; middle panel). In the N2b domain, most mutations occur on surface residues, particularly in loop regions, while the functionally important dimer interface is largely invariant (Figure 2c). Finally, the 38,318 SARS‐CoV‐2 genome sequences contain nine sequences with reported nonsense/premature stop codons in the N protein. Two of these are located at position 256 within the N2b domain, while the remaining seven are located in the spacer B/N3 region between positions 372–418 (Table S2).

Self‐association of the

Our structures of the SARS‐CoV‐2 N protein N2b domain reveal that, as in related coronaviruses, this domain mediates homodimer formation. We next systematically investigated the molecular basis for higher‐order self‐assembly of the SARS‐CoV‐2 nucleocapsid. We first purified the full‐length N protein (NFL) for analysis of its oligomeric state. While our initial attempts at purification yielded large aggregates significantly contaminated with nucleic acid (Figure S3a), purification of the protein in high‐salt buffer (1 M NaCl) and in the presence of both DNase and RNase yielded pure NFL (Figure S3b). Size exclusion chromatography coupled to multiangle light scattering (SEC‐MALS) of purified NFL revealed a heterogeneous population that is predominantly homotetrameric (Figure 3a).

FIGURE 3

The C‐terminus of N mediates tetramer formation. (a) Size exclusion chromatography (Superose 6 Increase 10/300 GL; void volume = 8.4 ml, total volume = 20.5 ml) coupled to multiangle light scattering (SEC‐MALS) analysis of full‐length severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) N. The measured MW of 190.0 kDa closely matches that of a tetramer (182.5 kDa). See Figure S3b for SDS‐PAGE analysis of all purified proteins. (b) SEC‐MALS (Superdex 200 Increase 10/300 GL; void volume = 7.3 ml, total volume = 20.6 ml; used for Panels (b–f)) analysis of SARS‐CoV‐2 N1ab (residues 2–175). The measured MW of 20.8 kDa closely matches that of a monomer (18.9 kDa). dRI, differential refractive index. (c) SEC‐MALS analysis of SARS‐CoV‐2 N1ab2a (residues 2–246). The measured MW of 25.0 kDa is slightly less than that of a monomer (26.2 kDa), reflecting partial proteolysis within the N2a domain (Figure S3b). (d) SEC‐MALS analysis of SARS‐CoV‐2 N2b. The measured MW (31.5 kDa) closely matches that of a homodimer (26.5 kDa). (e) SEC‐MALS analysis of SARS‐CoV‐2 N2b3. The measured MW (75.6 kDa) closely matches that of a homotetramer (77.4 kDa). (f) SEC‐MALS analysis of maltose binding protein (MBP)‐SARS‐CoV‐2 N3 (“Peak 1” black/dark blue; “Peak 2” gray/light blue) The measured MW of Peak 1 (101.9 kDa) and Peak 2 (48.9 kDa) closely match those of a homodimer (101.7 kDa) and a monomer (50.9 kDa). The small peak at 10.5 ml suggests higher‐order self‐assembly. (g) Schematic summary of size exclusion and SEC‐MALS results on N protein constructs. See Figure S3c,d for SEC‐MALS analysis of N1b (residues 49–174) and N1b2a (residues 49–246). (h) Possible configurations of a SARS‐CoV‐2 N protein tetramer. Dimerization is mediated by the N2b domain, and these dimers self‐associate through the N3 region to form homotetramers. Left: Parallel arrangement of the putative N3 domain α‐helices; Right: antiparallel arrangement To determine the molecular basis for homotetramer assembly, we purified a series of truncation constructs encompassing the ordered N1b and N2b domains and their associated linker domains (N1a, N2a, and spacer B/N3; Figure 1a). We characterized four truncations encompassing the protein's N‐terminal regions, including N1ab (residues 2–175), N1b (residues 49–175), N1ab2a (residues 2–246), and N1b2a (residues 49–246). All four of these truncations are monomeric in solution as determined by SEC‐MALS (Figures 3b,c and S3c,d). We next analyzed N2b, which forms a homodimer in our crystal structures. As expected, N2b is dimeric in solution (Figure 3d). Finally, we analyzed the contribution of the C‐terminal spacer B/N3 region to N protein self‐assembly. Prior work with the Murine Hepatitis Virus N protein showed that this region can, on its own, incorporate into nucleocapsid structures that lack the associated Membrane (M) protein, suggesting that the region mediates a homotypic interaction between N proteins. Other work with SARS‐CoV and HCoV‐229E N proteins also found that the C‐terminal spacer B/N3 region is required for higher‐order assembly of tetramers and larger oligomers. , , We purified a construct encoding N2b and the spacer B/N3 region (N2b3, residues 247–419) and found that it forms a homotetramer (Figure 3e). We also analyzed self‐assembly of the spacer B/N3 region on its own by performing SEC‐MALS analysis on this isolated region (N3, residues 365–419) fused to a His6‐maltose binding protein (MBP) tag. Initial purification of His6‐MBP‐N3 yielded two peaks on the final size exclusion column, which we separately pooled and analyzed by SEC‐MALS. We found that these two peaks correspond to a monomer and a dimer, respectively (Figure 3f). The pooled dimer population also showed a small population of potentially higher‐order oligomers (Figure 3f). Together, these data suggest that assembly of betacoronavirus N protein filaments likely proceeds through at least three steps, each mediated by different oligomerization interfaces: (a) dimerization mediated by the N2b domain; (b) tetramerization mediated by the spacer B/N3 region (Figure 3g,h); and (c) oligomer/filament assembly mediated by cooperative RNA binding and potential higher‐order self‐association of N homotetramers. To gain structural insight into how the spacer B/N3 region mediates N protein self‐association, we performed hydrogen‐deuterium exchange mass spectrometry (HDX‐MS) on N2b and N2b3 (Figure 4). By probing the rate of exchange of amide hydrogen atoms with deuterium atoms in a D2O solvent, HDX‐MS provides information on the level of secondary structure and solvent accessibility across an entire protein. We found that H‐D exchange rates within N2b largely agreed with our crystal structure: regions in β‐strands or α‐helices showed low exchange rates consistent with high order, while loop regions showed increased exchange rates consistent with their likely flexibility (Figure 4a,c,d).

FIGURE 4

Hydrogen‐deuterium exchange mass spectrometry (HDX‐MS) analysis of N2b and N2b3. (a) Schematic showing the N2b sequence and structure, plus protein regions detected by HDX‐MS. Each peptide is colored by its fractional deuterium uptake during the course of the experiment (blue–white–magenta = 0–100% fractional uptake). (b) Schematic showing the N2b3 sequence and inferred structure (the α‐helix spanning residues 400–416 is predicted by PSI‐PRED), plus protein regions detected by HDX‐MS. Two sets of exchange rates are shown: fractional deuterium uptake in N2b3 (upper box) colored as in Panel (a), and relative uptake comparing N2b and N2b3 (lower box). (c) Structure of the N2b dimer, with one monomer colored by fractional deuterium uptake (blue–white–magenta = 0–75% fractional uptake). (d) Uptake plots for two peptides within the ordered N2b domain, with uptake in N2b indicated in blue and uptake in N2b3 indicated in green. The peptide covering residues 323–329 (located within a loop) is relatively exposed, while the peptide covering residues 330–336 (within a β‐strand) is strongly protected from H‐D exchange. (e) Uptake plots for three peptides in the C‐terminal region of N2b3, plotted by fractional deuterium uptake. Peptides covering residues 395–402 (yellow) and 403–411 (red) show more protection than residues 404–419, suggesting that this region is partially structured. See Figure S4a for each peptide plotted by absolute deuterium uptake Compared to N2b, N2b3 contains an additional 56 amino acids (residues 365–419). While residues 360–394 were not detected in our HDX‐MS analysis, we detected spectra for seven overlapping peptides spanning residues 395–419 at the protein's extreme C‐terminus (Figure 4b). While all of these peptides exhibited higher levels of exchange than the ordered N2b domain, peptides spanning the N‐terminal part of this region (particularly residues 395–402) showed a degree of protection compared to those at the extreme C‐terminus (residues 404–419; Figure 4e). This finding suggests that at least part of the spacer B/N3 domain possesses secondary structure and may mediate N2b3 tetramer formation. Indeed, a recent molecular dynamics simulation of the N3 domain suggests the existence of an α‐helix spanning residues 400–411 in this domain, and our own analysis using the PSI‐PRED server suggests an α‐helix spanning residues 400–416 (Figure 1a). We next compared HDX‐MS exchange rates of N2b versus N2b3 for peptides within the N2b domain. We reasoned that if the C‐terminus of N2b3 mediates tetramer formation, it may do so by docking against a surface in the N2b domain, which may be detectable by reduced deuterium uptake in the involved region. Contrary to this expectation, we found that the H‐D exchange rates within the N2b domain were nearly identical between the two constructs, varying at most 2% in fractional deuterium uptake in individual peptides (Figure 4b,d). While these data do not rule out the possibility that the spacer B/N3 region docks against N2b, they nonetheless support our SEC‐MALS data showing that spacer B/N3 independently self‐associates to mediate N protein tetramer formation. We attempted to test this idea by measuring the association of His6‐MBP‐N3 with N2b, N2b3, and NFL in a pull‐down assay (Figure S4b). We were unable to detect binding of His6‐MBP‐N3 to any of these three constructs. As both N2b3 and NFL likely exist as pre‐formed tetramers, their failure to interact with additional His6‐MBP‐N3 is not surprising. The inability of N2b to interact with His6‐MBP‐N3, however, is consistent with the idea that the spacer B/N3 domain self‐interacts rather than binding the N2b domain.

DISCUSSION

Given the severity of the ongoing COVID‐19 pandemic, a deep understanding of the SARS‐CoV‐2 life cycle is urgently needed. Here, we examine the architecture and self‐assembly properties of the SARS‐CoV‐2 nucleocapsid protein, a key player in viral replication responsible for packaging viral RNA into new virions. Through two high‐resolution crystal structures, we show that this protein's N2b domain forms a compact, strand‐swapped dimer similar to that of related betacoronaviruses. While the N2b domain mediates dimer formation, we find that addition of the C‐terminal spacer B/N3 domain mediates formation of a robust homotetramer. We envision two possible modes of N protein tetramer assembly based on either parallel or antiparallel arrangement of the putative α‐helices in the N3 domain (Figure 3h). How these tetramers interact with viral RNA and self‐assemble into either helical filaments or the more recently observed globular viral RNP complexes will require higher‐level reconstitution and/or high‐resolution analysis of the internal structure of SARS‐CoV‐2 virions. Given the importance of nucleocapsid‐mediated RNA packaging to the viral life cycle, small molecules that inhibit nucleocapsid self‐assembly or mediate aberrant assembly may be effective at reducing the severity of infections and the infectivity of patients. The high resolution of our crystal structures will enable their use in virtual screening efforts to identify such nucleocapsid assembly modulators. Given the high conservation of the N2b domain in betacoronaviruses, these assembly modulators may also be effective at countering related viruses including SARS‐CoV. As SARS‐CoV‐2 is unlikely to be the last betacoronavirus to jump from an animal reservoir to humans, the availability of such treatments could drastically alter the course of future epidemics. The SARS‐CoV‐2 genome has been subject to unprecedented scrutiny, with over 38,000 individual genome sequences deposited in public databases as of early June 2020. We used this set of genome sequences to identify over 10,000 instances of amino acid substitutions in the N protein, and showed that these variants are strongly clustered in the protein's N2a linker domain. The ~650 substitutions we identified in the N1b and N2b domains were clustered away from these domains' RNA binding and dimerization interfaces, reflecting the functional importance of these surfaces. Given the early and strong antibody responses to the nucleocapsid displayed by SARS‐CoV‐2 infected patients, the distribution of mutations within this protein should be carefully considered as antibody‐based tests are developed. The high variability of the N2a domain means that individual patient antibodies targeting this domain may not be reliably detected with tests using the reference N protein; especially if these antibodies recognize residues 203 and 204, which are mutated in a large fraction of infections. At the same time, patient antibodies targeting the conserved N1b and N2b domains may in fact cross‐react with nucleocapsids of related coronaviruses like SARS‐CoV. The availability of a panel of purified N protein constructs now makes it possible to systematically examine the epitopes of both patient‐derived and commercial anti‐nucleocapsid antibodies.

MATERIALS AND METHODS

Cloning and protein purification

SARS‐CoV‐2 N protein constructs (NFL [residues 2–419], N1ab [2-175], N1ab2a [2-246], N1b [49-175], N1b2a [49-246], N2b [247-364], N2b3 [247-419]) were amplified by PCR from the IDT 2019‐nCoV N positive control plasmid (IDT cat. # 10006625; NCBI RefSeq YP_009724397) and inserted by ligation‐independent cloning into UC Berkeley Macrolab vector 2B‐T (AmpR, N‐terminal His6‐fusion; Addgene #29666) for expression in Escherichia coli. N3 (residues 365–419) was similarly inserted into UC Berkeley Macrolab vector 2C‐T (AmpR, N‐terminal His6‐MBP fusion; Addgene #29706). Plasmids were transformed into E. coli strain Rosetta 2(DE3) pLysS (Novagen), and grown in the presence of ampicillin and chloramphenical to an OD600 of 0.8 at 37°C, induced with 0.25 mM IPTG, then grown for a further 16 hr at 18°C prior to harvesting by centrifugation. Harvested cells were resuspended in buffer A (25 mM Tris–HCl pH 7.5, 5 mM MgCl2 10% glycerol, 5 mM β‐mercaptoethanol, 1 mM NaN3) plus 500 mM NaCl (1M NaCl for NFL) and 5 mM imidazole pH 8.0. For purification, cells were lysed by sonication, then clarified lysates were loaded onto a Ni2+ affinity column (Ni‐NTA Superflow; Qiagen), washed in buffer A plus 300 mM NaCl and 20 mM imidazole pH 8.0, and eluted in buffer A plus 300 mM NaCl and 400 mM imidazole. For cleavage of His6‐tags, proteins were buffer exchanged in centrifugal concentrators (Amicon Ultra, EMD Millipore) to buffer A plus 300 mM NaCl and 20 mM imidazole, then incubated 16 hr at 4°C with TEV protease. Cleavage reactions were passed through a Ni2+ affinity column again to remove uncleaved protein, cleaved His6‐tags, and His6‐tagged TEV protease. Proteins were concentrated in centrifugal concentrators and purified by size‐exclusion chromatography (Superdex 200; GE Life Sciences) in gel filtration buffer (25 mM Tris–HCl pH 7.5, 300 mM NaCl, 5 mM MgCl2, 10% glycerol, 1 mM DTT). Purified proteins were concentrated and stored at 4°C for analysis.

Size exclusion chromatography coupled to multiangle light scattering

For SEC‐MALS, 100 μl purified proteins at 2–5 mg/ml were injected onto a Superdex 200 Increase 10/300 GL column or Superose 6 Increase 10/300 GL column (GE Life Sciences) in gel filtration buffer. Light scattering and refractive index profiles were collected by miniDAWN TREOS and Optilab T‐rEX detectors (Wyatt Technology), respectively, and molecular weight was calculated using ASTRA v. 6 software (Wyatt Technology).

Hydrogen‐deuterium exchange mass spectrometry

H‐D exchange experiments were conducted with a Waters Synapt G2S system. Total volume of 5 μl samples containing 10 μM protein in gel filtration buffer were mixed with 55 μl of the same buffer made with D2O for several deuteration times (0 s, 1 min, 2 min, 5 min, 10 min) at 15°C. The exchange was quenched for 2 min at 1°C with an equal volume of quench buffer (3 M guanidine HCl, 0.1% formic acid). Proteins were cleaved with pepsin and separated by reverse‐phase chromatography, then directed into a Waters SYNAPT G2s quadrupole time‐of‐flight mass spectrometer. Peptides were identified using PLGS version 2.5 (Waters, Inc.), deuterium uptake was calculated using DynamX version 2.0 (Waters Corp.), and uptake was corrected for back‐exchange using DECA. Uptake plots were generated in Prism version 8.

Crystallization and structure determination

For crystallization of untagged N2b, protein (40 mg/ml) in crystallization buffer (25 mM Tris–HCl pH 7.5, 200 mM NaCl, 5 mM MgCl2, and 1 mM Tris(2‐carboxyethyl)phosphine) was mixed 1:1 with well solution containing 100 mM sodium acetate pH 4.5, 50 mM sodium/potassium tartrate, and 34% polyethylene glycol (PEG) 3350 at 20°C in hanging drop format. For crystallization of His6‐tagged N2b, protein (40 mg/ml) in crystallization buffer was mixed 1:1 with well solution containing 100 mM Tris–HCl pH 8.5, 50 mM ammonium sulfate, and 38% PEG 3350 at 20°C in hanging drop format. Both untagged and His6‐tagged N2b formed large shard‐like crystals, and were frozen in liquid nitrogen directly from the crystallization drop without further cryoprotection. Diffraction data were collected at beamline 24ID‐E at the Advanced Photon Source. Diffraction datasets were processed with the RAPD pipeline (https://github.com/RAPD/RAPD/), which uses XDS for indexing and data reduction, and the CCP4 programs AIMLESS and TRUNCATE for scaling and conversion to structure factors. The structure of untagged N2b was determined by molecular replacement in PHASER using a dimer of the SARS‐CoV N2b domain (PDB ID 2GIB) as a template. The resulting model was manually rebuilt in COOT and refined in phenix.refine using positional, isotropic B‐factor, and TLS (one group per chain) refinement. The structure of His6‐tagged N2b was determined by molecular replacement from the structure of untagged N2b, then manually rebuilt and refined as above. Data collection statistics, refinement statistics, and database accession numbers for diffraction data and final structures can be found in Table S1. All structural figures were generated with PyMOL version 2.3.

Nickel pull‐down

Nickel pull‐down assays were performed in buffer A plus 300 mM NaCl and 10 mM imidazole pH 8.0. Then, 10 μg bait (His6‐MBP‐N3) was mixed with equal weights of each prey protein in 50 μl total reaction volume and incubated on ice for 30 min (5 μl “load” sample removed). Total volume of 30 μl of a 50% slurry of Ni‐NTA Superflow beads (Qiagen) was added and the mixture was incubated with occasional mixing on ice for 30 min. Beads were washed three times with 1 ml buffer, then bound proteins were eluted with the addition of 30 μl buffer A plus 300 mM NaCl and 250 mM imidazole pH 8.0. Then, 10 μl of each eluate was analyzed by SDS‐PAGE alongside the load samples.

Bioinformatics

To examine variation in SARS‐CoV‐2 sequences, we downloaded a list of variants in the N gene in 38,318 genome sequences from China National Center for Bioinformation, 2019 Novel Coronavirus Resource (https://bigd.big.ac.cn/ncov?lang=en; downloaded June 3, 2020). We tabulated all mis‐sense and nonsense mutations, and graphed the number of amino acid variants at each codon in Prism version 8 (all variant frequencies are listed in Table S2). To examine the prevalence of the trinucleotide substitution at genome positions 28,881–28,883, we downloaded a set of 200 SARS‐CoV‐2 genomes from the NCBI Virus Resource: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=SARS‐CoV‐2%20taxid:2697049). We selected genomes with and without the substitution to show in Figure S2a. We used the NextStrain database to visualize the prevalence of the N protein G204R mutation, which is diagnostic of the GGG ➔ AAC trinucleotide substitution in positions 28,881–28,883. To visualize SARS‐CoV‐2 clade identity, we used the URL “https://nextstrain.org/ncov/global?c=clade_membership&l=unrooted.” To color by N protein residue 204 identity, we used the URL “https://nextstrain.org/ncov/global?c=gt-N_204&l=unrooted” (screenshots taken July 2, 2020).

AUTHOR CONTRIBUTIONS

Qiaozhen Ye: Investigation; methodology. Alan West: Investigation. Steve Silletti: Investigation. Kevin Corbett: Conceptualization; investigation; methodology; project administration; visualization; writing‐original draft; writing‐review and editing. Appendix S1: Supporting information Click here for additional data file.

59 in total

1. Structural Characterization of Human Coronavirus NL63 N Protein.

Authors: Bozena Szelazek; Wojciech Kabala; Krzysztof Kus; Michal Zdzalik; Aleksandra Twarda-Clapa; Przemyslaw Golik; Michal Burmistrz; Dominik Florek; Benedykt Wladyka; Krzysztof Pyrc; Grzegorz Dubin
Journal: J Virol Date: 2017-05-12 Impact factor: 5.103

2. Carboxyl terminus of severe acute respiratory syndrome coronavirus nucleocapsid protein: self-association analysis and nucleic acid binding characterization.

Authors: Haibin Luo; Jing Chen; Kaixian Chen; Xu Shen; Hualiang Jiang
Journal: Biochemistry Date: 2006-10-03 Impact factor: 3.162

3. An interaction between the nucleocapsid protein and a component of the replicase-transcriptase complex is crucial for the infectivity of coronavirus genomic RNA.

Authors: Kelley R Hurst; Rong Ye; Scott J Goebel; Priya Jayaraman; Paul S Masters
Journal: J Virol Date: 2010-07-21 Impact factor: 5.103

4. Structure of the SARS coronavirus nucleocapsid protein RNA-binding dimerization domain suggests a mechanism for helical packaging of viral RNA.

Authors: Chun-Yuan Chen; Chung-Ke Chang; Yi-Wei Chang; Shih-Che Sue; Hsin-I Bai; Lilianty Riang; Chwan-Deng Hsiao; Tai-Huang Huang
Journal: J Mol Biol Date: 2007-03-02 Impact factor: 5.469

5. Clinical Characteristics of Coronavirus Disease 2019 in China.

Authors: Wei-Jie Guan; Zheng-Yi Ni; Yu Hu; Wen-Hua Liang; Chun-Quan Ou; Jian-Xing He; Lei Liu; Hong Shan; Chun-Liang Lei; David S C Hui; Bin Du; Lan-Juan Li; Guang Zeng; Kwok-Yung Yuen; Ru-Chong Chen; Chun-Li Tang; Tao Wang; Ping-Yan Chen; Jie Xiang; Shi-Yue Li; Jin-Lin Wang; Zi-Jing Liang; Yi-Xiang Peng; Li Wei; Yong Liu; Ya-Hua Hu; Peng Peng; Jian-Ming Wang; Ji-Yang Liu; Zhong Chen; Gang Li; Zhi-Jian Zheng; Shao-Qin Qiu; Jie Luo; Chang-Jiang Ye; Shao-Yong Zhu; Nan-Shan Zhong
Journal: N Engl J Med Date: 2020-02-28 Impact factor: 91.245

6. Crystal structure of the severe acute respiratory syndrome (SARS) coronavirus nucleocapsid protein dimerization domain reveals evolutionary linkage between corona- and arteriviridae.

Authors: I-Mei Yu; Michael L Oldham; Jingqiang Zhang; Jue Chen
Journal: J Biol Chem Date: 2006-04-20 Impact factor: 5.157

7. Coronavirus nucleocapsid proteins assemble constitutively in high molecular oligomers.

Authors: Yingying Cong; Franziska Kriegenburg; Cornelis A M de Haan; Fulvio Reggiori
Journal: Sci Rep Date: 2017-07-18 Impact factor: 4.379

8. A Novel Coronavirus from Patients with Pneumonia in China, 2019.

Authors: Na Zhu; Dingyu Zhang; Wenling Wang; Xingwang Li; Bo Yang; Jingdong Song; Xiang Zhao; Baoying Huang; Weifeng Shi; Roujian Lu; Peihua Niu; Faxian Zhan; Xuejun Ma; Dayan Wang; Wenbo Xu; Guizhen Wu; George F Gao; Wenjie Tan
Journal: N Engl J Med Date: 2020-01-24 Impact factor: 91.245

9. The antiviral compound remdesivir potently inhibits RNA-dependent RNA polymerase from Middle East respiratory syndrome coronavirus.

Authors: Calvin J Gordon; Egor P Tchesnokov; Joy Y Feng; Danielle P Porter; Matthias Götte
Journal: J Biol Chem Date: 2020-02-24 Impact factor: 5.157

10. Solution structure of the c-terminal dimerization domain of SARS coronavirus nucleocapsid protein solved by the SAIL-NMR method.

Authors: Mitsuhiro Takeda; Chung-ke Chang; Teppei Ikeya; Peter Güntert; Yuan-hsiang Chang; Yen-lan Hsu; Tai-huang Huang; Masatsune Kainosho
Journal: J Mol Biol Date: 2007-12-05 Impact factor: 5.469

76 in total

1. Structures of the SARS-CoV-2 nucleocapsid and their perspectives for drug design.

Authors: Ya Peng; Ning Du; Yuqing Lei; Sonam Dorje; Jianxun Qi; Tingrong Luo; George F Gao; Hao Song
Journal: EMBO J Date: 2020-09-11 Impact factor: 11.598

2. Multiple Roles of SARS-CoV-2 N Protein Facilitated by Proteoform-Specific Interactions with RNA, Host Proteins, and Convalescent Antibodies.

Authors: Corinne A Lutomski; Tarick J El-Baba; Jani R Bolla; Carol V Robinson
Journal: JACS Au Date: 2021-06-15

3. Targeting liquid-liquid phase separation of SARS-CoV-2 nucleocapsid protein promotes innate antiviral immunity by elevating MAVS activity.

Authors: Shuai Wang; Tong Dai; Ziran Qin; Ting Pan; Feng Chu; Lingfeng Lou; Long Zhang; Bing Yang; Huizhe Huang; Huasong Lu; Fangfang Zhou
Journal: Nat Cell Biol Date: 2021-07-08 Impact factor: 28.824

4. Antiviral Properties of the NSAID Drug Naproxen Targeting the Nucleoprotein of SARS-CoV-2 Coronavirus.

Authors: Olivier Terrier; Sébastien Dilly; Andrés Pizzorno; Dominika Chalupska; Jana Humpolickova; Evžen Bouřa; Francis Berenbaum; Stéphane Quideau; Bruno Lina; Bruno Fève; Frédéric Adnet; Michèle Sabbah; Manuel Rosa-Calatrava; Vincent Maréchal; Julien Henri; Anny Slama-Schwok
Journal: Molecules Date: 2021-04-29 Impact factor: 4.411

5. Highly specific monoclonal antibodies and epitope identification against SARS-CoV-2 nucleocapsid protein for antigen detection tests.

Authors: Yutaro Yamaoka; Kei Miyakawa; Sundararaj Stanleyraj Jeremiah; Rikako Funabashi; Koji Okudela; Sayaka Kikuchi; Junichi Katada; Atsuhiko Wada; Toshiki Takei; Mayuko Nishi; Kohei Shimizu; Hiroki Ozawa; Shuzo Usuku; Chiharu Kawakami; Nobuko Tanaka; Takeshi Morita; Hiroyuki Hayashi; Hideaki Mitsui; Keita Suzuki; Daisuke Aizawa; Yukihiro Yoshimura; Tomoyuki Miyazaki; Etsuko Yamazaki; Tadaki Suzuki; Hirokazu Kimura; Hideaki Shimizu; Nobuhiko Okabe; Hideki Hasegawa; Akihide Ryo
Journal: Cell Rep Med Date: 2021-05-16

6. A SARS-CoV-2 antibody curbs viral nucleocapsid protein-induced complement hyperactivation.

Authors: Sisi Kang; Mei Yang; Suhua He; Yueming Wang; Xiaoxue Chen; Yao-Qing Chen; Zhongsi Hong; Jing Liu; Guanmin Jiang; Qiuyue Chen; Ziliang Zhou; Zhechong Zhou; Zhaoxia Huang; Xi Huang; Huanhuan He; Weihong Zheng; Hua-Xin Liao; Fei Xiao; Hong Shan; Shoudeng Chen
Journal: Nat Commun Date: 2021-05-11 Impact factor: 14.919

7. Energetic and structural features of SARS-CoV-2 N-protein co-assemblies with nucleic acids.

Authors: Huaying Zhao; Di Wu; Ai Nguyen; Yan Li; Regina C Adão; Eugene Valkov; George H Patterson; Grzegorz Piszczek; Peter Schuck
Journal: iScience Date: 2021-05-07

8. Characterization of SARS-CoV-2 nucleocapsid protein reveals multiple functional consequences of the C-terminal domain.

Authors: Chao Wu; Abraham J Qavi; Asmaa Hachim; Niloufar Kavian; Aidan R Cole; Austin B Moyle; Nicole D Wagner; Joyce Sweeney-Gibbons; Henry W Rohrs; Michael L Gross; J S Malik Peiris; Christopher F Basler; Christopher W Farnsworth; Sophie A Valkenburg; Gaya K Amarasinghe; Daisy W Leung
Journal: iScience Date: 2021-06-01

Review 9. Viewing SARS-CoV-2 Nucleocapsid Protein in Terms of Molecular Flexibility.

Authors: Tatsuhito Matsuo
Journal: Biology (Basel) Date: 2021-05-21

10. Structural phylogenetic analysis reveals lineage-specific RNA repetitive structural motifs in all coronaviruses and associated variations in SARS-CoV-2.

Authors: Shih-Cheng Chen; René C L Olsthoorn; Chien-Hung Yu
Journal: Virus Evol Date: 2021-06-16