Literature DB >> 36097552

The highly conserved RNA-binding specificity of nucleocapsid protein facilitates the identification of drugs with broad anti-coronavirus activity.

Shaorong Fan^1,2, Wenju Sun¹, Ligang Fan^1,2,3, Nan Wu^1,2, Wei Sun¹, Haiqian Ma¹, Siyuan Chen⁴, Zitong Li⁴, Yu Li⁴, Jilin Zhang², Jian Yan^1,2,3.

Abstract

The binding of SARS-CoV-2 nucleocapsid (N) protein to both the 5'- and 3'-ends of genomic RNA has different implications arising from its binding to the central region during virion assembly. However, the mechanism underlying selective binding remains unknown. Herein, we performed the high-throughput RNA-SELEX (HTR-SELEX) to determine the RNA-binding specificity of the N proteins of various SARS-CoV-2 variants as well as other β-coronaviruses and showed that N proteins could bind two unrelated sequences, both of which were highly conserved across all variants and species. Interestingly, both sequences are virtually absent from the human transcriptome; however, they exhibit a highly enriched, mutually complementary distribution in the coronavirus genome, highlighting their varied functions in genome packaging. Our results provide mechanistic insights into viral genome packaging, thereby increasing the feasibility of developing drugs with broad-spectrum anti-coronavirus activity by targeting RNA binding by N proteins.

Entities: Chemical

Keywords: Conserved; Coronavirus; Nucleocapsid protein; RNA binding specificity; SARS-CoV-2; Virion assembly

Year: 2022 PMID： 36097552 PMCID： PMC9454191 DOI： 10.1016/j.csbj.2022.09.007

Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN： 2001-0370 Impact factor: 6.155

Introduction

Following the first case reported in Wuhan, China, toward the end of 2019, COVID-19 was quickly recognized as a pandemic and has since infected >600 million people worldwide, claiming >6 million human lives as of September 2022 (according to WHO https://covid19.who.int/). COVID-19 is caused by the novel coronavirus SARS-CoV-2, which belongs to the same β-coronavirus family as the deadly SARS-CoV and MERS-CoV that caused local outbreaks in 2003 and 2012, respectively. The SARS-CoV-2 genome comprises a 29,903-nt-long positive-sense, single-stranded RNA coding for 29 viral proteins that is flanked by noncoding RNA elements, beginning with a 265-nt-long 5′-untranslated region (UTR) and a few intergenic regions and ending with a 337-nt-long 3′ poly (A) tail [1]. The 29 viral genes code for four essential structural proteins, namely the spike (S), envelope (E), membrane (M), and nucleocapsid (N) proteins, as well as 25 nonstructural accessory proteins (NSP) that have not yet been fully characterized [2]. Upon entry into cells, SARS-CoV-2 usually hijacks cellular proteins for genome replication, protein synthesis and modifications, and virion particle packaging and release [3], [4]. To understand how viral proteins selectively bind viral RNAs, we conducted high-throughput systematic evolution of ligands by exponential enrichment of RNA (HTR-SELEX), which was used to profile the sequence-binding specificity of RNA-binding proteins [5], for all the viral proteins of SARS-CoV-2. Among all the viral proteins, strong and reproducible RNA sequence specificity was only observed for the nucleocapsid (N) protein, the most abundant protein in SARS-CoV-2 and a well-established nucleoprotein. Interestingly, we identified two dissimilar RNA sequence motifs that were equivalently favored by the N protein; their consensus sequences were as follows: “UCCGCUUGGCC” (hereafter, motif 1) and “UAAUAGCCGAC” (hereafter, motif 2) (Fig. 1 a). Both motifs were highly enriched in all the 12 replicative experiments performed by independent researchers using different batches of SELEX-input ligands and recombinant proteins (Extended Data Fig. 1). The dual binding specificity had been commonly observed for mammalian RNA-binding proteins (RBPs), reported in multiple independent studies [5], [6], [7], [8]. To better represent the binding specificity (motif) of N protein, we used the position weight matrix (PWM) for downstream analyses (Supplementary Table S1), which was a commonly accepted quantitative model to describe the specific nucleic acid sequences preferred by a protein.

Fig. 1

RNA binding specificity of beta-coronavirus. a. Network analysis reveals the similarity of RNA binding specificity of RNA binding proteins, including all human RBPs with available motifs and nucleocapsid proteins of different beta-coronaviruses. Note that the subnetwork of coronavirus nucleocapsid proteins is disconnected from other subnetworks and zoomed in for clarification. Diamonds indicate individual motifs, and circles denote RBPs. The RBP is connected to its own binding motif. The edge between motifs is drawn if the SSTAT similarity score >1.0E−5. For details, please see Supplementary File S1. b. Network analysis of the similarity of RNA binding specificity of N proteins in different variants of SARS-CoV-2. The N protein of different SARS-CoV-2 variants is connected to its own binding motif. The edge between motifs is drawn if the SSTAT similarity score > 1.0E-5. c. The comparison of binding site density of N protein on human transcriptome and viral genome. Left, binding density of motif 1. Right, binding density of motif 2. The ellipse illustrates binding density on respective viral genome, and the box plot represents binding density on human transcriptome. Green, BAT-COV; Red, MERS-COV; Blue, SARS-COV; Yellow, SARA-COV-2. d. The binding site density (KDE, bandwidth = 0.1) of N protein along SARS-CoV-2 genome. Upper, motif 1. Lower, motif 2. A binding site is significantly detected when the sequence matches the motif (P < 0.05). e. The binding site density (KDE, bandwidth = 0.1) of different N proteins along the corresponding viral genomes. Upper, motif 1. Lower, motif 2. A binding site is significantly detected when the sequence matches the motif (P < 0.05). Smoothened color curves represent different viruses. f. The cartoon illustrates the model of the process of viral genome packaging and virion particle assembly. Upper, the structure of N protein, including 5′-arm, RNA binding domain (RBD1), intrinsically disordered linker region (IRD), dimerization domain (RBD2) and 3′-arm. Lower, model of the viral genome packaging and virion assembly. In step 1, N protein forms homodimer and bind to 5′ and 3′UTRs. IDRs between different N proteins promote formation of LLPS. Subsequently in step 2, viral RNA full of motif 2 is exposed to more N proteins. Even binding of N proteins to motif 2 increases the solubility that prevents intermingling between large RNA molecules and dissolves the condensates. Finally, in step 3, the dissolved LLPS releases the packaged RNA genome and makes the N protein accessible to interact with other viral proteins, such as M and E proteins, to complete the virion assembly. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Since the first outbreak in late 2019, multiple SARS-CoV-2 variants have emerged and have successively become dominant. The efficiency of genome packaging has been reported to be critical to the viral life cycle [9]. Therefore, we speculated whether the N protein mutations acquired by new variants promoted viral replication by enhancing the efficiency of RNA binding and virion assembly. To address this speculation, we analyzed the binding specificities of N proteins derived from three predominant SARS-CoV-2 variants: gamma, delta, and omicron (Supplementary Table S1; Extended Data Fig. 2a). Interestingly, we observed no overt changes in the RNA sequence specificity of the N proteins of any of these variants (Fig. 1b), suggesting that the overwhelming spread of new variants was not caused by alterations in the RNA binding by N proteins but more likely due to the enhanced recognition of S proteins by cellular receptors [10]. Besides SARS-CoV-2, multiple β-coronaviruses have demonstrated pathogenic characteristics, causing severe respiratory syndromes. For example, SARS-CoV and MERS-CoV triggered deadly epidemics and garnered global attention in 2003 and 2012, respectively. To understand the evolutionary relationships among these viruses and compare their common and differential characteristics, we performed the HTR-SELEX, a highly sensitive method that helps detect minor differences in nucleic acid-binding specificity [11], to examine the differential RNA-binding specificity of N proteins of these viruses (Extended Data Fig. 2b). We also analyzed the bat coronavirus RaTG13 isolated from horseshoe bats (from a remote cave in Yunnan Province, BAT-CoV hereafter), by far the most closely related species (sequence similarity, 96%) to SARS-CoV-2 but unable to infect humans [12], [13]. Each experiment was conducted with at least four independent replicates (Extended Data Fig. 2c). Surprisingly, the N proteins from all these coronaviruses consistently displayed the same dual binding specificity to RNA sequences, identical to that observed for SARS-CoV-2 (Fig. 1a), indicating the highly conserved mode of virion packaging. Furthermore, we compared the two motifs in terms of sequence specificity to human RBPs profiled using HTR-SELEX [5]; however, no significant similarity was observed with any of the known motifs (Fig. 1a; Supplementary File S1). Recently, by analyzing the binding modes of 144 RBPs across 202 datasets, Laverty et al. found that the single stranded RNA sequence played a primary role in determining the binding specificity of most RBPs, although the secondary structure might also contribute [7]. The result showing the lack of sequence motif similarity between human RBPs and viral N proteins inferred that the highly unique binding specificity could facilitate the N protein for the rapid recognition of viral RNA from the extremely complex pool of host cellular RNAs, even though both transcript types were generated from the host cell machinery. Such a feature is vital for efficient viral genome assembly and particle replication. To further test this hypothesis, we scanned both the human cellular transcriptome and the four viral genomes against the two motifs. As expected, binding sites matching either of the motifs were virtually absent from the human transcriptome, with a few binding sites occasionally presented in some human RNA types (Supplementary Table S2). In contrast, hundreds of strong binding sites were significantly detected for both motifs in all four viral genomes, consistent with the fact that approximately 720–2,200 nucleocapsid protein molecules are associated with one copy of viral RNA genome per virion [14], [15], [16]. In comparison, the density of the binding sites––both motif 1 and motif 2––is approximately 35–50 times higher in the viral genome than in the human transcriptome (Fig. 1c). Our results also suggest that the binding of N protein to both motifs is equally important for the life cycle of the virus. It is noteworthy that the case fatality rate of β-coronavirus is somehow associated with the density of motif 1 sites in its genome. MERS-CoV—infection with which has the highest case fatality rate among all coronaviruses—encompasses the densest motif 1-binding sites in its genome, followed by SARS-CoV and then SARS-CoV-2 (Fig. 1c). Next, we probed how the binding specificity of N proteins contributed to viral genome assembly and virion packaging. We scanned SARS-CoV-2 genomic RNA against each motif and constructed a density plot of the matched binding sites along the genome. Interestingly, the two motifs exhibited mutually complementary binding patterns, i.e., the highest density of the motif 1 was presented in the 5′- and 3′-UTRs of the RNA genome where the relative frequency of the motif 2 was low; intriguingly, the motif 2, but not the motif 1, was constantly enriched in the central region, ranging from approximately the 5th to 25th kilobases (Fig. 1d). The differential distribution of the two motifs underscores the potentially different roles of the UTRs and the central region in viral genome packaging. Studies have suggested that phase separation is involved in selecting the single RNA genome and ensuring its compactness [17], [18], [19]. In such a model, the binding of N proteins to the 5′- and 3′-UTRs promotes liquid–liquid phase separation (LLPS), whereas the association of N proteins with the central region enhances the fluidity and solubilization of the viral genome and limits its entanglement with other large RNA molecules, thereby increasing the packaging efficiency [20], [21]. In coronaviruses in particular, when the viral genome is packaged, the association of N proteins with the central regions dissolves the condensates and anchors the single RNA genome onto the viral membrane via interactions between the N protein and the M protein [17], [22]. Although the RNA-binding specificity of N proteins is highly conserved among coronaviruses, the genomic sequence composition could be highly diverse. Thus, coronaviruses could still undergo genome packaging via different fashions. For comparison, we mapped the N protein-binding site densities along individual genomes and noted that the distributions of both binding site types were highly similar to those of SARS-CoV-2 (Fig. 1e), confirming that β-coronaviruses could go through a conserved packaging process mediated by the dual binding specificities of N proteins to different viral genome regions. The structure of the SARS-CoV-2 N protein has been successfully determined, which contains two highly structured domains, an N-terminal RNA-binding domain (RBD1) and a C-terminal dimerization domain that also possesses RNA-binding ability (RBD2), separated by three intrinsically disordered regions (IDRs) [23], [24], [25], [26] (Fig. 1f). We suspected that RBD1 and RBD2 were responsible for recognition of motifs 1 and 2, respectively. Therefore, we individually cloned the two domains into an E. coli expression vector and isolated soluble proteins for HTR-SELEX analysis (Supplementary Table S3; Extended Data Fig. 3). Each experiment was independently conducted at least thrice. We barely observed any enrichment of either motif in any of the replicative experiments (data not shown), which indicated that RBD1 or RBD2 alone could not stably bind to any specific RNA sequence. Given the role of RBD2 in homodimerization of N protein, our results suggest that N protein dimerization is also required for its binding to either motif. This difference in the sequence specificity can be attributed to certain topologies of the N protein homodimers. To this end, we propose a model stating that during virion particle replication initiation, N protein dimers bind the 5′- and 3′-UTRs, provoking LLPS via IDR aggregation (linker region) and consequently circularizing the RNA genome. Such a change results in the exposure of the central region for subsequent coating with more nucleocapsid proteins, consequently increasing the fluidity and solubility of the N protein–RNA complex, and ultimately allowed N proteins to interact with other structural and accessory components, facilitating the virion particle assembly (Fig. 1f). To control the severe health and economic impacts of COVID-19, extensive effort has been made to identify effective drugs that can prevent its spread and ease its symptoms. Many antiviral drugs, including monoclonal antibodies, target the S protein. Owing to the fast mutation rate of SARS-CoV-2, the emergence of new variants can quickly mitigate drug efficacy [27], [28]. Thus, it is more reasonable to target the less-mutable N protein. Small chemical compounds or synthesized peptides that are cytoplasmic membrane-permeable have been used to inhibit dimerization or RNA binding or to dissolve the LLPS in antiviral treatment [29], [30], [31]. For example, PJ-34 and H3, which are small-compound inhibitors of the N-terminal RNA-binding domains of N proteins, were found to be effective in impeding the replication of the human β-coronavirus HCoV-OC43 [32]. During the MERS-CoV outbreak, 5-benzyloxygramine (P3) was identified to inhibit N proteins by promoting abnormal aggregation. At a concentration of 100 μM, 5-benzyloxygramine (P3) could completely prevent viral production and replication [33]. Based on the high binding specificity of the N protein to the viral genome but not the host transcriptome, we could synthesize and deliver, as antiviral drugs, short-sequence single-stranded RNA to cells, with the sequence either identical or complementary to the consensus sequence “UCCGCUUGGCC.” Such small RNA molecules can prevent viral replication by competing the binding between the viral genome and N protein. Given that both the RNA binding and genome composition are highly conserved among all β-coronaviruses, antiviral drugs targeting the RNA binding of N proteins would help fight against a broad spectrum of β-coronaviruses and can therefore be used in treating any novel coronavirus-related diseases that may occur in the future. Similar studies to assess the RNA-binding specificity of N proteins among other viral species (e.g., HIV, HBV, and Ebola virus) will continue, and the findings of the current and future studies will gradually expand our understanding of the process of viral genome packaging and facilitate the identification of effective therapies.

Online methods

Vector construction and protein purification

Different coronavirus N proteins and the RNA Binding Domains (RBDs) of SARS-CoV-2 were cloned into pET28a bacterial expression vector with an N-terminal 6× His tag using the Vazyme ClonExpress II One Step Cloning Kit (Vazyme Biotech). The resulting plasmids were transformed into E. coli Rosetta (DE3) strain for protein production. The bacteria were cultured in LB and induced with 0.1 mM IPTG when OD600 reached 0.6, and continued to be cultured for 16 h at 16 ℃ with rigorous shaking. The bacterial culture was centrifuged at 13,000×g for 5 min, and the pellet was resuspended in Buffer A (0.5 mg/ml Lysozyme, 50 mM Tris-HCl pH 7.5, 300 mM NaCl, 10 mM Imidazole, 1 mM DTT, 1 mM PMSF) at 4 °C, standing for 1 h and ultra-sonicated for 10 min at an interval of 10 s, followed by centrifugation at 13,000×g for 30 min at 4 °C. The supernatant was loaded onto a Ni-NTA column (Bio-Rad). The Ni-NTA beads were set at 4 °C for 30 min before being eluted with Buffer A supplemented with 300 mM imidazole. Finally, the eluted proteins were analyzed with SDS-PAGE to confirm its purity and molecular weight. The cDNA and amino acid sequences of all proteins used in this study can be found in Supplementary Table S3a.

HTR-SELEX

The HTR-SELEX experimental procedure was modified from the previous study [5]. Briefly, we first produced the DNA templates of HTR-SELEX containing a T7 promoter by PCR amplification. RNA input was transcribed from the DNA-templates using T7 in vitro transcription (HiScribe™ T7 Quick High Yield RNA Synthesis Kit, NEB, E2050S) following the manufacturer’s instruction. The remaining DNA was digested by DNase I (RNase free). To enable the RNA to form secondary structures, the RNA input libraries were then heated to 70 °C followed by slow cooling. After that, 600 ng of N protein was mixed with 5 µL of RNA ligands (SELEX input) in 20 µL binding buffer (50 mM NaCl, 1 mM MgCl2, 0.5 mM Na2EDTA, 1U/μL RNase inhibitor, and 4% glycerol in 50 mM Tris-Cl, pH 7.5). After incubated for 15 min at 37 °C, the mixture was continued to be incubated at room temperature for 15 min. Subsequently, 150 µL binding buffer, which contained 10 µL of pre-equilibrated Ni Sepharose 6 Fast Flow resin (GE Healthcare, 17–5318-01), was added into the protein-RNA mixture, followed by 2 h of gentle shaking at room temperature. The beads were then repeatedly washed with 200 μL of binding buffer for 12 times to remove the unbound RNA ligands. After removing the supernatant by centrifugation at 1000 × g, the beads were resuspended in 20 μL of elution buffer (0.5 µM RT-primer, 1 mM EDTA and 0.1% Tween 20 in 10 mM Tris-Cl buffer, pH 7.0) and heated to 70 °C for 5 min followed by slow cooling to 4 °C to allow the reverse transcription primers annealed to the RNA. Finally, reverse transcription was conducted followed by PCR amplification using the barcoded amplification primers listed in Supplementary Table S3b. The obtained PCR products were used as DNA templates for the next HTR-SELEX cycle. This SELEX process was repeated for four times. PCR products from each SELEX cycle were also purified and sequenced using BGI MGISEQ 2000 sequencer.

HTR-SELEX data analysis

The data were binned according to barcodes for each sample. After discarding the low-quality reads, the remaining sequences were trimmed to remove adaptor sequences. The remaining 40-nt region were subjected to further analysis. PWM was generated as described in Jolma et al. [5]. Briefly, the in-house software Autoseed identified the most frequent gapped and un-gapped k-mer sequences representing local maximum counts relative to similar sequences within the Huddinge neighborhoods [34]. In other words, the number of mis-alignable bases within the defined length between the compared sequences could be at most 1. The count of each seed motif is higher than that of any similar sequences within a Huddinge distance of one. The initial set of motifs is manually curated to make sure that the final seeds did not include partial motifs that encompassed constant linker sequences (displayed a strong positional bias on the ligand), and motifs that were recovered from a large number of experiments. PWM models were built upon sequences matching the indicated seeds, allowing sequencing reads only at the Hamming distance up to 1, namely that there could be only one base difference in the matched sequence from the seed. Individual results that were not supported by replicates were deemed inconclusive and were not included in the final dataset. Draft models were manually curated (by JY, LF and SF) to remove unsuccessful experiments and artefacts due to bottlenecks and aptamer selection. Then, we used the R package ggseqlogo to construct seqlogos for all PWMs. Final models for each replicative experiment were generated and stored in Supplementary Table S1.

Binding site density analysis

The genome-wide target sites were scanned using FIMO [35] (v5.0.4, with the parameter setting: --norc --thresh 0.05) for all motifs. Coordinates and summary of all predicted binding sites are available in Supplementary Table S2. To construct Fig. 1c, binding site density was transformed to Target Counts Per Kilobase by using a customer script (available upon request). The kdeplot function of python package seaborn was used for calculate and plot binding site density under parameter bandwidth = 0.1.

Network analysis of similarity between PWMs

The SSTAT (parameters: 50% GC-content, pseudocount regularization, type I threshold 0.01) was used for calculating the PWM similarities between human RBPs and coronavirus N protein as described in Fan et al. [36] (Supplementary Table S4). We constructed a similarity network in which diamonds represent motifs and circles represent RBPs (Fig. 1a, Supplementary File S1). The RBPs were connected to their motifs, and the different motifs were further connected if the SSTAT similarity score >1.0E−5. Cytoscape software (v3.7.2) was used for visualizing the network.

Data and code availability

Sequencing data generated in this study can be accessed via Gene Expression Omnibus (GEO) under accession number GSE209797. The data can be accessed through the link: [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE209797]. All primer sequences used in this study are available in Supplementary Table S1.

CRediT authorship contribution statement

Shaorong Fan: Methodology, Investigation, Writing – original draft. Wenju Sun: Data curation, Software, Visualization, Supervision, Writing – original draft. Ligang Fan: Methodology, Resources, Supervision, Writing – original draft. Nan Wu: Investigation. Wei Sun: Visualization. Haiqian Ma: Investigation. Siyuan Chen: Visualization, Writing – review & editing. Zitong Li: Visualization. Yu Li: Validation, Writing – review & editing. Jilin Zhang: Conceptualization, Methodology, Supervision, Writing – review & editing. Jian Yan: Conceptualization, Methodology, Supervision, Writing – original draft, Project administration, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

36 in total

1. Omicron and Delta variant of SARS-CoV-2: A comparative computational study of spike protein.

Authors: Suresh Kumar; Thiviya S Thambiraja; Kalimuthu Karuppanan; Gunasekaran Subramaniam
Journal: J Med Virol Date: 2021-12-27 Impact factor: 2.327

2. Conservation of transcription factor binding specificities across 600 million years of bilateria evolution.

Authors: Kazuhiro R Nitta; Arttu Jolma; Yimeng Yin; Ekaterina Morgunova; Teemu Kivioja; Junaid Akhtar; Korneel Hens; Jarkko Toivonen; Bart Deplancke; Eileen E M Furlong; Jussi Taipale
Journal: Elife Date: 2015-03-17 Impact factor: 8.140

3. Development of SARS-CoV-2 Nucleocapsid Specific Monoclonal Antibodies.

Authors: James S Terry; Loran Br Anderson; Michael S Scherman; Carley E McAlister; Rushika Perera; Tony Schountz; Brian J Geiss
Journal: bioRxiv Date: 2020-09-03

4. A pneumonia outbreak associated with a new coronavirus of probable bat origin.

Authors: Peng Zhou; Xing-Lou Yang; Xian-Guang Wang; Ben Hu; Lei Zhang; Wei Zhang; Hao-Rui Si; Yan Zhu; Bei Li; Chao-Lin Huang; Hui-Dong Chen; Jing Chen; Yun Luo; Hua Guo; Ren-Di Jiang; Mei-Qin Liu; Ying Chen; Xu-Rui Shen; Xi Wang; Xiao-Shuang Zheng; Kai Zhao; Quan-Jiao Chen; Fei Deng; Lin-Lin Liu; Bing Yan; Fa-Xian Zhan; Yan-Yi Wang; Geng-Fu Xiao; Zheng-Li Shi
Journal: Nature Date: 2020-02-03 Impact factor: 69.504

5. The SARS-CoV-2 nucleocapsid phosphoprotein forms mutually exclusive condensates with RNA and the membrane-associated M protein.

Authors: Shan Lu; Qiaozhen Ye; Digvijay Singh; Yong Cao; Jolene K Diedrich; John R Yates; Elizabeth Villa; Don W Cleveland; Kevin D Corbett
Journal: Nat Commun Date: 2021-01-21 Impact factor: 14.919

6. Increased immune escape of the new SARS-CoV-2 variant of concern Omicron.

Authors: Jie Hu; Pai Peng; Xiaoxia Cao; Kang Wu; Juan Chen; Kai Wang; Ni Tang; Ai-Long Huang
Journal: Cell Mol Immunol Date: 2022-01-11 Impact factor: 11.530

7. SARS-CoV-2 nucleocapsid protein interacts with immunoregulators and stress granules and phase separates to form liquid droplets.

Authors: Syam Prakash Somasekharan; Martin Gleave
Journal: FEBS Lett Date: 2021-11-22 Impact factor: 3.864