Literature DB >> 35386436

Identification of Conserved Epitopes in SARS-CoV-2 Spike and Nucleocapsid Protein.

Sergio Forcelloni¹, Anna Benedetti^2,3, Maddalena Dilucca², Andrea Giansanti^2,4.

Abstract

Background: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a novel virus that first occurred in Wuhan in December 2019. The spike glycoproteins and nucleocapsid proteins are the most common targets for the development of vaccines and antiviral drugs. Objective: We herein analyze the rate of evolution along with the sequences of spike and nucleocapsid proteins in relation to the spatial locations of their epitopes, previously suggested to contribute to the immune response caused by SARS-CoV-2 infections.
Methods: We compare homologous proteins of seven human coronaviruses: HCoV-229E, HCoV-OC43, SARS-CoV, HCoV-NL63, HCoV-HKU1, MERS-CoV, and SARS-CoV-2. We then focus on the local, structural order-disorder propensity of the protein regions where the SARS-CoV-2 epitopes are located.
Results: We show that most of nucleocapsid protein epitopes overlap the RNA-binding and dimerization domains, and some of them are characterized by a low rate of evolutions. Similarly, spike protein epitopes are preferentially located in regions that are predicted to be ordered and well- conserved, in correspondence of the heptad repeats 1 and 2. Interestingly, both the receptor-binding motif to ACE2 and the fusion peptide of spike protein are characterized by a high rate of evolution.
Conclusion: Our results provide evidence for conserved epitopes that might help develop broad-spectrum SARS-CoV-2 vaccines.

Entities: Chemical

Keywords: SARS-CoV-2; conservation score; conserved epitopes; nucleocapsid protein; order-disorder propensity; spike glycoprotein

Year: 2021 PMID： 35386436 PMCID： PMC8905637 DOI： 10.2174/1389202923666211216162605

Source DB: PubMed Journal: Curr Genomics ISSN： 1389-2029 Impact factor: 2.689

INTRODUCTION

Coronaviruses are a large family of viruses that may cause illness in animals. In humans, seven coronaviruses are known to cause respiratory infections, ranging from the common cold to more severe diseases. The first two coronaviruses, human CoV-229E (HCoV-229E) and human CoV-OC43 (HCoV-OC43) were discovered in the 1960s, and cause relatively mild respiratory symptoms [1]. Human severe acute respiratory syndrome coronavirus (SARSr-CoV) was identified in 2003, and causes flu-like symptoms and atypical pneumonia in the worst cases [2]. The human coronavirus NL63 (HCoV-NL63), identified in 2004, and the human CoV-HKU1 (HCoV-HKU1), described in 2005 [3], generally cause upper respiratory disease in humans, which may progress in lower respiratory infections [4]. More recently, the pathogenic Middle East respiratory syndrome (MERS- CoV) coronavirus, which appeared for the first time in 2012, was identified as the sixth human coronavirus [5]. Finally, a previously unknown coronavirus probably originated in baths, SARS-CoV-2, was identified in December 2019 in Wuhan, China [6, 7]. SARS-CoV-2 caused an ongoing pandemic of severe pneumonia named coronavirus disease 19 (COVID-19), which has affected over 4 million people worldwide and caused more than 300.000 deaths as May 13, 2020 (https://ourworldindata.org/grapher/total-deaths-covid- 19). Currently, six COVID-19 vaccines have been approved by the World Health Organization: two RNA vaccines (Pfizer-BioNTech and Moderna) and four conventional attenuated/inactivated vaccines (Oxford-AstraZeneca, Johnson & Johnson, Sinovac, Sinopharm-BBIBP) [8, 9]. In this context, the viral spike (S) glycoprotein and the nucleocapsid (N) protein are two of the main targets for antibody production and the development of vaccines and antiviral drugs [9], due to their ability to trigger a dominant and long-lasting immune response. The spike protein is a large type I transmembrane protein composed of approximately 1400 amino acids. S protein is an attractive target for vaccine development, as its surface expression renders it a direct target for the host immune response [10, 11]. Spike proteins assemble into trimers on the virion surface to form the distinctive crown-like structure and mediate the contact with the host cell by binding to ACE2 receptor, a process necessary for the virus entry. Spike protein contains 2 subunits: S1 N-terminal domain, responsible for ACE2 receptor binding, and S2 C-terminal domain, responsible for the fusion. The S2 subunit is the most conserved one, while the S1 subunit differs even within species of the same coronaviruses. The S1 contains two sub-domains (N-terminal and C-terminal) with receptor-binding functions. The S2 domain contains two heptad repeats composed by hydrophobic residues, responsible for the formation of an α-helical coiled-coil structure that participate in the virus-host cell membrane fusion [11]. The nucleocapsid protein regulates the viral genome transcription, replication and packaging, and it is essential for viability [12]. It contains two structural domains: the N-terminal domain, which acts as a putative RNA-binding domain, and a C-terminal domain, which acts as a dimerization domain. The N protein is of potential interest for vaccine development as it is highly immunogenic and its amino acid sequence is highly conserved [13, 14]. T cell responses against S and N proteins have been shown to be the most immunogenic and long-lasting in SARS-CoV patients. Furthermore, B-cell antibody response against S and N proteins was also reported to be effective, although short-lived compared to the T cell-response. The search of T-cell and B-cell epitopes, which can stimulate a specific immune response against S and N proteins, represents a valuable strategy to identify targets for the development of a SARS-CoV-2 vaccine [15]. In a previous study, we showed that genes encoding N and S proteins tend to evolve faster than genes encoding matrix and envelope proteins [1]. This result suggested that the higher divergence observed for these two genes could represent a significant barrier in the development of antiviral therapeutics against SARS-CoV-2. Here, we perform an accurate analysis of the position-specific rates of evolution and the order-disorder propensities of the spike glycoprotein (S) and the nucleocapsid protein (N) of SARS-CoV-2. We thus provide an in-silico survey of the major nucleocapsid protein and spike protein epitopes, identifying a subset of them that are well-conserved among human coronaviruses and represent reliable candidates for broad-spectrum vaccines against SARS-CoV-2.

RESULTS

Identification of Conserved Epitopes in Spike and Nucleocapsid Proteins

The amino acid sequences of the nucleocapsid (N) protein and spike (S) glycoprotein from the seven human coronaviruses here considered (HCoV-229E, HCoV-OC43, SARS-CoV, HCoV-NL63, HCoV-HKU1, MERS-CoV, and SARS-CoV-2.) were compared to assess the position-specific rates of evolution of these two proteins. We then investigated the relationships between the position-specific rates of evolution and the distribution of the epitopes that have previously been suggested to contribute to the immune response caused by human SARS-CoV-2 infections. For this purpose, we considered the SARS-CoV B cell and T cell linear epitopes that map identically to SARS-CoV-2 N and S proteins, as identified by Ahmed et al. [15]. It is worth noting that these epitopes were also considered in another study aiming to provide a molecular structural rationale of the major nucleocapsid protein epitopes for a potential role in conferring protection from SARS-CoV-2 infection [15]. Moreover, we also considered a high-quality set of previously identified linear epitopes directly extrapolated for the SARS-CoV-2, looking at the literature (see Materials and Methods). We herein focused on the SARS-CoV-2 N and S proteins because both these two proteins are the main targets of vaccines and antiviral drugs due to their dominant and long-lasting immune response previously reported against SARS-CoV [9, 15]. We aligned the homologous protein sequences in the seven human coronaviruses and used the resulting alignment to calculate a conservation profile by using the software Rate4site (see Materials and Methods) [16]. In Fig. (), we report the profile obtained for the protein N, together with the functional regions/domains and the SARS-CoV-2 linear epitopes. In this profile, values greater or less than zero reflect a faster or a slower evolution, respectively. We note that both the RNA-binding domain (region 41-186) and the dimerization domain (region 258-361) correspond to regions with values less than zero, implying higher conservation than the rest of the protein sequence. Although the general trend of the conservation profile hovers around 0, there are some regions where the conservation score is greater than one, indicating that these regions are likely to be under positive selection. Interestingly, the location of some of these regions corresponds to the presence of B cell and T cell epitopes (see Fig. S1 for the specific location and the sequences of each epitope). We found that the vast majority of the epitopes in common between SARS-CoV and SARS-CoV-2 are positioned in high-variable regions. We suppose that the high amino acid variability of these regions might allow the virus to evade the host immune system recognition. At the same time, we suggest that the source of variability in these protein regions is likely to be the host immune response. However, we also observed the presence of some epitopes in highly conserved regions spanning residues 70-120, 150-200 and 250-315, which may potentially offer long-lasting protection against SARS-CoV-2. Similarly, we study the conservation profile of the protein S and the distribution of the associated linear epitopes (Fig. ). On the bottom of (Fig. ), we report the functional regions of protein S. The receptor binding domain (region 319-541) contains the receptor-binding motif to the human angiotensin-converting enzyme 2 (ACE2), an enzyme attached to the outer surface (cell membrane) of cells in lungs, arteries, heart, kidney, and intestines, and it has been identified as the functional receptor for SARS-CoV-2 [17]. As a transmembrane protein, ACE2 serves as the main entry point into cells for some coronaviruses, including HCoV-NL63, SARS-CoV, and SARS-CoV-2 [18-23]. More specifically, the binding of the spike S protein of SARS-CoV and SARS-CoV-2 to the enzymatic domain of ACE2 on the surface of cells results in endocytosis and translocation of both the virus and the enzyme into endosomes located within cells [24, 25]. Interestingly, both the receptor-binding motif to ACE2 (region 437-508) and the fusion peptide (amino acids 788-806IYKTPPIKDFGGFNFSQIL for SARS-CoV-2), the segment of the fusion protein that inserts to a target lipid bilayer and triggers virus-cell membrane fusion, are characterized by high rate of evolution. Conversely, heptad repeats 1 and 2, which are known to play a crucial role in membrane fusion and viral entry [25], show lower rates of evolution. Moreover, we note that a large percentage of both B cell and T cell epitopes are located in the C-terminal region of spike protein in correspondence of the heptad repeats 1 and 2 in the S2 domain (see Fig. S2 for details about the sequences of epitopes and their location along the sequence). Thus, at variance with the protein N, we note that spike protein epitopes are mainly located in the protein regions that are characterized by a lower rate of evolution. This observation suggests that the immune system has adapted to recognize slowly evolving regions of the S protein [26]. Finally, we observed that SARS-CoV derived B cell and T cell epitopes that map identically to SARS-CoV-2 proteins are more localized around functional sites than the epitopes directly extrapolated for SARS-CoV-2 proteins N and S, which are more scattered throughout the protein sequence. This observation suggests an adaptation of the immune system to recognize functional regions of proteins N and S in SARS-CoV and SARS-CoV-2 and potentially induce a long-lasting immunity against coronaviruses.

Order-Disorder Propensities of Proteins S and N and Their Associated Epitopes

We predicted the structural order-disorder propensity proteins S and N to investigate the relationship between disordered structure and the spatial distribution of the SARS- CoV-2 derived B cell and T cell epitopes. To estimate the structural stability of a protein from its sequence without relying on the structure, we used the energy estimation approach at the core of the IUPred2A disorder prediction method (see Materials and Methods) [27]. In Fig. (), we show the order-disorder propensity profile for protein N, together with the SARS-CoV-2 derived B cell and T cell epitopes. The rationale to understand the results below is that the score of each residue in the protein sequence ranges from 0 (strong propensity for an ordered structure) to 1 (strong propensity for a disordered structure). Specifically, each residue in the sequence was classified as either ordered or disordered depending on whether the IUPred2A score is < 0.5 or > 0.5, respectively. We found a low but significant positive correlation between the order-disorder propensity profile and the conservation profile in Fig. () (Pearson correlation - coefficient = 0.3, p-value < 0.00001), implying that disordered regions of protein N tend to evolve faster than ordered ones. Both the RNA-binding domain and the dimerization domain are predicted to be ordered in a large percentage of their residues. We found that the vast majority of SARS- CoV B cell and T cell epitopes that map identically to SARS-CoV-2 N protein overlap these two functional regions that are also predicted to be conserved amino acid sites (Fig. and Fig. S3). In contrast, some SARS-CoV-2-specific epitopes are located in predicted disordered regions outside the RNA-binding domain and the dimerization domain. In line with a previous study [28], we suggest that these disordered epitopes appear to be linear, making them ideally suited to incorporation into peptide vaccines. Nevertheless, it is worth noting that vaccines based on these epitopes may not be effective in the long term due to the high variability of corresponding protein regions (Fig. ). Next, we studied the order-disorder propensity profile of S protein (Fig. ). The spike protein is predicted to be ordered along the whole sequence. Both the receptor binding domain and the fusion peptide are well-structured. Also, in this case, we observe that the vast majority of the SARS-CoV-2 derived B cell and T cell epitopes are located in regions displaying reduced disorder tendency (Fig. S4).

DISCUSSION

In this study, we performed a systematic analysis of the rate of evolution and the structural order-disorder propensities of protein N and S in relation to the location of the SARS-CoV-2 epitopes derived from N and S proteins. Identification of conserved epitopes is crucial to design broad-spectrum vaccines against the present outbreak of SARS-CoV-2 and the emergence of new SARS-CoV-2 variants that reduce the efficacy of the existing vaccines. Indeed, high-affinity neutralizing antibodies against conserved epitopes could provide immunity to SARS-CoV-2 and protection against eventual, future pandemic viruses. Conservation score measures the evolutionary conservation of an amino acid position in a set of homologous protein sequences. The rate of evolution is not constant among amino acid sites: some positions evolve slowly and are commonly referred to as “conserved”, whereas other positions evolve rapidly and are referred to as “variable”. The rate variations correspond to different levels of purifying selection acting on these sites [29]. This selection can result from geometrical constraints on protein folding and structure, constraints at amino acid sites involved in enzymatic activity, ligand binding, or protein-protein interactions. Here, we used Rate4site to calculate the rate of evolution of each residue in the amino acid sequences of proteins N and S. We then analyzed the structural properties of the protein regions where the epitopes are located by studying their order-disorder propensity. We show the presence of both conserved epitopes and non-conserved epitopes in terms of rate of evolution (Figs. and 3). Specifically, the vast majority of the SARS-CoV-2 epitopes for the N protein are located in the RNA-binding and dimerization domains (Fig. ). In this case, we find epitopes in both ordered and disordered regions (Fig. ). Although we note that the vast majority of epitopes are located in regions having high rates of evolution, we also identify epitopes in conserved protein regions (Fig. ). Similarly, we observe the presence of SARS-CoV-2 epitopes for the S protein around the heptad repeats 1 and 2, which could be more immunogenic against SARS-CoV-2 variants because of their low rate of evolution (Fig. ). We thus suggest that the immune targeting of these conserved epitopes might potentially offer protection against this novel coronavirus and its variants. Finally, it has been shown that numerous SARS S-protein-specific neutralizing antibodies recognize epitopes within the receptor-binding domain (RBD), thus blocking the RBD-ACE2 binding and preventing viral infection [30]. However, we show here that both the RBD (region 437-508) and the fusion peptide (region 788-806) are characterized by high rates of evolution, indicating a tendency for these two regions to mutate and overcome the host immunity.

MATERIALS AND METHODS

Data Sources

The complete coding genomic sequences of SARS-CoV-2 were obtained from NCBI viral databases, accessed as of 16th July, 2021. In this study, we considered seven human coronaviruses: human CoV-229E (HCoV-229E), human CoV-OC43 (HCoV-OC43), human Severe Acute Respiratory Syndrome Coronavirus (SARSr-CoV), human coronavirus NL63 (HCoV-NL63), human CoV-HKU1 (HCoV-HKU1), Middle East Respiratory Syndrome coronavirus (MERS-CoV), and the Severe Acute Respiratory Syndrome-related Coronavirus 2 (SARS-CoV-2). We downloaded the coding sequences of these coronaviruses from the National Center for Biotechnological Information (NCBI) (available at https://www.ncbi.nlm.nih.gov/). For each virus, we have investigated the evolutionary conservation and the structural disorder tendency of protein N (UniProt ID: P0DTC9) and S (UniProt ID: P0DTC2), because they are regarded as important targets for the development of vaccines and antiviral drugs.

Sequence Alignment

To explore the evolutionary relationship among the proteins N and S in the seven human coronaviruses here considered, the selected protein sequences were aligned by using Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) [31]. This tool is a multiple sequence alignment (MSA) program that uses seeded guide trees and HMM profile-profile techniques to generate alignments and phylogenetic trees of divergent sequences.

Disorder Prediction

The structural order-disorder propensity of each protein was predicted by using IUPred2A (https://iupred2a.elte.hu) [27], with the option for long disordered regions. Briefly, IUPred2A is a fast, robust, sequence-only predictor based on an energy estimation approach that allows to identify disordered protein regions. The key component of the calculations is the energy estimation matrix, a 20 by 20 matrix, whose elements characterize the general preference of each pair of amino acids to be in contact as derived from a reference set of globular proteins. This prediction associates a score to each residue in the protein sequence ranging from 0 (strong propensity for an ordered structure) to 1 (strong propensity for a disordered structure), using 0.5 as the threshold to classify residues as either ordered or disordered. In line with the original protocol [32], the position-specific estimations of the structural order-disorder propensity of each residue were then averaged over a window of 21 residues, and the average value was assigned to the central residue of the window (taking into account the limitations on both sides of the protein sequence).

Rate of Evolution for Site

We calculated the rate of evolution per-site of the SARS-CoV-2 proteins N and S relative to their orthologous proteins in other six human coronaviruses using Rate4site (https://m.tau.ac.il/~{}itaymay/cp/rate4site.html) [16]. Rate4Site calculates the evolutionary rate at each site in the MSA using a probabilistic-based evolutionary model. This allows taking into account the stochastic process underlying sequence evolution within protein families and the phylogenetic tree of the proteins in the family. The conservation score at each site in the MSA corresponds to the site's evolutionary rate. The position-specific estimations of the rate of evolution of each residue were then averaged over a window of 21 residues, and the average value was assigned to the central residue of the window (taking into account the limitations on both sides of the protein sequence). The size of the window was taken equal to that used above in section 2.3.

High Quality Set of Epitopes

In this study, we considered two groups of linear epitopes, consisting of continuous residues on the protein sequence of the proteins N and S. The first group comprises the whole set of SARS-CoV B cell and T cell epitopes that map identically to SARS-CoV-2 N and S proteins as identified by Ahmed et al. [15]. The second group consists of a high-quality set of previously identified epitopes directly extrapolated for SARS-CoV-2 [33-40].

CONCLUSION

In conclusion, our results suggest that targeting conserved regions of SARS-CoV-2 spike and nucleocapsid proteins with less plasticity and more structural constraint should have broader utility for antibody-based immunotherapy, neutralization, and prevention of escape variants.

39 in total

1. Effects of human anti-spike protein receptor binding domain antibodies on severe acute respiratory syndrome coronavirus neutralization escape and fitness.

Authors: Jianhua Sui; Meagan Deming; Barry Rockx; Robert C Liddington; Quan Karen Zhu; Ralph S Baric; Wayne A Marasco
Journal: J Virol Date: 2014-09-17 Impact factor: 5.103

2. Evolutionary Forces and Codon Bias in Different Flavors of Intrinsic Disorder in the Human Proteome.

Authors: Sergio Forcelloni; Andrea Giansanti
Journal: J Mol Evol Date: 2019-12-10 Impact factor: 2.395

3. Identification of Highly Conserved SARS-CoV-2 Antigenic Epitopes with Wide Coverage Using Reverse Vaccinology Approach.

Authors: Yasmin Hisham; Yaqoub Ashhab; Sang-Hyun Hwang; Dong-Eun Kim
Journal: Viruses Date: 2021-04-28 Impact factor: 5.048

Review 4. Receptor recognition and cross-species infections of SARS coronavirus.

Authors: Fang Li
Journal: Antiviral Res Date: 2013-08-29 Impact factor: 5.970

5. Codon Usage and Phenotypic Divergences of SARS-CoV-2 Genes.

Authors: Maddalena Dilucca; Sergio Forcelloni; Alexandros G Georgakilas; Andrea Giansanti; Athanasia Pavlopoulou
Journal: Viruses Date: 2020-04-30 Impact factor: 5.048

6. Preliminary Identification of Potential Vaccine Targets for the COVID-19 Coronavirus (SARS-CoV-2) Based on SARS-CoV Immunological Studies.

Authors: Syed Faraz Ahmed; Ahmed A Quadeer; Matthew R McKay
Journal: Viruses Date: 2020-02-25 Impact factor: 5.048

7. Comparative computational analysis of SARS-CoV-2 nucleocapsid protein epitopes in taxonomically related coronaviruses.

Authors: Bruno Tilocca; Alessio Soggiu; Maurizio Sanguinetti; Vincenzo Musella; Domenico Britti; Luigi Bonizzi; Andrea Urbani; Paola Roncada
Journal: Microbes Infect Date: 2020-04-14 Impact factor: 2.700

8. Analysis of preferred codon usage in the coronavirus N genes and their implications for genome evolution and vaccine design.

Authors: Abdullah Sheikh; Abdulla Al-Taher; Mohammed Al-Nazawi; Abdullah I Al-Mubarak; Mahmoud Kandeel
Journal: J Virol Methods Date: 2020-01-05 Impact factor: 2.014

9. SARS coronavirus entry into host cells through a novel clathrin- and caveolae-independent endocytic pathway.

Authors: Hongliang Wang; Peng Yang; Kangtai Liu; Feng Guo; Yanli Zhang; Gongyi Zhang; Chengyu Jiang
Journal: Cell Res Date: 2008-02 Impact factor: 25.617

10. Immunoinformatic Analysis of SARS-CoV-2 Nucleocapsid Protein and Identification of COVID-19 Vaccine Targets.

Authors: Sergio C Oliveira; Mariana T Q de Magalhães; E Jane Homan
Journal: Front Immunol Date: 2020-10-28 Impact factor: 7.561

4 in total

1. From a recombinant key antigen to an accurate, affordable serological test: Lessons learnt from COVID-19 for future pandemics.

Authors: Renata G F Alvim; Tulio M Lima; Danielle A S Rodrigues; Federico F Marsili; Vicente B T Bozza; Luiza M Higa; Fabio L Monteiro; Daniel P B Abreu; Isabela C Leitão; Renato S Carvalho; Rafael M Galliez; Terezinha M P P Castineiras; Leonardo H Travassos; Alberto Nobrega; Amilcar Tanuri; Orlando C Ferreira; André M Vale; Leda R Castilho
Journal: Biochem Eng J Date: 2022-07-16 Impact factor: 4.446

2. Integration of RT-LAMP and Microfluidic Technology for Detection of SARS-CoV-2 in Wastewater as an Advanced Point-of-Care Platform.

Authors: Ahmed Donia; Muhammad Furqan Shahid; Sammer-Ul Hassan; Ramla Shahid; Aftab Ahmad; Aneela Javed; Muhammad Nawaz; Tahir Yaqub; Habib Bokhari
Journal: Food Environ Virol Date: 2022-05-04 Impact factor: 4.034

3. A Bioinformatics Approach to Investigate Structural and Non-Structural Proteins in Human Coronaviruses.

Authors: Vittoria Cicaloni; Filippo Costanti; Arianna Pasqui; Monica Bianchini; Neri Niccolai; Pietro Bongini
Journal: Front Genet Date: 2022-06-14 Impact factor: 4.772

Review 4. Clinical Performance of Rapid and Point-of-Care Antigen Tests for SARS-CoV-2 Variants of Concern: A Living Systematic Review and Meta-Analysis.

Authors: Jimin Kim; Heungsup Sung; Hyukmin Lee; Jae-Seok Kim; Sue Shin; Seri Jeong; Miyoung Choi; Hyeon-Jeong Lee
Journal: Viruses Date: 2022-07-06 Impact factor: 5.818

4 in total