Literature DB >> 33889450

Bioinformatic prediction of immunodominant regions in spike protein for early diagnosis of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

Siqi Zhuang¹, Lingli Tang¹, Yufeng Dai¹, Xiaojing Feng¹, Yiyuan Fang¹, Haoneng Tang¹, Ping Jiang¹, Xiang Wu², Hezhi Fang³, Hongzhi Chen⁴.

Abstract

BACKGROUND: To contain the pandemics caused by SARS-CoV-2, early detection approaches with high accuracy and accessibility are critical. Generating an antigen-capture based detection system would be an ideal strategy complementing the current methods based on nucleic acids and antibody detection. The spike protein is found on the outside of virus particles and appropriate for antigen detection.
METHODS: In this study, we utilized bioinformatics approaches to explore the immunodominant fragments on spike protein of SARS-CoV-2.
RESULTS: The S1 subunit of spike protein was identified with higher sequence specificity. Three immunodominant fragments, Spike56-94, Spike199-264, and Spike577-612, located at the S1 subunit were finally selected via bioinformatics analysis. The glycosylation sites and high-frequency mutation sites on spike protein were circumvented in the antigen design. All the identified fragments present qualified antigenicity, hydrophilicity, and surface accessibility. A recombinant antigen with a length of 194 amino acids (aa) consisting of the selected immunodominant fragments as well as a universal Th epitope was finally constructed.
CONCLUSION: The recombinant peptide encoded by the construct contains multiple immunodominant epitopes, which is expected to stimulate a strong immune response in mice and generate qualified antibodies for SARS-CoV-2 detection.

Entities: Chemical

Keywords: Antigen-capture; COVID-19; Immunodominant fragments; SARS-CoV-2; Spike protein

Year: 2021 PMID： 33889450 PMCID： PMC8038641 DOI： 10.7717/peerj.11232

Source DB: PubMed Journal: PeerJ ISSN： 2167-8359 Impact factor: 2.984

Introduction

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is highly contagious and has caused more than one hundred million infection cases and over 2.4 million deaths (https://www.who.int/, as of February 15, 2021), posing a huge economic and social burden internationally (Lan et al., 2020; Shang et al., 2020). The reports of SARS-CoV-2 reinfection cases suggest that stronger international efforts are required to prevent COVID-19 re-emergence in the future (Zhan, Deverman & Chan, 2020). Nevertheless, the possibility of SARS-CoV-2 becoming a seasonal epidemic cannot be excluded (Shaman & Galanti, 2020). Even worse, the large number of asymptomatic infections greatly increase the difficulties of epidemic control (Rothe et al., 2020). At present, no specific drugs have been developed for SARS-CoV-2, and the effectiveness of the vaccines on the market still needs time to be evaluated. Therefore, early detection and isolation of infected people are still indispensable means to control the spread of the epidemic, which requires accurate, early, economical, and easy-to-operate diagnostic methods (Yan, Chang & Wang, 2020). The real-time reverse transcriptase-polymerase chain reaction (RT-PCR) and antibody-capture serological tests are currently the main diagnostic methods for SARS-CoV-2 (Ishige et al., 2020). As the golden standard, RT-PCR is highly reliable (Bustin & Nolan, 2020; Padoan et al., 2020). However, the implementation costs and relatively cumbersome operation problems make it a big challenge for large population screening (Thabet et al., 2020). The antibody-capture serological test is convenient, but seroconversion generally occurs in the second or third week of illness. Therefore, it is not ideal for the early diagnosis of infection (Hachim et al., 2020; Liu et al., 2020; Tang et al., 2020). The antigen-capture test is an alternative diagnostic method that relies on the immunodetection of viral antigens in clinical samples. Accordingly, this method could be applied for the detection of early infection no matter if the patient was asymptomatic or not (Ohnishi, 2008). Compared with RT-PCR based detection method, it is relatively inexpensive and can be used at the point-of-care. Rapid viral antigen detection has been successfully used for diagnosing respiratory viruses such as influenza and respiratory syncytial viruses (Cazares et al., 2020; Ji et al., 2011; Ohnishi et al., 2005, 2012; Qiu et al., 2005). The sensitivity and specificity of the antigen-capture detection system depend highly on the antigen employed to generate antibodies (Ohnishi et al., 2012). The spike protein is one of the structural proteins of SARS-CoV-2, with the majority located on the outside surface of the viral particles (Fehr & Perlman, 2015; Kumar et al., 2020; Woo et al., 2005). It has a 76.4% homology with the spike protein of SARS-CoV. Sunwoo et al. (2013) showed that the bi-specific spike protein derived monoclonal antibody system exhibited excellent sensitivity in SARS-CoV detection. The virus infection is initiated by the interaction of spike protein receptor-binding domain (RBD) and angiotensin-converting enzyme 2 (ACE2) on host cells. It is widely accepted that the spike protein is one of the earliest antigenic proteins recognized by the host immune system (Callebaut, Enjuanes & Pensaert, 1996; Chen et al., 2020c; Gomez et al., 1998; Lu et al., 2004; Sanchez et al., 1999). Nevertheless, the difficulties of using spike protein as an antigen are also obvious. Firstly, it is not easy to express and purify the full-length spike protein (Tan et al., 2004). Besides, the spike protein is highly glycosylated (Kumar et al., 2020) and prone to mutation (Wang et al., 2020a), which may counteract the sensitivity of antigen-capture based detection method. Hence, it is critical to truncating the glycosylation and mutation sites on spike protein as much as possible in antigen design (Meyer, Drosten & Muller, 2014; Tan et al., 2004). A study using the truncated spike protein to detect SARS-CoV achieved a diagnostic sensitivity of >99% and a specificity of 100% (Mu et al., 2008), which suggests that the truncated spike protein of SARS-CoV-2 could also be an appropriate candidate for the early diagnostic testing and screening of SARS-CoV-2. In this study, we analyzed the spike protein via bioinformatics tools to obtain immunodominant fragments. The predicted sequences were joined together as a novel antigen for the immunization of mice and antibody production. Epitopes information presented by this work may aid in developing a promising antigen-capture based detection system in pandemic surveillance and containment.

Method

Data retrieval and sequence alignment

Multiple bioinformatics analysis tools were used in this study, and the flowchart is depicted in Fig. 1. Coronaviruses had four genera composed of alpha-, beta-, gamma- and delta-coronaviruses. Among them, alpha- and beta- genera could infect humans. Seven beta-coronaviruses are known to infect humans (HCoV-229E, HCoV-OC43, HCoV-NL63, HCoV-HKU1, SARS-CoV, MERS-CoV, and SARS-CoV-2) (Kin et al., 2015; Su et al., 2016). We utilized the NCBI database to obtain the sequences of these human-related coronaviruses spike proteins, of which accession numbers were presented in Fig. 2A. The Clustal Omega Server-Multiple Sequence Alignment was used to analyze the sequence similarity. The analysis of the phylogenetic tree was calculated by the same server. In this study, we set parameters of Clustal Omega as default (Sievers et al., 2011). Additionally, we conducted the EMBOSS Needle Server-Pairwise Sequence Alignment (Needleman & Wunsch, 1970) to compare the whole sequence and several major domains between SARS-CoV-2 and SARS-CoV to find out the specific genomic regions on SARS-CoV-2.

Figure 1

Work flow chart.

Figure 2

Sequence alignment results of spike protein.

(A) Accession IDs and sequence identities of selected coronavirus spike protein. (B) Phylogenetic tree of spike proteins among selected coronavirus. (C) Sequence identity of major domains in spike protein between SARS-CoV-2 and SARS-CoV. (D) Sequence identity of domains in SARS-CoV-2 and SARS-CoV reflected by colors. From red to green, the color changing represents the sequence identity from high to low.

Sequence alignment results of spike protein.

Linear B-cell epitope prediction

Linear B-cell epitopes of the SARS-CoV-2 spike protein were calculated by ABCpred and Bepipred v2.0 servers. For ABCpred, we set a threshold of 0.8 to achieve a specificity of 95.50% and an accuracy of 65.37% for prediction. The window length was set to 16 (the default window length) in this study (Saha & Raghava, 2006). The BepiPred v2.0 combines a hidden Markov model and a propensity scale method. The score threshold for the BepiPred v2.0 was set to 0.5 (the default value) to obtain a specificity of 57.16% and a sensitivity of 58.56% (Jespersen et al., 2017). The residues with scores above 0.5 were predicted to be part of an epitope.

T-cell epitope prediction

The free online service TepiTool server, integrated into the Immune Epitope Database (IEDB), was used to forecast epitopes binding to mice MHC molecules (Paul et al., 2016). Alleles including H-2-Db, H-2-Dd, H-2-Kb, H-2-Kd, H-2-Kk, and H-2-Ld were selected for MHC-I binding epitopes analysis. We checked the “IEDB recommended” option during computation and retained sequences with predicted consensus percentile rank ≤1 as predicted epitope (Trolle et al., 2015). For MHC-II binding epitopes, alleles including H2-IAb, H2-IAd, and H2-IEd were selected for analysis. As the same as MHC-I binding computation, we chose the “IEDB recommended” option, and peptides with predicted consensus percentile rank ≤10 were identified as potential epitopes (Wang et al., 2010; Zhang et al., 2012).

Profiling and evaluation of selected fragments

The secondary structure of the SARS-CoV-2 spike protein (PDB ID: 6VSB chain B) was calculated by the PyMOL molecular graphics system using the SSP algorithm. PyMOL (http://www.pymol.org) is a python-based tool, which is widely used for visualization of macromolecules, such as SARS-CoV-2 spike protein in the current study (Yuan et al., 2016). Vaxijen2.0 server was utilized to analyze the antigenicity of epitopes and selected fragments. A default threshold of 0.4 was set and the prediction accuracy is between 70% and 89% (Doytchinova & Flower, 2007). The hydrophilicity of the selected fragment was analyzed by the online server ProtScale (Wilkins et al., 1999). Surface accessibility of predicted fragments was evaluated by NetsurfP, an online server calculating the surface accessibility and secondary structure of amino acid sequence (Petersen et al., 2009). Critical features such as allergenicity and toxicity were evaluated by online server AllerTOP v2.0 (Dimitrov et al., 2014) and ToxinPred (Gupta et al., 2013). In addition, we utilized IEDB (www.iedb.org) to search the selected fragments and epitopes to clarify whether these peptides have been experimentally verified (Vita et al., 2019). Protein sequence BLAST was performed to evaluate the possibility of cross-reactivity with other mouse protein sequences (Altschul et al., 1997).

Results

Sequence alignment of spike protein in different coronaviruses

We performed sequence alignment to determine the evolutionary relationships between SARS-CoV-2 and other beta-coronaviruses that could infect humans. According to the results of sequence alignment (Figs. 2A and 2B), SARS-CoV is the closest virus to SARS-CoV-2 among the seven HCoVs, exhibiting a 77.46% sequence identity. To better understand the divergence of spike protein sequences between SARS-CoV-2 and SARS-CoV, we further analyzed the sequences of main domains. Results showed that the S2 subunit was the most conserved domain with a 90.0% identity. RBM and NTD domains, which were located in the S1 subunit, exhibited 49.3% and 50.0% identity respectively (Figs. 2C and 2D). Hence, we chose the S1 subunit (amino acid 1-685) for the subsequent bioinformatics analysis given their high specificity.

Linear B-cell epitope prediction of S1 subunit in SARS-CoV-2 spike protein

The B-cell epitope is a surface accessible cluster of amino acids, which could be recognized by secreted antibodies or B-cell receptors and elicit humoral immune response (Getzoff et al., 1988). The immunodominant fragments should contain high-quality linear B-cell epitopes to stimulate antibody production effectively. The sequence of the SARS-CoV-2 S1 subunit was evaluated via ABCpred and BepiPred v2.0. A total of 31 peptides were identified by the ABCpred algorithm (Table S1). For the Bepipred v2.0 server, 14 epitopes were forecasted (Table S2). After antigenicity evaluation, 19 and 9 potential linear B-cell epitopes predicted by the ABCpred server and BepiPred v2.0 server were obtained respectively (Table 1). The peptides predicted by both bioinformatics programs are more likely to be an epitope recognized in vivo. After mapping the positions of peptides identified by these servers, 3 regions containing predicted epitopes were obtained. These regions could be preliminarily considered as candidates for immunodominant fragments (Fig. 3; Table 2).

Table 1

Linear B-cell epitopes predicted by ABCpred and BepiPred v2.0 with antigenicity score exceed the threshold value.

Tools	Position	Sequence	Length	Antigenicity(cut off ≥ 0.4)
ABCpred	583-598	EILDITPCSFGGVSVI	16	1.3971
	406-421	EVRQIAPGQTGKIADY	16	1.3837
	415-430	TGKIADYNYKLPDDFT	16	0.9642
	648-663	GCLIGAEHVNNSYECD	16	0.848
	288-303	AVDCALDPLSETKCTL	16	0.7905
	604-619	TSNQVAVLYQDVNCTE	16	0.7593
	307-322	TVEKGIYQTSNFRVQP	16	0.6733
	200-215	YFKIYSKHTPINLVRD	16	0.657
	257-272	GWTAGAAAYYVGYLQP	16	0.621
	329-344	FPNITNLCPFGEVFNA	16	0.6058
	245-260	HRSYLTPGDSSSGWTA	16	0.6017
	280-295	NENGTITDAVDCALDP	16	0.5804
	49-64	HSTQDLFLPFFSNVTW	16	0.5305
	492-507	LQSYGFQPTNGVGYQP	16	0.5258
	70-85	VSGTNGTKRFDNPVLP	16	0.5162
	236-251	TRFQTLLALHRSYLTP	16	0.5115
	266-281	YVGYLQPRTFLLKYNE	16	0.5108
	594-609	GVSVITPGTNTSNQVA	16	0.4651
	320-335	VQPTESIVRFPNITNL	16	0.4454
Bepipred v2.0	179-190	LEGKQGNFKNLR	12	1.1188
	404-426	GDEVRQIAPGQTGKIADYNYKLP	23	1.1017
	14-34	QCVNLTTRTQLPPAYTNSFTR	21	0.7594
	56-81	LPFFSNVTWFHAIHVSGTNGTKRFDN	26	0.6041
	208-222	TPINLVRDLPQGFSA	15	0.5531
	141-160	LGVYYHKNNKSWMESEFRVY	20	0.5308
	249-261	LTPGDSSSGWTAG	13	0.495
	306-321	FTVEKGIYQTSNFRVQ	16	0.4361
	615-644	VNCTEVPVAIHADQLTPTWRVYSTGSNVFQ	30	0.4259

Figure 3

Preliminary immunodominant fragments based on B-cell epitope prediction results.

The black squares represent epitopes predicted by ABCpred server, the black frames represent epitopes predicted by Bepipred v2.0 server, and the black lines with numbers on both ends represent the preliminary candidate immunodominant fragments.

Table 2

Details of epitopes in the preliminary immunodominant fragments selected according to linear B-cell epitope prediction results.

Regions	Epitope predicted by ABCpred			Epitope predicted by Bepipred v2.0
Regions	Position	Sequence	Antigenicity	Position	Sequence	Antigenicity
49-85	49-64	HSTQDLFLPFFSNVTW	0.5305	56-81	LPFFSNVTWFHAIHVSGTNGTKRFDN	0.6041
49-85	70-85	VSGTNGTKRFDNPVLP	0.5162	56-81	LPFFSNVTWFHAIHVSGTNGTKRFDN	0.6041
200-344	200-215	YFKIYSKHTPINLVRD	0.6570	208-222	TPINLVRDLPQGFSA	0.5531
	236-251	TRFQTLLALHRSYLTP	0.5115	249-261	LTPGDSSSGWTAG	0.4950
	245-260	HRSYLTPGDSSSGWTA	0.6017
	257-272	GWTAGAAAYYVGYLQP	0.6210
	266-281	YVGYLQPRTFLLKYNE	0.5108
	280-295	NENGTITDAVDCALDP	0.5804
	288-303	AVDCALDPLSETKCTL	0.7905	306-321	FTVEKGIYQTSNFRVQ	0.4361
	307-322	TVEKGIYQTSNFRVQP	0.6733
	320-335	VQPTESIVRFPNITNL	0.4454
	329-344	FPNITNLCPFGEVFNA	0.6058
	415-430	TGKIADYNYKLPDDFT	0.9642
583-663	583-598	EILDITPCSFGGVSVI	1.3971	615-644	VNCTEVPVAIHADQLTPTWRVYSTGSNVFQ	0.4259
	594-609	GVSVITPGTNTSNQVA	0.4651
	604-619	TSNQVAVLYQDVNCTE	0.7593
	648-663	GCLIGAEHVNNSYECD	0.8480

Preliminary immunodominant fragments based on B-cell epitope prediction results.

Murine T-cell epitope prediction of S1 subunit in SARS-CoV-2 spike protein

Though B cells are responsible for producing antibodies, humoral immunity is heavily dependent on the activation of T cells (Cho et al., 2019a). Helper T cells (Th) recognize antigen peptides presented by MHC-II molecules and facilitate the humoral immune response (Cho et al., 2019b; Mahon et al., 1995). During humoral immune responses, antigen-activated T cells could provide help in many aspects including directing antibody class switching and guiding the differentiation of antibody-secreting plasma cells as well as the properties of the B-cell antigen receptor (Cho et al., 2019a; Paus et al., 2006; Shulman et al., 2014). Therefore, the immunodominant fragments containing T-cell epitopes could offer essential help to powerful antibody production. The S1 subunit was selected for the prediction of T-cell epitopes. We utilized the TepiTool server to forecast MHC-I and MHC-II binding epitopes. A total of 35 MHC-I binding epitopes was predicted (Table S3), and 27 peptides were identified as MHC-II binding epitopes (Table S4). The antigenicity of these peptides was calculated via Vaxijen 2.0 server (Table 3). Combined with the MHC-II epitopes prediction results, the candidate immunodominant fragments were adjusted (Fig. 4). Compared with the preliminary candidate immunodominant fragments screened according to the linear B-cell epitope prediction, we added the Spike14-34 fragment into consideration because it contains a linear B epitope and an MHC-II binding epitope, both of which had high antigenicity scores (Table 4).

Table 3

MHC-II and MHC-I binding epitopes predicted by TepiTool server with antigenicity score exceed threshold value.

Type	Position	Sequence	Length	Allele	Core(smm-align)	Core(nn-align)	Percentile Rank	Antigenicity(cut off ≥ 0.4
MHC-II binding	538-552	CVNFNFNGLTGTGVL	15	H2-IAb	FNFNGLTGT	FNFNGLTGT	8.55	1.3281
	374-388	FSTFKCYGVSPTKLN	15	H2-IAb	FKCYGVSPT	YGVSPTKLN	6.45	1.0042
	199-213	GYFKIYSKHTPINLV	15	H2-Iab	KIYSKHTPI	YSKHTPINL	6.9	0.9278
	18-32	LTTRTQLPPAYTNSF	15	H2-IAb	TRTQLPPAY	TRTQLPPAY	9.9	0.79
	60-74	SNVTWFHAIHVSGTN	15	H2-IAb	VTWFHAIHV	TWFHAIHVS	9.1	0.7044
	263-277	AAYYVGYLQPRTFLL	15	H2-IAb	VGYLQPRTF	VGYLQPRTF	8.75	0.6073
	592-606	FGGVSVITPGTNTSN	15	H2-IAb	VITPGTNTS	VSVITPGTN	6	0.5825
	238-252	FQTLLALHRSYLTPG	15	H2-IEd	TLLALHRSY	TLLALHRSY	9.85	0.5789
	345-359	TRFASVYAWNRKRIS	15	H2-IAb	FASVYAWNR	YAWNRKRIS	7.45	0.4963
	215-229	DLPQGFSALEPLVDL	15	H2-IAb	FSALEPLVD	FSALEPLVD	6.05	0.4812
	140-154	FLGVYYHKNNKSWME	15	H2-IEd	GVYYHKNNK	YYHKNNKSW	6.4	0.4793
	512-526	VLSFELLHAPATVCG	15	H2-IAb	FELLHAPAT	FELLHAPAT	2.9	0.4784
	87-101	NDGVYFASTEKSNII	15	H2-Iab	YFASTEKSN	VYFASTEKS	6.85	0.4277
	52-66	QDLFLPFFSNVTWFH	15	H2-IAb	FLPFFSNVT	FLPFFSNVT	2.95	0.4159
	233-247	INITRFQTLLALHRS	15	H2-IAd	ITRFQTLLA	ITRFQTLLA	1.9	0.4118
MHC-I binding	643-651	FQTRAGCLI	9	H-2-Kk			0.6	1.7332
	612-620	YQDVNCTEV	9	H-2-Db			0.4	1.6172
	539-547	VNFNFNGLT	9	H-2-Kb			0.47	1.5069
	503-511	VGYQPYRVV	9	H-2-Kb			0.47	1.4383
	379-387	CYGVSPTKL	9	H-2-Kd			0.3	1.4263
	16-24	VNLTTRTQL	9	H-2-Kb			0.86	1.3468
	510-518	VVVLSFELL	9	H-2-Kb			0.43	1.0909
	202-210	KIYSKHTPI	9	H-2-Kb			0.27	0.7455
	168-176	FEYVSQPFL	9	H-2-Kk			0.5	0.6324
	268-276	GYLQPRTFL	9	H-2-Kd			0.2	0.6082
	505-513	YQPYRVVVL	9	H-2-Dd			0.3	0.5964
	488-496	CYFPLQSYG	9	H-2-Kd			0.64	0.578
	215-223	DLPQGFSAL	9	H-2-Dd			0.69	0.5622
	342-350	FNATRFASV	9	H-2-Kb			0.56	0.5609
	84-92	LPFNDGVYF	9	H-2-Ld			0.21	0.5593
	484-492	EGFNCYFPL	9	H-2-Kb			0.84	0.5453
	62-70	VTWFHAIHV	9	H-2-Kb			0.61	0.5426
	489-497	YFPLQSYGF	9	H-2-Dd			0.8	0.5107
	350-358	VYAWNRKRI	9	H-2-Kd			0.7	0.5003
	60-68	SNVTWFHAI	9	H-2-Kb			0.82	0.4892
	262-270	AAAYYVGYL	9	H-2-Kb			0.98	0.4605

Figure 4

Adjusted candidate immunodominant fragments according to MHC-II T-cell epitope prediction results.

The black squares represent epitopes predicted by ABCpred server, and the black frames represent epitopes predicted by Bepipred v2.0 server. The red frames denote MHC-II binding epitopes. The black lines with numbers on both ends represent the adjusted candidate fragments.

Table 4

Details of candidate immunodominant fragments adjusted according to the MHC-II binding T-cell epitopes prediction results.

Regions	Linear B-cell epitopes				MHC-II binding epitopes
Regions	Tools	Position	Sequence	Antigenicity	Position	Sequence	Antigenicity
14-34	Bepipred v2.0	14-34	QCVNLTTRTQLPPAYTNSFTR	0.7594	18-32	LTTRTQLPPAYTNSF	0.7900
49-101	Bepipred v2.0	56-81	LPFFSNVTWFHAIHVSGTNGTKRFDN	0.6041	52-66	QDLFLPFFSNVTWFH	0.4159
	ABCpred	49-64	HSTQDLFLPFFSNVTW	0.5305	60-74	SNVTWFHAIHVSGTN	0.7044
	ABCpred	70-85	VSGTNGTKRFDNPVLP	0.5162	87-101	NDGVYFASTEKSNII	0.4277
199-359	Bepipred v2.0	208-222	TPINLVRDLPQGFSA	0.5531	199-213	GYFKIYSKHTPINLV	0.9278
		249-261	LTPGDSSSGWTAG	0.4950
		306-321	FTVEKGIYQTSNFRVQ	0.4361
	ABCpred	200-215	YFKIYSKHTPINLVRD	0.6570
		236-251	TRFQTLLALHRSYLTP	0.5115	215-229	DLPQGFSALEPLVDL	0.4812
		245-260	HRSYLTPGDSSSGWTA	0.6017	215-229	DLPQGFSALEPLVDL	0.4812
		257-272	GWTAGAAAYYVGYLQP	0.6210	233-247	INITRFQTLLALHRS	0.4118
		266-281	YVGYLQPRTFLLKYNE	0.5108	233-247	INITRFQTLLALHRS	0.4118
		280-295	NENGTITDAVDCALDP	0.5804	238-252	FQTLLALHRSYLTPG	0.5789
		288-303	AVDCALDPLSETKCTL	0.7905	238-252	FQTLLALHRSYLTPG	0.5789
		307-322	TVEKGIYQTSNFRVQP	0.6733	263-277	AAYYVGYLQPRTFLL	0.6073
		320-335	VQPTESIVRFPNITNL	0.4454
		329-344	FPNITNLCPFGEVFNA	0.6058
		329-344	FPNITNLCPFGEVFNA	0.6058	345-359	TRFASVYAWNRKRIS	0.4963
583-620	ABCpred	583-598	EILDITPCSFGGVSVI	1.3971	592-606	FGGVSVITPGTNTSN	0.5825
		594-609	GVSVITPGTNTSNQVA	0.4651
		604-619	TSNQVAVLYQDVNCTE	0.7593

Adjusted candidate immunodominant fragments according to MHC-II T-cell epitope prediction results.

Immunodominant fragments refinement according to the glycosylation site distribution, mutation site distribution, and secondary structure

A profile of 24 glycosylation sites of SARS-CoV-2 spike protein has been reported (Shajahan et al., 2020). Since glycans could hinder the recognition of antigens by shielding the residues (Walls et al., 2019), protein glycosylation would affect the performance of antigen detection. Thus, glycosylation sites should be circumvented when selecting the immunodominant fragments. According to the study of Shajahan et al. (2020), 15 glycosylation sites were located in the S1 subunit of the spike protein. Hence, the fragments in this study were adjusted to Spike14-34, Spike49-101, Spike199-261, and Spike583-620. To retain antigenicity of the epitopes, the final identified fragments only contained 3 glycosylation sites which should have a minimum effect on antigen recognition. Rapid transmission of COVID-19 provides the SARS-CoV-2 with substantial opportunities for natural selection and mutations. To ensure the stability of the detection method, the immunodominant fragments were modified to avoid high-frequency mutation sites (Wang et al., 2020b). Spike14-34 were excluded for containing four high-frequency mutation sites. Fragment Spike49-101 was adjusted to Spike56-92, and fragment Spike583-620 was adjusted to Spike583-609. By adjusting the fragments, we avoided in a total of 8 high-frequency mutation sites (L5F, L18F, T29I, R21K/T, H49Y, L54F, S98F, D614G). The mainly mutant sites on the recent emergent highly infectious variants (including B.1.1.7, B.1.351, and P.1), such as N501Y, D614G, E484K, Y144del, K417N, and A570D were also not included in our fragments. The adjusted fragments contain none of the above high-frequency mutation sites, which might avoid the impact of mutations on detection performance and improve the detection efficiency in the future (Li et al., 2020; Tegally et al., 2020). The PyMOL was used to present the secondary structure of the spike protein (PDB ID: 6VSB) (Fig. S1). To keep the integrity of the secondary structure of the selected fragments, we extended the N- and C-ends with 2~5 residues, and the immunodominant fragments were finally adjusted to Spike56-94, Spike199-264, and Spike577-612. The epitopes and potential glycosylation sites contained in the selected immunodominant fragments were displayed in Fig. 5.

Figure 5

The epitopes and glycosylation sites on the selected immunodominant fragments.

(A–C) Present the predicted epitopes and glycosylation sites on fragment Spike56-94, Spike199-264 and Spike577-612 respectively. The black squares represent epitopes predicted by ABCpred server, and the black frames represent epitopes predicted by Bepipred v2.0 server. The red squares represent MHC-I binding epitopes, and the red frames represent MHC-II binding epitopes. The gray squares mean occupied glycosylation sites contained in the selected fragments.

The epitopes and glycosylation sites on the selected immunodominant fragments.

Profiling, evaluation, and visualization of selected immunodominant fragments

To further evaluate the antibody binding potentiality of these antigenic regions, the key features of the selected fragments such as antigenicity, hydrophilicity, surface accessibility, toxicity, and allergenicity were analyzed and presented (Table 5). The hydrophilicity and surface accessibility of the spike protein subunit 1 were calculated. The selected fragments of interest were submitted for computation of antigenicity, toxicity, and allergenicity. Three fragments presented relatively moderate hydrophilicity and surface accessibility. The proportion of hydrophilic amino acids in the selected fragments Spike56-94, Spike199-264, Spike577-612 are 48.72%, 45.45%, 33.33% respectively. The surface accessibility of these fragments calculated by the online server was shown in Table 5.

Table 5

Significant features of the selected immunodominant fragments.

The sequences marked as bold and italic in the table represent amino acids with hydrophilicity and surface accessibility respectively.

Fragments	Spike_56-94	Spike_199-264	Spike_577-612
Length(aa)	39	66	36
Sequence	LPFFSNVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFAS	GYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAA	RDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLY
Antigenicity	0.4590	0.5774	0.9127
Domain	S1(NTD)	S1(NTD)	S1
Hydrophilicity fragments	LPFFSNVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFAS	GYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAA	RDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLY
Surface Accessibility fragments	LPFFSNVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFAS	GYFKIYSK HTPINLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAA	RDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLY
Toxicity	Non-toxin	Non-toxin	Non-toxin
Allergenicity	non-allergen	non-allergen	probable allergen

Significant features of the selected immunodominant fragments.

The sequences marked as bold and italic in the table represent amino acids with hydrophilicity and surface accessibility respectively. The toxicity of the selected fragments was examined and no fragment was predicted to be toxic. The allergenicity was assessed and only fragment Spike577-612 was predicted to be a probable allergen. Attention should be paid to monitor potential allergic reactions when injecting the recombinant protein into mice. And the selected fragments were presented as the sphere in the trimer structure (Fig. 6). Next, we scanned the selected fragments utilizing the IEDB database to determine whether they were experimentally tested. The results showed that Spike200-215(IEDB ID: 1330367) and Spike238-252 (IEDB ID: 1329417) were identified experimentally as HLA class II epitope in SARS-CoV-2. Spike84-92 (IEDB ID: 1321049) and Spike202-210 (IEDB ID: 1319559) have been experimentally proved as HLA-B epitopes (Table S5). These findings enhanced the credibility of the current in silico analysis. The fragments identified would have a strong capacity in stimulating powerful antibody production.

Figure 6

Selected immunodominant fragments presented as spheres in the trimer structure of spike protein viewed by PyMOL.

Selected fragments were presented as red spheres, green cartoons denote unselected sections. (A, B, and C) denote fragments Spike56-94, Spike199-264, and Spike577-612 respectively.

Selected immunodominant fragments presented as spheres in the trimer structure of spike protein viewed by PyMOL.

Selected fragments were presented as red spheres, green cartoons denote unselected sections. (A, B, and C) denote fragments Spike56-94, Spike199-264, and Spike577-612 respectively.

Immunodominant fragments based recombinant antigen design

Three immunodominant fragments embody several linear B-cell epitopes, MHC-I binding, and MHC-II binding T-cell epitopes were selected. As a universal Th epitope, the PAN DR epitope [PADRE(AKFVAAWTLKAAA)] was added into the construction aiming to boost helper T cell activity (Alexander et al., 2000; Ghaffari-Nazari et al., 2015). (GGGGS)n is a wildly used flexible linker with the function of segmenting protein fragments, maintaining protein conformation, preserving biological activity, and promoting protein expression (Chen, Zaro & Shen, 2013). Finally, we combined the fragments and a PADRE epitope by linker peptide (GGGGS)2 and (GGGGS)3 (Chen, Zaro & Shen, 2013) (Fig. 7). The predicted antigenicity of the final construct (194 aa) was 0.5690 (Table 6). A protein BLAST for the final construct was conducted to evaluate the possibility of cross-reactivity. The BLAST result suggested that, except for the SARS-CoV-2 spike protein, no protein would cross-react with the construct (Raw data in the Supplemental Files), which indicated that our fragments possess good specificity.

Figure 7

A schematic diagram of recombinant peptide composed of selected fragments and a PADRE epitope.

Table 6

The structure and antigenicity of final recombinant peptides.

Final construct	PAN DR + (GGGGS)₂ + Spike_56-94 +(GGGGS)₃ + Spike_199-264 +(GGGGS)₃ + Spike_577-612
Sequence	AKFVAAWTLKAAAGGGGSGGGGSLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASGGGGSGGGGSGGGGSGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAGGGGSGGGGSGGGGSRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLY
Antigenicity	0.5690

Discussion

In this study, the immunodominant fragments within the S1 subunit of the SARS-CoV-2 spike protein were explored. The final construct consists of three immunodominant fragments Spike56-94, Spike199-264, Spike577-612, and a PADRE epitope. The recombinant antigen will be used to immunize mice to generate qualified antibody which could be applied for developing an antigen-capture based detection system. The antibody-based antigen capturing method is user-friendly, time-saving, and economical. Thus, it is an ideal complementary detection strategy especially for early diagnosis and large population screening. The monoclonal antibodies against SARS-CoV have been successfully applied in the immunological antigen-detection of SARS-CoV (Ohnishi, 2008). Accordingly, we explored the immunodominant fragments on the spike protein of SARS-CoV-2, which would provide aid in developing an accurate and fast antigen-capture based early detection system for SARS-CoV-2. We selected the S1 subunit for immunodominant fragments screening after divergence analysis. It had been reported that an S1 antigen-based assay of SARS-CoV could capture the virus as soon as the infection occurs (Sunwoo et al., 2013). Jong-Hwan Lee et al. designed a method that could seize and detect spike protein S1 subunit of SARS-CoV-2 using ACE2 receptor and S1-mAb (Lee et al., 2021). This finding suggests that it is appropriate to use the S1 subunit for specific and early diagnosis of SARS-CoV-2. Three immunodominant fragments (Spike56-94, Spike199-264, and Spike577-612) were identified in the present study. These sequences will be joined to construct recombinant peptides in the next step. Instead of using inactivated full-length spike protein, we designed a novel recombinant protein construct that increased sequence specificity as well as circumvented mutation sites and glycosylation sites. As the antigen design is based on bioinformatics study, the exact ability of the selected fragments to produce qualified antibodies for virus detection has yet to be determined by experiments. Noticeably, the spike protein of SARS-CoV-2 is heavily glycosylated. Glycans could shield epitopes during antibody recognition, which may interfere with the detection of viral proteins (Shajahan et al., 2020). About 17 N-glycosylation sites along with two O-glycosylation sites were found occupied in the spike protein of SARS-CoV-2 (Shajahan et al., 2020). We circumnavigated most glycosylation sites when selecting immunodominant fragments. The three selected fragments in this study only contain 3 glycosylation sites. In case these glycosylation sites impede the diagnostic performance, an additional deglycosylation step with N-glycanase should be applied for the test specimens (Dermani et al., 2019), which is a simple and efficient method for deglycosylation (Hirani, Bernasconi & Rasmussen, 1987; Huang et al., 2015; Lattová et al., 2016; Zheng, Bantog & Bayer, 2011). Alternatively, an eukaryotic expressing system could be employed to mimic the antigen presented in human cells. Though coronaviruses can find and repair errors during the replication process (Wang et al., 2020b), the SARS-CoV-2 genome still presents a large number of mutations. Mutations could not only help virus slip past our immune defense, but also spoil the efficiency of diagnostic tests (Chen et al., 2020b). In this study, we circumvented high-frequency mutation sites when selecting antigen fragments. In addition, our fragments also avoided RBD regions which are prone to mutation (Chen et al., 2020b). The construct finally built contained no high-frequency mutation. To date, several studies using predictive algorithms to analyze SARS-CoV-2 have been reported (Alam et al., 2020; Behmard et al., 2020; Can et al., 2020; Chen et al., 2020a; Dong et al., 2020; Poran et al., 2020; Saha, Ghosh & Burra, 2021; Sohail et al., 2021). However, most of these bioinformatics analyses against SARS-CoV-2 intended to develop effective vaccines to prevent infection and the identified sequences possess high homology with other viruses, especially SARS-CoV (Bhattacharya et al., 2020; Chen et al., 2020a; Robson, 2020). On the contrary, the fragments suitable for diagnosis should be unique when compared with other species to ensure the specificity of detection. Therefore, the results obtained from vaccine studies are not ideal for virus detection. In this study, attention was paid to the sequences with high variability, hence the immunodominant fragments identified are more specific. Distinct from vaccine studies, murine MHC alleles were selected in epitopes prediction in this study, so that the designed antigen could trigger a strong humoral immune response in mice. Furthermore, glycosylated sites and recently identified high-frequency mutation sites were deliberately avoided during the screening process to eliminate their potential adverse impact. In silico analysis has been widely used to mine and identify various pathogens as well as epitopes prediction (Kiyotani et al., 2020; Liò & Goldman, 2004; Qin et al., 2003; Robson, 2020; Shen et al., 2003). In this study, identified fragments were further scanned in the IEDB database, and found four peptides contained in the sequences were experimentally validated epitopes (Table S5), which reinforced the conclusion of the present study. In the following studies, we will immunize Balb/c mice with the designed antigen to generate mAbs which could be utilized for SARS-CoV-2 diagnosis after evaluating their sensitivity, specificity, and other related properties.

Conclusion

Through bioinformatics analysis, three immunodominant fragments were identified in the present study. After connected by flexible linkers, we acquired a final recombinant peptide with 194 residues. It was predicted to possess high antigenicity and specificity for SARS-CoV-2. Our next move is to express and purify the recombinant protein in a suitable expression system, followed by immunizing the mice with purified immunogen to obtain specific antibodies. The present study would provide aid in developing an antigen-capture based detection system.

The secondary structure presentation of SARS-CoV2 spike using PyMOL server.

The letter L with green shades represents the loop structure, the letter S with yellow shades denotes the sheet structure and the letter H with gray shades indicates the helix structure. The lines “-” represent gaps in the sequence when we use PyMOL to visually present it. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file.

Raw data.

All prediction results including linear B-cell epitope, T-cell epitope, hydrophilicity and surface accessibility of SARS-CoV-2 spike protein S1 subunit. Click here for additional data file.

82 in total

Review 1. The chemistry and mechanism of antibody binding to protein antigens.

Authors: E D Getzoff; J A Tainer; R A Lerner; H M Geysen
Journal: Adv Immunol Date: 1988 Impact factor: 3.543

2. Linear PADRE T helper epitope and carbohydrate B cell epitope conjugates induce specific high titer IgG antibody responses.

Authors: J Alexander; M F del Guercio; A Maewal; L Qiao; J Fikes; R W Chesnut; J Paulson; D R Bundle; S DeFrees; A Sette
Journal: J Immunol Date: 2000-02-01 Impact factor: 5.422

3. A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors: S B Needleman; C D Wunsch
Journal: J Mol Biol Date: 1970-03 Impact factor: 5.469

4. Design of an epitope-based peptide vaccine against the SARS-CoV-2: a vaccine-informatics approach.

Authors: Aftab Alam; Arbaaz Khan; Nikhat Imam; Mohd Faizan Siddiqui; Mohd Waseem; Md Zubbair Malik; Romana Ishrat
Journal: Brief Bioinform Date: 2021-03-22 Impact factor: 11.622

5. Immunological detection of severe acute respiratory syndrome coronavirus by monoclonal antibodies.

Authors: Kazuo Ohnishi; Masahiro Sakaguchi; Tomohiro Kaji; Kiyoko Akagawa; Tadayoshi Taniyama; Masataka Kasai; Yasuko Tsunetsugu-Yokota; Masamichi Oshima; Kiichi Yamamoto; Naomi Takasuka; Shu-ichi Hashimoto; Manabu Ato; Hideki Fujii; Yoshimasa Takahashi; Shigeru Morikawa; Koji Ishii; Tetsutaro Sata; Hirotaka Takagi; Shigeyuki Itamura; Takato Odagiri; Tatsuo Miyamura; Ichiro Kurane; Masato Tashiro; Takeshi Kurata; Hiroshi Yoshikura; Toshitada Takemori
Journal: Jpn J Infect Dis Date: 2005-04 Impact factor: 1.362

6. Dynamic signaling by T follicular helper cells during germinal center B cell selection.

Authors: Ziv Shulman; Alexander D Gitlin; Jason S Weinstein; Begoña Lainez; Enric Esplugues; Richard A Flavell; Joseph E Craft; Michel C Nussenzweig
Journal: Science Date: 2014-08-29 Impact factor: 47.728

7. Genomic Analysis of 15 Human Coronaviruses OC43 (HCoV-OC43s) Circulating in France from 2001 to 2013 Reveals a High Intra-Specific Diversity with New Recombinant Genotypes.

Authors: Nathalie Kin; Fabien Miszczak; Wei Lin; Meriadeg Ar Gouilh; Astrid Vabret
Journal: Viruses Date: 2015-05-07 Impact factor: 5.048

8. RT-qPCR Testing of SARS-CoV-2: A Primer.

Authors: Stephen A Bustin; Tania Nolan
Journal: Int J Mol Sci Date: 2020-04-24 Impact factor: 5.923

9. Clinical Performance of Two SARS-CoV-2 Serologic Assays.

Authors: Mei San Tang; Karl G Hock; Nicole M Logsdon; Jennifer E Hayes; Ann M Gronowski; Neil W Anderson; Christopher W Farnsworth
Journal: Clin Chem Date: 2020-08-01 Impact factor: 8.327

10. The expression and antigenicity of a truncated spike-nucleocapsid fusion protein of severe acute respiratory syndrome-associated coronavirus.

Authors: Feng Mu; Dongsheng Niu; Jingsong Mu; Bo He; Weiguo Han; Baoxing Fan; Shengyong Huang; Yan Qiu; Bo You; Weijun Chen
Journal: BMC Microbiol Date: 2008-11-28 Impact factor: 3.605

2 in total

1. Investigate the interaction of testosterone/progesterone with ionic liquids on varying the anion to combat COVID-19: Density functional theory calculations and molecular docking approach.

Authors: Kamlesh Kumari; Ajay Kumar; Indra Bahadur; Prashant Singh
Journal: J Phys Org Chem Date: 2021-08-18 Impact factor: 2.155

2. Considering epitopes conservity in targeting SARS-CoV-2 mutations in variants: a novel immunoinformatics approach to vaccine design.

Authors: Mohammad Aref Bagherzadeh; Mohammad Izadi; Kazem Baesi; Mirza Ali Mofazzal Jahromi; Majid Pirestani
Journal: Sci Rep Date: 2022-08-18 Impact factor: 4.996

2 in total