Literature DB >> 33091154

Vaccine design based on 16 epitopes of SARS-CoV-2 spike protein.

Jinlei He1, Fan Huang2, Jianhui Zhang1, Qiwei Chen1, Zhiwan Zheng1, Qi Zhou1, Dali Chen1, Jiao Li1, Jianping Chen1,3.   

Abstract

The global outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) urgently requires an effective vaccine for prevention. In this study, 66 epitopes containing pentapeptides of SARS-CoV-2 spike protein in the IEDB database were compared with the amino acid sequence of SARS-CoV-2 spike protein, and 66 potentially immune-related peptides of SARS-CoV-2 spike protein were obtained. Based on the single-nucleotide polymorphisms analysis of spike protein of 1218 SARS-CoV-2 isolates, 52 easily mutated sites were identified and used for vaccine epitope screening. The best vaccine candidate epitopes in the 66 peptides of SARS-CoV-2 spike protein were screened out through mutation and immunoinformatics analysis. The best candidate epitopes were connected by different linkers in silico to obtain vaccine candidate sequences. The results showed that 16 epitopes were relatively conservative, immunological, nontoxic, and nonallergenic, could induce the secretion of cytokines, and were more likely to be exposed on the surface of the spike protein. They were both B- and T-cell epitopes, and could recognize a certain number of HLA molecules and had high coverage rates in different populations. Moreover, epitopes 897-913 were predicted to have possible cross-immunoprotection for SARS-CoV and SARS-CoV-2. The results of vaccine candidate sequences screening suggested that sequences (without linker, with linker GGGSGGG, EAAAK, GPGPG, and KK, respectively) were the best. The proteins translated by these sequences were relatively stable, with a high antigenic index and good biological activity. Our study provided vaccine candidate epitopes and sequences for the research of the SARS-CoV-2 vaccine.
© 2020 Wiley Periodicals LLC.

Entities:  

Keywords:  SARS-CoV-2; epitope; non-synonymous mutation; spike protein; vaccine

Mesh:

Substances:

Year:  2020        PMID: 33091154      PMCID: PMC7675516          DOI: 10.1002/jmv.26596

Source DB:  PubMed          Journal:  J Med Virol        ISSN: 0146-6615            Impact factor:   20.693


INTRODUCTION

The severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) has caused a worldwide pandemic, seriously threatening the health of entire humankind, and an effective vaccine is urgently needed to help people resist the infection of the virus. The spike protein on the virion surface is thought to bind to the host receptor angiotensin‐converting enzyme 2 (ACE2) and plays an important role in cell adhesion and virulence similar to SARS‐CoV.  As the spike protein plays key roles in inducing neutralizing antibodies and protective immunity during SARS‐CoV infection, , the spike protein is considered to be an important vaccine research target for SARS‐CoV‐2. According to a report by Lucchese G from Universitätsmedizin Greifswald, the proteome of SARS‐CoV‐2 was compared with that of humans, focusing on searching for pentapeptides that are unique to SARS‐CoV‐2, especially pentapeptides of SARS‐CoV‐2 in spike protein. They found 107 unique pentapeptides in SARS‐CoV‐2 spike protein, corresponding to 66 antigen epitopes containing pentapeptides of SARS‐CoV‐2 spike protein in the Immune Epitope Database (IEDB, http://www.iedb.org). These epitopes in the IEDB database had been experimentally proven to have immunologic relevance and their templates were mainly derived from SARS‐CoV. As a novel coronavirus closely related to SARS‐CoV, the corresponding peptides of SARS‐CoV‐2 may also be immunologically relevant. Therefore, based on the above studies, our study aligned the 66 epitope sequences in the IEDB database with the amino acid sequences of SARS‐CoV‐2 spike protein to obtain the corresponding 66 peptides of SARS‐CoV‐2 spike protein, which may be candidate epitopes for vaccine research. Mutation analysis and immunoinformatics analysis were used to screen the best vaccine candidate epitopes from the 66 peptides of SARS‐CoV‐2 spike protein. Then, the best candidate epitopes were connected by different linkers in silico to obtain vaccine candidate sequences. Through structure, surface properties, and function analysis, the best vaccine candidate sequences were finally screened out.

MATERIALS AND METHODS

Sequence alignment of 66 epitopes in IEDB database to SARS‐CoV‐2 spike protein

We downloaded the spike protein amino acid sequence of SARS‐CoV‐2 isolate Wuhan‐Hu‐1 from GenBank (GenBank ID: QHD43416.1). The sequences of the 66 epitopes containing pentapeptides of SARS‐CoV‐2 spike protein were from Lucchese G's report and checked in the IEDB database. Then, the sequences of these epitopes were aligned with the amino acid sequence of SARS‐CoV‐2 spike protein to obtain 66 peptides at the corresponding sequence position of SARS‐CoV‐2 spike protein, which might be candidate epitopes of a vaccine.

Detection of nonsynonymous mutation sites of SARS‐CoV‐2 spike protein

As nonsynonymous mutation sites in the viral amino acid sequence may affect the recognition of vaccine antigens, vaccine candidate antigens are generally more inclined to choose conservative sequences. , Therefore, the inclusion of mutation sites in candidate epitopes of SARS‐CoV‐2 should be avoided as much as possible. We searched the 2019 Novel Coronavirus Resource (2019nCoVR, https://bigd.big.ac.cn/ncov) from the China National Center for Bioinformation (CNCB) to obtain high‐quality genomic data of SARS‐CoV‐2 clinical isolates. A total of 1218 isolates from 34 countries around the world sampled from June 1, 2020 to June 30, 2020 were selected for analysis. The detailed countries are shown in Table S1. We focused on counting nonsynonymous mutations that cause amino acid changes in spike protein single‐nucleotide polymorphism (SNPs). The amino acid sites with nonsynonymous mutations that appeared twice or more in 1218 isolates were considered to be easily mutated. The obtained 66 peptides of SARS‐CoV‐2 spike protein were checked for the presence of the easily mutated amino acid sites, and peptides containing the easily mutated sites should be noted in subsequent screening.

Screening candidate vaccine epitopes in spike protein

The immune protective antigens in the peptides of SARS‐CoV‐2 spike protein were predicted using immunoinformatics tool Vaxijen v2.0, the toxic peptides were predicted using ToxinPred and the allergenic peptides were predicted using AllergenFP v.1.0. The ability of the epitopes to induce interferon‐γ (IFN‐γ), interleukin‐4 (IL‐4), and IL‐10 secretion was predicted using IFNepitope, IL4Pred, and IL‐10Pred, respectively. The peptides with nonantigenic protection, toxicity, or allergenicity were removed, and the remaining peptides were used as antigen epitopes for subsequent screening. The solvent accessibility of each amino acid of spike protein (template 6xr8.1 ) was predicted by SWISS‐MODEL to screen the epitopes that were more likely to be exposed on the surface of the spike protein. ABCpred and IEDB Bepipred Linear Epitope Prediction 2.0 were used to predict B‐cell epitopes. NetMHC 4.0 Sever, Rankpep,  and SYFPEITHI were used to predict T‐cell epitopes and HLA molecules. As different HLA types are expressed at dramatically different frequencies in different ethnicities, after obtaining the results of HLA class I and class II molecules recognized by these epitopes, we predicted the coverage rate of each epitope in different populations using Population Coverage in IEDB Analysis Resource. Although some epitopes contained easily mutated sites, some of them might be strong neutralizing epitopes which might induce strong protections and should also be considered in vaccine design. Therefore, according to the above analysis, the selected vaccine candidate epitopes for SARS‐CoV‐2 were predicted to be relatively conservative, immunoprotective, nontoxic, and nonallergenic, and could promote the secretion of cytokines and more likely to be exposed on the surface of the spike protein. They were both B‐ and T‐cell epitopes, which could identify a certain number of HLA molecules and had high coverage rates in different populations.

Acquisition, analysis, and screening of vaccine candidate sequences

The selected vaccine candidate epitopes were connected by different linkers (no linker, GGGGS, GGGSGGG, EAAAK, GPGPG, AAY, and KK, respectively) to obtain vaccine candidate sequences. Bioinformatics tools were used to analyze and screen the vaccine candidate sequences. PredictProtein was used to predict the amino acid composition, secondary structure composition, solvent accessibility, and gene ontology terms of the candidate sequences. The flexibility and antigenic index of the candidate sequences were predicted using DNAStar software. Expasy ProtParam tool was used to predict the half‐life and stability of the candidate proteins. Finally, through a comprehensive analysis, the best candidate vaccine sequences were selected and will be prepared into vaccines and their immune effects verfied through animal experiments.

RESULTS

Epitope sequence alignment

After comparing the amino acid sequences of 66 epitopes in the IEDB database with those of corresponding positions of SARS‐CoV‐2 spike protein, 66 peptides belonging to SARS‐CoV‐2 spike protein were obtained and shown in Table 1. Among the 66 epitopes in the IEDB database, 60 epitopes were from the spike protein of SARS‐CoV, four epitopes were from hemagglutinin of influenza A virus, and two epitopes were from ribonucleoside‐diphosphate reductase large subunit‐like protein of human herpesvirus 6B. Among the obtained 66 peptides of SARS‐CoV‐2 spike protein, six peptides (310‐317, 757‐764, 891‐907, 897‐913, 899‐906, and 1025‐1041 ) were completely consistent with the sequences of epitopes in the IEDB database, which are bolded in Table 1. Moreover, there were seven peptides (356‐372, 356‐373, 365‐381, 371‐387, 373‐389, 379‐395, and 418‐434) partially overlapped with CR3022 epitope of SARS‐CoV‐2 published in Science by Yuan et al., which are underlined in Table 1. CR3022 is a neutralizing antibody previously isolated from a convalescent SARS patient and targets a highly conserved epitope that enables cross‐reactive binding between SARS‐CoV and SARS‐CoV‐2. , CR3022 related epitopes may produce cross‐protective antibody responses against SARS‐CoV and SARS‐CoV‐2. Therefore, these peptides need to be focused on in subsequent experiments.
Table 1

Sequence alignment of 66 epitopes in IEDB database to SARS‐CoV‐2 spike protein

IEDB ID numberEpitope sequenceOrganismPosition in spike proteinAmino acid sequence in spike proteinEasily mutated site
307aalvsgtatagWTFGAgSARS‐CoV875‐891SALLAGTITSGWTFGAGN/A
462aatkMSECVlgqskrvdSARS‐CoV 1025‐1041 AATKMSECVLGQSKRVD N/A
1460agclIGAEHvdtsyecdSARS‐CoV647‐663AGCLIGAEHVNNSYECD653, 660
3176aMQMAYRFSARS‐CoV 899‐906 AMQMAYRF N/A
6011canlllqygsFCTQLnralsgiaSARS‐CoV749‐771CSNLLLQYGSFCTQLNRALTGIA769
6333cgpklstdliknqCVNFNfngltgtgvltpsskrfqpfqqfgSARS‐CoV525‐566CGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGN/A
6334cgpklstdliknqCVNFNfngltgtgvltpsskrfqpfqqfgrdvsdftdSARS‐CoV525‐574CGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTD574
7066csqnplaelkcsvksfeidkGIYQTsnfrvvpsgdSARS‐CoV291‐325CALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESN/A
7217cttfddvqapnytqhtssmRGVYYPDeifrSARS‐CoV15‐44CVNLTTRTQLPPAYTNSFTRGVYYPDKVFR17, 21, 22, 29
7383CYGVSatklndlcfsnvSARS‐CoV 379‐395 CYGVSPTKLNDLCFTNV 382
8239dfcgkGYHLMSfpqaapSARS‐CoV1041‐1057DFCGKGYHLMSFPQSAPN/A
12417eidkGIYQTsnfrvvpsSARS‐CoV307‐323TVEKGIYQTSNFRVQPTN/A
15903ffSTFKCYGVSatklndSARS‐CoV 373‐389 SFSTFKCYGVSPTKLND 382
18161fvfngtswfiTQRNFfsSARS‐CoV1095‐1110FVSNGTHWFVTQRNFYN/A
18515gaalqipFAMQMAYRFnSARS‐CoV 891‐907 GAALQIPFAMQMAYRFN N/A
21464gnliaprGYFKIrsgkssimInfluenza A virus192‐211FVFKNIDGYFKIYSKHTPIN211
22321gsFCTQLnSARS‐CoV 757‐764 GSFCTQLN N/A
24978htssmRGVYYPDeifrsSARS‐CoV29‐45TNSFTRGVYYPDKVFRS29
25250IADYNYKLpddfmgcvlSARS‐CoV 418‐434 IADYNYKLPDDFTGCVI N/A
25293iaglIAIVMvtillccmSARS‐CoV1221‐1237IAGLIAIVMVTIMLCCM1237
25378iapgqtgvIADYNYKLpSARS‐CoV410‐426IAPGQTGKIADYNYKLPN/A
25382iaprGYFKIrngkssimrsdapigtcssecitInfluenza A virus195‐226KNIDGYFKIYSKHTPINLVRDLPQGFSALEPL211, 215, 220
29728iywtivkpgdillinstgnliaprGYFKIrnInfluenza A virus175‐205FLMDLEGKQGNFKNLREFVFKNIDGYFKIYS180, 181
30987kGIYQTsnSARS‐CoV 310‐317 KGIYQTSN N/A
30988kGIYQTsnfrvvpsgdvvrfSARS‐CoV310‐329KGIYQTSNFRVQPTESIVRFN/A
31581kkisnCVADYsvlynstSARS‐CoV 356‐372 KRISNCVADYSVLYNSA N/A
31582kkisnCVADYsvlynstfSARS‐CoV 356‐373 KRISNCVADYSVLYNSAS N/A
33305ksfeidkGIYQTsnfrvvSARS‐CoV304‐321KSFTVEKGIYQTSNFRVQN/A
33358ksivAYTMSlgadssiaSARS‐CoV690‐706QSIIAYTMSLGAENSVA690, 691, 701
33874kTSVDCnMYICGDSTECSARS‐CoV733‐749KTSVDCTMYICGDSTECN/A
36579liknqCVNFNfngltgtSARS‐CoV533‐549LVKNKCVNFNFNGLTGTN/A
36815lkcsvksfeidkGIYQTSARS‐CoV299‐315TKCTLKSFTVEKGIYQTN/A
36856lkgacscgsCCKFDeddSARS‐CoV1244‐1260LKGCCSCGSCCKFDEDDN/A
37758llrstsqksivAYTMSlSARS‐CoV683‐699RARSVASQSIIAYTMSL688, 690, 691
39023lqygsFCTQLnralsgiSARS‐CoV754‐770LQYGSFCTQLNRALTGI769
41177MAYRFNGIgvtqnvlyeSARS‐CoV902‐918MAYRFNGIGVTQNVLYEN/A
42999mvtilLCCMTSCCsclkSARS‐CoV1229‐1245MVTIMLCCMTSCCSCLK1237
43145nafnCTFEYisdafsldSARS‐CoV162‐178SANNCTFEYVSQPFLMDN/A
46379nvfqtqagclIGAEHvdSARS‐CoV641‐657NVFQTRAGCLIGAEHVN653
46822PAICHegkayfpregvfvfngtswfitqrnffsSARS‐CoV1079‐1111PAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEN/A
47479pFAMQMAYRFNGIgvtqSARS‐CoV 897‐913 PFAMQMAYRFNGIGVTQ N/A
49968pvsmakTSVDCnMYICGdsSARS‐CoV728‐746PVSMTKTSVDCTMYICGDSN/A
50058pwyvwlgfiaglIAIVMSARS‐CoV1213‐1229PWYIWLGFIAGLIAIVMN/A
53202rasanlaatkMSECVlgSARS‐CoV1019‐1035RASANLAATKMSECVLGN/A
54989rnfttaPAICHegkayfSARS‐CoV1073‐1089KNFTTAPAICHDGKAHF1078
58143sgncdvvigiinNTVYDSARS‐CoV1123‐1139SGNCDVVIGIVNNTVYDN/A
58730sivAYTMSlSARS‐CoV691‐699SIIAYTMSL691
61554stdliknqCVNFNfnSARS‐CoV530‐544STNLVKNKCVNFNFNN/A
61598stffSTFKCYGVSatklSARS‐CoV 371‐387 SASFSTFKCYGVSPTKL 382
62872tagWTFGAgaalqipfaSARS‐CoV883‐899TSGWTFGAGAALQIPFAN/A
63309tecanlllqygsFCTQLSARS‐CoV747‐763TECSNLLLQYGSFCTQLN/A
68971vigiinNTVYDplqpelSARS‐CoV1129‐1145VIGIVNNTVYDPLQPELN/A
72205VYYPDeifrsdtlyltqdSARS‐CoV36‐53VYYPDKVFRSSVLHSTQDN/A
74173yicgDSTECanlllqygSARS‐CoV741‐757YICGDSTECSNLLLQYGN/A
75920ysvlynstffSTFKCYGSARS‐CoV 365‐381 YSVLYNSASFSTFKCYG N/A
99918CTFEYisdafsldSARS‐CoV166‐178CTFEYVSQPFLMDN/A
100048gaalqipFAMQMAYRFSARS‐CoV891‐906GAALQIPFAMQMAYRFN/A
100230ksivAYTMSlgadssiaySARS‐CoV690‐707QSIIAYTMSLGAENSVAY690, 691, 701
100300MAYRFNGIgvtqnvlySARS‐CoV902‐917MAYRFNGIGVTQNVLYN/A
100316nafnCTFEYisdafsldvSARS‐CoV162‐179SANNCTFEYVSQPFLMDLN/A
100537swfiTQRNFfspqiiSARS‐CoV1101‐1115HWFVTQRNFYEPQIIN/A
100711agclIGAEHvdtsyecdiSARS‐CoV647‐664AGCLIGAEHVNNSYECDI653, 660
129239liaprGYFKIrsgkssiInfluenza A virus194‐210FKNIDGYFKIYSKHTPIN/A
532052gtswfiTQRNFfspqSARS‐CoV1099‐1113GTHWFVTQRNFYEPQN/A
873061mmcehiyytcvrTSVDCcHuman herpes virus 6B722‐739VTTEILPVSMTKTSVDCTN/A
874104ytcvrTSVDCcmkgaepHuman herpes virus 6B729‐745VSMTKTSVDCTMYICGDN/A

Note: In the table, the capitalized amino acid sequences were the sequences in SARS‐CoV‐2 spike protein. The underlined peptides partially overlaped with CR3022 epitope published in Science by Yuan et al.  The bolded peptides were completely consistent with the corresponding epitope sequences in the IEDB database.

Sequence alignment of 66 epitopes in IEDB database to SARS‐CoV‐2 spike protein Note: In the table, the capitalized amino acid sequences were the sequences in SARS‐CoV‐2 spike protein. The underlined peptides partially overlaped with CR3022 epitope published in Science by Yuan et al.  The bolded peptides were completely consistent with the corresponding epitope sequences in the IEDB database. After analyzing the SNPs of 1218 SARS‐CoV‐2 clinical isolates of spike protein, we found a total of 52 nonsynonymous mutation sites that occurred twice or more, which were considered to be easily mutated and are marked in Figure 1A. The D614G mutation occurred the most and appeared in 1101 SARS‐CoV‐2 clinical isolates. The D614G mutation was also discovered by Korber et al.,  and might lead to the change of SARS‐CoV‐2 virulence, but further research is needed. We checked the obtained 66 peptide sequences of SARS‐CoV‐2 to determine whether they contained easily mutated sites, and the peptides containing easily mutated sites should be noted in subsequent screening. Finally, 21 peptides containing easily mutated sites were found and are shown in Table 1. Peptides 15‐44, 195‐226, 683‐699, 690‐706, and 690‐707 even contained more than two easily mutated sites, and should not be considered as vaccine epitopes.
Figure 1

Mutation analysis of severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) spike protein and prediction of epitope population coverage. (A) The amino acid sites with nonsynonymous mutations of spike protein in 1218 clinical isolates of SARS‐CoV‐2. The amino acid sites with nonsynonymous mutations appeared twice or more in 1218 isolates, which were considered to be easily mutated and marked in the figure. We totally found 52 easily mutated sites. (B) Prediction of population coverage rates of 28 epitopes in SARS‐CoV‐2 spike protein

Mutation analysis of severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) spike protein and prediction of epitope population coverage. (A) The amino acid sites with nonsynonymous mutations of spike protein in 1218 clinical isolates of SARS‐CoV‐2. The amino acid sites with nonsynonymous mutations appeared twice or more in 1218 isolates, which were considered to be easily mutated and marked in the figure. We totally found 52 easily mutated sites. (B) Prediction of population coverage rates of 28 epitopes in SARS‐CoV‐2 spike protein

Prediction of protective antigen, toxicity, allergenicity, and cytokine secretion of the 66 peptides

The prediction results of protective antigen, toxicity, allergenicity, and cytokine secretion of the 66 peptides are shown in Table 2. There were 26 peptides without immune protection (score lower than 0.4 in analysis tool), 6 peptides with toxicity (score higher than 0 in analysis tool), and 19 peptides with allergenicity. There were 28 epitopes that had the ability to induce IFN‐γ secretion, 42 epitopes had the ability to induce IL‐4 secretion, and 24 epitopes had the ability to induce IL‐10 secretion. After removing the nonimmunoprotective, toxic, or allergenic peptides, there were 28 remaining peptides as candidate epitopes for further screening. Among the 28 epitopes, only 897‐913, 899‐906, and 1025‐1041 epitopes were completely consistent with the sequences in the IEDB database, and only 371‐387 and 379‐395 epitopes partially overlapped with CR3022 epitope of SARS‐CoV‐2. Moreover, 371‐387, 379‐395, and 410‐426 epitopes exist in the binding region of spike protein and ACE2, which might be the important candidate vaccine targets. These six epitopes would be noted in the subsequent screening.
Table 2

Prediction of protective antigen, toxicity, allergenicity, and cytokine secretion of 66 peptides in SARS‐CoV‐2 spike protein

Position in spike proteinAmino acid sequence in spike proteinProtective antigen predictionToxicity predictionAllergenicity predictionIFN‐γ predictionIL‐4 predictionIL‐10 prediction
875‐891SALLAGTITSGWTFGAGNonantigenNontoxinAllergenInducerNoninducerNoninducer
1025‐1041 AATKMSECVLGQSKRVD AntigenNontoxinNonallergenNoninducerInducerNoninducer
647‐663AGCLIGAEHVNNSYECDAntigenToxinNonallergenInducerInducerInducer
899‐906 AMQMAYRF AntigenNontoxinNonallergenInducerInducerNoninducer
749‐771CSNLLLQYGSFCTQLNRALTGIAAntigenNontoxinNonallergenInducerNoninducerInducer
525‐566CGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGAntigenNontoxinNonallergenN/ANoninducerInducer
525‐574CGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAntigenNontoxinNonallergenN/ANoninducerInducer
291‐325CALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESAntigenNontoxinNonallergenN/AInducerInducer
15‐44CVNLTTRTQLPPAYTNSFTRGVYYPDKVFRAntigenNontoxinNonallergenInducerInducerInducer
379‐395 CYGVSPTKLNDLCFTNV AntigenNontoxinNonallergenNon‐InducerInducerNoninducer
1041‐1057DFCGKGYHLMSFPQSAPNonantigenNontoxinNonallergenInducerInducerNoninducer
307‐323TVEKGIYQTSNFRVQPTAntigenNontoxinNonallergenInducerInducerNoninducer
373‐389 SFSTFKCYGVSPTKLND AntigenNontoxinAllergenNoninducerInducerNoninducer
1095‐1110FVSNGTHWFVTQRNFYNonantigenNontoxinAllergenNoninducerInducerNoninducer
891‐907 GAALQIPFAMQMAYRFN AntigenNontoxinAllergenNoninducerInducerNoninducer
192‐211FVFKNIDGYFKIYSKHTPINAntigenNontoxinAllergenInducerInducerInducer
757‐764 GSFCTQLN AntigenNontoxinAllergenInducerNoninducerNoninducer
29‐45TNSFTRGVYYPDKVFRSNonantigenNontoxinAllergenInducerNoninducerInducer
418‐434 IADYNYKLPDDFTGCVI AntigenNontoxinAllergenNon‐InducerInducerNoninducer
1221‐1237IAGLIAIVMVTIMLCCMAntigenToxinAllergenInducerInducerInducer
410‐426IAPGQTGKIADYNYKLPAntigenNontoxinNonallergenInducerInducerNoninducer
195‐226KNIDGYFKIYSKHTPINLVRDLPQGFSALEPLAntigenNontoxinNonallergenN/ANoninducerInducer
175‐205FLMDLEGKQGNFKNLREFVFKNIDGYFKIYSNonantigenNontoxinNonallergenN/AInducerInducer
310‐317 KGIYQTSN NonantigenNontoxinAllergenInducerInducerNoninducer
310‐329KGIYQTSNFRVQPTESIVRFNonantigenNontoxinNonallergenInducerInducerNoninducer
356‐372 KRISNCVADYSVLYNSA NonantigenNontoxinNonallergenNon‐InducerInducerInducer
356‐373 KRISNCVADYSVLYNSAS NonantigenNontoxinNonallergenNon‐InducerInducerInducer
304‐321KSFTVEKGIYQTSNFRVQNonantigenNontoxinNonallergenInducerInducerNoninducer
690‐706QSIIAYTMSLGAENSVAAntigenNontoxinNonallergenInducerNoninducerNoninducer
733‐749KTSVDCTMYICGDSTECNonantigenToxinNonallergenNon‐InducerNoninducerNoninducer
533‐549LVKNKCVNFNFNGLTGTAntigenNontoxinNonallergenNon‐InducerInducerNoninducer
299‐315TKCTLKSFTVEKGIYQTNonantigenNontoxinAllergenInducerInducerNoninducer
1244‐1260LKGCCSCGSCCKFDEDDNonantigenToxinNonallergenNoninducerInducerNoninducer
683‐699RARSVASQSIIAYTMSLAntigenNontoxinNonallergenInducerNoninducerNoninducer
754‐770LQYGSFCTQLNRALTGIAntigenNontoxinNonallergenInducerNoninducerNoninducer
902‐918MAYRFNGIGVTQNVLYEAntigenNontoxinNonallergenNoninducerNoninducerNoninducer
1229‐1245MVTIMLCCMTSCCSCLKNonantigenToxinNonallergenNoninducerInducerInducer
162‐178SANNCTFEYVSQPFLMDNonantigenNontoxinNonallergenNoninducerInducerNoninducer
641‐657NVFQTRAGCLIGAEHVNAntigenNontoxinAllergenNoninducerNoninducerInducer
1079‐1111PAICHDGKAHFPREGVFVSNGTHWFVTQRNFYENonantigenNontoxinNonallergenN/AInducerInducer
897‐913 PFAMQMAYRFNGIGVTQ AntigenNontoxinNonallergenNoninducerInducerNoninducer
728‐746PVSMTKTSVDCTMYICGDSNonantigenNontoxinAllergenNoninducerNoninducerNoninducer
1213‐1229PWYIWLGFIAGLIAIVMAntigenNontoxinNonallergenNoninducerNoninducerInducer
1019‐1035RASANLAATKMSECVLGAntigenNontoxinNonallergenInducerNoninducerNoninducer
1073‐1089KNFTTAPAICHDGKAHFNonantigenNontoxinAllergenInducerInducerInducer
1123‐1139SGNCDVVIGIVNNTVYDAntigenNontoxinAllergenNoninducerNoninducerNoninducer
691‐699SIIAYTMSLAntigenNontoxinAllergenNoninducerNoninducerNoninducer
530‐544STNLVKNKCVNFNFNAntigenNontoxinNonallergenNoninducerInducerNoninducer
371‐387 SASFSTFKCYGVSPTKL AntigenNontoxinNonallergenNoninducerInducerNoninducer
883‐899TSGWTFGAGAALQIPFANonantigenNontoxinNonallergenNoninducerInducerNoninducer
747‐763TECSNLLLQYGSFCTQLAntigenNontoxinNonallergenInducerNoninducerInducer
1129‐1145VIGIVNNTVYDPLQPELAntigenNontoxinNonallergenNoninducerInducerInducer
36‐53VYYPDKVFRSSVLHSTQDNonantigenNontoxinNonallergenInducerNoninducerInducer
741‐757YICGDSTECSNLLLQYGNonantigenNontoxinAllergenInducerNoninducerNoninducer
365‐381 YSVLYNSASFSTFKCYG NonantigenNontoxinNonallergenInducerInducerNoninducer
166‐178CTFEYVSQPFLMDNonantigenNontoxinAllergenNoninducerInducerNoninducer
891‐906GAALQIPFAMQMAYRFAntigenNontoxinNonallergenNoninducerInducerNoninducer
690‐707QSIIAYTMSLGAENSVAYAntigenNontoxinAllergenInducerNoninducerNoninducer
902‐917MAYRFNGIGVTQNVLYAntigenNontoxinNonallergenNoninducerNoninducerNoninducer
162‐179SANNCTFEYVSQPFLMDLNonantigenNontoxinNonallergenInducerInducerNoninducer
1101‐1115HWFVTQRNFYEPQIIAntigenNontoxinNonallergenNoninducerInducerNoninducer
647‐664AGCLIGAEHVNNSYECDIAntigenToxinNonallergenInducerInducerInducer
194‐210FKNIDGYFKIYSKHTPIAntigenNontoxinNonallergenNoninducerInducerInducer
1099‐1113GTHWFVTQRNFYEPQNonantigenNontoxinNonallergenNoninducerInducerNoninducer
722‐739VTTEILPVSMTKTSVDCTAntigenNontoxinNonallergenNoninducerInducerInducer
729‐745VSMTKTSVDCTMYICGDNonantigenNontoxinNonallergenNoninducerNoninducerNoninducer

Note: In the table, underlined peptides partially overlaped with CR3022 epitope published in Science by Yuan et al.  The bolded peptides were consistent with the corresponding epitope sequences in IEDB database. N/A meant undetectable because the peptide length was beyond the range of the analysis system (≤30 amino acids).

Prediction of protective antigen, toxicity, allergenicity, and cytokine secretion of 66 peptides in SARS‐CoV‐2 spike protein Note: In the table, underlined peptides partially overlaped with CR3022 epitope published in Science by Yuan et al.  The bolded peptides were consistent with the corresponding epitope sequences in IEDB database. N/A meant undetectable because the peptide length was beyond the range of the analysis system (≤30 amino acids).

Prediction of solvent accessibility, B‐ and T‐cell epitopes, and population coverage rates

The solvent accessibility prediction results of spike protein and the remaining 28 epitopes are shown in Figure S1, and the average solvent accessibility scores of amino acids for the 28 epitopes are shown in Table 3. There were 15 epitopes with an average solvent accessibility score higher than 20, which might be considered as vaccine candidates. The prediction results of B‐, T‐cell epitopes, and HLA class I and class II molecules identified by the 28 epitopes are shown in Table 3. Except that the amino acid sequence of 899‐906 epitope was too short to predict, all the other 27 epitopes were predicted to contain B‐cell epitopes, which might induce the production of neutralizing antibodies. The analysis results also suggested that the 28 epitopes belonged to T‐cell epitopes, 25 of which could recognize HLA class I and class II molecules, two of which could only recognize HLA class I molecules, and one of which could only recognize HLA class II molecules. However, among the six epitopes we focused on, only 371‐387, 379‐395, and 897‐913 could recognize a certain number of HLA class I and class II molecules. The epitope 410‐426, 899‐906, and 1025‐1041 could only recognize HLA class I molecules or class II molecules. The population coverage rates of HLA class I and class II molecules recognized by the 28 epitopes in different populations around the world are shown in Figure 1B. The highest population coverage rate of each epitope was found in Europe and North America, followed by East Asia and Oceania, and the population coverage rates of all epitopes in Africa populations were lower than in other populations. Among the 28 epitopes, 19 epitopes had a world population coverage rate of more than 50%. They were 15‐44, 194‐210, 195‐226, 291‐325, 307‐323, 371‐387, 410‐426, 525‐566, 525‐574, 683‐699, 690‐706, 722‐739, 747‐763, 749‐771, 754‐770, 891‐906, 897‐913, 1129‐1145, and 1213‐1229.
Table 3

Prediction of B cell and T cell epitopes of 28 epitopes in SARS‐CoV‐2 spike protein

Position in spike proteinAmino acid sequence in spike proteinAverage SOA scoreB cell epitope predictionT cell epitope prediction
ABCpredIEDBHLA class I moleculeHLA class II molecule
1025‐1041 AATKMSECVLGQSKRVD 12.332‐177‐14N/AHLA‐DRB1 0101
899‐906 AMQMAYRF 4.06N/A0HLA‐A2402N/A
749‐771CSNLLLQYGSFCTQLNRALTGIA22.792‐1717

HLA‐A0301  HLA‐A2402

HLA‐B0801  HLA‐B1501

HLA‐B2705  HLA‐B3901

HLA‐DRB1 0101  HLA‐DRB1 0301

HLA‐DRB1 0401  HLA‐DRB1 0701

HLA‐DRB1 1501

525‐566CGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFG25.591‐375‐39

HLA‐A0101  HLA‐A0301

HLA‐A1101  HLA‐A2601

HLA‐B0702  HLA‐B0801

HLA‐B1501  HLA‐B3901

HLA‐B3902  HLA‐B5101

HLA‐DRB1 0101  HLA‐DRB1 0301

HLA‐DRB1 0401  HLA‐DRB1 0404

HLA‐DRB1 0701  HLA‐DRB1 1501

525‐574CGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTD24.712‐21, 24‐437‐46

HLA‐A0101  HLA‐A0301

HLA‐A1101  HLA‐A2601

HLA‐B0702  HLA‐B0801

HLA‐B1501  HLA‐B3901

HLA‐B3902  HLA‐B5101

HLA‐DRB1 0101  HLA‐DRB1 0301

HLA‐DRB1 0401  HLA‐DRB1 0404

HLA‐DRB1 0701  HLA‐DRB1 1101

HLA‐DRB1 1501

291‐325CALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTES32.242‐2818‐32

HLA‐A0301  HLA‐A1101

HLA‐A2402  HLA‐A2601

HLA‐B5801

HLA‐DRB1 0101  HLA‐DRB1 0401

HLA‐DRB1 0701  HLA‐DRB1 1101

HLA‐DRB1 1501

15‐44CVNLTTRTQLPPAYTNSFTRGVYYPDKVFR31.385‐245‐26

HLA‐A0101  HLA‐A0201

HLA‐A0301  HLA‐A2601

HLA‐B0702  HLA‐B1501

HLA‐B1516  HLA‐B2705

HLA‐DRB1 0101  HLA‐DRB1 0401

HLA‐DRB1 0701

379‐395 CYGVSPTKLNDLCFTNV 15.481‐166‐13HLA‐A2402HLA‐DRB1 0301  HLA‐DRB1 0701
307‐323TVEKGIYQTSNFRVQPT36.141‐166‐14

HLA‐A0301  HLA‐A2402

HLA‐B5801

HLA‐DRB1 0701  HLA‐DRB1 1101

HLA‐DRB1 1501

410‐426IAPGQTGKIADYNYKLP8.612‐176‐13

HLA‐A0201  HLA‐B0702

HLA‐B1501  HLA‐B5101

N/A
195‐226KNIDGYFKIYSKHTPINLVRDLPQGFSALEPL30.243‐228‐28

HLA‐A0101  HLA‐A0201

HLA‐A2601  HLA‐B0702

HLA‐B0801  HLA‐B1501

HLA‐B3501  HLA‐B3901

HLA‐B5802

HLA‐DRB1 0101  HLA‐DRB1 0401

HLA‐DRB1 0404  HLA‐DRB1 0701

HLA‐DRB1 1101  HLA‐DRB1 1501

HLA‐DQA1 0301  HLA‐DQB1 0302

690‐706QSIIAYTMSLGAENSVA30.171‐166‐13

HLA‐A0201  HLA‐A2601

HLA‐B0702  HLA‐B0801

HLA‐B1501  HLA‐B1516

HLA‐B3901  HLA‐B5801

HLA‐DRB1 0101  HLA‐DRB1 0401

HLA‐DRB1 0402  HLA‐DRB1 0701

HLA‐DRB1 1101  HLA‐DRB1 1501

533‐549LVKNKCVNFNFNGLTGT19.674‐178‐13HLA‐B0801  HLA‐B1501HLA‐DRB1 0404  HLA‐DRB1 1501
683‐699RARSVASQSIIAYTMSLN/A2‐156‐13

HLA‐A0101  HLA‐A0201

HLA‐A2601  HLA‐B0702

HLA‐B0801  HLA‐B1501

HLA‐B1516  HLA‐B2705

HLA‐B3901  HLA‐B4001

HLA‐B5801  HLA‐B5802

HLA‐DRB1 0101  HLA‐DRB1 0301
754‐770LQYGSFCTQLNRALTGI25.032‐150

HLA‐A0201  HLA‐A0301

HLA‐A1101  HLA‐B1501

HLA‐A2402  HLA‐B2705

HLA‐B3901

HLA‐DRB1 0101  HLA‐DRB1 0301

HLA‐DRB1 0401  HLA‐DRB1 0701

902‐918MAYRFNGIGVTQNVLYE12.312‐156‐13HLA‐B2705

HLA‐DRB1 0401  HLA‐DRB1 0402

HLA‐DRB1 0404  HLA‐DRB1 1501

897‐913 PFAMQMAYRFNGIGVTQ 12.192‐1710, 12‐13

HLA‐A2402  HLA‐A2601

HLA‐B0801  HLA‐B1501

HLA‐B2705  HLA‐B3901

HLA‐B5801

HLA‐DRB1 0101  HLA‐DRB1 0401

HLA‐DRB1 0402  HLA‐DRB1 1501

1213‐1229PWYIWLGFIAGLIAIVMN/A1‐160

HLA‐A0201  HLA‐A2402

HLA‐A2601  HLA‐B3901

HLA‐DRB1 0101  HLA‐DRB1 0301

HLA‐DRB1 0401  HLA‐DRB1 0402

HLA‐DRB1 0701  HLA‐DRB1 1101

HLA‐DRB1 1501

1019‐1035RASANLAATKMSECVLG14.312‐178‐13

HLA‐A0301  HLA‐A1101

HLA‐B5801

HLA‐DRB1 0101
530‐544STNLVKNKCVNFNFN28.433‐146‐11HLA‐B0801  HLA‐B1501HLA‐DRB1 0301
371‐387 SASFSTFKCYGVSPTKL 24.573‐168‐13

HLA‐A0101  HLA‐A0301

HLA‐A2402  HLA‐B1501

HLA‐B3901  HLA‐B5801

HLA‐DRB1 0401  HLA‐DRB1 1501
747‐763TECSNLLLQYGSFCTQL26.541‐167‐13

HLA‐A0101  HLA‐A2402

HLA‐B0801  HLA‐B1501

HLA‐B3901

HLA‐DRB1 0101  HLA‐DRB1 1501
1129‐1145VIGIVNNTVYDPLQPEL37.571‐165‐13HLA‐A0201  HLA‐A2402HLA‐DRB1 0701  HLA‐DRB1 0801
891‐906GAALQIPFAMQMAYRF9.671‐140

HLA‐A2402  HLA‐A2601

HLA‐B0801  HLA‐B1501

HLA‐B3501  HLA‐B3901

HLA‐B4001  HLA‐B5301

HLA‐B5801

HLA‐DRB1 0101  HLA‐DRB1 0401

HLA‐DRB1 0404  HLA‐DRB1 0701

HLA‐DRB1 1501

902‐917MAYRFNGIGVTQNVLY11.382‐155‐12HLA‐B2705

HLA‐DRB1 0401  HLA‐DRB1 0402

HLA‐DRB1 0404  HLA‐DRB1 1501

1101‐1115HWFVTQRNFYEPQII23.552‐135‐11

HLA‐A2402  HLA‐A2601

HLA‐B2705

HLA‐DRB1 0801
194‐210FKNIDGYFKIYSKHTPI22.721‐169‐13

HLA‐A0101  HLA‐A0201

HLA‐B0702  HLA‐B0801

HLA‐B3901

HLA‐DRB1 1501  HLA‐DQA1 0301

HLA‐DQB1 0302

722‐739VTTEILPVSMTKTSVDCT13.031‐169‐14

HLA‐A0101  HLA‐A0301

HLA‐B1516  HLA‐B3901

HLA‐DRB1 0101  HLA‐DRB1 0701

Note: In the table, the underlined epitopes overlaped with CR3022 epitope published in Science by Yuan et al. The bolded epitopes were consistent with the corresponding epitope sequences in IEDB database. The prediction results of T‐cell epitope and MHC binding combined the results of three analysis tools NetMHC 4.0 Sever, Rankpep and SYFPEITHI. N/A meant undetectable. HLA, human lymphocyte antigen; SOA, solvent accessibility.

Prediction of B cell and T cell epitopes of 28 epitopes in SARS‐CoV‐2 spike protein HLA‐A0301  HLA‐A2402 HLA‐B0801  HLA‐B1501 HLA‐B2705  HLA‐B3901 HLA‐DRB1 0101  HLA‐DRB1 0301 HLA‐DRB1 0401  HLA‐DRB1 0701 HLA‐DRB1 1501 HLA‐A0101  HLA‐A0301 HLA‐A1101  HLA‐A2601 HLA‐B0702  HLA‐B0801 HLA‐B1501  HLA‐B3901 HLA‐B3902  HLA‐B5101 HLA‐DRB1 0101  HLA‐DRB1 0301 HLA‐DRB1 0401  HLA‐DRB1 0404 HLA‐DRB1 0701  HLA‐DRB1 1501 HLA‐A0101  HLA‐A0301 HLA‐A1101  HLA‐A2601 HLA‐B0702  HLA‐B0801 HLA‐B1501  HLA‐B3901 HLA‐B3902  HLA‐B5101 HLA‐DRB1 0101  HLA‐DRB1 0301 HLA‐DRB1 0401  HLA‐DRB1 0404 HLA‐DRB1 0701  HLA‐DRB1 1101 HLA‐DRB1 1501 HLA‐A0301  HLA‐A1101 HLA‐A2402  HLA‐A2601 HLA‐B5801 HLA‐DRB1 0101  HLA‐DRB1 0401 HLA‐DRB1 0701  HLA‐DRB1 1101 HLA‐DRB1 1501 HLA‐A0101  HLA‐A0201 HLA‐A0301  HLA‐A2601 HLA‐B0702  HLA‐B1501 HLA‐B1516  HLA‐B2705 HLA‐DRB1 0101  HLA‐DRB1 0401 HLA‐DRB1 0701 HLA‐A0301  HLA‐A2402 HLA‐B5801 HLA‐DRB1 0701  HLA‐DRB1 1101 HLA‐DRB1 1501 HLA‐A0201  HLA‐B0702 HLA‐B1501  HLA‐B5101 HLA‐A0101  HLA‐A0201 HLA‐A2601  HLA‐B0702 HLA‐B0801  HLA‐B1501 HLA‐B3501  HLA‐B3901 HLA‐B5802 HLA‐DRB1 0101  HLA‐DRB1 0401 HLA‐DRB1 0404  HLA‐DRB1 0701 HLA‐DRB1 1101  HLA‐DRB1 1501 HLA‐DQA1 0301  HLA‐DQB1 0302 HLA‐A0201  HLA‐A2601 HLA‐B0702  HLA‐B0801 HLA‐B1501  HLA‐B1516 HLA‐B3901  HLA‐B5801 HLA‐DRB1 0101  HLA‐DRB1 0401 HLA‐DRB1 0402  HLA‐DRB1 0701 HLA‐DRB1 1101  HLA‐DRB1 1501 HLA‐A0101  HLA‐A0201 HLA‐A2601  HLA‐B0702 HLA‐B0801  HLA‐B1501 HLA‐B1516  HLA‐B2705 HLA‐B3901  HLA‐B4001 HLA‐B5801  HLA‐B5802 HLA‐A0201  HLA‐A0301 HLA‐A1101  HLA‐B1501 HLA‐A2402  HLA‐B2705 HLA‐B3901 HLA‐DRB1 0101  HLA‐DRB1 0301 HLA‐DRB1 0401  HLA‐DRB1 0701 HLA‐DRB1 0401  HLA‐DRB1 0402 HLA‐DRB1 0404  HLA‐DRB1 1501 HLA‐A2402  HLA‐A2601 HLA‐B0801  HLA‐B1501 HLA‐B2705  HLA‐B3901 HLA‐B5801 HLA‐DRB1 0101  HLA‐DRB1 0401 HLA‐DRB1 0402  HLA‐DRB1 1501 HLA‐A0201  HLA‐A2402 HLA‐A2601  HLA‐B3901 HLA‐DRB1 0101  HLA‐DRB1 0301 HLA‐DRB1 0401  HLA‐DRB1 0402 HLA‐DRB1 0701  HLA‐DRB1 1101 HLA‐DRB1 1501 HLA‐A0301  HLA‐A1101 HLA‐B5801 HLA‐A0101  HLA‐A0301 HLA‐A2402  HLA‐B1501 HLA‐B3901  HLA‐B5801 HLA‐A0101  HLA‐A2402 HLA‐B0801  HLA‐B1501 HLA‐B3901 HLA‐A2402  HLA‐A2601 HLA‐B0801  HLA‐B1501 HLA‐B3501  HLA‐B3901 HLA‐B4001  HLA‐B5301 HLA‐B5801 HLA‐DRB1 0101  HLA‐DRB1 0401 HLA‐DRB1 0404  HLA‐DRB1 0701 HLA‐DRB1 1501 HLA‐DRB1 0401  HLA‐DRB1 0402 HLA‐DRB1 0404  HLA‐DRB1 1501 HLA‐A2402  HLA‐A2601 HLA‐B2705 HLA‐A0101  HLA‐A0201 HLA‐B0702  HLA‐B0801 HLA‐B3901 HLA‐DRB1 1501  HLA‐DQA1 0301 HLA‐DQB1 0302 HLA‐A0101  HLA‐A0301 HLA‐B1516  HLA‐B3901 Note: In the table, the underlined epitopes overlaped with CR3022 epitope published in Science by Yuan et al. The bolded epitopes were consistent with the corresponding epitope sequences in IEDB database. The prediction results of T‐cell epitope and MHC binding combined the results of three analysis tools NetMHC 4.0 Sever, Rankpep and SYFPEITHI. N/A meant undetectable. HLA, human lymphocyte antigen; SOA, solvent accessibility.

Screening vaccine candidate epitopes

Combined with the prediction results, among the 28 epitopes, epitopes with an average accessibility score of more than 20 or a world population coverage rate of more than 50% were selected. Therefore, a total of 21 epitopes were selected. However, among the 21 epitopes, eight of them (15‐44, 195‐226, 371‐387, 525‐574, 683‐699, 690‐706, 749‐771, and 754‐770) had easily mutated sites. Considering the importance of the 371‐387 epitope and the relatively few mutations of 749‐771 and 754‐770 epitopes, these three epitopes were retained. Finally, 16 epitopes were selected for vaccine preparation, they were 194‐210, 291‐325, 307‐323, 371‐387, 410‐426, 525‐566, 530‐544, 722‐739, 747‐763, 749‐771, 754‐770, 891‐906, 897‐913, 1101‐1115, 1129‐1145, and 1213‐1229. The 16 epitopes were relatively conservative, immunological, nontoxic, and nonallergenic, and could induce the secretion of cytokines, and more likely to be exposed on the surface of the spike protein. They were both B‐ and T‐cell epitopes, could recognize a certain number of HLA molecules, and their population coverage rates in the world were more than 50%.

Vaccine candidate sequences acquisition and general analysis

The 16 candidate epitopes were eventually merged into 11 peptides and connected with different linkers to obtain vaccine candidate sequences. The schematic diagram of tandem sequences of the 11 peptides is shown in Figure 2A. After the analysis of the candidate sequences by PredictProtein, DNAStar, and Expasy ProtParam tool, their secondary structure and surface properties were obtained. The number of amino acids with no linker, linker GGGGS, GGGSGGG, EAAAK, GPGPG, AAY, and KK were 243aa, 293aa, 313aa, 293aa, 293aa, 273aa, and 263aa, respectively. Their molecular weights were 27046.39 Da, 30199.25 Da, 31340.29 Da, 31751.62 Da, 30700.29 Da, 30099.73 Da, and 29609.79 Da, respectively. The isoelectric points of the sequences were 8.84, 8.84, 8.84, 8.76, 8.84, 8.78, and 9.85, respectively. As the N‐terminal amino acids of the sequences were all phenylalanine (F), their half‐lives were the same. Their estimated half‐life were: 1.1 h (mammalian reticulocytes, in vitro), 3 min (yeast, in vivo) and 3 min (Escherichia coli, in vivo). Therefore, a methionine (M) was considered to add at the N‐terminus of each of the sequences to extend the half‐life of the protein. Moreover, the instability index of the seven sequences was 27.39, 40.80, 38.52, 27.54, 23.62, 25.33, and 22.02, which suggested that the protein with linker GGGGS was classified as unstable and the other proteins were classified as stable.
Figure 2

Schematic diagram of the tandem sequence of 11 vaccine candidate peptides and sequence analysis after connecting with different linkers. (A) Schematic diagram of tandem sequences of 11 vaccine candidate peptides. Peptides 897‐913 were predicted possible cross‐immunoprotection for SARS‐CoV and SARS‐CoV‐2. Peptides 371‐387 and 747‐771 contained easily mutated sites. (B) Amino acid composition, secondary structure composition, and solvent accessibility analysis of different vaccine candidate sequences

Schematic diagram of the tandem sequence of 11 vaccine candidate peptides and sequence analysis after connecting with different linkers. (A) Schematic diagram of tandem sequences of 11 vaccine candidate peptides. Peptides 897‐913 were predicted possible cross‐immunoprotection for SARS‐CoV and SARS‐CoV‐2. Peptides 371‐387 and 747‐771 contained easily mutated sites. (B) Amino acid composition, secondary structure composition, and solvent accessibility analysis of different vaccine candidate sequences The analysis results of amino acid composition, secondary structure composition, and solvent accessibility of the candidate sequences are shown in Figure 2B. We found that the addition of different linkers changed the secondary structure composition of the proteins, especially the GPGPG and AAY linkers increased the Loop structure. Loops are irregular structures that connect two secondary structure elements in proteins, and they often play important roles in function, including enzyme reactions and ligand binding. Moreover, the addition of linkers also changed the solvent accessibility of the proteins. The addition of these six linkers increased the solvent accessibility of the proteins, and all exposed more amino acids on the protein surface. The flexibility and antigenic index results of the sequences were shown in Figure 3. Compared with the results of no linker sequence, except for linker AAY, the addition of other linkers all increased the flexibility and antigenic index of the sequences.
Figure 3

Analysis of flexibility and antigenic index of different vaccine candidate sequences

Analysis of flexibility and antigenic index of different vaccine candidate sequences

Functional analysis of vaccine candidate sequences

The prediction results of gene ontology terms of the sequences were shown in supplementary material Figure S2. The number of molecular function ontology and cell composition ontology of the protein sequences was changed by different linkers, and the specific results of molecular function ontology prediction for the sequences are shown in supplementary material Figure S3‐S9. However, protein sequences connected by different linkers had almost no effect on biological process ontology. Interestingly, the number of molecular function ontology or cell composition ontology of sequence with linker GGGGS and GPGPG was more than that of the other sequences, indicating that the use of GGGGS or GPGPG linker might increase some biological activities of the protein. However, the previous protein stability prediction showed that the protein with the GGGGS linker was unstable, so the GGGGS linker was not a good choice.

Comprehensive selection of vaccine candidate sequences

Based on the above analysis, the protein sequence with linker GGGGS was predicted to be unstable and the sequence with linker AAY was predicted to reduce the flexibility and antigenic index of the protein. Therefore, considering the secondary structure, flexibility, antigenic index, solvent accessibility, stability, and function prediction results of the sequences, we finally selected five sequences (without linker, with linker GGGSGGG, EAAAK, GPGPG, and KK, respectively) as the SARS‐CoV‐2 vaccine candidate sequences. These vaccine candidate sequences contained T‐ and B‐cell epitopes exposed on the surface of spike protein and the HLA molecules recognized by the epitopes had high population coverage rates. These sequences were predicted to be stable, with high antigenic index and good biological activity, especially the sequence linked by GPGPG.

DISCUSSION

At present, scientists all over the world are stepping up the research into the COVID‐19 vaccine. According to the Draft landscape of COVID‐19 candidate vaccines‐7 July 2020 published by the World Health Organization (WHO), 21 candidate vaccines of COVID‐19 had been approved for clinical trials. These included five RNA vaccines, four inactivated vaccines, four DNA vaccines, four protein subunit vaccines, three viral vector vaccines, and one plant‐derived virus‐like particle vaccine. The vaccine in this study belonged to the class of epitope vaccine and peptide vaccine, increasing the diversity of COVID‐19 vaccine types. Previous studies on vaccines for other infectious diseases showed that different types of vaccines had their own limitations. In most cases, the immune effects of combining different types of vaccines were stronger than that of a single vaccine alone, , and this situation had also appeared in SARS‐CoV vaccine research. , Therefore, we recommend in future research and development of COVID‐19 vaccines, considering the diversity of vaccine types, combining the advantages and disadvantages of different types of vaccines, and using different vaccines for immunization, and carrying out research on heterologous prime‐boost vaccines. Vaccine design is a complex issue with many factors to consider, the most important of which is the safety and effectiveness of the vaccine. When screening candidate epitopes in our study, nonsynonymous mutation sites in the sequence were considered to ensure that the candidate epitopes did not contain easily mutated sites to avoid affecting antigen recognition. , The toxicity and allergenicity of epitopes were considered to ensure the safety of the epitopes. , The immunogenicity of antigens, the secretion of cytokines, the solvent accessibility of amino acids, and the recognition of MHC molecules were considered to ensure the effectiveness of the epitopes. , , , The coverage of epitopes in different populations was also considered to ensure the effectiveness of the epitopes in most populations. Moreover, when expressing the fusion protein, choosing the appropriate linker is very important for the design of the vaccine candidate sequence. Different linkers have impacts on the correct folding, stability, biological activity, and immunogenicity of proteins. These studies need a lot of experiments to verify. However, the application of immunoinformatics tools to help design vaccine has greatly improved the efficiency and accuracy of epitope screening and the rationality of vaccine design and has been applied to many vaccine research. , In this study, 16 epitopes of spike protein were predicted to be B‐ and T‐cell epitopes and selected as vaccine candidate epitopes for vaccine design. Among them, the epitope 371‐387 partially overlapped with the CR3022 epitope of SARS‐CoV‐2 published in Science by Yuan et al. CR3022 can neutralize SARS‐CoV and is also able to interact with SARS‐CoV‐2. , Moreover, containing the 371‐387 epitope in this study, epitope 375‐394 was observed to stimulate robust secretion of IFN‐γ from splenocytes. Epitopes 375‐394, 525‐646, and 902‐926 with an average positive rate of ≥ 50% (the percentage of convalescent sera from COVID‐19 patients having positive reactions to the epitopes) among all 39 patients contained or overlapped with epitope 371‐387, 525‐566, and 891‐913 in this study. The study of Ferretti et al. reported the epitopes of spike protein recognized by memory CD8+ T cells of patients with COVID‐19 recovery. Among them, the 378‐386 and 1208‐1216 epitopes partially overlap with the 371‐387 and 1213‐1229 epitopes screened in this study. Therefore, some of the epitopes selected in this study had been confirmed to be antigenic epitopes in other studies and showed immune effects. As the multi‐epitope vaccine proposed in this study consists of less than 500 amino acids, we will consider connecting another strong immunogenicity peptide and choosing the appropriate vaccine vectors (such as adenovirus vectors), drug delivery systems (such as PAGL microspheres or liposomes, etc.) and adjuvants (such as TLR receptor adjuvants, etc.) to improve the immune effects of the vaccine. In the previous results, we believed that six epitopes were important. Epitope 897‐913, 899‐906, and 1025‐1041 epitopes were completely consistent with the sequences in the IEDB database. Epitope 371‐387 and 379‐395 partially overlapped with CR3022 epitope of SARS‐CoV‐2. Epitope 371‐387, 379‐395, and 410‐426 exist in the binding region of spike protein and ACE2. However, only 371‐387, 410‐426, and 879‐913 were finally selected as vaccine candidate epitopes. The reasons were that epitope 379‐395 contained the easily mutated sites 382, the average solvent accessibility score of amino acids was only 15.48, and it recognized few HLA molecules. Epitope 899‐906 had an average solvent accessibility score of only 4.06 and cannot recognize HLA class Ⅱ molecules. Epitope 1025‐1041 also had a low average solvent accessibility score (12.33) and cannot recognize HLA class Ⅰ molecule. Therefore, these three epitopes were not selected as vaccine candidates. Another interesting finding was that in the population coverage results of 28 epitopes, the coverage rate of each epitope was high in Europe, North America, East Asia, and Oceania, but low in East Africa, West Africa, South Africa, and Central Africa. We thought this was due to the differences in recognition of HLA molecules by different populations. However, this difference might lead to people in Africa being less protected by the same vaccine than people in Europe, North America, East Asia, and Oceania. Whether it is necessary to prepare a specific vaccine based on the recognition ability of the African population to HLA subclasses in the future remains to be studied.

CONCLUSIONS

According to the results of mutation and immunoinformatics analysis, we finally recommend 16 epitopes (194‐210, 291‐325, 307‐323, 371‐387, 410‐426, 525‐566, 530‐544, 722‐739, 747‐763, 749‐771, 754‐770, 891‐906, 897‐913, 1101‐1115, 1129‐1145, and 1213‐1229) of spike protein as SARS‐CoV‐2 vaccine candidate epitopes. In particular, epitope 897‐913 was predicted to have possible cross‐immunoprotection for SARS‐CoV and SARS‐CoV‐2. The vaccine candidate sequences (without linker, with linker GGGSGGG, EAAAK, GPGPG, and KK, respectively) were predicted to be relatively stable, with a high antigenic index and good biological activity. We recommended the five sequences as candidate sequences for SARS‐CoV‐2 vaccine. Our next project is to synthesize the gene sequences for cloning and expression to prepare vaccines for SARS‐CoV‐2 and verify their immune effects. The bioinformatics analysis method in our study will greatly improve the accuracy and effectiveness of vaccine epitopes screening and the rationality of vaccine design, and can also be applied to vaccine design for other infectious diseases.

CONFLICT OF INTERESTS

The authors declare that there are no conflict of interests.

AUTHOR CONTRIBUTIONS

Conceptualization: Jinlei He, Jianping Chen, and Jiao Li. Data curation: Jianhui Zhang. Formal analysis: Jinlei He and Fan Huang. Investigation: Qiwei Chen and Dali Chen. Methodology: Jinlei He, Fan Huang, and Jianhui Zhang. Project administration: Zhiwan Zheng and Qi Zhou. Supervision: Jianping Chen and Jiao Li. Writing‐original draft: Jinlei He and Fan Huang. Writing‐review & editing: Jianping Chen and Jiao Li. Supporting information. Click here for additional data file.
  47 in total

1.  SYFPEITHI: database for searching and T-cell epitope prediction.

Authors:  Mathias M Schuler; Maria-Dorothea Nastke; Stefan Stevanovikć
Journal:  Methods Mol Biol       Date:  2007

2.  T-Cell Epitope Prediction.

Authors:  George N Konstantinou
Journal:  Methods Mol Biol       Date:  2017

3.  Principles of Vaccination.

Authors:  Fred Zepp
Journal:  Methods Mol Biol       Date:  2016

4.  Recombinant and epitope-based vaccines on the road to the market and implications for vaccine design and production.

Authors:  Patricio Oyarzún; Bostjan Kobe
Journal:  Hum Vaccin Immunother       Date:  2016-03-03       Impact factor: 3.452

Review 5.  Understanding modern-day vaccines: what you need to know.

Authors:  Volker Vetter; Gülhan Denizer; Leonard R Friedland; Jyothsna Krishnan; Marla Shapiro
Journal:  Ann Med       Date:  2017-11-27       Impact factor: 4.709

6.  PredictProtein--an open resource for online prediction of protein structural and functional features.

Authors:  Guy Yachdav; Edda Kloppmann; Laszlo Kajan; Maximilian Hecht; Tatyana Goldberg; Tobias Hamp; Peter Hönigschmid; Andrea Schafferhans; Manfred Roos; Michael Bernhofer; Lothar Richter; Haim Ashkenazy; Marco Punta; Avner Schlessinger; Yana Bromberg; Reinhard Schneider; Gerrit Vriend; Chris Sander; Nir Ben-Tal; Burkhard Rost
Journal:  Nucleic Acids Res       Date:  2014-05-05       Impact factor: 16.971

7.  A prime-boost vaccination protocol optimizes immune responses against the nucleocapsid protein of the SARS coronavirus.

Authors:  Kia Schulze; Caroline Staib; Hermann M Schätzl; Thomas Ebensen; Volker Erfle; Carlos A Guzman
Journal:  Vaccine       Date:  2008-12-02       Impact factor: 3.641

8.  HLA supertype variation across populations: new insights into the role of natural selection in the evolution of HLA-A and HLA-B polymorphisms.

Authors:  Rodrigo Dos Santos Francisco; Stéphane Buhler; José Manuel Nunes; Bárbara Domingues Bitarello; Gustavo Starvaggi França; Diogo Meyer; Alicia Sanchez-Mazas
Journal:  Immunogenetics       Date:  2015-10-12       Impact factor: 2.846

9.  SARS corona virus peptides recognized by antibodies in the sera of convalescent cases.

Authors:  Jian-Ping Guo; Martin Petric; William Campbell; Patrick L McGeer
Journal:  Virology       Date:  2004-07-01       Impact factor: 3.616

10.  Distinct conformational states of SARS-CoV-2 spike protein.

Authors:  Yongfei Cai; Jun Zhang; Tianshu Xiao; Hanqin Peng; Sarah M Sterling; Richard M Walsh; Shaun Rawson; Sophia Rits-Volloch; Bing Chen
Journal:  Science       Date:  2020-07-21       Impact factor: 47.728

View more
  10 in total

Review 1.  Artificial Intelligence-Based Data-Driven Strategy to Accelerate Research, Development, and Clinical Trials of COVID Vaccine.

Authors:  Ashwani Sharma; Tarun Virmani; Vipluv Pathak; Anjali Sharma; Kamla Pathak; Girish Kumar; Devender Pathak
Journal:  Biomed Res Int       Date:  2022-07-06       Impact factor: 3.246

2.  Consensus Enolase of Trypanosoma Cruzi: Evaluation of Their Immunogenic Properties Using a Bioinformatics Approach.

Authors:  Alejandro Diaz-Hernandez; Maria Cristina Gonzalez-Vazquez; Minerva Arce-Fonseca; Olivia Rodríguez-Morales; Maria Lilia Cedillo-Ramirez; Alejandro Carabarin-Lima
Journal:  Life (Basel)       Date:  2022-05-18

3.  Success of Current COVID-19 Vaccine Strategies vs. the Epitope Topology of SARS-CoV-2 Spike Protein-Receptor Binding Domain (RBD): A Computational Study of RBD Topology to Guide Future Vaccine Design.

Authors:  Santhinissi Addala; Madhuri Vissapragada; Madhumita Aggunna; Niharikha Mukala; Manisha Lanka; Shyamkumar Gampa; Manikanta Sodasani; Jahnavi Chintalapati; Akhila Kamidi; Ravindra P Veeranna; Ravikiran S Yedidi
Journal:  Vaccines (Basel)       Date:  2022-05-25

4.  Design of a Recombinant Multivalent Epitope Vaccine Based on SARS-CoV-2 and Its Variants in Immunoinformatics Approaches.

Authors:  Mingkai Yu; Yuejie Zhu; Yujiao Li; Zhiqiang Chen; Zhiwei Li; Jing Wang; Zheng Li; Fengbo Zhang; Jianbing Ding
Journal:  Front Immunol       Date:  2022-05-06       Impact factor: 8.786

5.  Immunoinformatic analysis of structural and epitope variations in the spike and Orf8 proteins of SARS-CoV-2/B.1.1.7.

Authors:  Mushtaq Hussain; Sanya Shabbir; Anusha Amanullah; Fozia Raza; Muhammad J Imdad; Sahar Zahid
Journal:  J Med Virol       Date:  2021-03-25       Impact factor: 2.327

Review 6.  Current and prospective computational approaches and challenges for developing COVID-19 vaccines.

Authors:  Woochang Hwang; Winnie Lei; Nicholas M Katritsis; Méabh MacMahon; Kathryn Chapman; Namshik Han
Journal:  Adv Drug Deliv Rev       Date:  2021-02-06       Impact factor: 17.873

7.  A Recombinant Protein SARS-CoV-2 Candidate Vaccine Elicits High-titer Neutralizing Antibodies in Macaques.

Authors:  Gary Baisa; David Rancour; Keith Mansfield; Monika Burns; Lori Martin; Daise Cunha; Jessica Fischer; Frauke Muecksch; Theodora Hatziioannou; Paul D Bieniasz; Fritz Schomburg; Kimberly Luke
Journal:  Res Sq       Date:  2021-01-05

8.  Annotating Spike Protein Polymorphic Amino Acids of Variants of SARS-CoV-2, Including Omicron.

Authors:  Gusti Ngurah Mahardika; Nyoman B Mahendra; Bayu K Mahardika; Ida B K Suardana; Made Pharmawati
Journal:  Biochem Res Int       Date:  2022-04-11

9.  Utility of in silico-identified-peptides in spike-S1 domain and nucleocapsid of SARS-CoV-2 for antibody detection in COVID-19 patients and antibody production.

Authors:  Karen Cortés-Sarabia; Víctor Manuel Luna-Pineda; Hugo Alberto Rodríguez-Ruiz; Marco Antonio Leyva-Vázquez; Daniel Hernández-Sotelo; Fredy Omar Beltrán-Anaya; Amalia Vences-Velázquez; Oscar Del Moral-Hernández; Berenice Illades-Aguiar
Journal:  Sci Rep       Date:  2022-09-05       Impact factor: 4.996

10.  Vaccine design based on 16 epitopes of SARS-CoV-2 spike protein.

Authors:  Jinlei He; Fan Huang; Jianhui Zhang; Qiwei Chen; Zhiwan Zheng; Qi Zhou; Dali Chen; Jiao Li; Jianping Chen
Journal:  J Med Virol       Date:  2020-11-01       Impact factor: 20.693

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.