Literature DB >> 35250426

In-silico analysis of recombinant protein vaccines based on the spike protein of Indonesian SARS-CoV-2 through a reverse vaccinology approach.

Riska A Febrianti1, Erlia Narulita1,2.   

Abstract

Objectives: This study aimed to produce a recombinant protein vaccine candidate based on an epitope of spike protein from Indonesian SARS-CoV-2 to provide immunogenicity and protection against future infection.
Methods: A reverse vaccinology approach was used to identify potential vaccine candidates by screening the pathogen's genome through computational analyses.
Results: Epitope vaccine candidates with the amino acid sequence of FKNHTSPDV were selected. This peptide is hydrophilic, does not induce autoimmune and allergic reactions, is antigenic, is classified as a stable protein, and is predicted to be present in the cell membrane. The selected epitope sequences were inserted into the plasmid vector pcDNA3.1(+) N-GST (thrombin). Inclusion of additional features of the gene encoding glutathione-S transferase, which can increase antigen expression and solubility, and the genes encoding NSP 1-4 proteins, which are essential in replication, added value to the produced recombinant protein.
Conclusion: Recombinant protein vaccine candidates with the FKNHTSPDV epitope have parameters sufficient for production on a laboratory scale for further testing.
© 2022 [The Author/The Authors].

Entities:  

Keywords:  Indonesia; Recombinant protein; Reverse vaccinology; SARS-CoV-2; Spike protein

Year:  2022        PMID: 35250426      PMCID: PMC8881762          DOI: 10.1016/j.jtumed.2022.02.007

Source DB:  PubMed          Journal:  J Taibah Univ Med Sci        ISSN: 1658-3612


Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which causes Coronavirus disease 2019 (COVID-19), has led to a global pandemic with rapid spread. Limited preventive measures in the form of vaccination are available. Until August 2021, according to Worldmeters data (https://www.worldometers.info/), the total number of COVID-19 cases was 219 million, and the death toll was nearly 4 million worldwide. In Indonesia, the percentage of deaths due to infection with the SARS-CoV-2 virus is higher than that in many other countries, at approximately 9.36%. Measures for treatment and prevention of COVID-19 to date have included the antiviral drug remdesivir, vaccines, and convalescent plasma transfusion, thus providing potential options for treating patients with severe COVID-19. However, the clinical benefits of remdesivir in patients with severe disease are limited. Recent studies have shown that the severity of COVID-19 is associated with an increase rather than a decrease in IgG response, and that convalescent plasma transfusions can be beneficial for patients only when given 14 days after the onset of the disease rather than later. The key to fighting this pandemic is understanding the viral receptor recognition mechanisms that regulate infectivity, pathogenesis, and host range. SARS-CoV-2 and SARS-CoV recognize the same receptor, ACE2, in humans. The SARS-CoV-2 spike glycoprotein determines viral binding and invasion of target cells via the ACE2 receptor. Many studies have used this glycoprotein to develop a vaccine for COVID-19, because of its crucial role and high surface exposure. Therefore, the S protein interaction region was selected for inclusion in the structure of the designed vaccine. This region has a specific and highly conserved glycosylation pattern, which makes it a suitable antibody target. Neutralizing antibodies binding this site on spike protein block viral binding and entry into host cells. In contrast, the roles of T cells in the immune response to coronavirus infection are much more important than those of B cells. More than 70% of the T cell immune response targets the structural proteins of the coronavirus. T cells are stimulated by antigen-presenting cells (APCs). When APCs phagocytose viruses, only structural proteins from the coronavirus undergo processing via the major histocompatibility complex (MHC), because of the lack of non-structural proteins (NSPs) in viral particles. In the past, vaccine development relied heavily on immunological trials, in an expensive and time-consuming process. However, recent advances in immunological bioinformatics have provided a viable tool to significantly decrease the time and costs involved, and the risk of errors, when used in laboratory settings for vaccine development. This tool can decrease the costs of discovery and simulation in testing the antigenicity of vaccine candidates in early research stages. One development associated with this framework is the production of recombinant protein vaccines through a reverse vaccinology approach to identify recombinant vaccine epitope candidates. Recombinant protein vaccines stimulate the production of antibodies that interact with antigen proteins or virus particles. Recombinant protein vaccines do not replicate and do not have an infectious component of viral particles. These vaccines are thus considered to be safer than live virus vaccines. This technology has been tested extensively and found to elicit only very mild adverse effects.

Materials and Methods

Identification and download of SARS-CoV-2 Indonesian isolate sample sequences

The complete genome sequences of SARS-CoV-2 Indonesian isolates, amounting to 29 sequences representing the mutations in various provinces in Indonesia, were retrieved on the date on which this project started, March 5, 2021, from the GISAID website (https://www.gisaid.org/) without filtering coverage and with download of coding nucleotide sequences. SARS-CoV-2 wild-type spike protein originating from Wuhan (reference No. NC_045512.2) was used as a reference sequence for mapping the region of the wild-type spike protein-encoding gene in the complete genome sequence of Indonesian SARS-CoV-2, which had not been annotated at that time.

Alignment of Indonesian SARS-CoV-2 nucleotide sequences with the Wuhan reference sequence (wild type)

A total of 29 nucleotide sequences of the complete genome of Indonesian SARS-CoV-2 with spike protein were aligned with the wild type Wuhan sequence via the online tool Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/). The alignment was aimed at determining the positions of Indonesian SARS-CoV-2 spike protein nucleotides with reference to the Wuhan spike protein nucleotide sequence taken from the annotated complete genome.

Translation of Indonesian SARS-CoV-2 spike protein nucleotide sequences

The nucleotide sequences encoding the spike protein from each of the Indonesian SARS-CoV-2 sequences obtained in the previous stage were translated. The translation was aimed at obtaining the amino acid spike sequences with respect to the annotated Wuhan wild-type spike protein amino acid reference sequence, by using the online tool ExPASy (https://web.expasy.org/translate/).

Determination of the sustainability of Indonesian SARS-CoV-2 spike protein sequences

Amino acid spike sequences of the entire sample were aligned for sustainability analysis with the Antigenic Variability Analyzer (AVANA) tool. The threshold was set to 95%. This step was aimed at determining conserved and non-conserved regions. In the determination of B cell and T cell epitopes, only sequences of peptides in conserved sites were considered as vaccine candidates.

Consensus sequence construction for Indonesian SARS-CoV-2 spike protein

Consensus sequence construction was performed from amino acid sequence alignment of the Indonesian SARS-CoV-2 spike protein with the online tool EMBOSS (https://www.ebi.ac.uk/Tools/msa/embosscons/).

Determination of B cell epitopes

The consensus sequence of the Indonesian SARS-CoV-2 spike protein was mapped against B cells through prediction of the linear epitopes of B cells by using the Bepipred 2.0 parameter in IEDB (http://tools.iedb.org/bcell/). Bepipred 2.0 is an immunoinformatics tool for predicting B-cell epitopes from antigen sequences. This step was aimed at determining regions that B cells can potentially recognize.

Determination of the epitopes of CD4 T cells and CD8 T cells

CD8 T cell epitopes was determined from the mapping of B cell epitope sequences longer than 9-mers against human leukocyte antigen (HLA) class I, by using the online tool netCTLpan 1.1 (http://www.cbs.dtu.dk/services/NetCTLpan/). Epitope sequences recognized by CD8 T cells were then mapped against HLA class II to predict which epitopes are also identified by CD4 T cells, by using the online tool netMHCII 2.3 (http://www.cbs.dtu.dk/services/NetMHCII/) with a peptide length setting of 9-mer for each web tool used. Other settings followed the default for each web tool. This study mapped peptide sequences against 56 class I HLA alleles and 22 class II HLA alleles found in the Indonesian population.

Similarity analysis to human peptides

The detected epitope candidates with the potential to be recognized by B cells, CD8 T cells, and CD4 T cells were then tested for suitability through comparison against human non-redundant protein sequences [taxid: 9606] by using NCBI Blastp (http://blast.ncbi.nlm.nih.gov/Blast.cgi). The NCBI Blastp parameters were as follows: 1) 30,000 expectation value; 2) PAM30 matrix; 3) disabled low complexity filter; 4) composition-based statistics category set to “no adjustment”; 5) cutoff from 10e-4 to 10e-3. Nonameric sequences with homologous identities equal or more than seven-ninths without gaps or mismatches to human self-peptides were eliminated from the vaccine epitope candidates.

Analysis of hydrophobicity properties

Physicochemical analysis was performed by determining the solubility of the epitope according to the hydrophobicity value by using the ExPASy online tool (https://web.expasy.org/protparam/). This program predicts the hydrophobicity of short amino acid sequences (above six amino acids), isoelectric point, and molecular protein weight. It also uses a grand average of hydropathicity (GRAVY) score distribution, showing the value for each type of amino acid, with a range of −4.5 for arginine and +4.5 for isoleucine.

Novelty analysis of the candidate vaccine epitope

The epitope recency test was performed by determining the presence of the epitopes in the epitope database in the IEDB Analysis Resources (https://www.iedb.org/).

Analysis of antigenicity and allergenicity, and determination of membrane topology for the vaccine epitope candidates

Antigenicity testing of epitope vaccine candidates was conducted with the online tool VaxiJen Server 2.0 (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html) by setting the threshold value to 0.4 to improve the prediction accuracy. Simultaneously, the virus was selected as the target organism. The allergenicity analysis of the epitope candidates was conducted with the online tool AllerTOP v2.0, which has a predictive accuracy of 88.7%. Prediction of the selected candidate epitope's transmembrane topology was performed with TMHMM v.2.0 (http://www.cbs.dtu.dk/services/TMHMM/). The output of this tool was in the form of a probability graph of the possible epitope presentation locations and a statement indicating one of three location choices: inside, outside, or transmembrane.

Reverse translation of selected epitope sequences and codon optimization

Selected epitope sequences were back-translated to obtain nucleotide sequences in EMBOSS BackTranseq (https://www.ebi.ac.uk/Tools/st/embossbacktranseq/). The results of this back translation were then optimized with the NovoPro Labs online tool to obtain a nucleotide sequence with a codon adaptation index >0.8.

Construction of recombinant plasmid vectors

The plasmid selected as a vector was pcDNA3.1(+) N-GST (thrombin). The plasmid insert was designed in SnapGene software. The design result sequence was inserted via the Eco321 restriction site of the plasmid.

Results

Identification of SARS-CoV-2 sequence samples and mapping of spike protein-encoding genes

The list of accession codes of the complete genome sequences of Indonesian SARS-CoV-2 used in this study can be found in Table 1. The complete genome sequences were mapped with multiple sequence alignment methods to determine the gene sequences encoding the spike protein. These sequences were used as epitope candidates and were selected in this study on the basis of the Wuhan wild-type spike protein-coding reference sequences nucleotides (reference No. NC_045512.2).
Table 1

List of Indonesia SARS-CoV-2 complete genome sequence access codes downloaded from GISAID.

NumberProvinceAccession number
1East Java/SidoarjoEPI_ISL_956315
2Banten/TangerangEPI_ISL_947327
3JakartaEPI_ISL_953427
4West JavaEPI_ISL_747241
5Central JavaEPI_ISL_791988
6Special Region of YogyakartaEPI_ISL_911709
7AcehEPI_ISL_791981
8Bangka Belitung IslandsEPI_ISL_747237
9North SumatraEPI_ISL_756401
10LampungEPI_ISL_791978
11Riau IslandsEPI_ISL_791985
12South Sumatra/PalembangEPI_ISL_833039
13West SumatraEPI_ISL_910014
14BengkuluEPI_ISL_791979
15BaliEPI_ISL_775596
16East Nusa Tenggara/KupangEPI_ISL_766048
17West Nusa TenggaraEPI_ISL_775598
18South KalimantanEPI_ISL_753699
19Central KalimantanEPI_ISL_538502
20East KalimantanEPI_ISL_791983
21North KalimantanEPI_ISL_803876
22West KalimantanEPI_ISL_911750
23North Sulawesi/ManadoEPI_ISL_574623
24South Sulawesi/MakassarEPI_ISL_833502
25North MalukuEPI_ISL_791986
26West PapuaEPI_ISL_775597
27Papua/TimikaEPI_ISL_574603
28JakartaEPI_ISL_1118931
29JakartaEPI_ISL_1118933
List of Indonesia SARS-CoV-2 complete genome sequence access codes downloaded from GISAID.

Translation of the nucleotide sequences encoding Indonesian SARS-CoV-2 spike protein, analysis of sequence continuity, and construction of the amino acid consensus sequence of spike protein

Nucleotide sequences identified as spike protein-encoding genes from the results of alignment to the reference sequences were translated for each Indonesian SARS-CoV-2 sample to obtain amino acid sequences of spike proteins, on the basis of the reference reading frame of the spike protein-coding sequences from the annotated complete genome of Wuhan SARS-CoV-2 wild type. Sequence sustainability analysis was conducted by using the alignment results of the amino acid spike sequences from the entire Indonesian SARS-CoV-2 sample. The results were based on the analysis of the sustainability of known sequence regions containing amino acid residues with conserved and non-conserved regions (Figure 1). Sustainable areas of spike protein sequences are indicated in red on the chart, whereas unsustainable areas are in white. Amino acid residues in unsustainable regions were located at positions 74, 149, 249, 398, 513, 583, 51, 700–725, 775–795, 813, 838, 924, 1126, and 1298.
Figure 1

Determination of conserved regions of the spike protein sequence of Indonesian SARS-CoV-2. Conserved regions of spike protein sequences are marked in red, while unconserved regions are in white. Amino acid residues belonging to unconserved regions are at positions 74, 149, 249, 398, 513, 583, 51, 700-725, 775-795, 813, 838, 924, 1126, and 1298.

Determination of conserved regions of the spike protein sequence of Indonesian SARS-CoV-2. Conserved regions of spike protein sequences are marked in red, while unconserved regions are in white. Amino acid residues belonging to unconserved regions are at positions 74, 149, 249, 398, 513, 583, 51, 700-725, 775-795, 813, 838, 924, 1126, and 1298. The epitope sequences in unsustainable areas were eliminated according to determination of B cell and T cell epitopes. The consensus sequence from Indonesian SARS-CoV-2 spike protein amino acid sequences was used in the epitope determination stage. The results of the consensus sequence of spike protein amino acids are shown in Table 2.
Table 2

Indonesia SARS-CoV-2 spike protein consensus sequence.

>ConsensusMFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKxGCCSCGSCCKFDEDDSEPVLKGVKLHYT
Indonesia SARS-CoV-2 spike protein consensus sequence.

Determination of the epitopes of B cells, CD4 T cells, and CD8 T cells

The full-length spike protein consensus sequence based on 29 Indonesian SARS-CoV-2 sequences was mapped against B cell epitopes to determine the locations of linear epitopes recognized by B cells, on the basis of the annotated epitope database and Bepipred 2.0 parameters in the IEDB webserver. In this stage, a total of 34 peptide sequences recognized by B cells were obtained, with lengths varying from 1-mer to 62-mer (Table 3). After selection according to the sustainability analysis results, eight peptide sequences containing amino acid residues in non-conserved regions were obtained. In addition, a peptide sequence with at least 9-mer amino acids was required to proceed to the mapping stage against HLA class I and class II. Only 18 peptide sequences of B-cell epitopes with lengths varying from 9-mer to 62-mer, located in a conserved region, and mapping against the HLA I and HLA II alleles were found in the Indonesian population. Mapping of B cell epitope sequences against HLA class I and class II was performed to determine the epitopes recognized by CD4 T cells and CD8 T cells. The list of HLA class I and class II used is provided in Table 4.
Table 3

Prediction of B cell epitope and analysis of its sustainability.

No.StartEndPeptideLengthConservancy analysis (based on AVANA)
11337SQCVNLTTRTQLPPAYTNSFTRGVY25Conserved region
25981FSNVTWFHAIHVSGTNGTKRFDN23Non-conserved region
39798KS2Conserved region
4138154DPFLGVYYHKNNKSWME17Non-conserved region
5177189MDLEGKQGNFKNL13Conserved region
6206221KHTPINLVRDLPQGFS16Conserved region
7250260TPGDSSSGWTA11Conserved region
8293296LDPL4Conserved region
9304322KSFTVEKGIYQTSNFRVQP19Conserved region
10329363FPNITNLCPFGEVFNATRFASVYAWNRKRISNCVA35Conserved region
11369393YNSASFSTFKCYGVSPTKLNDLCFT25Conserved region
12404426GDEVRQIAPGQTGKIADYNYKLP23Conserved region
13440501NLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTN62Conserved region
14516536ELLHAPATVCGPKKSTNLVKN21Conserved region
15555562SNKKFLPF8Conserved region
16580583QTLE4Non-conserved region
17602606TNTSN5Conserved region
18617632CTEVPVAIHADQLTPT16Conserved region
19635643VYSTGSNVF9Conserved region
20656666VNNSYECDIPI11Conserved region
21672690ASYQTQTNSPRRARSVASQ19Conserved region
22695710YTMSLGAENSVAYSNN16Non-conserved region
23748748E1Conserved region
24773779EQDKNTQ7Non-conserved region
25786800KQIYKTPPIKDFGGF15Non-conserved region
26807814PDPSKPSK8Non-conserved region
27828842LADAGFIKQYGDCLG15Non-conserved region
28988992EAEVQ5Conserved region
2910351043GQSKRVDFC9Conserved region
3011071118RNFYEPQIITTD12Conserved region
3111331172VNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGI40Conserved region
3212031206LGKY4Conserved region
3312521267SCCKFDEDDSEPVLKG16Conserved region
3412691269K1Conserved region
Table 4

List of Class I and Class II HLA (human leukocyte antigen) are useda.

56 HLA class I alleles possessed by the Indonesian population HLA-A01:01,HLA-A02:01,HLA-A02:03,HLA-A02:06,HLA-A02:11,HLAA03:01,HLAA11:01, HLA-A11:04,HLA-A24:02,HLA-A24:07,HLA-A24:10,HLA-A26:01,HLAA29:01,HLA-A30:01, HLA-A32:01,HLA-A33:03,HLA-A34:01,HLAA74:01,HLA-B07:02,HLA-B07:05,HLA-B08:01, HLA-B13:01,HLA-B13:02,HLA-B15:01,HLA-B15:02,HLAB15:10,HLAB15:12, HLA-B15:13, HLA-B15:17,HLA-B15:21,HLA-B15:25,HLAB15:32,HLA-B18:01,HLA-B18:02,HLA-B27:06, HLA-B35:01,HLA-B35:02,HLA-B35:03,HLA-B35:05,HLA-B35:30,HLA-B37:01,HLA-B38:02, HLA-B39:15,HLAB40:01,HLAB40:02,HLA-B40:06,HLA-B41:01,HLA-B44:03,HLA-B48:01, HLA-B51:01,HLAB51:02,HLA-B52:01,HLA-B56:01, HLA-B56:02, HLA-B56:07, HLAB57:01, HLA-B58:01
22 HLA class II alleles possessed by the Indonesian populationDRB1_0101, DRB1_0301, DRB1_0401, DRB1_0402, DRB1_0403, DRB1_0405, DRB1_0406, DRB1_0701, DRB1_0802, DRB1_0901, DRB1_1001, DRB1_1101, DRB1_1201, DRB1_1301, DRB1_1302, DRB1_1401, DRB1_1404, DRB1_1405, DRB1_1407, DRB1_1454, DRB1_1501, DRB1_1602

Reference:.

Prediction of B cell epitope and analysis of its sustainability. List of Class I and Class II HLA (human leukocyte antigen) are useda. Reference:. On the basis of the mapping of class I HLA alleles in the Indonesian population, 63 epitope sequences had the potential to bind class I HLA and could potentially be CD8 T cell epitopes. The CD8 T cell epitope sequences were then mapped against HLA class II to determine the potential for introducing the same epitope sequence into CD4 T cells. According to the mapping results against HLA class II, the core peptides from epitope sequences predicted to be recognized by CD8 T cells but containing non-specific amino acid profiles (marked with the symbol X) were eliminated as vaccine candidates. After selection according to HLA class II mapping results, 35 CD4 T cell epitope sequences were obtained. The determination of CD4 T cell epitopes and CD8 T cells on the basis of the recognized list of HLA alleles can be found in Supplementary 1.

Analysis of similarity to the human peptide, hydrophobicity characteristics, and novelty of the candidate vaccine epitope

The results of the epitope tests based on the three analysis parameters are presented in Table 5. According to the results of 35 B-cell and T-cell epitope sequences, 19 homologous identity sequences have no any gaps or mismatches with human peptides or self-peptides. At this stage, 16 epitope candidates were selected, which were considered to have low similarity to human peptides. Determination of the solubility or hydrophobicity properties of the epitope sequence was based on the scores resulting from the calculation of the average GRAVY score in the ExPASy tool. A result above or below zero indicates that a protein is hydrophobic or hydrophilic, respectively.
Table 5

Similarities to selfpeptide, hydrophobicity characteristics, and novelty of candidate epitopes.

Epitope candidatesAnalysis of similarity to self-peptideHydrophobicity test
Epitope recency testSelected epitopes (go to the next step)
GRAVY scoreMeaning
VNLTTRTQLNo similarities in 7/9 or more were found without gaps−0.200HydrophilicNew epitope (database no reference)VNLTTRTQL
LTTRTQLPPHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
TRTQLPPAYNo similarities in 7/9 or more were found without gaps−0.922HydrophilicNew epitope (2021 reference database-specifically SARS-CoV-2)TRTQLPPAY
LVRDLPQGFHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
FTVEKGIYQNo similarities in 7/9 or more were found without gaps−0.200HydrophilicNew epitope (database no reference)FTVEKGIYQ
GIYQTSNFRNo similarities in 7/9 or more were found without gaps−0,822HydrophilicNew epitope (database no reference)GIYQTSNFR
FASVYAWNRNo similarities in 7/9 or more were found without gaps−0.044HydrophilicNew epitope (2020-specific SARS-CoV-2 reference database)FASVYAWNR
FNATRFASVNo similarities in 7/9 or more were found without gaps0.433HydrophobicNot tested because it is hydrophobicEliminated
VFNATRFASHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
TRFASVYAWNo similarities in 7/9 or more were found without gaps0.267HydrophobicNot tested because it is hydrophobicEliminated
YAWNRKRISNo similarities in 7/9 or more were found without gaps−1456HydrophilicNew epitope (database no reference)YAWNRKRIS
VYAWNRKRINo similarities in 7/9 or more were found without gaps−0,900HydrophilicNew epitope (database no reference)VYAWNRKRI
ATRFASVYANo similarities in 7/9 or more were found without gaps0.567HydrophobicNot tested because it is hydrophobicEliminated
EVFNATRFAHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
FKCYGVSPTNo similarities in 7/9 or more were found without gaps0.089HydrophobicNot tested because it is hydrophobicEliminated
CYGVSPTKLHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
ASFSTFKCYNo similarities in 7/9 or more were found without gaps0.267HydrophobicNot tested because it is hydrophobicEliminated
STFKCYGVSNo similarities in 7/9 or more were found without gaps0.178HydrophobicNot tested because it is hydrophobicEliminated
FERDISTEIHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
LYRLFRKSNHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
IYQAGSTPCHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
CYFPLQSYGHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
YFPLQSYGFHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
YRLFRKSNLHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
FRKSNLKPFHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
YNYLYRLFRHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
YQAGSTPCNNo similarities in 7/9 or more were found without gaps−0,833HydrophilicNew epitope (database no reference)YQAGSTPCN
FNCYFPLQSNo similarities in 7/9 or more were found without gaps0.133HydrophobicNot tested because it is hydrophobicEliminated
DISTEIYQAHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
EIYQAGSTPHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
LLHAPATVCHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
TQTNSPRRAHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
QTNSPRRARHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
FKNHTSPDVNo similarities in 7/9 or more were found without gaps−1133HydrophilicNew epitope (database no reference)FKNHTSPDV
ELDSFKEELHighest similarity: 7/9 without gapThe sequences were eliminated because of their high similarity to self-peptideEliminated
Similarities to selfpeptide, hydrophobicity characteristics, and novelty of candidate epitopes. On the basis of the results of the hydrophobicity analysis, nine epitope candidates were found to be hydrophilic. Epitope candidates that were hydrophobic were eliminated, because they are not considered ideal vaccine candidates. On the basis of epitope recency analysis performed on March 6, 2021, nine epitope candidates passing the hydrophobicity analysis were classified as new epitopes (not found in the database).

Analysis of antigenicity and allergenicity, and determination of membrane topology for the vaccine epitope candidate

On the basis of the analysis of the antigenicity of the nine candidates that passed the epitope vaccine parameters, seven epitopes had antigenicity scores above the threshold value set (>0.4). This result indicated that these epitopes are antigenic (Table 6), whereas the remaining epitopes are non-antigenic because they had scores below the threshold. The results of allergenicity analysis indicated that six epitopes were predicted to trigger allergic reactions (i.e., to be allergens), whereas the remaining epitopes were non-allergenic. According to the results of these two analyses, three potential epitope candidates were antigenic and non-allergenic. In addition, the prediction of the epitope candidate transmembrane topology results reported an inside location, meaning that the epitope tends to be present on the inner surface of the membrane.
Table 6

The antigenicity, allergenicity, and topology of the epitope candidate membrane.

No.Epitope candidatesAntigenicity test
Allergenicity testTopologyPotential epitope candidates
ScoreMeaning
1TRTQLPPAY1.2923AntigenicNon-allergenInsideTRTQLPPAY
2VNLTTRTQL1.3468AntigenicAllergensInsideEliminated
3GIYQTSNFR0.5380AntigenicAllergensInsideEliminated
4FTVEKGIYQ−0.1987Non-antigenicAllergensInsideEliminated
5FASVYAWNR0.0713Non-antigenicAllergensInsideEliminated
6YAWNRKRIS0.8209AntigenicNon-allergenInsideYAWNRKRIS
7VYAWNRKRI0.5003AntigenicAllergensInsideEliminated
8FKNHTSPDV0.4846AntigenicNon-allergenOutsideFKNHTSPDV
9YQAGSTPCN0.4992AntigenicAllergensInsideEliminated
The antigenicity, allergenicity, and topology of the epitope candidate membrane. The best epitope was selected on the basis of the parameters of the previous analysis. The selected epitope had an amino acid sequence of FKNHTSPDV, which had relatively high solubility according to the GRAVY value. Thus, the FKNHTSPDV sequence was chosen as the best epitope to potentially serve as a vaccine candidate in this study. The characteristics of the selected epitope candidates from the various stages of analysis performed are summarized in Table 7. The selected epitope sequences were then back translated and optimized to obtain a nucleotide sequence encoding the epitope with codons that could be optimally expressed in the human body. The results of nucleotide sequence optimization are shown in Table 8.
Table 7

Characteristics of selected epitope candidates.

CharacteristicsInformationMeaning
Similarity to self-peptideNo similarities in 7/9 or more were found without gapsNot similar to self-peptide
HydrophobicityGRAVY Score: −1.133Hydrophilic (dissolved)
Molecular mass1044.13
tpI6.74
StabilityStable protein
Recency (based on IEDB database)The research has never been done and was not found in the databaseNew (novel)
Antigenicity0.4846Antigenic
AllergenicityNon-allergen
TopologyOutside
Table 8

Selected epitope coding nucleotide sequences and codon optimization results.

Selected epitope (amino acid sequence)Reverse translated sequence (nucleotide sequences)Optimized sequencesCharacteristics
CAIGC content (%)
FKNHTSPDVTTCAAGAACCACACCAGCCCCGACGTGTTCAAAAACC ACACTTCTCC GGACGTA0.8744.44
Characteristics of selected epitope candidates. Selected epitope coding nucleotide sequences and codon optimization results.

Construction of recombinant pcDNA3.1(+) N-GST (thrombin) plasmid

The results of the plasmid construction are shown in Figure 2. The red part of the plasmid construct is the part designed in this study. The inserted gene was added with ATG as a start codon, encoding the amino acid methionine. The addition of features to vaccine constructs, such as the gene encoding the enzyme glutathione S transferase (GST), which can increase the expression and solubility of antigens, and genes encoding proteins NSP 1–4, which are essential in replication,, conferred added value on the recombinant protein.
Figure 2

Construction results for Plasmid pcDNA3.1(+) N-GST (Thrombin)-Epitope of Indonesian SARS-CoV-2. The red part of the plasmid construct is the part designed in this study. The inserted gene was added with ATG as a start codon that encoded the amino acid methionine.

Construction results for Plasmid pcDNA3.1(+) N-GST (Thrombin)-Epitope of Indonesian SARS-CoV-2. The red part of the plasmid construct is the part designed in this study. The inserted gene was added with ATG as a start codon that encoded the amino acid methionine.

Discussion

The development of the SARS-CoV-2 vaccine in this study was based on a recombinant protein. Recombinant protein vaccines stimulate the production of antibodies that interact with antigen proteins or viral particles. The process of identifying and initially testing vaccine candidates can start from in silico studies using bioinformatics tools. The use of in silico studies beforehand can decrease the likelihood of failure and the losses incurred. This research began with downloading of the complete genome sequences of SARS-CoV-2 in the GISAID database. A total of 29 sequences from provinces in Indonesia were used, which were considered to generally represent a complete genome sample of Indonesian SARS-CoV-2, because of their high similarity to one another. The proofreading activity, which is regulated by NSP14 3′-to-5′ exoribonuclease (NSP14-ExoN) in the SARS-CoV-2 genome, is a key determinant of both coronavirus replication and recombination. Thus, the genome sequence of viruses belonging to this group are conserved regions. Sustainability analysis has indicated a high similarity among SARS-CoV-2 sequences reaching >99%. The target protein selected in this study was the spike protein or surface glycoprotein. On the basis of previous research, the spike protein shows a relatively high antigenicity value and has the potential to be a candidate for vaccine epitopes that can induce an excellent immune response. The SARS-CoV-2 spike protein is also conserved, as compared with the spike protein in another human coronavirus (hCoV). This conserved area has been predicted not to undergo mutation, and to be responsible for a particular function or provide a necessary structural characteristic. Determination of conserved regions aims to ensure that the epitopes used as candidate vaccines show promise for efficacy and coverage of groups or clusters. Determination of a SARS-CoV-2 Indonesian spike protein peptide sequence able to bind B cell receptors or immunoglobulin was performed to obtain a specific B cell epitope to induce the activation and function of B cells. If this epitope is recognized, it should cause B cells to differentiate and proliferate into plasma B cells, which produce antibodies that directly attack the virus, and can become memory B cells, which support a faster and more effective immune response to infection. On the basis of the results of sequence conservation analysis, we eliminated the predicted peptide sequences that could be recognized by B cells (Table 3), contained non-conserved amino acid residues (non-conserved), and were less than 9-mers in length. To determine that the T cell epitopes were in a conserved region, we ensured that the peptide sequences mapped against HLA class I and class II. MHC class I is an APC ligand recognized by CD8 T cells. The ligand binds antigen, then presents it to be recognized by CD8 T cells as an antigenic determinant (epitope) with activity toward infected cells. A series of analyses of epitopes that are potentially recognized by B cells, CD4 T cells, and CD8 T cells were performed to select the best epitope as a vaccine candidate before moving to the next stage. Analysis of the similarity of epitopes to human peptides (self-peptides) was conducted to eliminate epitopes with the potential to induce autoimmune reactions. This analysis was fundamental to ensuring that the selected epitope did not have high similarity to human peptides, which could elicit antibodies or other immune responses against the host peptide itself. Hence, peptides with high similarity were eliminated. The novelty aspect was assessed by cross-checking the epitope candidates against the IEDB database. Epitopes that had never been reported or studied before were selected as vaccine candidates in this study. Epitope solubility analysis based on the hydrophobicity value (GRAVY score) was performed to determine the epitope with the best solubility (hydrophilicity), to enable its interaction with the immune system in aqueous media. The results of the analyses of similarity, recency, and hydrophobicity are presented in Table 1. Antigenicity analysis is used to determine the ability of epitopes to be recognized as antigens by adaptive immune responses, particularly to stimulate the responses of B cells and T cells. Epitopes with good antigenicity can induce an adaptive immune response resulting in the production of memory cells, which “remember” viral antigens. Memory cells result in a faster and more effective immune response when an infection occurs. Seven epitopes were found to have antigenicity scores exceeding the threshold value set (>0.4). Allergenicity analysis was used to eliminate epitopes that are potential allergens and might induce allergic reactions in the body. Epitopes used as vaccine candidates should not cause allergic reactions that are harmful to the body. Thus, only epitope candidates with high antigenicity and were not allergenic were selected as vaccine epitope candidates. After the chosen vaccine epitope candidate was translated into the nucleotide sequence encoding the epitope, codon optimization was performed to ensure and increase the likelihood of the protein being expressed in humans. The selected epitope had an amino acid sequence of FKNHTSPDV, which has relatively high solubility, on the basis of the GRAVY value. Thus, this sequence was chosen as the best epitope that could potentially serve as a vaccine candidate in this study. This epitope has also been identified in a previous study, which used a different sequence. The plasmid pcDNA3.1(+) N-GST (thrombin) was used as a vector in this study, and the target gene was inserted via the Eco321 restriction site. Escherichia coli was the primary choice for producing recombinant protein in this study, because of its low cost and ease of culture, and the availability of accessible related technology.,, Various recombinant proteins from bacteria, archaebacteria, and eukaryotes can be produced efficiently in E. coli. E. coli do not have disulfide isomerase protein; thus, the recombinant protein expressed in E. coli bacteria cannot fold completely, thus resulting in low solubility and activity of the protein produced. A strategy widely used to overcome this problem involves fusion of the protein with GST. GST fusion also simplifies the protein purification process. Moreover, the addition of the genes encoding NSP 1–4, which are essential proteins in replication,, added value to the recombinant protein produced.

Conclusions

On the basis of the search results for an Indonesian SARS-CoV-2 spike protein epitope that can be recognized by selected B cells and T cells, we identified an epitope with the amino acid sequence FKNHTSPDV, which is hydrophilic, does not have the potential to induce autoimmune and allergic reactions, is antigenic, is classified as a stable protein, and is predicted to be present outside the cell membrane. The selected epitope sequence was inserted into the plasmid vector pcDNA3.1(+) N-GST (thrombin) with the addition of a GST sequence to increase the solubility and activity of the protein produced and the genes encoding NSP 1–4, which are essential in replication.

Source of funding

This work was part of research supported by the Research and Community Service Institute (LP2M), University of Jember, Indonesia, through Hibah Mendukung IDB contract number 2858/UN25.3.1/LT/2021 and thesis supervisor assignment letter number 1155/UN25.2/SP/2021.

Conflict of interest

The authors have no conflict of interest to declare.

Ethical approval

This article does not contain any studies involving animals or human participants performed by any of the authors.

Recommendation

SARS-CoV-2 mutations can increase the number of COVID-19 cases, and some mutations can decrease vaccine effectiveness. Recent advances in the field of immunological bioinformatics have provided a viable vaccine development tool to significantly decrease the time, costs, and risk of trial error.

Authors contributions

RA and EN conceived and designed the study. RA conducted research, provided research materials, collected and organized data, and analyzed and interpreted data. EN supervised the research. RA and EN wrote the initial and final drafts of the article. All authors have critically reviewed and approved the final draft and are responsible for the content and similarity index of the manuscript.
  25 in total

1.  Expression of overlapping PreS1 fragment recombinant proteins for the determination of immunogenic domains in HBsAg PreS1 region.

Authors:  Wei-Guo Hu; Jun Wei; Xin-Xiu Yang; Heng-Chuan Xia; Feng Li; Zu-Chuan Zhang
Journal:  Acta Biochim Biophys Sin (Shanghai)       Date:  2004-06       Impact factor: 3.848

Review 2.  A critical analysis of codon optimization in human therapeutics.

Authors:  Vincent P Mauro; Stephen A Chappell
Journal:  Trends Mol Med       Date:  2014-09-25       Impact factor: 11.951

Review 3.  Delivering the Messenger: Advances in Technologies for Therapeutic mRNA Delivery.

Authors:  Piotr S Kowalski; Arnab Rudra; Lei Miao; Daniel G Anderson
Journal:  Mol Ther       Date:  2019-02-19       Impact factor: 11.454

4.  Generation of monoclonal antibodies for the assessment of protein purification by recombinant ribosomal coupling.

Authors:  Janni Kristensen; Hans Uffe Sperling-Petersen; Kim Kusk Mortensen; Hans Peter Sørensen
Journal:  Int J Biol Macromol       Date:  2005-12-02       Impact factor: 6.953

5.  How to find soluble proteins: a comprehensive analysis of alpha/beta hydrolases for recombinant expression in E. coli.

Authors:  Markus Koschorreck; Markus Fischer; Sandra Barth; Jürgen Pleiss
Journal:  BMC Genomics       Date:  2005-04-02       Impact factor: 3.969

6.  BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes.

Authors:  Martin Closter Jespersen; Bjoern Peters; Morten Nielsen; Paolo Marcatili
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

7.  Total protein extraction for metaproteomics analysis of methane producing biofilm: the effects of detergents.

Authors:  Hung-Jen Huang; Wei-Yu Chen; Jer-Horng Wu
Journal:  Int J Mol Sci       Date:  2014-06-06       Impact factor: 5.923

8.  Dynamic changes in circulating T follicular helper cell composition predict neutralising antibody responses after yellow fever vaccination.

Authors:  Johanna E Huber; Julia Ahlfeld; Magdalena K Scheck; Magdalena Zaucha; Klaus Witter; Lisa Lehmann; Hadi Karimzadeh; Michael Pritsch; Michael Hoelscher; Frank von Sonnenburg; Andrea Dick; Giovanna Barba-Spaeth; Anne B Krug; Simon Rothenfußer; Dirk Baumjohann
Journal:  Clin Transl Immunology       Date:  2020-05-13

Review 9.  Structural and functional properties of SARS-CoV-2 spike protein: potential antivirus drug development for COVID-19.

Authors:  Yuan Huang; Chan Yang; Xin-Feng Xu; Wei Xu; Shu-Wen Liu
Journal:  Acta Pharmacol Sin       Date:  2020-08-03       Impact factor: 6.150

10.  Immunoregulation with mTOR inhibitors to prevent COVID-19 severity: A novel intervention strategy beyond vaccines and specific antiviral medicines.

Authors:  Yunfeng Zheng; Renfeng Li; Shunai Liu
Journal:  J Med Virol       Date:  2020-05-22       Impact factor: 20.693

View more
  1 in total

Review 1.  Immunology to Immunotherapeutics of SARS-CoV-2: Identification of Immunogenic Epitopes for Vaccine Development.

Authors:  Apoorva Pandey; Swati Singh; Riya Madan
Journal:  Curr Microbiol       Date:  2022-09-05       Impact factor: 2.343

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.