Literature DB >> 32682041

Design of a multi-epitope vaccine against SARS-CoV-2 using immunoinformatics approach.

Samira Sanami¹, Milad Zandi², Behzad Pourhossein², Gholam-Reza Mobini³, Mohsen Safaei¹, Atena Abed¹, Pooria Mohammadi Arvejeh³, Fatemeh Amini Chermahini³, Morteza Alizadeh⁴.

Abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused COVID-19 disease in China. So far, no vaccine has licensed to protect against infection with COVID-19, therefore an effective COVID-19 vaccine needed. The aim of this study was to predict antigenic peptides of SARS-CoV-2 for designing the COVID-19 vaccine using immunoinformatic analysis. In this study, T and B-cell epitopes of S protein were predicted and screened based on the antigenicity, toxicity, allergenicity, and cross-reactivity with human proteomes. The epitopes were joined by the appropriate linker. LT-IIc as an adjuvant was attached to the end of the structure. The secondary and 3D structure of the vaccine was predicted. The refinement process was performed to improve the quality of the 3D model structure; the validation process is performed using the Ramachandran plot and ProSA z-score. The proposed vaccine's binding affinity to the HLA-A11:01 and HLA-DRB1_01:01 molecule was evaluated by molecular docking. Using molecular dynamics, the stability of vaccine-HLA complexes was also evaluated. Finally, in silico gene cloning was performed in the pET30a (+) vector. The findings suggest that the current vaccine may be a promising vaccine to prevent SARS-CoV-2 infection.

Entities: CellLine Chemical Disease Gene Species

Keywords: COVID-19; Epitopes; Vaccine

Mesh：

Substances：

Year: 2020 PMID： 32682041 PMCID： PMC7362859 DOI： 10.1016/j.ijbiomac.2020.07.117

Source DB: PubMed Journal: Int J Biol Macromol ISSN： 0141-8130 Impact factor: 6.953

Introduction

In December 2019, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused COVID-19 disease in China, which was associated with a cluster of respiratory infections [1]. Infection with SARS-CoV-2 has spread to various countries and created a significant threat to public health. Following the outbreak of SARS-CoV in 2002 and 2003 and MERS-CoV outbreak in 2012, the SARS-CoV-2 outbreak is a recurrence of the previous occurrence with new pressure, and the bat is highly probable to be the origin of all types of CoV [2]. Researchers have confirmed the transmission of this virus from human to human, which has led to a further outbreak of the disease [3]. Coronaviruses are classified into four genera: Alpha, Beta, Gamma, and Delta Coronavirus. Alpha and Bata coronaviruses infect bats and human [4,5], SARS-CoV-2 is a member of β coronavirus genera [6]. SARS-CoV-2 is a single strand, positive-sense RNA virus with a genome of around 30 Kb in length [7]. In all coronaviruses, the structural proteins comprise spike (S), envelope (E), membrane protein (M), and nucleoprotein (N) [8,9]. SARS-CoV-2 uses the S protein as the key to binding and entering host cells therefore understanding its structure and function is important. The S protein is a type I transmembrane protein and highly glycosylated. Spike glycoproteins assemble into trimers on the viral particle surface. S glycoprotein has three domains including ectodomain, transmembrane, and endodomain. Ectodomain subdivided into S1 and S2 subunits, S1 is important for binding to host cell receptors, the amino acid sequences of S1 are more variable than S2 [10]. S protein is an attractive target for vaccine development for the following reasons: S protein is located at the virus surface and uses angiotensin-converting enzyme 2 receptor in respiratory epithelial cells to enter the host cells [11,12], it also can induce neutralizing antibodies [13]. The development and production of a vaccine are costly and will take many years to achieve this goal, the various strategies were used to reduce the time and costs of this process [14]. Reverse vaccinology or vaccinomics is a new method that combines immunogenomics and bioinformatics to develop novel vaccines [15]. This method has many benefits over conventional vaccinology, it decreases the vaccine development time and costs and it also makes possible to study non-cultivable or risky microorganisms [16]. Scientists are working very hard to combat this virus, but still, there is no drug or vaccine available to protect against infection with COVID-19 [17]. Thus, in this regard, we decided to design a new multi-epitope vaccine against SARS-CoV-2 infection using reverse vaccinology approaches. In this research, first, the CTL, HTL, and B-cell epitopes of the S protein were predicted using ProPred-1, ProPred, and ABCPred servers, respectively, and then were selected base on antigenicity, toxicity, allergenicity, and cross-reactivity with human proteomes. The selected epitopes were joined together using the appropriate linker to construct the main core of the multi-epitope vaccine. Since the multi-epitope vaccine has poor antigenicity, LT-IIc as an adjuvant to overcome this weakness was attached to the end of the structure. LT-IIc is a type II subfamily of LTBs consisting of 121 amino acids. Moreover, when co-administered with a multi-epitope vaccine it shows an increase in CD8+ T cell immune responses [18]. Next, the physicochemical properties of the construct were investigated, the 3D structure of the protein was predicted, and finally, its affinity to the MHC I and II molecules was investigated through docking, following that, was performed the molecular dynamics (MD) simulation of docking complexes. Finally, the vaccine coding gene expression in Escherichia coli (E. coli) was evaluated, and in silico gene cloning was performed in the pET30a (+) vector.

Methodology

Amino acid sequence retrieval

The amino acid sequences of SARA-COV-2 S protein (QII57328.1) and LT-IIc (H6W8F2) were obtained in FASTA format from NCBI (https://www.ncbi.nlm. nih.gov/) and Uniprot (https://www.uniprot.org/) databases, respectively.

Prediction and screening of epitopes

The binding epitopes to MHC I alleles were predicted using the ProPred-1 server (http://crdd.osdd.net/raghava/propred1/). ProPred-1 server is based on a method of predicting epitopes that bind to 47 different alleles of MHC I [19]. To predict HTL epitopes was used the ProPred server (http://crdd.osdd.net/raghava/propred/) which uses quantitative matrices to predict T- cell epitopes binding to 51 different alleles of MHC II [20]. In this study, peptide fragments with default parameters for MHC I and II alleles with the highest coverage of Iran's population were predicted. Gamma interferon (IFN-γ) is a cytokine that plays a key role in both innate and adaptive immune response; IFNepitope server (https://webs.iiitd.edu.in/raghava/ifnepitope/predict.php) determined the IFN-γ-inducing HTL epitopes. ABCpred server (http://crdd.osdd.net/raghava/abcpred/) using an artificial neural network predicted the B-cell epitopes. This is the first server developed using fixed-length patterns based on a recurrent neural network. The server predicts peptides 10–20 mer in length and the threshold is among 0.1 to 1. In this study, the server was predicted peptide fragments of 16 mer length, and the threshold was set at 0.89 [21]. Antigenicity, toxicity, and allergenicity were evaluated for predicted epitopes. The antigenicity of the epitopes were calculated by VaxiJen v2.0 server (http://www.ddgpharmfac.net/vaxijen/VaxiJen/VaxiJen.html), which is based on the transformation of the protein sequences auto cross-covariance (ACC) into uniform vectors of main amino acid properties. The performance accuracy of the VaxiJen server varied from 70 to 89%, based on the selected model organism. In this study, the VaxiJen v2.0 server determined the epitopes antigenicity based on a threshold of 0.4 and a virus model [[22], [23], [24]]. The epitope toxicity was determined using the ToxinPred server (http://crdd.osdd.net/raghava/toxinpred/). This server predicts of toxic or non-toxic peptides from a wide variety of peptides, along with their important physical and chemical properties such as hydrophobicity, hydropathicity, amphipathicity, molecular weight, and pI charge [25]. The AllerTOP v. 2.0 Server (https://www.ddg-pharmfac.net/AllerTOP/index.html) predicted epitopic allergenicity. The approach used in this server is based on the automatic cross-covariance (ACC) transformation of protein sequences into uniform vectors of equal length [26]. The PIR peptide matching program (https://research.bioinformatics.udel.edu/peptidematch/index.jsp) was used to investigate the presence or absence of similarity with the human proteome in predicted epitope sequences [27].

Vaccine design and prediction of features

The selected T and B-cell epitopes from the previous step were placed in the vaccine structure. KK and GPGPG linkers joined the screened T and B-cell epitopes in a structure of multi-epitope vaccine, respectively. The LT-IIc amino acid sequence was attached to the merged epitopes at the N-terminus using the EAAAK linker to boost the immunogenicity of the vaccine. An effective vaccine should not only cause strong immune responses but should also have appropriate physicochemical properties during the development cycle. Protparam server (https://web.expasy.org/protparam/) calculated the amino acid content and some of the designed vaccine's physicochemical properties such as molecular weight, theoretical pI, half-life in the mammalian reticulocytes, instability index, aliphatic index, and grand average of hydropathy (GRAVY) [28]. A standard method for determining the molecular weight of a protein, given the number of amino acids that make it up, is to multiply that number by the average weight of an amino acid, which is 110 Da [29]. The isoelectric point is the pH at which amino acid or protein loses its charge and is, therefore, unable to move in a direct current electric field. Knowing the theoretical pI of a protein can be very useful for selecting and optimizing the methods used for protein purification, including ion-exchange chromatography and isoelectric focusing electrophoresis [30]. The half-life is the time it takes for half of the protein to disappear inside the cell after synthesis [31]. The instability index of proteins indicates their stability in the test tube so that proteins with an instability index of less than 40 are predicted as stable proteins [32]. The aliphatic index of a protein is the relative volume of protein occupied by aliphatic side chains, this factor indicates the stability of the protein against heat [33]. The GRAVY is computed by dividing the sum of the hydropathic values of all amino acids by the number of residues in the sequence [34]. The GRAVY demonstrates the amphipathic nature of the proteins, the negative and positive values suggest the hydrophilic and hydrophobic nature of the designed vaccine, respectively [35]. SOLpro server (http://scratch.proteomics.ics.uci.edu/) predicted the protein tends to be soluble when overexpression in E. coli [36]. ANTIGENpro server (http://scratch.proteomics.ics.uci.edu/) was used to estimate the antigenicity of the vaccine, the accuracy of the server was determined to be 76% based on cross-validation experiments using the combined dataset [37]. Allergenicity and toxicity of the vaccine were determined by AllerTOP v. 2.0 and ToxinPred servers, respectively.

Secondary structure prediction

Prabi server (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_gor4.html) used GOR4 method to predict the secondary structure of the engineered vaccine. GOR IV with a mean accuracy of 64.4% uses all possible pair frequencies within a 17 amino acid residues window [38].

Vaccine 3D modeling, refinement, and validation

3Dpro server (http://scratch.proteomics.ics.uci.edu/) predicted the tertiary structure of the multi-epitope vaccine [36]. 3Drefine server (http://sysbio.rnet.missouri.edu/3Drefine/) improved the quality of the 3D model from the previous stage. 3Drefine refining protocol includes a two-step process as follows: (i) Optimization of the hydrogen bonding network and (ii) minimization of atomic energy by integrating physics with knowledge-based force fields [39]. Model validation is an important step to detect possible errors in crude structures and to compare the quality of models before and after the refinement process [40], for this purpose, RAMPAGE (http://mordred.bioc.cam.ac.uk/~rapper/rampage.php) and ProSA (https://prosa.services.came.sbg.ac.at/prosa.php) servers were used in this study. Ramachandran plot was generated using the RAMPAGE server. The most important application of the Ramachandran plot is to predict the quality of different protein structures regarding experimental methods (X-ray crystallography, NMR, and Cryo-EM). Acceptable and nonacceptable quality protein structures consisted of a collection of torsional angles in the allowed and disallowed regions, respectively [41]. The ProSA server evaluated the overall quality of the protein models in the form of a z-score. When the predicted model's z-scores are outside the normal range of native proteins, it indicates an incorrect structure [42].

Molecular docking

ClusPro 2.0 server (https://cluspro.org) performed molecular docking between vaccine candidate and two of the most common binding alleles in the population of Iran including HLA-A11:01 with PDB entry of 1X7Q, and HLA-DRB1_01:01 with PDB entry of 1AQD. ClusPro can discriminate the hypothetical user-created structures and run any server-compatible docking algorithms. Additional functions allow the users to modify the structure, determine the attraction and repulsion, or identify pairwise distance restrictions [[43], [44], [45]].

Molecular dynamics (MD) simulation

MD simulation was performed using Gromacs 5.1.5 (GROningen MAchine for Chemical Simulations) software to investigate the behavior of vaccine, HLA-A11:01, and HLA-DRB1_01:01 molecules in docking structure [46]. The input structures were prepared using the amber ff99SB force field. The correct hydrogen status of histidines was established for all the histidines presented in the structure, and the disulfide bonds were defined for the protein. Then, the surface charge of the structure was neutralized by adding several sodium and chlorine ions. The protein was inserted into a layer of TIP3P water molecules at 10 Å using gmx solvate software. Energy minimization was performed on structures with the1000-step by steepest descent method to eliminate Van der Waals interactions and to form the hydrogen bonds between water and complex molecules. In the next step, the system temperature gradually increased from 0 up to 300 K for 200 ps in constant volume and was then equilibrated in constant pressure at 1000 ps. MD simulation was performed at 27 °C (300 K) for 40 ns. Non-bonding interactions with a distance of 10 Å were calculated using the PME method. To increase the computational speed, the SHAKE algorithm was used to limit the bonds involved in the hydrogen atom.

In silico cloning

Backtranslation of the amino acid sequence of the multi-epitope vaccine was conducted using the Gene infinity server (http://www.geneinfinity.org/sms/sms_backtranslation.html). The Codon Adaptation Index (CAI) and GC content of optimized DNA were assessed by the GenScript server (https://www.genscript.com/tools/rare-codon-analysis). Finally, the multi-epitope vaccine sequence was inserted into the pET-30a (+) vector by the SnapGene tool.

Result

Using the ProPred-1 server, which was set to default parameters, were predicted 15 CTL epitopes (9 mer) for human MHC-I alleles including HLA-A*0201, HLA-A*1101, HLA-B*3501, HLA-B*5101 and HLA-B*5301 (Table 1 ). The ProPred server predicted 657 HTL epitopes (9 mer) for the alleles of MHC-II, namely, DRB1*01:01, DRB1*03:01, DRB1*07:01, DRB1*11:01, DRB1*13:01 and DRB1*15:01. Next, 52 epitopes capable of binding to at least four types of MHC alleles were assessed using the IFNepitope server for their ability to induce IFN-γ, the result of this step suggested that 23 HTL epitopes were capable of inducing IFN-γ (Table 2 ). Using the default parameters of the ABCPred server were predicted 15 linear B-cell epitopes (Table 3 ). The predicted CTL, HTL, and B-cell epitopes were evaluated for overlapping, antigenicity, toxicity, allergenicity, and cross-reactivity with human proteomes. It is noteworthy that among the overlapped epitopes were selected epitopes that had a higher antigenicity and improved quality of the second and three-dimensional structure of the vaccine by being in the chimeric structure. Finally, 4, 8, and 6 epitopes were selected from CTL, HTL, and B-cell epitopes, respectively.

Table 1

Results of CTL epitope prediction. Four CTL epitopes were selected (labeled with *) for incorporation in the proposed vaccine amino acid sequence.

Pos	CTL epitope	VaxiJen score	Allergenicity	Toxicity	Cross-reactivity with human proteomes	Final decision
8	LPLVSSQCV	0.8267	Yes	No	Zero	–
56	LPFFSNVTW	1.0808	Yes	No	Zero	–
89	GVYFASTEK	0.7112	No	No	Zero	*
142	GVYYHKNNK	0.8264	Yes	No	Zero	–
269	YLQPRTFLL	0.4532	Yes	No	Zero	–
271	QPRTFLLKY	−0.5552	No	No	Zero	–
321	QPTESIVRF	−0.5551	Yes	No	Zero	–
550	GVLTESNKK	0.8797	No	No	Zero	*
712	IAIPTNFTI	0.7052	No	No	Zero	*
861	LPPLLTDEM	0.4723	Yes	No	Zero	–
1000	RLQSLQTYV	−0.2167	No	No	Zero	–
1052	FPQSAPHGV	−0.0058	No	No	Zero	–
1060	VVFLHVTYV	1.5122	No	No	Zero	*
1161	SPDVDLGDI	1.573	Yes	No	Zero	–
1212	WPWYIWLGF	1.4953	Yes	No	Zero	–

Table 2

Results of HTL epitope prediction. Eight HTL epitopes were selected (labeled with *) for incorporation in the proposed vaccine amino acid sequence.

Pos	IFN-γ -inducing HTL epitope	VaxiJen score	Allergenicity	Toxicity	Cross-reactivity with human proteomes	Final decision
1	FVFLVLLPL	0.8601	No	No	Zero	*
2	VFLVLLPLV	0.4651	No	No	Zero	–
3	FLVLLPLVS	0.4266	No	No	Zero	–
4	LVLLPLVSS	0.6523	Yes	No	Zero	–
41	VFRSSVLHS	−0.3392	Yes	No	Zero	–
117	LIVNNATNV	0.1942	No	No	Zero	–
266	VGYLQPRTF	1.2441	No	No	Zero	*
317	FRVQPTESI	0.9396	Yes	No	Zero	–
366	VLYNSASFS	0.4029	Yes	No	Zero	–
406	VRQIAPGQT	0.8675	Yes	No	Zero	–
448	YNYLYRLFR	−0.8692	Yes	No	Zero	–
488	YFPLQSYGF	0.5107	Yes	No	Zero	–
504	YQPYRVVVL	0.5964	No	No	Zero	*
510	VVLSFELLH	1.409	No	No	Zero	*
575	VRDPQTLEI	0.7025	No	No	Zero	*
950	VVNQNAQAL	0.4749	No	No	Zero	*
1000	LQSLQTYVT	−0.0592	No	No	Zero	–
1017	IRASANLAA	0.4455	Yes	No	Zero	–
1128	VIGIVNNTV	0.7774	Yes	No	Zero	–
1213	WYIWLGFIA	1.0356	No	No	Zero	*
1216	WLGFIAGLI	0.1544	Yes	No	Zero	–
1217	LGFIAGLIA	0.5797	No	No	Zero	*
1220	IAGLIAIVM	0.4716	No	No	Zero	–

Table 3

Results of linear B-cell epitope prediction. Six linear B-cell epitopes were selected (labeled with *) for incorporation in the proposed vaccine amino acid sequence.

Pos	linear B-cell epitope	VaxiJen score	Allergenicity	Toxicity	Cross-reactivity with human proteomes	Final decision
236	TRFQTLLALHRSYLTP	0.5115	No	No	Zero	*
245	HRSYLTPGDSSSGWTA	0.6017	No	No	Zero	*
257	GWTAGAAAYYVGYLQP	0.6210	No	No	Zero	*
307	TVEKGIYQTSNFRVQP	0.6733	Yes	No	Zero	–
470	TEIYQAGSTPCNGVEG	−0.0105	Yes	No	Zero	–
476	GSTPCNGVEGFNCYFP	0.2489	Yes	No	Zero	–
492	LQSYGFQPTNGVGYQP	0.5258	Yes	No	Zero	–
594	GVSVITPGTNTSNQVA	0.4651	No	No	Zero	*
648	GCLIGAEHVNNSYECD	0.848	Yes	Yes	Zero	–
879	AGTITSGWTFGAGAAL	0.2868	No	No	Zero	–
1058	HGVVFLHVTYVPAQEK	0.8847	No	No	Zero	*
1112	PQIITTDNTFVSGNCD	0.2404	Yes	No	Zero	–
1180	QKEIDRLNEVAKNLNE	0.0684	Yes	No	Zero	–
1206	YEQYIKWPWYIWLGFI	0.951	No	No	Zero	*
1247	CCSCGSCCKFDEDDSE	0.1273	Yes	Yes	Zero	–

Results of CTL epitope prediction. Four CTL epitopes were selected (labeled with *) for incorporation in the proposed vaccine amino acid sequence. Results of HTL epitope prediction. Eight HTL epitopes were selected (labeled with *) for incorporation in the proposed vaccine amino acid sequence. Results of linear B-cell epitope prediction. Six linear B-cell epitopes were selected (labeled with *) for incorporation in the proposed vaccine amino acid sequence. The final vaccine structure consists of four domains: LT-IIc, Linear B-cell, CTL, and HTL epitopes, which were placed in the vaccine construct by different linkers (Fig. 1 ). The Protparam results revealed that the multi-epitope vaccine comprised of 380 amino acids. The vaccine's molecular weight was 41.65 kDa and its theoretical pI value was 9.96. The half-life of the vaccine was 30 h in mammalian reticulocytes, more than 20 h in yeast, and more than 10 h in E. coli. The instability index, aliphatic index, and GRAVY were predicted 22.73, 91.08, and −0.035, respectively. The SOLpro server result showed the vaccine to be soluble. The antigenicity of the final sequence (including the adjuvant) and the main sequence (without adjuvant) of vaccine were predicted using the ANTIGENpro server to be 0.584792 and 0.512677, respectively. The vaccine sequence was predicted to be non-allergenic and non-toxic using AllerTOP v. 2.0 and ToxinPred servers, respectively.

Fig. 1

Graphical visualization of the multi-epitope vaccine. The 380 amino acid sequence of vaccine contains an adjuvant (green) at the N-terminal of multi-epitope sequence, B-cell epitopes (violet), CTL epitopes (cyan), and HTL epitopes (blue). The used linkers for fuse mentioned domains together were EAAAK (red), GPGPG (blue), and KK (yellow). The Prabi server predicted that 24.47, 25.26 and 50.26% of the total 380 amino acids of vaccine construct were alpha helix, extended strand, and random coil, respectively. Vaccine primary 3D model was predicted using the 3Dpro server. The 3Drefine server performed the process of refining the protein model. Five parameters, including 3Drefine, GDT-TS, GDT-HA, RMSD, RWplus, and MolProbity, ranked the optimized models. The higher GDT-TS, RMSD and GDT-HA scores, and the low scores of 3Drefine, MolProbity, and RWplus showed a higher quality model. Based on these factors, refined model No. 5 was selected, which is indicated in the first row of Table 4 , the raw 3D structure and refined model of the vaccine are depicted in Fig. 2A and B. The validation process was performed using the Ramachandran plot and the ProSA z-score. Ramachandran plot result revealed that in the primary model, 93.4%, 3.2%, and 3.4% of residues were located in favored, allowed, and outlier regions, respectively (Fig. 3A), while in the refined model, 93.1%, 5%, and 1.9% of residues were found in favored, allowed, and outlier regions, respectively (Fig. 3B). Z-score of the initial model was calculated −0.82 (Fig. 4A), while the Z-Score of the refined model was estimated −1.08 (Fig. 4B).

Table 4

The model refinement results. The refined model No. 5 was selected based on the 3Drefine score, GDT-TS, GDT-HA, RMSD, MolProbity, and RWPlus parameters.

Model	3D^refine score	GDT-TS	GDT-HA	RMSD (Å)	MolProbity	RWPlus
5	26,777.7	0.9980	0.9625	0.376	3.238	−49,406.905116
4	27,111.1	0.9993	0.9743	0.345	3.205	−49,297.629523
3	27,585.9	0.9993	0.9822	0.311	3.254	−49,016.949003
2	28,548.4	1.0000	0.9908	0.265	3.271	−48,826.479019
1	34,484.1	1.0000	0.9980	0.199	3.297	−48,423.046186

Fig. 2

Protein modeling and refinement. (A) The Three-dimensional structure of the proposed vaccine obtained from the 3Dpro server. (B) The final 3D model of the vaccine refined by the 3Drefine server.

Fig. 3

3D protein model validation using the Ramachandran plot before and after refining. (A) In the initial model, 93.4%, 3.2%, and 3.4% of residues were positioned in favored, allowed and outlier regions, respectively. (B) In the refined model 93.1%, 5%, and 1.9% of residues were observed in favored, allowed and outlier regions, respectively.

Fig. 4

ProSA-web z-score plot for primary and refined models. (A) The z-score in the primary model was −0.82, (B) while this value was changed to −1.08 in the refined model.

The model refinement results. The refined model No. 5 was selected based on the 3Drefine score, GDT-TS, GDT-HA, RMSD, MolProbity, and RWPlus parameters. Protein modeling and refinement. (A) The Three-dimensional structure of the proposed vaccine obtained from the 3Dpro server. (B) The final 3D model of the vaccine refined by the 3Drefine server. 3D protein model validation using the Ramachandran plot before and after refining. (A) In the initial model, 93.4%, 3.2%, and 3.4% of residues were positioned in favored, allowed and outlier regions, respectively. (B) In the refined model 93.1%, 5%, and 1.9% of residues were observed in favored, allowed and outlier regions, respectively. ProSA-web z-score plot for primary and refined models. (A) The z-score in the primary model was −0.82, (B) while this value was changed to −1.08 in the refined model. The interaction of the refined model of vaccine with HLA-I and HLA-II molecules was investigated using the ClusPro 2.0 server. Proteins with PDB entry of 1X7Q and 1AQD were identified separately as receptor and vaccine as a ligand to do molecular docking. The server provided 30 clusters from each complex and ranked the clusters based on the amount of energy. Cluster No. 0.00 in the results of the interaction between the HLA-A11:01 and HLA-DRB1_01:01 molecules with the multi-epitope vaccine showed the lowest energy weighted score of −1165.4 and −1279.1 Kcal/mol, respectively (Figs. 5A and 6A ). The PDBsum server was used to obtain accurate information on the interaction between residues in docking complexes. A total of 41 vaccine residues coupled with 39 residues of chain A from HLA-A11:01 molecule. A number of 18 hydrogen bonds were formed between the residues of the chain A from the HLA-A11:01 molecule and the residues of the vaccine, which included Pro105-His113, Glu53-Lys250, Glu53-His211, Asn174-Gln183, Glu166-Tyr109, Arg169-Tyr109, Lys146-Tyr361, Asp39-Lys259, Glu58-Lys27, Lys176-Gly376, Lys68-Asn112, Lys68-Ala379, Lys68-Leu377, Arg163-Ser18, Asp106-Asn98, Arg75-Ile367, Arg75-Ala368 and Thr80-Trp360 (Fig. 5B). In the vaccine-HLA-DRB1_01:01 complex 15 and 23 residues of the vaccine associated with 14 residues of the chain A and 24 residues of the chain B from the HLA-DRB1_01:01 molecule, respectively. Hydrogen bond of Thr162-Ser53 was established between the vaccine and chain A from HLA-DRB1 01:01 molecule and 14 hydrogen bonds, including Glu176-Lys4, Glu128-Lys4, Gln92-Ala34, Gly84-Arg149, His81-Arg149, Glu87-Arg149, Glu87-Lys39, Thr21-Lys39, Glu22-Gln42, Asn19-Val41, Asn82-Gly145, Asn82-Thr141, Thr77-Thr141and Gln70-Arg137 were formed between the chain B from HLA-DRB1 01:01 molecule and vaccine (Fig. 6B).

Fig. 5

Fig. 6

Molecular docking of the vaccine with the HLA-DRB1_01:01 molecule (chain A and B). (A) The cartoon depictions of the vaccine- HLA-DRB1_01:01 complex illustrated using the PyMOL software. The vaccine, chain A and B of HLA-A11:01 molecule has been shown in magenta, green and cyan, respectively. (B) Bonds between residues vaccine with HLA-DRB1_01:01 molecule (chain A and B). A number of 15 and 23 residues of the vaccine paired with 14 residues from the chain A and 24 residues from the chain B of the HLA-DRB1_01:01 molecule, respectively. A total of 15 hydrogen bonds (blue line) were formed between the residues of the chain A and B from the HLA-DRB1_01:01molecule and the residues of the vaccine.

Molecular docking of the vaccine with the HLA-A11:01 molecule (chain A). (A) The cartoon depictions of the vaccine-HLA-A11:01 complex illustrated using the PyMOL software. The vaccine, chain A and B of HLA-A11:01 molecule has been shown in magenta, green and cyan, respectively. (B) Bonds between residues vaccine with HLA-A11:01 molecule (chain A). A total of 41 residues of the vaccine associated with 39 residues of HLA-A11:01 molecular chain A. A number of 18 hydrogen bonds (blue line) were formed between the residues of the chain A from the HLA-A11:01 molecule and the residues of the vaccine. Molecular docking of the vaccine with the HLA-DRB1_01:01 molecule (chain A and B). (A) The cartoon depictions of the vaccine- HLA-DRB1_01:01 complex illustrated using the PyMOL software. The vaccine, chain A and B of HLA-A11:01 molecule has been shown in magenta, green and cyan, respectively. (B) Bonds between residues vaccine with HLA-DRB1_01:01 molecule (chain A and B). A number of 15 and 23 residues of the vaccine paired with 14 residues from the chain A and 24 residues from the chain B of the HLA-DRB1_01:01 molecule, respectively. A total of 15 hydrogen bonds (blue line) were formed between the residues of the chain A and B from the HLA-DRB1_01:01molecule and the residues of the vaccine.

Molecular dynamics simulation

The best molecular docking complexes were used as inputs to MD simulation. At this stage, MD simulation of both complexes was performed for 40 ns and the results were evaluated as root mean square deviation (RMSD), Root Mean Square Fluctuation (RMSF), and radius of gyration (Rg). RMSD between the created structures during MD simulation is an appropriate parameter to ensure the stability of the vaccine-HLA complexes. RMSD of the HLA-A11:01 molecule in the vaccine - HLA-A11:01 complex reached 0.2 nm after approximately 1000 ps and showed a slight variation around this amount until the end of the MD simulation. The RMSD of the vaccine showed a significant increase from the beginning of the MD simulation until 3500 ps, after that, there was a downward trend up to 5000, from this point to until 36,000 ps the trend was approximately increasing (Fig. 7A). The profile of RMSD changes of HLA-DRB1_01:01 molecule in the vaccine - HLA-DRB1_01:01 complex was similar to HLA-A11:01, while the rate of RMSD changes in the vaccine was relatively different. At the beginning of the MD simulation, the value of RMSD showed a sharp increase until 20,000 ps which reached 2 nm, of this point until the end of the MD simulation was observed no significant changes. (Fig. 7B).

Fig. 7

RMSD plots. (A) RMSD analysis of vaccine - HLA-A11:01complex indicated that during the MD simulation period, RMSD was approximately constant in the HLA-A11:01 molecule and increased in the vaccine. (B) RMSD analysis of vaccine - HLA-DRB1_01:01 complexes showed that RMSD value of HLA-DRB1_01:01 molecule did not change significantly until the end of the simulation time and RMSD of vaccine increased from the beginning of the simulation to 20,000 ps, then stabilized until the end of the simulation period. RMSF determined the flexibility of any residue in the vaccine- HLA complexes. RMSF result showed that the RMSF of most amino acids in the chain A of the HLA-A11:01 molecule was high and the maximum and minimum RMSF values were 0.17 and 0.05, respectively (Fig. 8A). Two regions on the chain B of the molecule showed relatively high fluctuations, which included regions 12–21 and 70–78. In the range of 12–21, the RMSF reached 0.23 nm (Fig. 8B). The study of vaccine structure in this complex showed high RMSF in most of its amino acids. The two regions of the vaccine showed very high levels of RMSF of 0.57 and 0.5 nm, which included amino acids in regions 65–80 and 300–320, respectively (Fig. 8C). Most parts in chain A of HLA-DRB1_01:01 molecule in the vaccine - HLA-DRB1_01:01 complex including amino acids in areas 32–38, 48–51, 92–100, 125–130, 154–157 and 167–171 showed high fluctuations (Fig. 9A). In chain B, the lowest and highest value of RMSF was 0.05 and 0.15 nm, respectively (Fig. 9B). The vaccine showed very high fluctuations in most areas and had a maximum RMSF of 0.55 nm (Fig. 9C).

Fig. 8

RMSF diagram for amino acid (A) chain A and (B) chain B of HLA-A11:01 molecule, and (C) vaccine. There are areas with high flexibility in the peaks.

Fig. 9

RMSF diagram for amino acid (A) chain A and (B) chain B of HLA-DRB1_01:01 molecule, and (C) vaccine. There are areas with high flexibility in the peaks.

RMSF diagram for amino acid (A) chain A and (B) chain B of HLA-A11:01 molecule, and (C) vaccine. There are areas with high flexibility in the peaks. RMSF diagram for amino acid (A) chain A and (B) chain B of HLA-DRB1_01:01 molecule, and (C) vaccine. There are areas with high flexibility in the peaks. Rg for the HLA-A11:01 molecule in the vaccine - HLA-A11:01 complex was 2.35 nm at the beginning of the simulation and showed no change from the beginning of the simulation to the end (Fig. 10A). While Rg of the HLA-DRB1_01:01 molecule in the vaccine - HLA-DRB1_01:01 complex was 2.4 nm at the beginning of the MD simulation and had a slight increase up to 1000 ps, then remained constant until the end of the MD simulation (Fig. 10B). In the presence of the HLA-A11:01 molecule, the Rg value of the vaccine was 3.7 at the beginning of the simulation, and during the simulation period, a slight increase and decrease were observed in this parameter, and at the end of the simulation time, the Rg value reached 3.5 (Fig. 10A). The Rg level of the vaccine in the presence of the HLA-DRB1_01:01 molecule showed a relatively significant decrease from the beginning of the simulation to 17,500 ps then remained constant from that point until the end of the simulation period (Fig. 10B).

Fig. 10

Rg plot in during MD simulation. (A) The Rg value of the HLA-A11:01 molecule in the vaccine - HLA-A11:01 complex did not show much difference from the beginning to the end of the simulation. The Rg vaccine from the beginning of the simulation to its end, despite a series of small peaks during the simulation period, eventually reached the initial value. (B) The Rg value of the HLA-DRB1_01:01 molecule in the vaccine - HLA-A11:01 showed no significant change from the beginning to the end of the simulation and the Rg of vaccine revealed a relatively significant decrease from the beginning of the simulation to 17,500 ps then remained constant until the end of the simulation period. The Gene infinity server conducted back translation of the vaccine construct. GenScript Tool was used to evaluating the key properties of the gene sequence including the Codon Adaptation Index (CAI) and GC content. The optimized nucleotide sequence CAI value was 1 and the GC content of the optimized sequence was 55.47% in E. coli. Recognition sites for BamHI and XhoI restriction enzymes were added to 5′ and 3′ end of the optimized gene, respectively. The adapted codon sequence was inserted into the pET30a (+) vector using SnapGene software (Fig. 11 ).

Fig. 11

In silico restriction cloning of the multi-epitope vaccine sequence into the pET30a (+) expression vector. The red region represents the vaccine coding gene and the black circle represents the vector backbone.

Discussion

The COVID-19 outbreak demonstrates a pandemic threat to global public health [47]. The SARS-CoV-2 can be transmitted from person to person, even when the infected individual has no clinical symptoms, and sometimes cannot be detected in the infected individual for several days or even weeks. There are cases where a person has clinical symptoms of COVID-19, but the diagnosis test is negative, so the best way to prevent coronavirus from occurring is to get a vaccine [48], but unfortunately, because of the novelty of the virus, there is still no licensed vaccine for the disease. However, researchers and pharmacists around the world are working at an unprecedented pace to develop an effective virus vaccine, and Clinical trials of several candidate vaccines have begun, including mRNA-1273, INO-4800, and ChAdOx1 nCoV-19 [49]. Reverse vaccinology technology has been widely used to determine the epitopes for the development of multi-epitope vaccines in various organisms. In recent months, several reverse vaccinology studies have also been conducted to design the COVID-19 vaccine. In a study conducted by Enayatkhani et al. the B-cell and T epitopes of N, ORF3a, and M proteins were predicted, a multi-epitope vaccine candidate was constructed, then were performed bioinformatics evaluations [50]. Peele et al. designed a vaccine candidate using B-cell and T epitopes of S protein by in silico methods [51]. Kalita et al. selected N, M, and S proteins as the target antigen for the prediction of T and B-cell epitopes and designed a multi-epitope vaccine against SARS-CoV-2 [48]. Guglielmo Lucchese predicted S protein antigenic peptides from SARS-CoV-2 using immunoinformatics methods [52]. Ahmed et al. predicted a collection of B cell and T cell epitopes derived from the S and N proteins of SARS-CoV-2 [53]. In each of the studies mentioned above, different proteins have been used to predict epitopes, also, the type of bioinformatics softwares, Linkers, and many other factors are different in each study, therefore the laboratory evaluations of each of these designed vaccines show different results. This work focussed on the in silico design of a potential multi-epitope vaccine for COVID-19 using S protein. It is important to identify B-cell and T epitopes of the target antigens in order to develop a multi-epitope vaccine. In the current study, we predicted a large number of CTL, IFN-γ-inducing HTL, and B-cell epitopes for S protein then selected 18 high-ranked epitopes based on antigenicity, toxicity, allergenicity, and cross-reactivity with human proteomes. Linkers play a dual role in the multi-epitope vaccine structure: (i) the prevention of the generation of junctional epitopes (neoepitopes), which is the major concern in designing multi-epitope vaccines, and (ii) promotion of the immune processing and presentation of HLA-II binding epitopes [40,54]. To design the vaccine construct, the selected T and B-cell epitopes were fused using the KK and GPGPG linkers, respectively. LT-IIc was attached to the N-terminal of the multi-epitope sequence by an EAAAK linker. The designed vaccine was evaluated for its physicochemical features, antigenicity, allergenicity, and toxicity. The vaccine's molecular weight was 41.65 kDa, which makes it an acceptable vaccine since proteins with a molecular weight of less than 110 kDa are considered to be more appropriate targets for the development of vaccines, as proteins with a lower molecular weight can be easily purified [29], The vaccine construct was basic in nature and the total numbers of negatively and positively charged residues were 16 and 48, respectively. The vaccine structure has a half-life of 30, 20, and 10 h in mammalian, yeast, and E. coli, respectively, therefore the vaccine is exposed to the immune system for a relatively long time and causes more immune responses [28]. The recombinant vaccine's instability index was calculated at 22.73, and as it was less than 40, it was considered to be a stable protein [28]. The calculated aliphatic index and the GRAVY were 91.08 and −0.035, respectively. The higher aliphatic index value suggests greater thermostability and the negative GRAVY value shows the vaccine's hydrophilic nature therefore it can show strong interaction with water molecules [55]. The sequences of vaccine (with and without adjuvant) were both antigenic, and adding the adjuvant to the N-terminal of the vaccine sequence increased the vaccine's antigenicity. Results also suggested that the vaccine was soluble, non-allergenic, and non-toxic. To obtain a high-quality 3D structure, the initial 3D structure was subjected to the refinement process. The quality of the initial and refined models was assigned using Ramachandran plot and ProSA z-score. Ramachandran plot data showed that 3.4% of residues of the initial model was located in the outlier region while this value was reduced to 1.9% after the refinement process. In addition, the z-score of primary model was calculated −0.82, while this parameter was determined −1.08 after refining. These results suggest that the three-dimensional structure of the vaccine has improved after refinement. ClusPro server conducted the docking process between the multi-epitope vaccine with HLA-A11:01 and HLA-DRB1_01:01 molecules, the value of lowest energy in the selected complexes of vaccine-HLA-A11:01 and vaccine-HLA-DRB1_01:01 were −1165.4 and −1279.1, respectively, which indicated that the interaction of the vaccine with the HLA-DRB1_01:01 to be stronger than its interaction with the HLA-A11:01. The results of the RMSD evaluation showed that the vaccine - HLA-DRB1_01:01 complex reached stability at 20 ns times, while the vaccine - HLA-A11:01 complex required more time to stabilize. The RMSF result suggested that the flexibility in the regions between vaccine-HLA in the both complex and the results of Rg showed that the vaccine structure in both complexes was densified during simulation, but gained more density in the vaccine-HLA-DRB1_01:01 complex. In practice, some essential factors including CAI and GC content of the gene need to be optimized to achieve an optimal level protein expression in E. coli. According to the results, the value of the CAI was 1, the CAI value was greater than 0.8, which suggests a higher probability of protein expression at a high level [56]. GC content was 55.47%, and the ideal percentage of GC content ranges from 30 to 70%. The results of bioinformatics evaluation of the vaccine showed that this vaccine candidate may have a high potency against SARS-CoV-2 but the in vitro and in vivo studies are needed to confirm these results.

Conclusion

The purpose of this research was to develop a COVID-19 vaccine using bioinformatics approaches. To achieve this aim were predicted T and B-cell epitopes of S protein and then screened for antigenicity, toxicity, allergenicity, and cross-reactivity with human proteomes. The eighteen high-ranked epitopes and LT-IIc were incorporated into the final construct. After evaluating the physicochemical properties of the vaccine construct, its secondary and three-dimensional structure was predicted. Structural quality, binding affinity to MHC-I and II, and stability of the docked complexes were assessed. Finally, the expression level of the vaccine in E. coli strain K12 was evaluated. Significant results were reported in this research, experimental confirmation of the engineered vaccine required to further assess the efficacy of this vaccine candidate.

Funding

None.

Declaration of competing interest

The authors have declared no competing interest.

17 in total

1. Design, construction and in vivo functional assessment of a hinge truncated sFLT01.

Authors: Fahimeh Zakeri; Hamid Latifi-Navid; Zahra-Soheila Soheili; Mehdi Sadeghi; Seyed Shahriar Arab; Shahram Samiei; Ehsan Ranaei Pirmardan; Sepideh Taghizadeh; Hamid Ahmadieh; Ali Hafezi-Moghadam
Journal: Gene Ther Date: 2022-09-16 Impact factor: 4.184

Review 2. Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2.

Authors: Kaifu Gao; Rui Wang; Jiahui Chen; Limei Cheng; Jaclyn Frishcosy; Yuta Huzumi; Yuchi Qiu; Tom Schluckbier; Xiaoqi Wei; Guo-Wei Wei
Journal: Chem Rev Date: 2022-05-20 Impact factor: 72.087

3. Partial sequence conservation of SARS-CoV-2 NSP-2, NSP-12, and Spike in stool samples from Abadan, Iran.

Authors: Milad Zandi; Saber Soltani; Alireza Tabibzadeh; Sepideh Nasimzadeh; Emad Behboudi; Armin Zakeri; Yousef Erfani; Shokrollah Salmanzadeh; Samaneh Abbasi
Journal: Biotechnol Appl Biochem Date: 2022-04-09 Impact factor: 2.724

Review 4. Future developments in biosensors for field-ready SARS-CoV-2 virus diagnostics.

Authors: Mona Fani; Milad Zandi; Saber Soltani; Samaneh Abbasi
Journal: Biotechnol Appl Biochem Date: 2020-10-19 Impact factor: 2.724

Review 5. In silico T cell epitope identification for SARS-CoV-2: Progress and perspectives.

Authors: Muhammad Saqib Sohail; Syed Faraz Ahmed; Ahmed Abdul Quadeer; Matthew R McKay
Journal: Adv Drug Deliv Rev Date: 2021-01-17 Impact factor: 17.873

6. Designing a next generation multi-epitope based peptide vaccine candidate against SARS-CoV-2 using computational approaches.

Authors: Ratnadeep Saha; Pratik Ghosh; V L S Prasad Burra
Journal: 3 Biotech Date: 2021-01-11 Impact factor: 2.406

7. Designing a conserved peptide-based subunit vaccine against SARS-CoV-2 using immunoinformatics approach.

Authors: Elijah Kolawole Oladipo; Ayodeji Folorunsho Ajayi; Olugbenga Samson Onile; Olumuyiwa Elijah Ariyo; Esther Moradeyo Jimah; Louis Odinakaose Ezediuno; Oluwadunsin Iyanuoluwa Adebayo; Emmanuel Tayo Adebayo; Aduragbemi Noah Odeyemi; Marvellous Oluwaseun Oyeleke; Moyosoluwa Precious Oyewole; Ayomide Samuel Oguntomi; Olawumi Elizabeth Akindiya; Victoria Oyetayo Aremu; Dorcas Olubunmi Aboderin; Julius Kola Oloke
Journal: In Silico Pharmacol Date: 2021-01-06

8. Computational construction of a glycoprotein multi-epitope subunit vaccine candidate for old and new South-African SARS-CoV-2 virus strains.

Authors: Olugbenga Oluseun Oluwagbemi; Elijah Kolawole Oladipo; Emmanuel Oluwatobi Dairo; Ayodele Eugene Ayeni; Boluwatife Ayobami Irewolede; Esther Moradeyo Jimah; Moyosoluwa Precious Oyewole; Boluwatife Mary Olawale; Hadijat Motunrayo Adegoke; Adewale Joseph Ogunleye
Journal: Inform Med Unlocked Date: 2022-01-15

9. Multi-Epitope Vaccine Design Using an Immunoinformatic Approach for SARS-CoV-2.

Authors: Ye Feng; Haiping Jiang; Min Qiu; Liang Liu; Shengmei Zou; Yun Li; Qianpeng Guo; Ning Han; Yingqiang Sun; Kui Wang; Lantian Lu; Xinlei Zhuang; Shanshan Zhang; Shuqing Chen; Fan Mo
Journal: Pathogens Date: 2021-06-11

10. Lessons Learned from Cutting-Edge Immunoinformatics on Next-Generation COVID-19 Vaccine Research.

Authors: Chiranjib Chakraborty; Ashish Ranjan Sharma; Manojit Bhattacharya; Sang-Soo Lee
Journal: Int J Pept Res Ther Date: 2021-07-10 Impact factor: 1.931