Literature DB >> 32295479

Reverse vaccinology approach to design a novel multi-epitope vaccine candidate against COVID-19: an in silico study.

Maryam Enayatkhani¹, Mehdi Hasaniazad¹, Sobhan Faezi², Hamed Gouklani¹, Parivash Davoodian¹, Nahid Ahmadi³, Mohammad Ali Einakian⁴, Afsaneh Karmostaji¹, Khadijeh Ahmadi¹.

Abstract

At present, novel Coronavirus (2019-nCoV, the causative agent of COVID-19) has caused worldwide social and economic disruption. The disturbing statistics of this infection promoted us to develop an effective vaccine candidate against the COVID-19. In this study, bioinformatics approaches were employed to design and introduce a novel multi-epitope vaccine against 2019-nCoV that can potentially trigger both CD4+ and CD8+ T-cell immune responses and investigated its biological activities by computational tools. Three known antigenic proteins (Nucleocapsid, ORF3a, and Membrane protein, hereafter called NOM) from the virus were selected and analyzed for prediction of the potential immunogenic B and T-cell epitopes and then validated using bioinformatics tools. Based on in silico analysis, we have constructed a multi-epitope vaccine candidate (NOM) with five rich-epitopes domain including highly scored T and B-cell epitopes. After predicting and evaluating of the third structure of the protein candidate, the best 3 D predicted model was applied for docking studies with Toll-like receptor 4 (TLR4) and HLA-A*11:01. In the next step, molecular dynamics (MD) simulation was used to evaluate the stability of the designed fusion protein with TLR4 and HLA-A*11:01 receptors. MD studies demonstrated that the NOM-TLR4 and NOM-HLA-A*11:01 docked models were stable during simulation time. In silico evaluation showed that the designed chimeric protein could simultaneously elicit humoral and cell-mediated immune responses. Communicated by Ramaswamy H. Sarma.

Entities: CellLine Chemical Disease Gene Species

Keywords: COVID-19; Coronavirus; Epitope; Immunoinformatics; Vaccine

Year: 2020 PMID： 32295479 PMCID： PMC7196925 DOI： 10.1080/07391102.2020.1756411

Source DB: PubMed Journal: J Biomol Struct Dyn ISSN： 0739-1102

Introduction

Coronavirus disease COVID-19 outbreak began in late December 2019 in Wuhan, the capital of Hubei Province, China (Wang et al., 2020). Scientists from all over the world are attempting to investigate this novel virus, known as 2019-nCoV, which is highly contagious, and to discover effective interventions to control and prevent the disease (Heymann, 2020; Huang et al., 2020). Coronaviruses are positive-sense single-stranded RNA viruses (ssRNA+) belonging to the Coronaviridae family. Human Coronaviruses HCoV-229E, HCoV-NL63, HCoV-OC43, and HCoV-HKU1 are observed in almost one-third of the common cold (Lim et al., 2016). However, recently some cases of human coronavirus infections have led to fatal endemics, including SARS (Severe Acute Respiratory Syndrome), MERS (Middle East Respiratory Syndrome) and COVID-19 that are common diseases between humans and animals whose belong to the genus Betacoronavirus of the Coronaviridae family (Al-Tawfiq et al., 2014). So far, the novel COVID-19 has caused more than 700,000 illnesses and more than 33,000 deaths worldwide (W.H.O., 2020). The genome size of this virus is about 30 kb and encodes structural and non-structural proteins like other coronaviruses. Structural proteins include S protein (Spike), E protein (Envelope), M protein (Membrane), and N protein (Nucleocapsid) (Ahmed et al., 2020). The increasing rate of COVID-19 disease and the high morbidity necessitate the development of a specific and safe vaccine candidate as soon as possible. There is very little known actually about the pathogenesis of the virus; therefore, an immunoinformatics-based approach to investigate the immunogenic epitopes and vaccine design using data from proteins sequencing of the COVID-19 is required. N protein is the only structural protein that associates with replicase-transcriptase complexes and binds with genomic RNA in coronaviruses (Cong et al., 2017). This protein is multifunctional and one of the most crucial structural components of coronaviruses. N protein is a structural and antigenic protein that is involved in packaging, transcription, and replication coronaviruses (4). This data showed that N protein is a suitable candidate for targeting drug and vaccine design because this protein is conserved, antigenic and multifunctional (6). Leung and et al. concluded that N protein can be a suitable vaccine candidate against SARS-Cov because induce strong antibody and this process may trigger cytokine production (Leung et al., 2004). Coronaviruses M protein also has a key role in the assembly of virions. The SARS-CoV M protein can interact with N protein and make a network of interactions with the genomic RNA (He et al., 2004). Ong and et al. the COVID-19 antigens such as S, N and M proteins introduced as a vaccine candidate (6). This protein has also been studied as an epitope vaccine candidate against SARS-CoV (7). Open reading frame 3a (ORF3a) is required for viral replication and virulence of SARS CoV. Severe induction of proinflammatory cytokine is a sign of SARS-CoV and MERS-CoV infections. ORF3a activates both pro-IL-1β gene expression and IL-1β secretion and leads to severe lung injury. (Siu et al., 2019). Also, ORF3a has an important role in SARS-CoV assembly or budding with the participation of M and E proteins (McBride & Fielding, 2012). These proteins are not only involved in the pathogenesis of the COVID-19 virus but also have high antigenicity (Chan et al., 2020; Siu et al., 2019; Xu et al., 2020). In this study E, M, N, ORF10, ORF8, ORF3a and M proteins were evaluated by available bioinformatics tools for designing an efficient multi-epitope vaccine for the stimulation of immune responses against COVID-19 infection. Since the COVID-19 has been recently discovered, little immunological information is available. Preliminary studies based on phylogenetic analyses of the COVID-19 whole genome have suggested that this virus is very similar to the SARS-CoV (79.7% Identify)(9, 14). Given the apparent similarity between the two viruses, it could be concluded that previous studies on the protective immune responses against SARS-CoV may be useful for developing a vaccine for COVID-19. Previous studies have suggested that both humoral and cellular immunity play important roles in protective responses against this virus (Deming et al., 2007; Yang et al., 2004). Studies revealed that the formation of antibodies against the N protein of SARS-CoV, an immunogenic protein that is highly expressed during infection, is relatively common in patients infected with this virus (Liu et al., 2004; Lin et al., 2003). Although these antibodies are effective, they have a shorter lifespan in recovering the patients. In addition to the specific humoral immunity, it has been shown that the CD4+ and CD8+ responses provide long-lasting protection against COVID-19. These studies showed that besides antibody-mediated immune response, cellular immunity is critical to induce protectivity in these infections (Liu et al., 2017). The concept of a multi-epitope vaccine is to efficiently identify and assemble B and T-cell epitopes that are more capable of stimulating the immune system and therefore can induce more potent and effective both arms of immune responses. Peptides and epitopes have shown to be desirable candidates for vaccine development due to their relatively easy production, chemical stability, and lack of infectious potential (Patronov & Doytchinova, 2013). The experimental design and production of multi-epitope vaccines have improved dramatically in recent years. These vaccines are mainly made up of B-cell, CD8+ cytolytic T-cell (CTLs) and CD4+ helper T-cells (HTLs) epitopes (Chiarella et al., 2009). Since the antigenic epitopes of a protein could be predicted and detected, therefore the whole protein is not suitable to stimulate an immune response (Testa & Philip, 2012; Zheng et al., 2017). During the development of a vaccine candidate against COVID-19, complex pathogenic mechanisms and numerous pathogenic factors should be considered in vaccine formulation. In the present study, we aimed to design a novel multi-epitope fusion protein (Nucleocapsid, ORF3a, and Membrane protein or NOM) containing more efficient antigenic epitopes-rich domains. The biological activity of the engineered fusion protein was assessed by bioinformatics tools using the interaction between the vaccine candidate and the innate immune system receptor (TLR4) and cellular immune system receptor (HLA-A*11:01). We strongly believe that the outcome of the present report will provide a potential vaccine candidate against 2019-nCoV.

Materials and methods

In this study, we designed a suitable vaccine candidate against COVID-19, by exploiting the programs of reverse vaccinology (Figure 1)

Figure 1.

Strategies employed in the overall study.

Retrieval of protein sequences

At first, the amino acid sequences of proteins were retrieved from the National Centre for Biotechnology Information (NCBI) at www.ncbi.nlm.nih.gov in FASTA format and performed for subsequent analysis (Table 1).

Table 1:

Amino acid sequences of proteins were retrieved from NCBI.

Name protein	Accession number	FASTA
Nucleocapsid protein	QIC53221.1	>QIC53221.1 nucleocapsid protein [Severe acute respiratory syndrome coronavirus 2]
[Severe acute respiratory syndrome coronavirus 2]	QIC53221.1	MSDNGPQNQRNAPRITFGGPSDSTGSNQNGERSG ARSKQRRPQGLPNNTASWFTALTQHGKEDLKFPRG QGVPINTNSSPDDQIGYYRRATRRIRGGDGKMKDLS PRYFYYLGTGPEAGLPYGANKDGIIWVATEGALNTP KDHIGTRNPANNAAIVLQLPQGTTLPKGFYAEGSRG GSQASSRSSSRSRNSSRNSTPGSSRGTSPARMAGN GGDAALALLLLDRLNQLESKMSGKGQQQQGQTVTK KSAAEASKKPRQKRTATKAYNVTQAFGRRGPEQTQ GNFGDQELIRQGTDYKHWPQIAQFAPSASAFFGMS RIGMEVTPSGTWLTYTGAIKLDDKDPNFKDQVILLNK HIDAYKTFPPTEPKKDKKKKADETQALPQRQKKQQT VTLLPAADLDDFSKQLQQSMSSADSTQA
Membrane protein	QIC53216.1	>QIC53216.1 membrane protein [Severe acute respiratory syndrome coronavirus 2]
[Severe acute respiratory syndrome coronavirus 2]	QIC53216.1	MADSNGTITVEELKKLLEQWNLVIGFLFLTWICLLQF AYANRNRFLYIIKLIFLWLLWPVTLACFVLAAVYRINWI TGGIAIAMACLVGLMWLSYFIASFRLFARTRSMWSF NPETNILLNVPLHGTILTRPLLESELVIGAVILRGHLRIA GHHLGRCDIKDLPKEITVATSRTLSYYKLGASQRVAG DSGFAAYSRYRIGNYKLNTDHSSSSDNIALLV
ORF10 protein[Severe acute respiratory syndrome coronavirus 2]	QIC53212.1	>QIC53212.1 ORF10 protein [Severe acute respiratory syndrome coronavirus 2]
	QIC53212.1	MGYINVFAFPFTIYSLLLCRMNSRNYIAQVDVVNFNLT
Envelope protein	QIC53206.1	>QIC53206.1 envelope protein [Severe acute respiratory syndrome coronavirus 2]
[Severe acute respiratory syndrome coronavirus 2]	QIC53206.1	MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRL CAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV
ORF8 protein	QIC53210.1	>QIC53210.1 ORF8 protein [Severe acute respiratory syndrome coronavirus 2]
[Severe acute respiratory syndrome coronavirus 2]	QIC53210.1	MKFLVFLGIITTVAAFHQECSLQSCTQHQPYVVDDP CPIHFYSKWYIRVGARKSAPLIELCVDEAGSKSPIQYID IGNYTVSCLPFTINCQEPKLGSLVVRCSFYEDFLEYH DVRVVLDFI
ORF3a protein	QIC53205.1	>QIC53205.1 ORF3a protein [Severe acute respiratory syndrome coronavirus 2]
[Severe acute respiratory syndrome coronavirus 2]	QIC53205.1	MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQ ASLPFGWLIVGVALLAVFQSASKIITLKKRWQLALSKG VHFVCNLLLLFVTVYSHLLLVAAGLEAPFLYLYALVY FLQSINFVRIIMRLWLCWKCRSKNPLLYDANYFLCWH TNCYDYCIPYNSVTSSIVITSGDGTTSPISEHDYQIGG YTEKWESGVKDCVVLHSYFTSDYYQLYSTQLSTDT GVEHVTFFIYNKIVDEPEEHVQIHTIDVSSGVVNPVM EPIYDEPTTTTSVPL

Amino acid sequences of proteins were retrieved from NCBI.

Selection of antigenic proteins

We selected six proteins of COVID-19 virus (Table 1) that have an essential role in virulence and replication of the virus, and previous studies have highlighted the necessity of these proteins in coronaviruses function. After the antigenic analysis of these proteins, three proteins of N, ORF3a, and M were selected for final analysis.

Prediction of T-cell (HLA class I and II) epitopes

1d sequence-based screening server RANKPEP was used to identify T-cell epitopes (Reche & Reinherz, 2007). This server predicts peptide binders to MHC molecules from protein sequences using the position-specific scoring matrix (PSSM). We have selected all HLA class I alleles from the selection panel of RANKPEP server for prediction of epitopes of HLA class I. To prediction of epitopes of HLA class II, we considered DRB1*0101, DRB1*0301, DRB1*0401, DRB1*0701, DRB1*0801, DRB1*1101, DRB1*1301, and DRB1*1501 that cover HLA variability of over 95% of the human population worldwide (Kruiswijk et al., 2020).

B-cell epitopes (linear) identification

For the prediction of B-cell epitopes, the amino acid sequence was analyzed using BepiPred and Kolaskar & Tongaonkar Antigenicity (http://www.iedb.org/) servers (Vita et al., 2019). Bepipred for linear epitope prediction uses both hidden Markov model and amino acid propensity scales methods. Kolaskar and Tongaonkar evaluate the protein for B cell epitopes using the physicochemical properties of the amino acids and their frequencies of occurrence in recognized B cell epitopes (Kolaskar & Tongaonkar, 1990; Mirza et al., 2016).

Selection of epitope-rich domains and the final sequence

According to the prediction results of the servers used, B cell epitopes and HLA class I and II epitopes that have had high scores were extracted and combined to generate multi-epitope protein. B cell and T cell epitope-rich domains of N, ORF3a and M proteins were selected and joined to each other with an AAA linker.

Antigenicity and allergenicity evaluation

Antigenicity of designed recombinant protein predicted using the VaxiJen v2.0 server. The VaxiJen classified antigens based on auto cross-covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties which is a novel alignment-independent method and overcome the limitations of alignment-dependent sequence methods (Doytchinova & Flower, 2007). The prediction of vaccine candidate allergenicity is essential. The allergenicity of the designed protein was computed by AllerTOP (http://www.ddg-pharmfac.net/AllerTOP/). AllerTOP method predicts recombinant protein allergenicity on auto cross-covariance ACC that describe residue hydrophobicity, size, abundance, helix- and β-strand forming propensities (Dimitrov et al., 2013). AllerTOP v.2 has the highest accuracy (88.7%) compared to several servers for allergen prediction (Dimitrov et al., 2013).

The physicochemical parameters

The analyzed parameters consisted of the molecular weight, theoretical pI, amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index and grand average of hydropathicity that were evaluated by ProtParam online server (http://us.expasy.org/tools/ protparam.html) (Gasteiger et al., 2005).

Secondary and tertiary structure prediction

GOR was used for the designed protein secondary structure prediction (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_gor4.html) (Kloczkowski et al., 2002). GOR4 predict protein secondary structure using information theory. The tertiary structure was built using the Galaxy web. GalaxyWEB server (http://galaxy.seoklab.org/ tbm) is based on the TBM method. This server detects similar proteins and alignment with the target sequence, then make the model and finally perform model refinement (Shin et al., 2014).

Tertiary structure refinement and validation

The best-modeled structure was refined using the Galaxy Refine server at (http://galaxy.seoklab.org/cgi-bin/submit.cgi?type=REFINE). Galaxy refined the model by molecular dynamics simulation. This method showed one of the best performances in improving protein structure quality. Analysis of the final 3 D model was made using MolProbity, ProSA and Ramachandran plot. Ramachandran plot obtained from RAMPAGE calculates torsional angles residue-by-residue in protein and indicates that residues are in allowed, favored or outlier regions (Oberholser, 2010). ProSA web used to recognize the errors in the generated 3 D models using atomic coordinates of the model. ProSA web was created Z-score (overall model quality) and a plot of residue energies of proteins (Wiederstein & Sippl, 2007). Clash analysis is very important for the validation of proteins. MolProbity is a structure-validation web service that calculates the clash score, Protein-geometry score, Poor rotamers, Ramachandran plot, and MolProbity score.

Defining discontinuous B-cell epitopes of designed protein

The interaction between antigen epitopes and antibodies is very essential to eliminate the infection. Conformational epitopes are the most important and the most prevalent epitopes that are recognized by antibodies. All conformational epitope prediction methods require the 3 D structures of proteins. Discontinuous epitopes of recombinant protein predicted using the ElliPro server. This web-based predicts discontinuous epitopes based on protein-antibody interactions. ElliPro server predicts conformational and linear B cell epitopes using Thornton’s method and by MODELLER program or BLAST search of PDB predict and visualize of antibody epitopes (Ponomarenko et al., 2008).

Obtaining and preparing the structures of immune receptors and NOM recombinant protein

The crystallographic structures of TLR4 (PDB ID: 2Z62 (Kim et al., 2007)) and HLA-A*11:01 (PDB ID: 5WJL (Culshaw et al., 2017)) were obtained from the PDB database (Berman et al., 2000). The structures were cleaned and crystallographic waters and co-crystallized molecules were deleted and only the monomer forms of each receptor were kept. Then, the structures of the two receptors and the NOM protein were prepared by the Dock prep tool in UCSF Chimera software (Pettersen et al., 2004) where hydrogen atoms and charge were added to the structures to make them ready for the next step. In the next step, to stabilize and relax the structures in an aqueous physiological environment, each of the monomer form of the receptors and the NOM protein were simulated for 100 ns. The details of the MD simulation will be explained later.

Protein-Protein molecular docking and refinement

After the MD simulation of each receptor and the NOM recombinant protein, the last frame of the trajectory of each simulation was extracted and used as the input structures for protein-protein molecular docking. The PatchDock webserver (Schneidman-Duhovny et al., 2005) with default parameters was used to predict the best positions and orientations where the two proteins can have the highest number of favorable interactions. Afterward, the top 10 solutions were subjected to Firedock webserver (Andrusier et al., 2007; Mashiach et al., 2008) to refine the interaction of protein-protein complexes resulted from molecular docking. The top two complexes from Coronavirus NOM protein-TLR4 and NOM recombinant protein-HLA-A*11:01 complexes were chosen based on global energies. These four complexes were then taken to the next step, MD simulation.

Molecular dynamics simulation

After performing the protein-protein molecular docking and finding the best orientations of the vaccine candidate and the receptor proteins to interact with one another, the two best-scored complexes were subjected to MD simulation. All of the MD simulations were done by GROMACS 2018 package (Abraham et al., 2015) and OPLS-AA force field (Jorgensen et al., 1996). The monomer form of each protein and also the docked complexes were placed in the center of a triclinic box with a distance of 1 nm from all edges and solvated with TIP3P water model (Jorgensen et al., 1983). Then, sodium and chloride ions were added to produce a neutral physiological salt concentration of 150 mM. Each system was energy minimized, using the steepest descent algorithm, until the Fmax was found to be smaller than 10 kJ.mol−1nm−1. All of the covalent bonds were constrained using the Linear Constraint Solver (LINCS) algorithm (Hess et al., 1997) to maintain constant bond lengths. The long-range electrostatic interactions were treated using the Particle Mesh Ewald (PME) method (Darden et al., 1993; Essmann et al., 1995) and the cut off radii for Coulomb and Van der Waals short-range interactions were set to 0.9 nm. Then 100 ps NVT (constant number of particles (N), volume (V), and temperature (T)) and 300 ps NPT (constant number of particles (N), pressure (P), and temperature (T)) equilibrations were performed for each system. After the necessary equilibrations, 100 ns of production run were performed for each of the four complexes. Finally, simulations were carried out under the periodic boundary conditions (PBC), set at XYZ coordinates to ensure that the atoms had stayed inside the simulation box, and the subsequent analyses were then performed using GROMACS utilities, VMD (Humphrey et al., 1996) and USCF Chimera, and also the plots and the figures were created using Daniel’s XL Toolbox (v 7.3.2) add-in (Kraus, 2014) and EzMol (Reynolds et al., 2018). All of the computations were performed on an Ubuntu desktop PC, with a [Intel(R) Xeon(R) CPU E5-2630 v3 + NVIDIA GeForce GTX 1080] configuration.

Mmpbsa binding free energy

The binding free energies of the protein-protein complexes were calculated by Molecular Mechanics-Poisson–Boltzmann Solvent-Accessible surface area, MMPBSA method (Kollman et al., 2000) using g_mmpbsa package (Kumari et al., 2014). In this method, the binding free energy is the result of the free energy of the complex minus the sum of the free energies of the ligand and the protein. In our case, the NOM recombinant protein is defined as the ligand and the immune receptors are defined as the receptors. The MMPBSA calculation was done for every ns of each simulation trajectories.

Results and discussion

Defining T-cell epitopes

The results of several studies have shown that strong virus-specific T-cell response is required for the elimination of respiratory virus infections such as SARS-CoV and influenza A and para-influenza. These studies conclude that future vaccine interventions should also consider strategies to enhance T cell response to provide robust long-term memory (Channappanavar et al., 2014; Janice Oh et al., 2012). Studies have shown that high levels of T cell responses against N protein were found 2 years after the patient’s recovery (Peng et al., 2006). Antibodies are essential to combat SARS-CoV infection, and the body needs SARS-CoV specific CD4+ T helper cells to produce these specific antibodies. Also, CD8+ cytotoxic T cells are important for recognizing and killing infected cells, especially in the lungs of infected individuals. We used the Rankpep server which covers almost all HLA supertypes to predicted different epitops from N, ORF3a and M proteins sequence according to HLA I and HLA II alleles. Antigenic epitopes with high binding affinity score were predicted that are summarized in table 2a–b.

Table 2a:

HLA I antigenic epitopes predicted using Rankpep.

Antigen	HLA_A0201	HLA_A0204	HLA_A0206	HLA_B0702	HLA_B51	HLA_B5401	HLA_B5301
Nucleocapsid protein	316-324	299-307		66-74	343-351	66-74	45-53
	GMSRIGMEV	KHWPQIAQF		FPRGQGVPI	DPNFKDQVI	FPRGQGVPI	LPNNTASWF
					45-53
					LPNNTASWF
					66-74
					FPRGQGVPI
ORF3a	39-47	39-47	45-53	35-43	41-49
	ASLPFGWLIV	ASLPFGWLI	WLIVGVALL	IPIQASLPF	LPFGWLIVG
	46-55	72-80	72-80		35-43
	LIVGVALLAV	ALSKGVHFV	ALSKGVHFV		IPIQASLPF
	64-73	220-228	82-90
	TLKKRWQLAL	STDTGVEHV	NLLLLFVTV
	79-87
	FVCNLLLLFV
Membrane protein	16-24	56-64	21-29	58-66	58-66	58-66
	LLEQWNLVI	LLWPVTLAC	NLVIGFLFL	WPVTLACFV	WPVTLACFV	WPVTLACFV
	15-23	96-104	65-73
	KLLEQWNLV	FIASFRLFA	FVLAAVYRI
	65-73		89-97
	FVLAAVYRI		GLMWLSYFI
			96-104
			FIASFRLFA

HLA I antigenic epitopes predicted using Rankpep. HLA II antigenic epitopes predicted using Rankpep.

Defining linear B-cell epitopes

Successful vaccination against viruses such as measles and rubella reflects the importance of protective antibodies. Protection against virus infection such as Coronaviruses depends on the simulation of neutralizing antibodies in addition to the T cell-mediated immunity. While cytotoxic lymphocytes can kill infected cells, antibodies have the potential to eliminate infected cells and prevent the infectious virus from infecting a cell (neutralization). SARS-CoV-specific Neutralizing antibodies block viral entry (Dörner & Radbruch, 2007; Hsueh et al., 2004). In this study, for the linear epitope prediction, the Bepipred server was employed. Bepipred analysis revealed several continuous predicted epitopes of N, ORF3a and M proteins. For cross-checking the predicted epitopes, the sequence of all three proteins was also predicted by Kolaskar & Tongaonkar Antigenicity. The linear B-cell epitopes are represented in (Table 3). Given that both cellular and humoral immune responses are essential against coronaviruses infection (Janice Oh et al., 2012), finally, epitopes that were shared between B cell and T-cell were selected.

Table 2b:

HLA II antigenic epitopes predicted using Rankpep.

Antigen	HLADRB10101	HLADRB10401	HLADRB10402	HLADRB10402	HLADRB10701	HLADRB10801	HLADRB11101	HLADRB11501
Nucleocapsid protein	298-305	52-60			346-354	41-49	87-95	50-58
	YKHWPQIAQ	WFTALTQHG			FKDQVILLN	RPQGLPNNT	YRRATRRIR	ASWFTALTQ
	354-362	86-94			300-308	348-356	86-94	360-368
	NKHIDAYKT	YYRRATRRI			HWPQIAQFA	DQVILLNKH	YYRRATRRI	YKTFPPTEP
	86-94	300-308					34-42
	YYRRATRRI	HWPQIAQFA					GARSKQRRP
	305-313	49-57					298-306
	AQFAPSASA	TASWFTALT					YKHWPQIAQ
	52-60						301-309
	WFTALTQHG						WPQIAQFAP
ORF3a	211-219	211-219	45-53	62-70	59-67		62-70	77-85
	YYQLYSTQL	YYQLYSTQL	WLIVGVALL	IITLKKRWQ	ASKIITLKK		IITLKKRWQ	VHFVCNLLL
	212-220	45-53	211-219	85-93	87-95		68-76	84-92
	YQLYSTQLS	WLIVGVALL	YYQLYSTQL	LLFVTVYSH	FVTVYSHLL		RWQLALSKG	LLLFVTVYS
	65-73				54-62
	LKKRWQLAL				AVFQSASKI
Membrane protein	32-40	28-36	28-36	37-45	65-73	44-52	31-39
	ICLLQFAYA	FLTWICLLQ	FLTWICLLQ	FAYANRNRF	FVLAAVYRI	RFLYIIKLI	WICLLQFAY
	65-73	55-63	49-57		48-56	55-63	39-47
	FVLAAVYRI	WLLWPVTLA	IKLIFLWLL		IIKLIFLWL	WLLWPVTLA	YANRNRFLY
	76-84	65-73				58-66	90-98
	ITGGIAIAM	FVLAAVYRI				WPVTLACFV	LMWLSYFIA
	80-88	71-79
	IAIAMACLV	YRINWITGG

Selected targets for designing the final vaccine

The main strategy in the present study was to design and construct a novel multi-epitope protein from COVID-19 based on in silico methods to elicit humoral and cellular immune responses. Due to the low immunogenicity of the epitope, we chose epitope-rich domains to generate a more diverse and robust response (Wieser et al., 2010). Based on in silico analysis, five epitope-rich domains including highly scored and shared epitopes between T and B-cell epitopes were selected and joined to each other with a three AAA linker (Table 4). The schematic diagram of designed vaccine domains with linker’s sites has been shown in Figure 2.

Table 3:

Predicted epitopes of N, ORF3a and M proteins via Bepipred and Kolaskar & Tongaonkar antigenicity.

Antigen	Server	Amino acid Position	Sequence
Nucleocapsid protein	Bepipred	1-51	MSDNGPQNQRNAPRITFGGPSDSTGSNQNGERSGARSKQRRPQGLPNNTAS
(QIC53221.1)		58-85	QHGKEDLKFPRGQGVPINTNSSPDDQIG
		93-104	RIRGGDGKMKDL
		306-310	QFAPS
		323-321	EVTPSGTWL
		338-347	KLDDKDPNFK
		361-390	KTFPPTEPKKDKKKKADETQALPQRQKKQQ
	Kolaskar & Tongaonkar	52-59	WFTALTQH
		69-75	GQGVPIN
		83-89	QIGYYRR
		299-315	KHWPQIAQFAPSASAFF
		333-339	YTGAIKL
		347-363	KDQVILLNKHIDAYKTF
ORF3a	Bepipred	17-28	QGEIKDATPSDF
(QIC53205.1)		61-71	KIITLKKRWQL
		213-214	QL
		216-225	STQLSTDTGV
	Kolaskar & Tongaonkar	44-58	GWLIVGVALLAVFQS
		74-100	SKGVHFVCNLLLLFVTVYSHLLLVAAG
		212-217	YQLYST
Membrane protein (QIC53207.1)	Bepipred	5-20	NGTITVEELKKLLEQW
		40-41	AN
		132-137	PLLESE
		180-191	KLGASQRVAGDS
	Kolaskar & Tongaonkar	29-38	LTWICLLQFA
		46-71	LYIIKLIFLWLLWPVTLACFVLAAVY
		83-91	AMACLVGLM
		93-101	LSYFIASFR

Figure 2.

The schematic diagram of the vaccine candidate construct consists of N, ORF3a and M proteins epitopes of the COVID-19 linked together with AAA linkers.

The schematic diagram of the vaccine candidate construct consists of N, ORF3a and M proteins epitopes of the COVID-19 linked together with AAA linkers. Predicted epitopes of N, ORF3a and M proteins via Bepipred and Kolaskar & Tongaonkar antigenicity.

Vaccine features

Assessment of antigenicity and allergenicity

Prediction of the vaccine candidate antigenicity represents a numerical criterion for the capability of the vaccine to bind to the B- and T-cell receptors and increase the immune response in the host cell. VaxiJen v2.0 was used to predict the antigenicity of the designed protein. VaxiJen analysis of NOM protein showed potent antigenicity 0.5999 with a 0.4% threshold. The results indicate that the designed protein sequences without adjuvant are antigenic. Allergen proteins induce an IgE antibody response (Dimitrov et al., 2013). The designed vaccine candidate must not show an allergic reaction to the body. The allergenicity of the sequence was predicted using the Allertop tool. Based on prediction approaches in Allertop, this protein was not recognized as an allergen.

The physicochemical parameters and protein secondary structure prediction

Primary structure analysis NOM designed protein (337 aa) has a molecular weight of 37513.57 D. The total number of positively charged residues (Arg + Lys) and negatively charged residues (Asp + Glu) were 19 and 32 respectively. The theoretical isoelectric point (PI) was calculated at 9.62. The instability index (< 40) indicates that the designed protein has high stability for the initiation of an immunogenic reaction. The instability index (II) was 35.75 which classifies the protein as stable. The aliphatic index of NOM recombinant protein was calculated 97.92 and indicates this protein has stability in several temperatures. Natively unfolded protein regions and alpha-helix are important forms of structural antigens that can be arranged in their native structure and thus identified by antibodies that are produced in response to infection. (Shey et al., 2019). The composition of the predicted secondary structure of the NOM multi-epitope vaccine candidate was 43.92% (alpha helix), 16.02% (extended strand), and 40.06% (random coil). All this information indicates the designed protein is suitable as a vaccine candidate.

Tertiary structure prediction and validation

Three-dimensional structure was modeled by GalaxyWEB for our designed protein (Figure 3a). This model was used for evaluation and refinement. For validation of the model, ProSA-web, Ramachandran plot and MolProbity were used that compare and analyze the protein structure. The ProSA Web determined Z-score for the best predicted 3 D model (Figure 3b). The Z-score of the NOM predicted model was -4.42, which is within the range of scores typically found for native proteins of similar size. The Ramachandran plot indicates that most of the residues are found in the favored and allowed regions (99.7%) and only 0.3% in the outlier region, this shows that the quality of the designed model is satisfactory. Phe 282, Asn311, Tyr228, Lys194, Asp223, Ser229, Ser4, His197, Tyr68 residues were observed in the allowed and outlier regions of Ramachandran plot (Figure 3c). In MolProbity analysis, the all-atom clash score was 9.37, the MolProbity score was 2.21. All structural images were created using PyMol (The PyMOL Molecular Graphics System, Version 1.1, Schrödinger, LLC).

Figure 3.

(a-c) Prediction and validation of tertiary structure of the NOM recombinant protein using (a) Prediction of the tertiary structure of the NOM recombinant protein, (b) ProSA web, (c) Ramachandran plot.

Defining discontinuous B-cell epitopes

Ellipro Server was used for predicting conformational B-cell epitopes from the 3 D structure of NOM recombinant protein. Discontinuous B-cell epitopes were predicted with scores ranging from0.679 to 0.945. Amino acid residues, the number of residues, sequence location as well as their scores have been listed in Tables 5. The graphical representation of the discontinuous epitopes has been displayed in Figure 4.

Table 5:

Conformational B-cell epitopes from vaccine protein using Ellipro server.

No.	Residues	Number of residues	Score
1	P1, S2, D3, S4, T5, G6, S7, N8, Q9, N10, G11, E12, S14, G15, A16, R17, S18, K19, Q20, R21, R22	21	0.945
2	K123, L124, D125, D126, K127, D128, P129, N130, F131, K132, D133, Q134, V135, I136	14	0.819
3	P23, Q24, G25, L26, P27, N28, N29, T30, A31, S32, W33, F34, T35, A36, L37, T38, Q39, H40, G41, K42, E43, D44, L45	23	0.779
4	I181, I182, T183, L184, K185, K186, R187, W188, Q189, L190, A191, L192, S193, K194, G195, V196, H197, F198, V199, C200, N201, L203, F244, I245, Y246, N247, K248, I249, V250, D251, E252, P253, A254, A255, A256, W257, N258, L259, V260, I261, G262, F263, L264, F265, L266, T267, W268, I269, C270, L271, L272, Q273, F274, A275, Y276, A277, N278, R279, N280, R281, F282, L283, I285, I286, I289, L304, A305, A306, Y308, R309, I310, N311, W312, I313, T314, G315, G316, I317, I319	79	0.679

Figure 4.

Three-dimensional representation of discontinuous epitopes of the NOM designed protein. The epitopes are represented by a yellow surface, and the bulk of the polyprotein is represented in grey sticks.

Establishing the stability of the initial structures

A vaccine can interact with different receptors of the immune system. We have docked designed final constructs (NOM) with TLR4 and HLA-A*11:01 receptors. The interaction between these receptors and NOM recombinant protein induce different immune responses. To achieve a stable and relaxed state of the NOM recombinant protein and the immune receptors, a MD simulation of 100 ns was performed for each structure. These simulations ensure that the structures are stable enough to be used for protein-protein molecular docking. After 100 ns of the production run, as it is shown in Figure 5, the RMSD (Root Mean Square Distance) values of the monomers show that the backbone structure of TLR4 (PDB ID: 2Z62) is very rigid and stable. The structure of the NOM recombinant protein also reached stability after about 25 ns after a considerable structural change. The backbone structure of the HLA-A*11:01 (PDB ID: 5WJL), however, showed big spikes in the RMSD values but all three structures were visually inspected and were considered stable enough for the next step of the project.

Figure 5.

The RMSD values of the simulated monomer forms of the proteins throughput the 100 ns of production runs.

Protein-protein molecular docking

To find the best orientation for optimal interaction of the NOM protein with the immune receptors we decided to use protein-protein molecular docking. We used PatchDock webserver which concentrates on recognizing and matching patterns of the surfaces of the proteins to put them in the best possible positions. Afterward, the top 10 best solutions were subjected to the Firedock webserver to refine the docked structures. The FireDock algorithms refine the docked complexes by side-chain rearrangement and soft rigid-body optimization. The ranking of the docked complexes is based on short-range, long-range, attractive and repulsive interaction energies between the residues of the two proteins, which are all summed up in the global energy or the binding energy of the complexes. We considered global energy as the main criterion for choosing the best complexes and chose the two top-scored solutions with the best global energy. As it is shown in Table 6, the solution numbers 5 and 6 from the NOM-TLR4 complex and the solution numbers 2 and 3, from NOM-HLA-A*11:01 complex, were chosen for further calculations. The total energy plots of the simulations are shown in Figure 6. The results showed a strong interaction between designed protein amino acids and receptors. The interactions between the binding receptors and the docked NOM protein were visualized using the UCSF Chimera program. Residues of HLA-A*1101(GLU53, ASN174, GLY56, ASN174, GLU53, PRO57) participated in the interaction with ARG49, SER97, GLN91, MET102, ARG49 and GLN91 residues of the candidate vaccine. Also, nine residues of TLR4 (TLR4 LYS123, ILE319, THR66, SER90, LYS124, LYS46, HIS85, LYS123) participated in the interaction with SER45, GLU199, GLN91, ALA84, MET328, GLN65, SER90, GLU68 residues of the designed candidate vaccine.

Table 6.

The rankings of the solution of the complexes of NOM protein and the immune receptors sorted by global energy (kJ/mol).

Complex	No of Solution	glob energy	aVdW	rVdW	aElec	rElec	laElec	lrElec
CoVir NOM-TLR4	6	−70.72	−35.8	15.25	−59.29	32.2	−9.04	7.84
CoVir NOM-TLR4	5	−36.56	−38.35	23.35	−35.83	0	−8.39	8.75
CoVir NOM-TLR4	7	−36.39	−24.48	25.24	0	0	0	0
CoVir NOM-TLR4	9	−36.21	−35.91	16.81	−16.15	12.49	−16.07	8.56
CoVir NOM-TLR4	4	−34.54	−20.58	9.45	−11.25	0	−5.55	0
CoVir NOM-TLR4	3	−21.92	−34.54	8.21	−25.98	58.08	−29.15	12.41
CoVir NOM-TLR4	2	−19.49	−23.45	11.3	−82.71	81.52	−17.3	0
CoVir NOM-TLR4	8	−18.15	−17.09	6.82	0	0	0	0
CoVir NOM-TLR4	10	−6.55	−17.77	6.58	−7.49	5.9	−8.21	7.88
CoVir NOM-TLR4	1	26.74	−3.48	49.43	0	0	−2.35	3.58
CoVir NOM-HLA-A*11:01	2	−26.26	−30.64	19.51	−26.6	52.43	−24.6	8.53
CoVir NOM-HLA-A*11:01	3	−21.9	−19.09	7.4	−47.3	14.42	−39.27	20.53
CoVir NOM-HLA-A*11:01	1	−5.38	−5.26	0.43	0	0	−5.89	3.65
CoVir NOM-HLA-A*11:01	8	−0.97	−8.76	3.27	0	0	0	0
CoVir NOM-HLA-A*11:01	5	0.15	−25.97	7.15	−37.47	90.84	−26.96	28.97
CoVir NOM-HLA-A*11:01	6	1.52	−0.8	0.61	0	0	0	0
CoVir NOM-HLA-A*11:01	9	1.74	−28.7	15.11	−90.47	136.21	−32.16	26.54
CoVir NOM-HLA-A*11:01	4	10.29	−10.6	22.49	−40.45	11.12	−24.32	24.47
CoVir NOM-HLA-A*11:01	7	18.39	−21.1	10.69	−35.18	99.31	−30.65	40.48
CoVir NOM-HLA-A*11:01	10	1498.09	−34.31	2561.4	−73.6	10.46	−7.59	18.87

Figure 6.

The total energy plots of the simulations.

The total energy plots of the simulations. The rankings of the solution of the complexes of NOM protein and the immune receptors sorted by global energy (kJ/mol).

Molecular dynamics simulation of top scored solutions

After the protein-protein molecular docking, 4 solutions were chosen for MD simulation, solution numbers 2 and 3 from NOM protein-HLA-A*11:01 complex and solution numbers 5 and 6 from NOM protein-TLR4 complex. Each was simulated for 100 ns. To examine their stability throughout the simulations period, the RMSD values of each protein was analyzed. As it is shown in Figure 7, each protein has different RMSD values and in some of them, the backbone atoms have considerable movements during the simulations. The reason is that the interactions can get optimized and rearranged by introducing water molecules and physiological conditions. The RMSD values of TLR4 in NOM protein-TLR4 complexes and the values of HLA-A*11:01 in NOM protein-HLA-A*11:01 complexes are very stable and the average values do not reach very high values. However, the RMSD values of the NOM protein in every complex reach high values which means that the structure of the vaccine has more movements to refine the interactions with the immune receptors. However, the interactions between the NOM recombinant protein and the immune receptors are quite strong and MMPBSA binding energy calculations exhibit great binding energies. As it is shown in Table 7, the Van der Waals and electrostatic energies of the complexes are strong enough to keep the two proteins in contact with each other. Moreover, the deviation of each value is very small which means that the interactions are very stable and consistent throughout the simulations. In another word, the higher scale of RMSD values with consistent binding energies only show that the structure of the NOM protein is well optimized for interacting with the immune receptors.

Figure 7.

The RMSD values of each protein in the simulated complexes throughout the 100 ns of production runs.

Table 7.

The Van der Waals, Electrostatic, Polar solvation, SASA and Binding Energy of protein complexes, kJ/mol, calculated by MMPBSA method.

Complex	Van der Waals	Electrostatic	Polar solvation	SASA	Binding Energy
CoVir NOM-HLA-A*11:01, sol no 2	−556 +/- 8	−2991.1 +/- 28.5	1354.4 +/- 20.5	−79.46 +/- 1.2	−2267.7 +/- 6.5
CoVir NOM-HLA-A*11:01, sol no 3	−381.9 +/- 8.7	−2705.2 +/- 11.5	729.4 +/- 10.9	−57.1 +/- 1	−2423.3 +/- 24.9
CoVir NOM-TLR4, sol no 5	−503 +/- 5.7	−1598.9 +/- 10.8	1202 +/- 6.1	−75.4 +/- 0.3	−978.4 +/- 7.6
CoVir NOM-TLR4, sol no 6	−407.6 +/- 5.9	−1406.8 +/- 23.4	1009.3 +/- 12.3	−64.8 +/- 0.6	−865.7 +/- 10.6

The RMSD values of each protein in the simulated complexes throughout the 100 ns of production runs. The Van der Waals, Electrostatic, Polar solvation, SASA and Binding Energy of protein complexes, kJ/mol, calculated by MMPBSA method. Five epitope-rich domains were selected In addition to the analysis described above, to analyze the fluctuations of the backbone atoms of structures of the proteins, we decided to perform RMSF (Root Mean Square Fluctuation) analysis. In this analysis, the average value of fluctuation of each residue during the simulation is plotted, Figure 8. As it is shown in Figure 8a, the RMSF values of NOM recombinant protein in five simulations indicate that the fluctuation of the monomer form of the vaccine in many regions is considerably more than the complex forms. This is a direct indication that the NOM recombinant protein is much more stable when it is in complex with the two immune receptors. Furthermore, the RMSF values of the NOM recombinant protein in complex with HLA-A*11:01 are lower compared with that of NOM recombinant protein in complex with TLR4. Also, with the evidence that the binding energies are also lower, we can conclude that NOM recombinant protein can bind to the HLA-A*11:01 better than the other immune receptor.

Figure 8.

The RMSF values of each protein in the simulated complexes compared to the simulated monomer forms of the proteins throughout the 100 ns of production runs. a, The comparison of the RMSF values of NOM recombinant protein in the complexes with the monomer form. b, The comparison of the RMSF values of HLA-A*11:01 in the complexes with the simulated monomer form. c, The comparison of the RMSF values of TLR4 in the complexes with the simulated monomer form. The graphical illustration of the monomer forms and the complex forms of the NOM recombinant protein and the HLA-A*11:01 and TLR4 immune receptors. In Figure 8b and Figure 8c, the RMSF values of the monomer forms of the immune receptors show lower values compared to the complex forms. This behavior is exactly the opposite of the NOM designed protein. This can be an indication that the structures are very stable in their natural function and their structures considerably change when they get into contact with the designed NOM protein. Furthermore, the structural illustration of the monomer forms and also the complex forms of the vaccine and the immune receptors, Figure 9, shows that the vaccine can fill the cavities and bind tightly to them, as it was proved by the binding energies.

Figure 9.

The graphical illustration of the monomer forms and the complex forms of the NOM recombinant protein and the HLA-A*11:01 and TLR4 immune receptors.

Conclusion

COVID-19 pandemic is much more than a health crisis. It leads to a political, social and economic crisis in the world. The development of a safe and effective vaccine could reduce the rate of this infection. Immunoinformatics methods are valuable in reducing time and cost in vaccine design and other fields of life sciences. We have predicted and validated NOM recombinant protein against HLA-A*11:01 and TLR4 receptors. Our evaluation was based on vaccine candidate structural analysis and molecular docking and MD simulations study. The NOM-TLR4 and NOM-HLA-A*11:01 complexes were very stable in their natural function with strong molecular interactions in around 100 ns. Higher binding energy even after the MD simulation of 100 ns confirmed the stability and specificity NOM-TLR4 and NOM- HLA-A*11:01 interaction. Our vaccine candidate can stimulate both cellular and humoral immunity, given that B and T-cell epitopes have been selected in the final construct. Taken all together, according to physicochemical evaluations as well as structural and immunological analyses, NOM recombinant protein could be considered as a possible vaccine candidate against COVID-19.

Table 4:

Five epitope-rich domains were selected

Antigen	Position	Antigenic determinant
Nucleocapsid protein	20–100	PSDSTGSNQNGERSGARSKQRRPQGLPNNTASWFTALTQHGK EDLKFPRGQGVPINTNSSPDDQIGYYRRATRRIRGGDGK
	300–370	HWPQIAQFAPSASAFFGMSRIGMEVTPSGTWLTYTGAIKLDDK DPNFKDQVILLNKHIDAYKTFPPTEPKK
ORF3a	40–100	SLPFGWLIVGVALLAVFQSASKIITLKKRWQLALSKGVHFVCNLLLLFVTVYSHLLLVAAG
	210–240	DYYQLYSTQLSTDTGVEHVTFFIYNKIVDEP
Membrane protein	20–100	WNLVIGFLFLTWICLLQFAYANRNRFLYIIKLIFLWLLWPVTLA CFVLAAVYRINWITGGIAIAMACLVGLMWLSYFIASF

49 in total

1. A unique antigen against SARS-CoV-2, Acinetobacter baumannii, and Pseudomonas aeruginosa.

Authors: Mohammad Reza Rahbar; Shaden M H Mubarak; Anahita Hessami; Bahman Khalesi; Navid Pourzardosht; Saeed Khalili; Kobra Ahmadi Zanoos; Abolfazl Jahangiri
Journal: Sci Rep Date: 2022-06-27 Impact factor: 4.996

Review 2. A review of COVID-19 biomarkers and drug targets: resources and tools.

Authors: Francesca P Caruso; Giovanni Scala; Luigi Cerulo; Michele Ceccarelli
Journal: Brief Bioinform Date: 2021-03-22 Impact factor: 11.622

3. Bioinformatics and immunoinformatics to support COVID-19 vaccine development.

Authors: Stephanie Ishack; Shari R Lipner
Journal: J Med Virol Date: 2021-04-23 Impact factor: 20.693

4. Antiretroviral drug activity and potential for pre-exposure prophylaxis against COVID-19 and HIV infection.

Authors: Dennis C Copertino; Bruno C Casado Lima; Rodrigo R R Duarte; Timothy R Powell; Christopher E Ormsby; Timothy Wilkin; Roy M Gulick; Miguel de Mulder Rougvie; Douglas F Nixon
Journal: J Biomol Struct Dyn Date: 2021-03-18 Impact factor: 5.235

Review 5. COVID-19 drug repurposing: A review of computational screening methods, clinical trials, and protein interaction assays.

Authors: Xueqing Wang; Yuanfang Guan
Journal: Med Res Rev Date: 2020-08-30 Impact factor: 12.944

Review 6. Bioinformatic HLA Studies in the Context of SARS-CoV-2 Pandemic and Review on Association of HLA Alleles with Preexisting Medical Conditions.

Authors: Mina Mobini Kesheh; Sara Shavandi; Parastoo Hosseini; Rezvan Kakavand-Ghalehnoei; Hossein Keyvani
Journal: Biomed Res Int Date: 2021-05-28 Impact factor: 3.411

7. An Immunoinformatics Approach for SARS-CoV-2 in Latam Populations and Multi-Epitope Vaccine Candidate Directed towards the World's Population.

Authors: Andrés Felipe Cuspoca; Laura Lorena Díaz; Alvaro Fernando Acosta; Marcela Katherine Peñaloza; Yardany Rafael Méndez; Diana Carolina Clavijo; Juvenal Yosa Reyes
Journal: Vaccines (Basel) Date: 2021-06-01

8. A computational study to disclose potential drugs and vaccine ensemble for COVID-19 conundrum.

Authors: Sajjad Ahmad; Yasir Waheed; Saba Ismail; Sumra Wajid Abbasi; Muzammil Hasan Najmi
Journal: J Mol Liq Date: 2020-11-10 Impact factor: 6.165

Review 9. COVID-19 pandemic: an overview of epidemiology, pathogenesis, diagnostics and potential vaccines and therapeutics.

Authors: Haneen Amawi; Ghina'a I Abu Deiab; Alaa A A Aljabali; Kamal Dua; Murtaza M Tambuwala
Journal: Ther Deliv Date: 2020-05-12

Review 10. COVID-19 vaccine research and development: ethical issues.

Authors: T Wibawa
Journal: Trop Med Int Health Date: 2020-10-19 Impact factor: 3.918