Raju Dash1, Rasel Das2, Md Junaid3, Md Forhad Chowdhury Akash4, Ashekul Islam5, Sm Zahid Hosen1. 1. Molecular Modeling and Drug Design Laboratory (MMDDL), Pharmacology Research Division, Bangladesh Council of Scientific and Industrial Research (BCSIR), Chittagong, Bangladesh. 2. Nanotechnology and Catalysis Research Center, University of Malaya, Kuala Lumpur, Malaysia. 3. Department of Pharmaceutical Sciences, North South University, Dhaka, Bangladesh. 4. Department of Pharmacy, BGC Trust University Bangladesh, Chittagong, Bangladesh. 5. Department of Biochemistry and Molecular Biology, University of Chittagong, Chittagong, Bangladesh.
Abstract
Ebola virus (EBOV) is one of the lethal viruses, causing more than 24 epidemic outbreaks to date. Despite having available molecular knowledge of this virus, no definite vaccine or other remedial agents have been developed yet for the management and avoidance of EBOV infections in humans. Disclosing this, the present study described an epitope-based peptide vaccine against EBOV, using a combination of B-cell and T-cell epitope predictions, followed by molecular docking and molecular dynamics simulation approach. Here, protein sequences of all glycoproteins of EBOV were collected and examined via in silico methods to determine the most immunogenic protein. From the identified antigenic protein, the peptide region ranging from 186 to 220 and the sequence HKEGAFFLY from the positions of 154-162 were considered the most potential B-cell and T-cell epitopes, correspondingly. Moreover, this peptide (HKEGAFFLY) interacted with HLA-A*32:15 with the highest binding energy and stability, and also a good conservancy of 83.85% with maximum population coverage. The results imply that the designed epitopes could manifest vigorous enduring defensive immunity against EBOV.
Ebola virus (EBOV) is one of the lethal viruses, causing more than 24 epidemic outbreaks to date. Despite having available molecular knowledge of this virus, no definite vaccine or other remedial agents have been developed yet for the management and avoidance of EBOV infections in humans. Disclosing this, the present study described an epitope-based peptide vaccine against EBOV, using a combination of B-cell and T-cell epitope predictions, followed by molecular docking and molecular dynamics simulation approach. Here, protein sequences of all glycoproteins of EBOV were collected and examined via in silico methods to determine the most immunogenic protein. From the identified antigenic protein, the peptide region ranging from 186 to 220 and the sequence HKEGAFFLY from the positions of 154-162 were considered the most potential B-cell and T-cell epitopes, correspondingly. Moreover, this peptide (HKEGAFFLY) interacted with HLA-A*32:15 with the highest binding energy and stability, and also a good conservancy of 83.85% with maximum population coverage. The results imply that the designed epitopes could manifest vigorous enduring defensive immunity against EBOV.
Ebola virus (EBOV) is an antisense-strand RNA virus from the Filoviridae family, and it is structurally filamentous.1 Although the initial discovery of EBOV was in 1976, till now more than 24 epidemics have been reported from Africa, mostly with the Zaire species (http://who.int/mediacentre/factsheets/fs103/en/).2–4 The genome of EBOV enciphers the seven structural proteins, ie, nucleoprotein (NP), viral structural proteins (VP35, VP40, VP30, and VP24), glycoprotein (GP), and RNA-dependent RNA polymerase (L).5Among these, three different versions of glycoprotein are transcribed by the GP gene.6–9 Both attachment protein (GP1) and entry/fusion protein (GP2) are expressed from the full length of the GP chain, which are synthesized from messenger RNAs (mRNAs), containing an additional nontemplated adenosine. The soluble GP (sGP) is synthesized from the unedited RNA transcript. On the contrary, small soluble GP (ssGP) is translated during this process by adding two additional adenosine residues.10 The GPs are expressed virally on the virion surface, which plays a crucial role in the catalysis of membrane fusion and amalgamation to host cells. As a result, it is considered not only a crucial component for vaccines but also an essential target for developing inhibitors and antibodies of attachment and fusion.11–13With the advances in genomics, proteomics, and the understanding of pathogens, the field of viral vaccine preparation has been recently expanded by a most promising approach, known as epitope-based vaccine design.14 Epitope represents the negligible immunogenic region of a protein sequence, which specifically elicits accurate immune responses.15 Various studies recently reported that the vaccination process based on epitope efficiently educes defensive immune responses against diverse pathogens.16–19 In this context, prediction of epitopes via in silico tools in vaccine designing process can significantly minimize the time and cost required in the development process.Thereby, based on the available GP sequences of EBOV, this study attempted to design effective epitope-based peptide vaccines (T-cell and B-cell epitope) using various in silico tools. These results offer new epitope vaccine candidates for vaccine development against EBOV.
Materials and methods
The methodologies used for peptide vaccine development are shown in Figure 1.
Figure 1
Graphical depiction of the methodologies used in peptide vaccine design.
Abbreviations: EBOV, Ebola virus; GP, glycoprotein; RMSF, root mean square function; RMSD, root mean square deviation; MD, molecular dynamics; HLA, human leukocyte antigen; IEDB, immune epitope database.
Protein sequence retrieval, evaluation analysis, and antigenic protein identification
All available sequences of the GP of EBOV were extracted from the UniProt database.20 After that, multiple sequence alignment was performed by using the ClustalW2 tool, and a phylogenetic tree was assembled by MEGA 6.021 software. And then, VaxiJen v2.022 was used to predict most efficient antigenic protein from the available protein sequences.
T-cell epitope identification and conservancy analysis
T-cell identification was done using the NetCTL 1.2 server,22 setting thresholds at 0.5, 0.89, and 0.94 for sensitivity and accuracy. MHC-I binding of the identified epitopes and epitope conservancy were then calculated using tools from the immune epitope database (IEDB).24–26 These tools calculate the half maximal inhibitory concentration (IC50) value of epitope binding to human leukocyte antigen (HLA) molecules using the stabilized matrix base method.26,27 The restriction for epitope identification was set to 12 MHC-I supertypes. Prior to the run, all the alleles were considered, and the length of the peptides was set at 9.0.
Prediction of population coverage and allergenicity assessment
The population coverage tool from IEDB was applied to determine the population coverage for every single epitope by selecting HLA alleles of the corresponding epitope.Allergenicity of the predicted epitope was calculated using AllerHunter,27 which can predict both nonallergens and allergens with a high level of accuracy, by comparing the input sequence with the sequence of known allergen.29
Molecular simulation analysis of HLA allele interaction
Design of the three-dimensional structure of epitope and HLA protein
The three-dimensional structures of all the five epitopes were predicted by a PEP-FOLD web-based server.30 For each sequence, this server predicted the five most provable structures, the best of which, having the lowest energy model, was chosen for further analysis.To validate the binding of identified epitope and HLA molecule, we considered the homology modeling as there is no relevant structure available in the protein data bank. We selected homology modeling using the most popular online protein fold recognition server, Phyre2,31 to generate the three-dimensional structure of HLA-A*32:1532 (accession id: AM422702). Then, ModRefiner33 was used to minimize and correct the hypothetical structure. The validation of the predicted structure was done using PROCHECK,34 verify 3D,35 ERRAT,36 PROVE,37 and QMEAN.38
Docking analysis
Molecular docking analysis was performed using AutoDock Vina,39 by considering HLA molecule as a protein and identified epitopes as ligands. First, we used the protein preparation wizard of UCSF Chimera40 to prepare the hypothetical protein for docking analysis by adding hydrogens and Gasteiger–Marsili charges.41 The prepared file was then converted into pdbqt format. The parameters used for the docking simulation were set to default. The size of the grid box in AutoDock Vina was kept at 36.3095, 54.3374, and 48.025, respectively, for X, Y, and Z. The energy range was kept at 4, according to the default setting. AutoDock Vina was implemented via the shell script offered by AutoDock Vina developers. Docking results were observed by negative score in kcal/mol, as binding affinity of ligands.39
Binding energy estimation and molecular dynamics (MD) simulation
The binding free energy of HLA-epitope complexes were calculated by using MM (CHARMm)42 – Generalized Born Surface Area (GBSA) and Poisson–Boltzmann Surface Area (PBSA) protocols, implemented in Accelrys Discovery Studio 2.5. Using implicit solvent models of GBSA and PBSA, the binding free energy (ΔGbind) for each epitope was calculated by maintaining salt concentration of 0.15 M. Default value was set for conformational entropy and ligand minimization. The distance cutoff value was set to 14.0 Å. The binding energy was calculated by using following equation:The entire dynamics simulation study for the HLA-epitope complex was accomplished in YASARA Dynamics software. Prior to simulation, the complex was cleaned and optimized the hydrogen bond network.43 After that, a cubic simulation cell was created with a periodic boundary condition, and the atoms of the complex were typed using the AMBER1444 force field. The pKa (acid dissociation constant) values of protein titratable amino acids were calculated and solvated the simulation box using the transferable intermolecular potential3 points (TIP3P) water model (density: 0.997 g/L−1). The system consistent with 46406 atoms was energy minimized using the steepest gradient approach (5000 cycles) followed by simulated annealing method. Restrained and unrestrained all-atom molecular dynamics simulation were performed in solvent using the PME method to describe long-range electrostatic interactions at a cut off distance of 8 Å at physiological conditions (298 K, pH 7.4, 0.9% NaCl).45 A multiple time step algorithm together with a simulation time step interval of 2.50 fs was chosen.46 Molecular dynamics simulations of 100 ns long were performed at constant temperature using a Berendsen thermostat and constant pressure. The MD trajectories were saved every 250 ps for analysis. The trajectories generated from the simulation were analyzed for the stability by various evaluative measures viz. RMSD, RMSF (RMS fluctuations), and initial and final protein backbone comparisons using YASARA structure built in macros and VMD software.
Identification of the B-cell epitope
To detect B-cell epitope, various tools from IEDB were used to identify the B-cell antigenicity, together with the Emini surface accessibility prediction,47 Kolaskar and Tongaonkar antigenicity scale,48 Karplus and Schulz flexibility prediction,49 and Bepipred linear epitope prediction analysis.50 Since antigenic parts of a protein belong to the beta turn regions,51 the Chou and Fasman beta turn prediction tool52 was also used.
Results
Evolutionary analysis of the GP sequences
A total of 46 GP sequences from the different variants of EBOV were collected from the UniProtKB database. Multiple sequence alignment analysis was then performed, and a phylogenetic tree (Figure 2) was constructed thereby. Using the unweighted pair-group method with arithmetic mean, a phylogram was constructed using the bootstrap with 1,000 replications in MEGA6.53 From the multiple sequence alignment analysis, it is clearly seen that protein sequences that isolated from various strains were having a close relationship. Also, from the multiple comparison result, the selected sequences of EBOV of the same subtype have 78%–99% similarity. This result also confers the possibilities of mutation in glycoprotein of all strains, which demonstrates a good agreement with the results from Veljkovic et al.54
Figure 2
Evolutionary divergence analysis of available glycoproteins of different strains of EBOV; results are represented in a phylogenetic tree.
Protein sequences in this study were considered to screen out using VaxiJen web server for the identification of potent antigenic protein. As a corollary, UniProtKB id: Q9YMG2 was identified as the most potent antigenic protein having a maximum total prediction score of 0.5390. Here the threshold of 0.5 is considered as the potent antigenicity.55 This sequence was used for further analysis.
Prediction of potent T-cell epitope
On the basis of the high combinatorial score, the five best epitopes were predicted by the NetCTL server from the selected protein sequence in a preselected environment. The identified epitopes are represented in Table 1. In combination with several methods such as proteasomal cleavage/transporter associated with antigen processing (TAP)/MHC-I combined predictor, MHC-I processing of the NetCTL server calculates an overall score for each peptide’s intrinsic potential from a protein for the designing of T-cell epitope. Peptides with a higher score represent higher processing capabilities.
Table 1
Five most potent T-cell epitopes, according to the overall score predicted by the NetCTL server
Number
Epitopes
Overall score (nM)
1
ATEDPSSGY
2.8069
2
TEDPSSGYY
1.5503
3
LFEVDNLTY
1.1224
4
HKEGAFFLY
1.0887
5
LLQLNETIY
0.9753
The five T-cell epitopes were subjected to MHC-I binding prediction, using the stabilized matrix base method. The epitopes that elicited higher affinity (IC50 <200 nM) were subjected to afterward analysis (Table 2). Notably, proteins are transformed into peptides by proteasome complex, which cleaved the peptide bonds. By combining with Class I MHC molecules, these peptides were deported to the cell membrane, where they were introduced to T helper cells.
Table 2
Interaction, binding, and conservancy of identified T-cell epitopes
Notes: MHC-I alleles that have an interacting affinity lower than 200 nm are represented, and total processing scores are shown as enclosed numbers.
Abbreviations: HLA, human leukocyte antigen; MHC-I, major histocompatibility complex I.
As shown in Table 2, a 9-mer T-cell epitope HKEGAFFLY was detected to interact with most MHC-I alleles, including HLA-B*15:02; HLA-C*03:03; HLA-A*32:15; HLA-B*15:03; HLA-B*27:20; HLA-C*12:03; HLA-A*68:23; HLA-B*40:13; and HLA-A*32:07 with higher affinity, among the five T-cell epitopes.Furthermore, this epitope retained the highest conservancy of 83.85%, according to the IEDB conservancy analysis, as tabulated in Table 2. As population coverage in vaccine design generally plays a crucial role, it was calculated in this study.
Population coverage and allergenicity analysis
The cumulative percentage of population coverage was obtained for the predicted epitope HKEGAFFLY. As shown in Table 3, the population coverage for East Africa was found to be 66.98%; in West and North Africa, it was 69.50% and 63.89%, respectively; and for Central Africa it was observed to be 75.93%. The population coverage was recorded at 55.88% for the East Asian region, which was a major hotspot for viral infection. For North America, the population coverage was found to be 58.69%. In current vaccine design pipeline, allergenicity is considered the most prominent barrier in vaccine designing, since most vaccines convert the immune system into an “allergic” reaction56 by inducting Type 2 T helper cells and immunoglobulin E. That is why we predicted allergenicity of the selected epitope by the AllerHunter web server, where the probability is >0.06. The epitope HKEGAFFLY was scored 0.00 (sensitivity =91.6%, specificity =89.3%), and was thus considered a nonallergen, according to the Food and Agriculture Organization/World Health Organization evaluation system of allergenicity prediction.
Table 3
Analysis of the population coverage for the proposed epitope against EBOV
Population
Coverage (%)a
Average hitb
PC90c
East Africa
66.98
1.00
0.23
West Africa
69.50
0.89
0.25
North Africa
63.89
1.13
026
Central Africa
75.93
1.04
0.23
North America
58.69
0.98
0.21
East Asia
55.88
2.11
0.29
Europe
67.57
1.00
0.23
Southeast Asia
63.00
1.45
0.30
Notes:
Projected population coverage.
Average number of epitope hits/HLA combinations recognized by the population.
Minimum number of epitope hits/HLA combinations recognized by 90% of the population.
Abbreviations: EBOV, Ebola virus; HLA, human leukocyte antigen.
Validation of predicted T-cell epitope
As described in the “Materials and methods” section, the hypothetical structure of HLA-A*32:15 protein was generated using the homology technique. The structure was then analyzed through various web-based protein validation software. As shown in Figure 3, the Ramachandran plot generated by the PROCHECK34 server showed that about 98.9% of the residues of protein are located in the most favored region, as against 0% in the outlier region and 1.1% in the generously allowed region. It should be noted that the protein model having >90% of the residues in the core and allowed regions can be considered a high-quality model.57 The hypothetical model was further analyzed using ERRAT and Verify 3D.58 For a good model, structure should retain an ERRAT score >80.00, against which the model in this study obtained an ERRAT score of 89.859.55 Verify 3D graph indicates that 100.00% of residues of this model had an averaged 3D-1D score of 0.2, which is good.59 Along with the QMEAN analysis, the protein model in our interest resulted in a Z-score of −1.33, and the total score was 0.636. This value denotes a higher quality of the model, where the acceptable score ranges between 0 and 1 (Figure 3B).38 On the basis of the results obtained from the aforementioned structural validation programs, the model (Figure 3C) showed much reliability and was considered for further study.
Figure 3
Evaluation of structural superiority by (A) Ramachandran plot, (B) QMEAN assessment, and (C) three-dimensional structure of final model of HLA-A*32:15.
Abbreviations: PDB, Protein Data Bank; HLA, human leukocyte antigen; SSE, secondary structure element; ACC, accessibility.
Molecular docking simulation revealed that the proposed epitopes bound in the cleft of the HLA-A*32:15 (Figure S2), where the highest binding affinity was −7.6 kcal/mol (Table S2, observed for the HKEGAFFLY epitope). The Chimera40 program was used to visualize the interactions of docked HLA-A–epitope complexes, as shown in Figures 4 and S2. Then, binding energy calculation was carried out to understand the binding of HLA with epitopes. Here the binding free energies of MM-GBSA and MM-PBSA are approximate free energies of binding, so a more negative value denotes stronger binding. From MM-GBSA analysis, the highest binding free energy was observed for HLA-A*32:15 with epitope (HKEGAFFLY) of –63.89 kj/mol (Table S1). On the contrary, the lowest binding free energy was obtained for ATEDPSSGY epitope, i.e. –44.86 kj/mol. In contrast of MM-GBSA, the HKEGAFFLY epitope was also resulted the highest binding energy of –38.48 kj/mol, while the lowest binding free energy was seen for TEDPSS-GYY epitope, –20.98 kj/mol. Since HKEGAFFLY epitope obtained the highest docking affinity and binding free energy, its complex subjected for molecular dynamics simulation.
Figure 4
Molecular interaction analysis of HLA-A*32:15 and epitope HKEGAFFLY complex, where the epitope HKEGAFFLY binds in the groove of the HLA-A*32:15, generated by Autodock Vina.
Notes: Here, blue color represents the corresponding distances of hydrogen bonding, the red stick model defines the epitope.
Abbreviation: HLA, human leukocyte antigen.
The 100 ns MD simulation of HLA-epitope (HLA-A*32:15-HKEGAFFLY) complex was carried out using AMBER14 force field, following the energy minimization protocol. The stability of the HLA-epitope complex by means of RMSD was calculated and rendered in Figure 5A. From the results, it is revealed that the HLA molecule was stabilized after 5 ns simulation and tended to remain in plateau phase thereafter for rest of the period. The RMSD value of HLA was observed to grow up quickly from 0.48 Å to 2.214 Å and remained stable in the range from 2.0 Å to 3 Å. In case of epitope, similar RMSD pattern was observed, where the order of magnitude was seen to fluctuate in some range. The average energy of the simulation was –578125.270 kj/mol; the average Coulombic charge and van der Waals interactions was –694749.662 kj/mol, 77122.511 kj/mol, respectively. We also calculated the contribution of each residue for both HLA and epitope in the simulation, in terms of RMSF and RMSD. As seen in Figure 6A, highest RMSD was observed for ARG residue at the position of 180 in HLA, while lowest RMSD observed for CYS100. However, this residue was also resulted highest RMSF value of 7.181 Å, while the rests of the residues were in lowest fluctuation. In case of epitope, the histidine residue at the first position and the tyrosine residue in 9th position were seen to be very much flexible, as these residues were resulted with highest RMSD and RMSF (Figure 6B). In the meanwhile, we calculated the number of hydrogen bond formed between the epitope and HLA molecule during the simulation. The results represented in Figure 5B, showed that hydrogen bond at initial stage was 236, and the range decreased to 160. During the simulation, the number of hydrogen bond was at a range of 160–210, which indicates the strong binding of epitope-HLA complex. Hence, all analyses lead to the conclusion that HKEGAFFLY is one of the most prominent T-cell epitopes for GP based designing of vaccine.
Figure 5
Time evolutions of (A) the backbone RMSD and (B) the number of hydrogen bonds formed between the epitope-HLA (HLA-A*32:15) complex in binding position. The results are presented with respect to the time (ps) of MD simulation, demonstrating the mechanical stability and flexibility of the complex.
Abbreviations: HLA, human leukocyte antigen; MD, molecular dynamics; RMSD, root mean square deviation.
Figure 6
(A) Plot showing the RMSF (black) and RMSD (red) values of C-alpha atoms from of MD simulation of HLA protein. (B) Plot showing the RMSF (black) and RMSD (red) values of C-alpha atoms from of MD simulation of 9-mer epitope. Here, Y axis represented the RMSD and RMSF values in molecular distance unit, ie, Angstrom (A), while X axis demonstrated the position of residue.
Abbreviations: RMSF, root mean square function; RMSD, root mean square deviation; MD, molecular dynamics; HLA, human leukocyte antigen.
B-cell epitope identification
For the identification of potential B-cell epitopes, amino acid scale base methods have been used in this study. Consistent with this protocol, we used diverse investigation processes for the calculation of an incessant B-cell epitope.According to the analysis of Kolaskar and Tongaonkar’s48 antigenicity prediction method, the average antigenicity was 1.028, while 1.225 and 0.894 were the maximum and minimum, respectively. The Kolaskar and Tongaonkar48 antigenicity prediction uses a semiempirical method to predict antigenicity on the basis of physico-chemical properties of the residues in a protein and their diversity in experimentally known epitopes, where values >1.00 were considered to denote a potential antigen. As summarized in Table 4 and Figure S3, 14 epitopes have been found to satisfy the threshold value, and also have the potentiality to express the B-cell response. Furthermore, the surface accessibility of the protein was also analyzed using the Emini surface accessibility prediction methods, since a potent B-cell epitope should be accessible through the surface.47 As shown in Figure S4 and Table 5, higher accessibility was found in regions 9–17 and 186–223 amino acid residues. Figure S5 represents the β-turns region identified by Chou and Fasman β-turn methods.52 According to the result, the region from 200 to 220 (in the region of 200–220 and 105–150) is regarded as β-turns as well as hydrophilic in nature. These are two properties required to be a potent B-cell epitope.60 Experimentally, antigenicity is related to the protein flexibility.61 That is why we implemented the Karplus and Schulz flexibility prediction method, where it was evident that the regions of 255–280 and 200–220 were regarded as the most flexible (Figure S6). Finally, based on the Hidden Markov model, the Bepipred linear epitope prediction tool was utilized to predict linear B-cell epitopes. The predicted result is rendered and tabulated in Figure S7 and Table 6. Hence, by comparing the foregoing results, the peptide sequences ranging from 186 to 220 are able to provoke the immune response as B-cell epitope for GP-based designing of vaccine.
Table 4
Prediction of antigenic region of the protein by Kolaskar and Tongaonkar antigenicity prediction method
Number
Start
End
Peptide
Length
1
4
11
TGILQLPR
8
2
17
56
TSFFLWVIILFQRTFSIPLGVIHNSTLQVSDVDKLVCRDK
40
3
63
69
LRSVGLN
7
4
76
82
ATDVPSA
7
5
89
99
RSGVPPKVVNY
11
6
118
126
GSECLPAAP
9
7
132
154
FPRCRYVHKVSGTGPCAGDFAFH
23
8
156
172
EGAFFLYDRLASTVIYR
17
9
177
189
AEGVVAFLILPQA
13
10
194
202
FSSHPLREP
9
11
211
221
SGYYSTTIRYQ
11
12
233
247
LFEVDNLTYVQLESR
15
13
249
259
TPQFLLQLNET
11
14
274
280
IWKVNPE
7
Table 5
Prediction of the Emini surface accessibility of the protein under study
Number
Start
End
Peptide
Length
1
9
17
LPRDRFKRT
9
2
56
63
KLSSTNQL
8
3
81
87
SATKRWG
7
4
111
117
LEIKKPD
7
5
188
193
QAKKDF
6
6
197
221
HPLREPVNATEDPSSGYYSTTIRYQ
25
7
227
232
TNETEY
6
8
244
249
LESRFT
6
9
263
270
SGKRSNTT
8
Table 6
Identification of the Bepipred linear epitope from the protein sequence
Number
Start
End
Peptide
Length
1
14
14
F
1
2
57
59
LSS
3
3
73
106
NGVATDVPSATKRWGFRSGVPPKVVNYEAGEWAE
34
4
114
131
KKPDGSECLPAAPDGIRG
18
5
141
148
VSGTGPCA
8
6
175
176
TF
2
7
191
193
KDF
3
8
198
215
PLREPVNATEDPSSGYYS
18
9
223
229
TGFGTNE
7
10
261
270
YTSGKRSNTT
10
11
279
285
PEIDTTI
7
Discussion
In recent trends, the primary focus of vaccine development is very much rely on GPs, as they are involved in cell attachment, fusion and entry as well as assist in invasion; and thus plays the role of pathogenesis of disease. The central role of GP of EBOV therefore makes an appropriate antigenic target for vaccine development; as a result, several reports have been published on this perspective.62–66 However, the information representing the population coverage in the worldwide are still limited. In such case, computational based epitiope screening is very much efficient in context of HLA class I molecules,67 and also much safe, high specificity and cost effective. Therefore, this study incorporated various immunoinformatics and molecular modelling tools to identify potential epitopes present in EBOV GPs.Initially, a set of 46 glycoprotein sequences from the different strains of EBOV has been subjected to perform multiple sequence alignment. Previous GP sequences analysis of different strains of each EBOV species revealed a high degree of sequence similarity,68,69 and thereby, it is believed that targeting GP from old strain could provide strong and cross reactive immunity against the new strain and previous outbreaks in 2014.70 Interestingly, in our molecular analysis, we have found ~98–99% conservation for the amino acid sequences of different strains within the species, which confers the degree of similarity and support the previous analysis. From this set of GPs, the most antigenic protein sequence was determined by Vaxijen server. Based on auto cross covariance (ACC), the Vaxijen server transform the protein sequence into uniform vectors of physicochemical properties of proteins. With 91% sensitive, 82% accuracy and 72 specificity, the l00-CV (leave one – out cross validation) was used to identify antigenicity of protein for viral species.71 The resultant antigenic protein (VaxiJen score ≥0.5) was then subjected for various immunoinformatics analysis, followed by IEDB web server.At the beginning five potent 9-mer epitopes have been predicted from NetCTL 1.2 server and selected for further study. Using the threshold of 0.5, the NetCTL 1.2 server predicts maximum number of epitopes without compromising the specificity or sensitivity levels, covering all 12 MHC class I supertypes.23 The five most potent epitopes are represented in Table 1, and the scores are the predicted MHC class I affinities in the form of –logIC50 and IC50 value.For MHC-I binding prediction, peptides with IC50 values <50 nM are considered high affinity, <500 nM for intermediate affinity, and <5000 nM for low affinity. Therefore, we selected maximum alleles having binding affinity <200 nM.72 It is advocated that T-cell epitope binding to specific multiple HLA supertypes are termed as promiscuous in vaccine design, since they effectively increase the coverage of higher proportions of human populations.73,74 According to the results, both HKEGAFFLY and LFEVDNLTY bind to the highest number of alleles. However, HKEGAFFLY represents highest conservancy and was hence considered as epitope of choice.We also validated each epitope by molecular docking simulation and MM-GBSA/MM-PBSA studies with HLA-A*32:15 protein, as it was found common in the results from MHC-I binding interaction analysis. Prior of docking simulation, the three dimensional structure of HLA molecule was prepared by using the Phyre2, followed by intensive mood. As a result, 179 residues (99%) of HLA-A*32:15 modelled at >90% accuracy. In these study, 2BCK_Chain A, crystal structure of HLA-A*2402 showed the highest similarity of 90% (Figure S1). The selected model has been chosen from the twenty models generated by Phyre2, on the basis of similarity and confidence level. Phyre2 is one of the best protein prediction servers that allows remote fold reorganization and homology detection. Using hidden Markov model (HMM), this server predicts the structure of given protein sequence by constructing backbone, loop modelling and adding side chains.75 However in intensive mode, additional ab initio approach is used for reconstruction of missing region, backbone and side chain.75 In docking simulation, among the other epitopes, HKEGAFFLY obtained highest binding affinity (Table S1, supplementary material). In addition, MM-PBSA and MM-GBSA techniques are frequently to re-rank docking poses from molecular docking study, as they achieve a much better performance than docking scoring functions.76 Nevertheless, the success rate of the absolute binding free energy prediction strongly depends on the systems.77 Thence, we used both of these solvation models to predict binding energy more accurately, where the results from MM-PBSA examine the accuracy and reliability of the results from MM-GBSA. Results of binding energy calculation are shown in Table S1. The relative magnitude of binding free energy obtained from GB methods is found to be consistent with those calculated using PB method, despite of the differences in the absolute value of salvation energy. As a corollary, these results also demonstrated the consistence with relative stabilities of HLA-epitope complexes. In previous published reports regarding in silico epitope identification of EBOV,78–81 the studies are limited to sequence-based scoring function techniques and some extend to docking simulation. These techniques have certain limitations, though these are very useful.82 In docking simulation of peptide and protein, it faces problem like peptide’s flexibility.83,84 Whereas, energy based approach like molecular mechanics and interaction energy scoring can add valuable information to sequence based results.85,86Therefore, we have performed MD simulation study of 100 ns long to enhance the predictive power of the peptide affinity calculations to MHC molecule. In molecular dynamics simulation, both epitope and HLA protein were seen to achieve equilibration, while different fluctuations of RMSD were seen by the time evolution. Higher RMSD values of epitope indicate the flexibility in binding with HLA molecule, during the simulation. These results were further confirmed by the analysis of per residue contributions in dynamics simulation by means of RMSF and RMSD. Low values of RMSF indicate the core region of the HLA protein was stable, while high values of RMSD demonstrated the motion of the protein during the simulation. In like manner, the RMSD and RMSF profiles of eptiope confirm the synergic conformation changes to accommodate the binding pocket of HLA. The hydrogen bond occupancy analysis between the HLA-epitope further confirmed the stability of the complex during the simulation. Overall, these results evidently demonstrate that both HLA and epitope have remarkable conformation changes to facilitate the binding and formed stable complex in thermodynamic environment (Figure 7).
Figure 7
Conformational illustration of HLA-epitope complex. Here, cartoon structure of green color represents the starting confirmation the complex, while red color represents the confirmation of last step in 100 ns long MD simulation.
Abbreviations: MD, molecular dynamics; HLA, human leukocyte antigen.
It is one of important factors in vaccine design that the distribution of HLA varies according to the diverse ethnic groups and geographic regions around the world. Therefore, wide range of population coverage must be considered during the designing of an effective design. According to the results from population coverage analysis, the epitope HKEGAFFLY showed wide range of population coverage in different regions of the world (Table 3), where the highest coverage was observed Central Africa; one of the most EBOV infected areas. This result indicates that it will specifically bind with the prevalent HLA molecules in the target population, where the vaccine will be employed.In other aspects, the B-cell epitope stimulates minimal immune unity, which is very much strong enough to elicit a potent humoral immune response, causing no harmful side effects to human body. Thereby, we are also calculated and found that the sequences ranging from 186–220 as a B-cell epitope, by taking consideration of amino acid property, hydrophilicity, accessibility, flexibility, turns, exposed surface, polarity and antigenic propensity. This study could provide a solid base for vaccine design.
Conclusion
In recent years, most vaccines have been developed based on B-cell immunity; however, the current strategy relies mostly on T-cell epitope owing to long-lasting immunity. Both B-cell and T-cell epitopes are offered in this study for stimulating immunity in several ways. The resulting peptides showed B-cell and T-cell selectivity, better conservancy, population coverage, and significant interaction with MHC-1 allele with good affinity. Above all, the predicted epitopes are anticipated to offer long-term and high protective immunity against EBOV.Multiple sequence alignment and secondary structure predictions of query sequence (HLA*A15:02) and template sequence (PDB: 2BCK).Notes: Green color represents alpha helix, whereas beta sheet is represented by blue color.Abbreviations: HLA, human leukocyte antigen; PDB, Protein Data Bank.Visualization of the best docked HLA–epitope complexes obtained from molecular docking analysis. The complexes are representing the interaction of HLA (HLA*A15:02) with (A) ATEDPSSGY, (B) TEDPSSGYY, (C) LFEVDNLTY, and (D) LLQLNETIY epitope.Abbreviation: HLA, human leukocyte antigen.Kolaskar and Tongaonkar antigenicity prediction of the most antigenic protein.Notes: Here, the x-axis and y-axis represent sequence position and antigenic propensity, respectively. The threshold value is 1.000. The regions above the threshold are antigenic, shown in yellow while, green color reflects the polypeptide regions that could not satisfy the threshold margin.Emini surface accessibility prediction of the most antigenic protein.Notes: Here, the x-axis and y-axis represent sequence position and antigenic propensity, respectively. The threshold value is 1.000. The regions above the threshold are antigenic, shown in yellow while, green color reflects the polypeptide regions that could not satisfy the threshold margin.Chou and Fasman β-turn prediction of the most antigenic protein.Notes: Here, the x-axis and y-axis represent sequence position and antigenic propensity, respectively. The threshold value is 1.000. The regions above the threshold are antigenic, shown in yellow while, green color reflects the polypeptide regions that could not satisfy the threshold margin.Karplus and Schulz flexibility prediction.Notes: Here, x-axis and y-axis represent position and score, respectively. The threshold is 1.00. The flexible regions are shown in yellow while, green color reflects the polypeptide regions that could not satisfy the threshold margin.Bepipred linear epitope prediction of the most antigenic protein.Notes: Here, x-axis and y-axis represent position and score, respectively. The threshold is 0.35. The flexible regions are shown in yellow while, green color reflects the polypeptide regions that could not satisfy the threshold margin.Results of binding affinity and binding energy calculations of the selected epitopes
Table S1
Results of binding affinity and binding energy calculations of the selected epitopes
Authors: Gloria P Monterrubio-López; Jorge A González-Y-Merchand; Rosa María Ribas-Aparicio Journal: Biomed Res Int Date: 2015-04-15 Impact factor: 3.411
Authors: Sylvie Chabot; Yusra Gimie; Karam Obeid; Jaekwan Kim; Clement A Meseda; Krishnamurthy Konduru; Gerardo Kaplan; Li Sheng Fowler; Jerry P Weir; Keith Peden; Marian E Major Journal: J Virol Date: 2022-09-07 Impact factor: 6.549