Mst Rubaiat Nazneen Akhand1, Kazi Faizul Azim2, Syeda Farjana Hoque3, Mahmuda Akther Moli3, Bijit Das Joy1, Hafsa Akter4, Ibrahim Khalil Afif5, Nadim Ahmed4, Mahmudul Hasan6. 1. Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, 3100, Bangladesh; Department of Biochemistry and Chemistry, Sylhet Agricultural University, Sylhet, 3100, Bangladesh. 2. Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, 3100, Bangladesh; Department of Microbial Biotechnology, Sylhet Agricultural University, Sylhet, 3100, Bangladesh. 3. Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, 3100, Bangladesh; Department of Pharmaceuticals and Industrial Biotechnology, Sylhet Agricultural University, Sylhet 3100, Bangladesh. 4. Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, 3100, Bangladesh. 5. Department of Biotechnology and Genetic Engineering, Noakhali Science and Technology University, Noakhali, Bangladesh. 6. Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, 3100, Bangladesh; Department of Pharmaceuticals and Industrial Biotechnology, Sylhet Agricultural University, Sylhet 3100, Bangladesh. Electronic address: mhasan.pib@sau.ac.bd.
Abstract
The present study aimed to predict a novel chimeric vaccine by simultaneously targeting four major structural proteins via the establishment of ancestral relationship among different strains of coronaviruses. Conserved regions from the homologous protein sets of spike glycoprotein, membrane protein, envelope protein and nucleocapsid protein were identified through multiple sequence alignment. The phylogeny analyses of whole genome stated that four proteins reflected the close ancestral relation of SARS-CoV-2 to SARS-COV-1 and bat coronavirus. Numerous immunogenic epitopes (both T cell and B cell) were generated from the common fragments which were further ranked on the basis of antigenicity, transmembrane topology, conservancy level, toxicity and allergenicity pattern and population coverage analysis. Top putative epitopes were combined with appropriate adjuvants and linkers to construct a novel multiepitope subunit vaccine against COVID-19. The designed constructs were characterized based on physicochemical properties, allergenicity, antigenicity and solubility which revealed the superiority of construct V3 in terms safety and efficacy. Essential molecular dynamics and normal mode analysis confirmed minimal deformability of the refined model at molecular level. In addition, disulfide engineering was investigated to accelerate the stability of the protein. Molecular docking study ensured high binding affinity between construct V3 and HLA cells, as well as with different host receptors. Microbial expression and translational efficacy of the constructs were checked using pET28a(+) vector of E. coli strain K12. However, the in vivo and in vitro validation of suggested vaccine molecule might be ensured with wet lab trials using model animals for the implementation of the presented data.
The present study aimed to predict a novel chimeric vaccine by simultaneously targeting four major structural proteiene">ns via the establishmeene">nt of aene">ncestral relationship among differeene">nt straiene">ns of n class="Species">coronaviruses. Conserved regions from the homologous protein sets of spike glycoprotein, membrane protein, envelope protein and nucleocapsidprotein were identified through multiple sequence alignment. The phylogeny analyses of whole genome stated that four proteins reflected the close ancestral relation of SARS-CoV-2 to SARS-COV-1 and bat coronavirus. Numerous immunogenic epitopes (both T cell and B cell) were generated from the common fragments which were further ranked on the basis of antigenicity, transmembrane topology, conservancy level, toxicity and allergenicity pattern and population coverage analysis. Top putative epitopes were combined with appropriate adjuvants and linkers to construct a novel multiepitope subunit vaccine against COVID-19. The designed constructs were characterized based on physicochemical properties, allergenicity, antigenicity and solubility which revealed the superiority of construct V3 in terms safety and efficacy. Essential molecular dynamics and normal mode analysis confirmed minimal deformability of the refined model at molecular level. In addition, disulfide engineering was investigated to accelerate the stability of the protein. Molecular docking study ensured high binding affinity between construct V3 and HLA cells, as well as with different host receptors. Microbial expression and translational efficacy of the constructs were checked using pET28a(+) vector of E. coli strain K12. However, the in vivo and in vitro validation of suggested vaccine molecule might be ensured with wet lab trials using model animals for the implementation of the presented data.
Novel coronavirusnamed n class="Species">SARS-CoV-2/2019-nCoV was identified at the end of 2019 in Wuhan, a city in the Hubei province of China, causing severe pneumonia that leads to huge death cases (Wang et al., 2020). Gradually this virus emerged as a new threat to the whole world and affecting almost all parts of the world. To date, the pathogen has affected 198 countries, and thus becoming a global public health emergency. Global public health concern with pandemic notion of COVID-19 was declared on January 30th, 2020 by the World Health Organization (WHO), 2020a, World Health Organization (WHO), 2020b. Again, an adverse situation has also been announced on 13 March 2020 for increasing the infections of COVID-19 (Kunz and Minder, 2020). Till April 10, 2020, total virus affected people around the world exceeded 1,633,272 and more than 97,601 committed death, while 366,610 people fully recovered from the infection World Health Organization (WHO), 2020a, World Health Organization (WHO), 2020b). The alarming situation is that the number of confirmed cases worldwide has exceeded one million by this time. It took more than three months to reach the first 10,000 confirmed cases, while required only 12 days to detect the next 100,000 cases. The situation is getting worse in European region. Total death cases in Italy, Spain, USA, France, United Kingdom was 14,681,11,744,7847,6507 and 4313 respectively (till April 4, 2020) and this number is exacerbating day by day (World Health Organization (WHO), 2020a, World Health Organization (WHO), 2020b).
Some common clinical manifestations of COVID-19 are n class="Disease">fever, sputum production and shortness in breath, cough, fatigue, sore throat and headache which leads to severe cases of pneumonia. A few patients also have gastrointestinal symptoms with diarrhea and vomiting (Guan et al., 2020). Though several early studies showed that the mortality rate for SARS-CoV-2 is not as high (2–3%), the latest global deathrate for COVID-19 is 3.4% which indicates the increasing trends (Wu et al., 2020). The investigation of Chinese Center for Diseases Control and Prevention (2020) revealed that the prevalence of COVID-19 is more apparent in the people ages 50 years rather than the lower age groups (Jeong-ho et al., 2020). High fever and Lymphocytopenia were found more common in Covid-19, though the frequency of the patient without fever condition is also higher than in the earlier outbreaks caused by SARS-CoV (1%) and MERS-CoV (2%) (Huang et al., 2020; Chen et al., 2020).
SARS-CoV-2 is a n class="Species">beta-coronavirus that has a positive sense, 26–32 kb in length, single stranded RNA molecule as its genetic material and belongs to the family Coronaviridae, order Nidovirales (Hui et al., 2020). It shares genome similarity with SARS-CoV (79.5%) and bat coronavirus (96%) (Zhou et al., 2020a, Zhou et al., 2020b; Zhu et al., 2020). However, there are still obscured hypothesis regarding the vector or carrier of SARS-CoV-2, though its detection was primarily linked to Wuhan’s Huanan Seafood wholesale market (Lu et al., 2020; World Health Organization (WHO), 2020a, World Health Organization (WHO), 2020b). Though the species of SARS-CoV-1 and bat coronavirus shares sufficient sequence similarities with the COVID-19, the known way mechanism of infection to the host, and the deathrate is quite different in case of the novel coronavirus. In addition, there is an evolutionary distance between SARS-CoV-1 and bat coronavirus as well as the COVID-19 (Hu et al., 2018; Wu et al., 2020; Wu 2020). Because of high sequence variability of the pathogen, many of the efforts that have been undertaken to develop vaccine against SARS-CoV remain unsuccessful (Graham et al. 2013). Therefore, there is an urgent need to develop vaccines for treatment of SARS-CoV-2 based on the understanding of actual evolutionary ancestral relationship. While some natural metabolites and traditional medication may come up with comfort and take the edge off few symptoms of COVID-19, there is no proof that existing treatment procedures can effectively combat against the diseased condition (World Health Organization (WHO), 2020a, World Health Organization (WHO), 2020b). However, inactivated or live-attenuated forms of pathogenic organisms are usually recommended for the initiation of antigen-specific responses that alleviate or reduce the possibility of host experience with secondary infections (Thompson and Staats 2011). Moreover, all of the proteins are not usually targeted for protective immunity, whereas only a few numbers of proteins are necessary depending on the microbes (Tesh et al., 2002, Li et al. 2014). Depending on sufficient antigen expression from experimental assays, traditional vaccine could take 15 years to develop, while sometimes can lead to undesirable consequences (Purcell et al. 2007; Petrovsky and Aguilar 2004).
Reverse vaccinology approach, on the other haene">nd, is aene">n effective way to develop vacciene">ne agaiene">nst n class="Disease">COVID-19. In this method, computation analysis towards genomic architecture of pathogenic candidate could predict the antigens of pathogens without the prerequisite to culture the pathogens in lab condition. Although, few pathogens that challenge to develop effective vaccines so far may become possible through such approach (Rappuoli 2000) which initiates a huge move in the development of vaccine against the deadly pathogens. The strategy included the comprehensive utilization of bioinformatics algorithm or tools to develop epitope based vaccine molecules, though further validation and experimental procedures are also needed (Moxon et al., 2019). In addition, peptide based subunit vaccines are biologically safer due to the absence of continuous in vitro culture during the production period, and also implies an appropriate activation of immune responses (Purcell et al. 2007; Dudek et al., 2010). Such immunoinformatic approaches have already been employed by the researchers to design vaccines against a number of deadly pathogens including Ebola virus (Khan et al. 2015), HIV (Pandey et al. 2018), Areanaviruses (Azim et al. 2019a), Marburgvirus (Hasan et al. 2019a), Norwalk virus (Azim et al. 2019b), Nipah virus (Saha et al. 2017), Influenza virus (Hasan et al. 2019b) and so on. At present, a suitable peptide vaccine to combat COVID-19 is urgently necessary that could efficiently generate enough immune response to destroy the virus. Hence, the study was designed to develop a chimeric recombinant vaccine against SARS-CoV-2 by targeting four major structural proteins of the pathogen, while revealing the evolutionary history of different species of coronavirus based on whole genome and protein domain-based phylogeny.
Materials and methods
Data acquisition
Complete Genomes of the COVID-19 aene">nd other n class="Species">coronaviruses were retrieved from the NCBI (https://www.ncbi.nlm.nih.gov/), using the keyword ‘coronavirus’ and the search option ‘nucleotide’. A total 61 complete genomes were retrieved, with unique identity (File S1). Protein sequence of the spike, envelope, membrane and nucleocapsid were also retrieved from the corresponding genome sequences found in NCBI (File S1).
Phylogeny construction and visualization
The complete genome sequences of coronaviruses aene">nd the sequeene">nces of viral n class="Chemical">proteins, such as spike glycoprotein, envelope, membrane and nucleocapsid were employed to construct different phylogenetic trees. Multiple sequence alignment (MSA) of the complete genome and protein sequences were performed using MAFFT v7.310 (Katoh and Standley 2013) tool. For the whole genome alignment, we used MAFFT Auto algorithm, while for the protein sequences alignment, MAFFT G-INS-I algorithm was used using default parameters. Next, alignment was visualized using the JalView-2.11 (Waterhouse et al. 2009). Alignment position with more than 50% gaps was pruned from coronavirus genome using Phyutility 2.2.6 program (Smith and Dunn 2008). Again, more than 20% gaps from the spikeprotein alignment were removed. PartitionFinder-2.1.1 (Lanfear et al., 2016) indicated the best fit substitution model of the completed genome sequences and the protein sequences. The phylogeny of the whole genome sequences of coronavirus was constructed using both the Maximum Likelihood Method and Bayesian Method. RAxML version 8.2.11 (Stamatakis 2014) with the substitution model GTRGAMMAI was used using 1000 rapid bootstrap replicates. MrBayes version 3.2.6 (Ronquist et al. 2012) with INVGAMMA model was used for the corona virus genomes. Phylogenetic analyses of four different protein sequences were performed by using RAxML-8.2.11 tool. For spike and nucleocapsidproteins, we found PROTGAMMAIWAG and PROTGAMMAIWAG as the best fit model, respectively. Again, PROTGAMMAWAG was the best fit model of evolution for both the membrane and envelope proteins. For the retrieval of the domain sequences of the stated protein sequences, InterPro database (https://www.ebi.ac.uk/interpro/) was utilized. Finally, the Interactive Tree of Life (iTOL; EMBL, Heidelberg, Germany) was used for the visualization of the phylogenetic trees. All the trees were rooted in the midpoint.
Identification of conserved regions as vaccine target
In the present study, reverse vaccinology technique was utilized to model a novel multiepitope subunit vaccine against 2019-nCoV. The scheme iene">n Fig. 1 represeene">nts the complete methodology that has beeene">n adopted to develop the fiene">nal vacciene">ne construct. Among 496 n class="Chemical">proteins (available in the NCBI database) from different strains of novel corona virus, four structural proteins, i.e. spike glycoprotein, membrane glycoprotein, envelope protein and nucleocapsidprotein, were prioritized for further investigation (File S2). After sequence retrieval from NCBI, the sequences were subjected to BLASTp analysis to find out the homologous protein sequences. Multiple sequence alignment was done by using Clustal Omega to identify the conserved regions (Sievers and Higgins 2014).The topology of each conserved regions were predicted by TMHMM Server v.2.0 (http://www.cbs.dtu.dk/services/TMHMM/), while the antigenicity of the conserved regions was determined by VaxiJen v2.0 (Doytchinova and Flower 2007a).
Fig. 1
Flow chart summarizing the protocol of multi-epitope subunit vaccine design against SARS-CoV-2 through reverse vaccinology approach.
T-cell epitope prediction, transmembrane topology screening and antigenicity analysis
Only the common fragments were used for T-Cell epitopes enumeration via T-Cell epitope prediction server of IEDB (http://tools.iedb.org/maiene">n/tcell/) (Vita et al., 2015). Agaiene">n, TMHMM server was utilized for the prediction of traene">nsmembraene">ne topology of predicted MHC-I aene">nd MHC-II biene">ndiene">ng peptides followed by aene">ntigeene">nicity scoriene">ng via VaxiJeene">n v2.0 server (Krogh et al. 2001; Doytchiene">nova aene">nd Flower 2007b). The epitopes which have aene">ntigeene">nic poteene">ncy were picked aene">nd used for precediene">ng aene">nalysis.
Flow chart summarizing the protocol of multi-epitope subuene">nit vacciene">ne desigene">n agaiene">nst n class="Species">SARS-CoV-2 through reverse vaccinology approach.
Conservancy analysis and toxicity profiling of the predicted epitopes
The level of conservancy scrutinizes the ability of epitope candidates to impart capacious spectrum immunity. Homologous sequence sets of the chosen antigenic proteiene">ns were retrieved from the NCBI database by utiliziene">ng BLASTp tool. Later, conservaene">ncy aene">nalysis tool (http://tools.iedb.org/conservaene">ncy/) iene">n IEDB was used to demonstn class="Species">rate the conservancy level of the predicted epitopes among different viral strains. The toxicity of non-allergenic epitopes was enumerated by using ToxinPred server (Gupta, 2013).
Population coverage and allergenicity pattern of putative epitopes
Among different ethnic societies and geographic spaces, the HLA distribution varies around the world. Population coverage study was conducted by usiene">ng IEDB populationn class="Species">coverage calculation server (Vita et al., 2015). To check the allergenicity of the proposed epitopes, four distinct servers i.e. AllergenFP (Dimitrov et al. 2014b), AllerTOP (Dimitrov et al., 2014a, Dimitrov et al., 2014b), Allermatch (Fiers et al., and Allergen Online (http://www.allergenonline.org/) servers were utilized.
Identification of B-cell epitopes and conservancy analysis
Three differeene">nt algorithms i.e. Bepipred Liene">near Epitope Prediction 2.0 (Jesperseene">n et al. 2017), Emiene">ni surface accessibility prediction (Emiene">ni et al. 1985) aene">nd Kolaskar aene">nd Tongaonkar aene">ntigeene">nicity scale (Kolaskar aene">nd Tongaonkar 1990) from IEDB predicted the poteene">ntial B-Cell epitopes withiene">n conserved fragmeene">nts of the choseene">n viral n class="Chemical">proteins. To cross check the epitope conservancy, we used 30 strains from 30 different countries using World Health Organization’s most infected countries’ list. We demonstrated the conservancy for both B-cell and T-cell epitopes used to design the final vaccine construct in this study.
Construction of vaccine molecules and prediction of allergenicity, antigenicity and solubility of the constructs
Top CTL, HTL aene">nd B cell epitopes were compiled to desigene">n the fiene">nal vacciene">ne constructs iene">n the study. Each vacciene">ne constructs commeene">nced with aene">n adjuvaene">nt followed by top CTL epitopes, n class="Disease">HTL epitopes and BCL epitopes respectively. For construction of novel corona vaccine, the chosen adjuvants i.e. L7/L12 ribosomal protein, beta defensin (a 45 mer peptide) and HABA protein (M. tuberculosis, accession number: AGV15514.1) were used (Rana and Akhter, 2016). Several linkers such as EAAAK, GGGS, GPGPG and KK in association with PADRE sequence were incorporated to construct fruitful vaccine sequences against COVID-19. The constructed vaccines were then analyzed whether they are non-allergenic by utilizing the following tool named Algpred (Azim et al., 2019a, Azim et al., 2019b). The most potential vaccine among the three constructs was then determined by assessing the antigenicity and solubility of the vaccines via VaxiJen v2.0 (Doytchinova and Flower 2007b) and Proso II server (Smialowski et al., 2007), respectively.
Physicochemical characterization and secondary structure analysis
ProtParam tool (https://web.expasy.org/n class="Chemical">protparam/), provided by ExPASy server (Hasan et al. 2019c) was used to functionally characterize (Gasteiger et al., 2005) the vaccine constructs. The studied functional properties were isoelectric pH, molecular weight, aliphatic index, instability index, hydropathicity, estimated half-life, GRAVY values and other physicochemical characteristics. Alpha helix, beta sheet and coil structures of the vaccine constructs were analyzed through GOR4 secondary structure prediction method using Prabi (https://npsa-prabi.ibcp.fr/). In addition, Espript 3.0 (Robert and Gouet 2014) was also used to predict the secondary structure of the stated protein sequences.
Homology modeling, structure refinement, validation and disulfide engineering
Vaccine 3D model was generated on the basis of perceene">ntage similarity betweeene">n target n class="Chemical">protein and available template structures from PDB by using I-TASSER (Peng and Xu 2011). The modelled structures were further refined via FG-MD refinement server. Structure validation was performed by Ramachandran plot assessment in RAMPAGE (Hasan et al. 2019b). By utilizing DbD2 server, probable disulfide bonds were designed for the anticipated vaccine constructs (Craig and Dombkowski, 2013). The value of energy was considered <2.5, while the chi3 value for the residue screening was chosen between −87 to +97 for the operation (Hasan et al. 2019b).
Conformational B-cell and IFN-α inducing epitopes prediction
The B-cell epitopes of putative vaccine molecules were predicted via ElliPro server (http://tools.iedb.org/ellin class="Chemical">pro/) with minimum score 0.5 and maximum distance of 7 Å (Ponomarenko et al., 2004). Moreover, IFN- inducing epitopes within the vaccine were predicted using IFNepitope with motif and SVM hybrid detection strategy (Hajighahramani et al., 2017).
Molecular dynamics and normal mode analysis (NMA)
Normal mode analysis (NMA) was performed to predict the stability and large scale mobility of the vaccine proteiene">n. The iMod server determiene">ned the stability of construct V3 by compariene">ng the esseene">ntial dyene">namics to the normal modes of n class="Chemical">protein (Aalten et al., 1997; Wüthrich et al., 1980). It is a recommended alternative to costly atomistic simulation (Tama and Brooks 2006; Cui and Bahar 2007) and shows much quicker and efficient assessments than the typical molecular dynamics (MD) simulations tools (Prabhakar et al. 2016; Awan et al. 2017). The main-chain deformability was also predicted by measuring the efficacy of target molecule to deform at each of its residues. The motion stiffness was represented via eigenvalue, while the covariance matrix and elastic network model was also analyzed.
Protein-protein interaction study
Patchdock server was prioritized for docking between different HLA alleles and the putative vaccine molecules (Azim et al. 2020). In addition, the superior construct was also docked with different human immuene">ne receptors such as n class="Gene">ACE 3, APN, DPP4 and TLR-8. The 3D structures of these receptors were retrieved from RCSB protein data bank. Detection of highest binding affinity between the putative vaccine molecules and the receptor was experimented based on the lowest interaction energy of the docked structure.
Codon adaptation and in silico cloning
JCAT tool was utilized for codon adaptation in order to fasten the expression of vaccine construct V3 in E. coli straiene">n K12. For this, some restriction eene">nzymes (i.e. BglI aene">nd BglII), Rho iene">ndepeene">ndeene">nt traene">nscription termiene">nation aene">nd n class="Chemical">prokaryote ribosome-binding site were put away from the work (Grote et al., 2005). After that, the mRNA sequence of constructed V3 vaccine was ligated within BglI (401) and BglII (2187) restriction site at the C-terminal and N-terminal sites respectively. SnapGene tool was utilized for in silico restriction cloning (Solanki and Tiwari 2018).
Results
COVID-19 exhibits close ancestral relation to SARS-CoV-1 and bat-coronavirus
In the phylogenetic analysis, we introduced different coronavirus from n class="Chemical">three different genera: alpha-coronavirus, beta-coronavirus and gamma-coronavirus. Total 61 species of the corona viruscovered 21 sub-genera (Table S1 and Fig. 2). These 61 species of corona viruses included 7 pathogenic species (Fig. 2), which are: COVID-19 or SARS-CoV-2, SARS-CoV-1 virus, MERS virus, HCoV-HKU1, HCoV-OC43, HCoV-NL63, HCoV-229E (Forni et al., 2017; Zhou et al., 2020a, Zhou et al., 2020b; Zumla et al. 2016). Among these, the first five species belong to the beta-coronavirus genera, while the last two belongs to the alpha-genera. Here, none of the members of Gammacoronavirus studied in this study were found to use human as potential host. Apart from the human coronaviruses, we introduced other coronaviruses which choose different species of bats, whale, turkey, rat, mink, ferret, swine, camel, rabbit, cow and others as host (Table S1).
Fig. 2
Phylogeny of 61 species of coronaviruses. Seven pathogenic human coronaviruses have been represented by blue star and the IDs have been made bold. COVID-19 clade has been shown with red colour. SARS-CoV-1 and MERS virus have been represented by orange and blue colors, respectively. The genera have been represented on the left coloured labels. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
The phylogeny of these species clearly revealed two broad clades (Fig. 2), where first large clade contains Gammacoronavirus aene">nd Alpha-n class="Species">coronavirus genera, while the other belongs to the beta-coronavirus. Within the beta-coronavirus clade, we found three clear divisions. HCoV-HKU1 and HCoV-OC43 have been placed in the first clade, while in the second clade we found MERS coronavirus. COVID-19 or SARS-COV-2 formed clade with SARS-COV-1 and bat beta-coronaviruses (Fig. 2), which is consistent with the previous finding (Ceraolo and Giorgi 2020). Though SARS-CoV-1 belongs to the same sub-genus as the COVID-19, the bat coronaviruses belong to two different sub-genera including Hibecovirus and Nobecovirus (File S1).
Evolution of spike proteins based on domain
Domain analysis of spiken class="Chemical">protein of coronaviruses reveals that they contain mainly one signature domains namely, coronavirus S2 glycoprotein (IPR002552), which is present in all the candidates. All other beta-coronavirus contains spike receptor binding protein (IPR018548), coronavirusspike glycoprotein hapted receptor 2 domain (IPR027400) and spike receptor binding domain superfamily (IPR036326). SARS-CoV-1 contains an extra domain, namely spike glycoprotein N-terminal domain (IPR032500), which is also present in some the sub-genera (Embecovirus) of beta-coronavirus, but not in COVID-19. One important finding in our study is that the COVID-19 candidates do not contain the domain spike glycoprotein (IPR042578), which is present in the SARS-CoV-1 (Fig. 3). The secondary structure prediction study shows a large numbers of cysteine residues which contribute to the formation of disulfide bonds within the spikeprotein. Most of them fall within the S1 spikeprotein, which is 654 amino acid long in SARS-CoV-1, while 672 amino acids long in COVID-19. The RGD motif which is conserved within the COVID-19 is present in the vicinity of the S1 protein. It exists as KGD that clearly demonstrates the mutation over the short time period. Again, the receptor binding domain and receptor binding motif analyses disclose variations within several region between the COVID19 and SARS-CoV-1 (File S2). The domain-based phylogenetic analysis reflects two main divisions, where the all the novel beta-coronavirus i.e., COVID19 form clade with the SARS-CoV-1; while other beta-coronavirus fall in another clade which further divide to give rise different sub-genera. This clearly shows that the COVID-19 exerts specific ancestral connection to the SARS-CoV-1 in terms of spike glycoproteins. Interestingly, our study also revealed close relatedness of both the SARS-CoV-1 and COVID-19 to the bat beta-coronavirus that belongs to the Hibecovirus sub-genus. However, in our study, the bat coronaviruses of Nobecovirus sub-genus did not fall into the same clade of novel coronaviruses. The phylogenetic study and MSA also revealed that, the functional portion of the spike glycoprotein domain and spike glycoprotein N-terminal domain might be lost from the COVID-19 during the course of evolution.
Fig. 3
Phylogeny of spike glycoprotein of coronavirus. The sub-genera have been labeled in the left table. The filled and unfilled circles show the presence and absence of the domains labeled on the top.
Domain architecture and ancestral state of envelope proteins
The envelope proteins of both n class="Species">beta-coronavirus and Alpha-coronavirus contain only one protein domain (IPR003873) namely, Nonstructural protein NS3 or small envelope protein E (NS3/E). This domain is well conserved in coronavirus and also found in murine hepatitis virus. On the other hands, the gamma coronavirus shows the exception, which possess (IPR005296) IBV3C protein domain, which thought to be expressed from the ORF3C gene of infectious bronchitis virus (Jia and Naqi 1997) (File S1, File S2). The length of the domains for the COVID-19, SARS-CoV and MARS virus are 75, 76 and 82 amino acids, respectively. on the contrary, the length of the gamma coronavirus candidate in our study which utilizes turkey as host is 99 amino acids.
The average length of the COVID-19n class="Gene">envelope proteins is 75 amino acids long. The NS3/E protein domains span the whole length of the protein and possess mainly one transmembrane domain, one non-cytoplasmic domain and a cytoplasmic domain. However, some species from the sub-genera of Embecovirus shows two transmembrane domains (Fig. 4, File S1, File S2). While in our in-silico study, we found 2 transmembrane domains in SARS-CoV-1 and 2–3 transmembrane domains in MERS virus, previous experiments proved that both contains only one α-helical transmembrane domains (Nieto-Torres et al. 2011; Surya et al. 2015). Though the computational analysis of CoVID-19envelope protein secondary structure shows 2 transmembrane domains, our domain analysis shows only one such domain in their structures. Again, the Turkeycorona viruses also possess the same transmembrane, non-cytoplasmic domain and cytoplasmic domain, with a variation in the orientation. The domain-based phylogeny of the envelope proteins of novel corona viruses reveals close ancestral relationship with the SARS-CoV-1 and bat coronaviruses (Fig. 4). In spite to the previous findings, where it was found that the envelope proteins of the MERS virus and SARS-CoV-1 exerted close proximity in terms of secondary structure and functions (Surya et al. 2015). Unlike to earlier finding, we got that gamma-coronavirus candidate in our study shows close connection with both SARS-CoV-1 and COVID-19 in terms of envelope proteins.
Fig. 4
Phylogeny of envelope proteins of coronaviruses. The sub-genera under three different genera have been shown on the left labels. The star signs represent the COVID-19 virus. 5 different pathogenic human corona viruses have been shown in bold form, including the MERS virus (blue) and SARS-CoV-1 (yellow). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Domain architecture and phylogeny of membrane proteins
Membrane proteiene">ns of all the n class="Species">coronavirus mainly contain coronavirus M matrix/glycoprotein (IPR002574) domain family. However, the candidates of alpha- and gamma-coronaviruses contain M matrix/glycoprotein: Alpha-coronavirus (IPR042551) and M matrix/glycoprotein: gamma-coronavirus (IPR042550) domains, respectively. The next two domains belong to the M matrix/glycoprotein domain family. The membrane proteins of coronaviruses range from 221 to 230 amino acid long. Computational analysis of secondary structures shows some variations of COVID-19 with SARS-CoV-1 and MERS virus (File S3). SARS-CoV-2 possesses alpha helical structure in their structure, while the other two completely devoid of this structure. Again, both SARS-CoV-1 and MARS contain parallel beta sheets, while it is completely absent in the novel corona virus (File S3). The phylogenetic analysis of the membrane supports that the novel coronavirus is closely connected to the SARS-CoV-1 virus membrane protein. As well as it produced connections with bat coronaviruses of Hibecovirus and Nobecovirus sub-genus (Fig. 5).
Fig. 5
Phylogeny of membrane proteins of coronavirus. The sub-genera under three different genera have been shown on the left labels. The blue and orange star represent the SARS-CoV1 and MERS virus, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Domain-based phylogeny of nucleocapsid proteins
The length of nucleocapsidn class="Chemical">proteins of beta-coronavirus genus ranges from 410 to 450 amino acids. Three signature domains are mainly present in the nucleocapsidproteins, which are: CoronavirusNucleocapsidprotein (IPR001218), NucleocapsidProteins C-terminal (IPR037179) and NucleocapsidProteins N-terminal (IPR037195). However, in our experiment, we didn't find these domains in HCoV-HKU1 (Fig. 6
); this virus belongs to Embecovirus sub-genus and contains only CoronavirusNucleocapsid I (IPR004876) domain. The candidates of alpha- and gamma-coronavirus have their special domains. According to our domain-based phylogeny study of nucleocapsidproteins, we found the close approximation of the COVID-19 with the SARS-CoV-1, which is consistent with the findings of phylogeny of whole genome. Strikingly, unlike spike and membrane proteins, the closest homologs of COVID-19 are not only the SARS-COV-1 and bat coronaviruses (Sub-genus: Hibecovirus and Nobecovirus), but it also includes the MERS viruses and other proteins which is from the Merbecovirus sub-genus.
Fig. 6
Phylogeny of nucleocapsid proteins of coronavirus. The sub-genera under three different genera have been shown on the left labels. The filled and unfilled circles show the presence and absence of the domains labeled on the top. COVID-19, SARS-CoV1 and MERS viruses are clades are labeled with blue, green and red colors, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Phylogeny of 61 species of coronaviruses. Seveene">n pathogeene">nic n class="Species">human coronaviruses have been represented by blue star and the IDs have been made bold. COVID-19 clade has been shown with red colour. SARS-CoV-1 and MERS virus have been represented by orange and blue colors, respectively. The genera have been represented on the left coloured labels. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Phylogeny of spike glycon class="Chemical">protein of coronavirus. The sub-genera have been labeled in the left table. The filled and unfilled circles show the presence and absence of the domains labeled on the top.
Phylogeny of envelope proteins of n class="Species">coronaviruses. The sub-genera under three different genera have been shown on the left labels. The star signs represent the COVID-19 virus. 5 different pathogenic humancorona viruses have been shown in bold form, including the MERS virus (blue) and SARS-CoV-1 (yellow). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Phylogeny of membrane proteiene">ns of n class="Species">coronavirus. The sub-genera under three different genera have been shown on the left labels. The blue and orange star represent the SARS-CoV1 and MERS virus, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Phylogeny of nucleocapsidn class="Chemical">proteins of coronavirus. The sub-genera under three different genera have been shown on the left labels. The filled and unfilled circles show the presence and absence of the domains labeled on the top. COVID-19, SARS-CoV1 and MERS viruses are clades are labeled with blue, green and red colors, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
A total 31, 24, 29 and 29 sequences of spike glycon class="Chemical">protein, membrane protein, envelope protein and nucleocapsidprotein were retrieved from the NCBI database belonging to different strains of SARS-CoV-2. Following by BLASTp analysis and Multiple Sequence Alignment, two conserved regions were detected for membrane glycoprotein and nucleocapsid, while single fragments were identified for both spike glycoprotein and envelope protein (Table 1
).Results showed that, all the conserved sequences except one from membrane glycoprotein met the criteria of default threshold standard in VaxiJen. Again, transmembrane topology scrutinizing showed that among the immunogenic conserved sequences from the corresponding proteins except spike glycoprotein met the criteria of desired exomembrane characteristics (Table 1).
Table 1
Identified conserved regions among different homologous protein sets.
Identified conserved regions among different homologous proteiene">n sets.
A plethora of immunogenic epitopes were generated from the conserved sequeene">nces that were able to biene">nd with most noteworthy number of HLA cells (Table S1, Table S2, Table S3, Table S4). Top epitopes with exomembraene">ne characteristics were raene">nked for each iene">ndividual n class="Chemical">protein after investigating their antigenicity score and transmembrane topology (Table 2).
Table 2
Predicted T-cell (CTL and HTL) epitopes of Spike glycoprotein, membrane protein, envelope protein and nucleocapsid protein.
Types
Proteins
Epitope
Start
End
Vaxijen score
No. of HLAs
Conservancy
CTL epitopes
Spike glycoprotein
ADYNYKLPD
26
34
1.3382
81
100% (47/47)
Membrane protein
GIAIAMACL
18
26
1.2059
54
10.6% (5/47)
LACFVLAAV
1
9
1.1825
27
93.6% (44/47)
LVGLMWLSY
26
34
1.0633
54
8.51% (4/47)
LACFVLAAVY
1
10
1.0354
27
91.5% (43/47)
CLVGLMWLSY
25
34
1.0255
27
8.51% (4/47)
LVIGAVILR
18
26
1.1027
54
8.51% (4/47)
ELVIGAVILR
17
26
0.9998
27
8.51% (4/47)
ESELVIGAV
15
23
0.9872
54
100% (47/47)
IGAVILRGH
20
28
0.9127
27
8.51% (4/47)
LESELVIGA
14
22
0.8597
27
23.4% (11/47)
Envelope protein
VKPSFYVYS
52
60
1.0547
27
8.33% (3/36)
FLLVTLAIL
26
34
0.9645
81
88.9% (32/36)
VVFLLVTLA
24
32
0.9374
27
83.3% (30/36)
LAILTALRL
31
39
0.8872
54
91.7% (33/36)
LNSSRVPDL
65
73
0.8553
54
5.56% (2/36)
LLFLAFVVF
18
26
0.8144
81
63.9% (23/36)
VFLLVTLAI
25
33
0.8134
54
88.9% (32/36)
LAFVVFLLV
21
29
0.7976
54
72.2% (26/36)
VSLVKPSFY
49
57
0.7476
81
11.11% (4/36)
FVVFLLVTL
23
31
0.7403
81
83.3% (30/36)
Nucleocapsid
RSGARSKQR
32
40
1.7874
81
6.41% (5/78)
Protein
DLSPRWYFY
103
111
1.7645
81
7.69% (6/78)
DGKMKDLSP
98
106
1.7554
27
7.69% (6/78)
KMKDLSPRW
100
108
1.7462
54
7.69% (6/78)
TQHGKEDLKF
57
66
1.646
54
7.69% (6/78)
KLDDKDPNF
144
152
2.6591
27
93.6% (73/78)
AIKLDDKDP
142
150
2.167
27
8.97% (7/78)
LDDKDPNFK
145
153
1.9433
54
93.6% (73/78)
GAIKLDDKDP
141
150
1.9075
27
39.7% (31/78)
TQGNFGDQE
88
96
1.8694
27
8.97% (7/78)
Spike glycoprotein
SFVIRGDEVRQIAPG
6
20
0.5882
27
8.70% (8/92)
DSFVIRGDEVRQIAP
5
19
0.1792
8.70% (8/92)
Membrane glycoprotein
LACFVLAAVYRINWI
1
15
1.2905
27
8.51% (4/47)
FVLAAVYRINWITGG
4
18
1.0230
27
8.51% (4/47)
AIAMACLVGLMWLSY
20
34
0.9526
27
8.51% (4/47)
IAIAMACLVGLMWLS
19
33
0.9464
27
8.51% (4/47)
ITGGIAIAMACLVGL
15
29
0.9310
27
6.38% (3/47)
LVIGAVILRGHLRIA
18
32
0.8769
27
8.51% (4/47)
ELVIGAVILRGHLRI
17
31
0.7972
27
8.51% (4/47)
PLLESELVIGAVILR
12
26
0.7261
27
8.51% (4/47)
SELVIGAVILRGHLR
16
30
0.6768
27
8.51% (4/47)
LESELVIGAVILRGH
14
28
0.6528
27
8.51% (4/47)
Envelope protein
VTLAILTALRLCAYC
29
43
0.8599
27
75.0% (27/36)
LAFVVFLLVTLAILT
21
35
0.8229
27
63.9% (23/36)
LLFLAFVVFLLVTLA
18
32
0.8122
27
58.3% (21/36)
VSLVKPSFYVYSRVK
49
63
0.7974
27
8.33% (3/36)
VVFLLVTLAILTALR
24
38
0.7559
27
77.8% (28/36)
VNVSLVKPSFYVYSR
47
61
0.7513
27
8.33% (3/36)
FLAFVVFLLVTLAIL
20
34
0.7476
27
63.9% (23/36)
LFLAFVVFLLVTLAI
19
33
0.7471
27
61.1% (22/36)
ILTALRLCAYCCNIV
33
47
0.7427
27
83.3% (30/36)
LVKPSFYVYSRVKNL
51
65
0.7311
27
8.33% (3/36)
Nucleocapsid protein
DLSPRWYFYYLGTGP
103
117
1.5180
27
7.69% (6/78)
LSPRWYFYYLGTGPE
104
118
1.3086
27
100% (78/78)
KDLSPRWYFYYLGTG
102
116
1.2051
27
7.69% (6/78)
GGDGKMKDLSPRWYF
96
110
1.169
27
7.69% (6/78)
GDGKMKDLSPRWYFY
97
11
1.1013
27
7.69% (6/78)
LDDKDPNFKDQVILL
145
159
1.4829
27
8.97% (7/78)
WLTYTGAIKLDDKDP
136
150
1.2787
27
6.41% (5/78)
DDKDPNFKDQVILLN
146
160
1.2508
27
8.97% (7/78)
AFFGMSRIGMEVTPS
119
133
1.1085
27
83.3% (65/78)
DPNFKDQVILLNKHI
149
163
1.1072
27
8.97% (7/78)
Conservancy analysis, toxicity profiling, population coverage and allergenicity pattern of the predicted epitopes
Epitopes from each proteiene">n showed high level of conservaene">ncy up to 100% (Table 2). Toxiene">nPred server predicted the relative n class="Disease">toxicity of each epitope which indicated that the top epitopes were non-toxin in nature (Table S5). Population coverage of four structures proteins were also done for the predicted CTL and HTL epitopes. From the screening, results showed that population of the various geographic regions could be covered by the predicted T-cell epitopes (Fig. 7).Finally, the allergenic epitopes were excluded from the list based on the evaluation of four allergenicity prediction server (Table S5).
Fig. 7
Population coverage analysis of spike protein (A), envelope protein (B), membrane protein (C) and nucleocapsid proteins (D).
Identification of B cell epitopes and conservancy analysis
Top B-cell epitopes were predicted for Spike glycon class="Chemical">protein, membrane protein, envelope protein and nucleocapsidprotein using 3 distinct algorithms (i.e. Bepipred Linear Epitope prediction, Emini Surface Accessibility, Kolaskar and Tongaonkar Antigenicity prediction) from IEDB. Epitopes were also allowed to analyze their vaxijen scoring and allergenicity (Table 3
). Results revealed that top B-cell and T-cell epitopes were highly conserved (up to100%) among different strains of SARS-CoV-2 (Table 4
), while 19 out of final 20 epitopes showed conservancy > 90%. Therefore, the designed vaccine construct is expected to confer broad range immunity in the host.
Table 3
Allergenicity and antigenicity assessment of predicted B-cell epitopes.
Protein
Algorithms
Top peptide sequence
Allergenicity
Vaxijen score
Spike glycoprotein
Bepipred Linear Epitope Prediction 2.0
IRGDEVRQIAPGQTGKIADYNYK
Non-allergen
1.0547
Emini surface accessibility prediction
GDEVRQ
Non-allergen
0.6701
Kolaskar and Tongaonkar antigenicity
VRQIAPG
Non-allergen
1.2611
Membrane glycoprotein
Bepipred Linear Epitope Prediction 2.0
MWSFNPETN
Non-allergen
0.5509
Emini surface accessibility prediction
LFARTRSMWSFNPET
Non-allergen
0.9033
Kolaskar and Tongaonkar antigenicity
AMACLVGLM
Non-allergen
0.6251
Envelope protein
Bepipred Linear Epitope Prediction 2.0
YVYSRVKNLNSSRVP
Non-allergen
0.4492
Emini surface accessibility prediction
PSFYVYSRVKNLNSSRVP
Non-allergen
0.5796
Kolaskar and Tongaonkar antigenicity
VNSVLLFLAFVVFLLVTLA
Non-allergen
0.5893
Nucleocapsid protein
Bepipred Linear Epitope Prediction 2.0
PGSSRGTSPARMAGNGG
Non-allergen
0.3854
Emini surface accessibility prediction
TEPKKDKKKKAD
Non-allergen
0.2378
Kolaskar and Tongaonkar antigenicity
HWPQIAQFAPSASAF
Non-allergen
0.4320
Table 4
Conservancy of the final epitopes among different SARS-CoV-2 strains (retrieved from the list of World Health Organization for top 30 infected regions).
Protein
Epitope
Conservancy
Spike glycoprotein
ADYNYKLPD
100.00% (31/31)
SFVIRGDEVRQIAPG
100.00% (31/31)
IRGDEVRQIAPGQTGKIADYNYK
100.00% (31/31)
GDEVRQ
100.00% (31/31)
VRQIAPG
100.00% (31/31)
Envelope protein
FVVFLLVTL
96.30% (26/27)
LCAYCCNIV
92.59% (25/27)
VTLAILTALRLCAYC
96.30% (26/27)
YVYSRVKNLNSSRVP
92.59% (25/27)
PSFYVYSRVKNLNSSRVP
92.59% (25/27)
Membrane glycoprotein
LVIGAVILR
96.67% (29/30)
LACFVLAAVYRINWI
100.00% (30/30)
MWSFNPETN
100.00% (30/30)
LFARTRSMWSFNPET
100.00% (30/30)
AMACLVGLM
100.00% (30/30)
Nucleocapsid protein
AIKLDDKDP
93.33% (28/30)
TEPKKDKKKKAD
93.33% (28/30)
HWPQIAQFAPSASAF
93.33% (28/30)
DLSPRWYFYYLGTGP
93.33% (28/30)
PGSSRGTSPARMAGNGG
76.67% (23/30)
Population coverage aene">nalysis of n class="Gene">spike protein (A), envelope protein (B), membrane protein (C) and nucleocapsidproteins (D).
Predicted T-cell (CTL and HTL) epitopes of n class="Gene">Spike glycoprotein, membrane protein, envelope protein and nucleocapsidprotein.
Allergenicity and antigenicity assessment of predicted B-cell epitopes.Conservancy of the final epitopes among different SARS-CoV-2 straiene">ns (retrieved from the list of World Health Orgaene">nization for top 30 n class="Disease">infected regions).
Three putative vacciene">ne molecules (i.e. V1, V2 aene">nd V3) were constructed, each comprisiene">ng a n class="Chemical">protein adjuvant, eight T-cell epitopes, twelve B-cell epitopes and respective linkers (Table S6). PADRE sequence was included to extend the efficacy and potency of the constructed vaccine. The putative vaccine constructs, V1, V2 and V3 were 397, 481 and 510 residues long respectively. However, allergenicity score of V3 (−0.89886723) revealed that it was superior among the three constructs in terms safety and efficacy. V3 also had a solubility score (0.60) (Fig. 8E) and antigenicity (0.58) over threshold value (Table 5).
Physicochemical characterization and secondary structure analysis of the construct
ProtParam tool was employed to aene">nalyze the physicochemical n class="Chemical">properties of V3. Molecular weight of 3 was scored as 55.181 kDa. The extinction coefficient of V3 was calculated as 63,830 at 0.1% absorption. It had been found that the protein would have net negative charge which was higher than the recommended pI 9.81. Aliphatic index and GRAVY value were found 77.80 and −0.383 respectively, which could express the thermostability and hydrophilic status of the V3 vaccine construct. Around sixty minutes in vitro half-life stability in mammalian reticulocytes was predicted for V3. The computed instability index (II) 36.98 classified the protein as a stable one. In contrast, Secondary structure of V3 exhibited to have 46.47% alpha helix, 15.00% sheet and 38.63% coil structure (Fig. S1).
Fig. S1
Secondary structure prediction of constructed vaccine protein V3.
Tertiary structure of the putative vaccine construct V3 was generated usiene">ng I-TASSER server (Fig. 8A aene">nd B). The server used 10 best templates with highest sigene">nificaene">nt (measured via Z-score) from the LOMETS n class="Chemical">threading program to model the 3D structure. After refinement, Ramachandran plot analysis revealed that 92.7% and 5.7% residues were in the favored and allowed regions respectively, while only 8 residues (1.6%) occupied in the outlier region (Fig. 8C). The overall quality factor determined by ERRAT server was 91.56% (Fig. 8D). 3D modelled structure of V1 and V2 are shown in Fig. S2. DbD2 server recognized 33 pairs of amino acid residue with the potentiality to create disulfide bond between them. After analysis chi3 and B-factor parameter of residue pairs on the basis of energy, only 2 pairs (PRO 277-THR 329 and LEU 425-CYS 435) met the criteria for disulfide bond formation which were changed with cysteine (Fig. S3).
Fig. S2
3D modelled structure of vaccine protein V1 and V2.
Fig. S3
Disulfide engineering of vaccine protein V3 (A: Initial form, B: Mutant form).
Conformational B-cell and IFN- inducing epitopes prediction
Ellipro server predicted a total 6 conformational B-cell epitopes from the 3D structure of the construct V3. Epitopes No. 1 were considered as the broadest conformational B cell epitopes with 25 amiene">no acid residues (Fig. 9 aene">nd Table S7). Results also revealed that predicted liene">near epitopes from 76 to 101, 131–143 aene">nd 48–55 were iene">ncluded iene">n the conformational B-cell epitopes. Moreover, the sequeene">nce of the fiene">nal vacciene">ne was scaene">nned for 15-mer IFN-iene">nduciene">ng epitopes. Results showed that there were 292 positive IFN- iene">nduciene">ng epitopes from which 20 had a score ≥ 5 (Table S8). Residues of 194–209 regions (GGGSLVIGAVILRGG) iene">n the vacciene">ne showed highest score of 17 (Table S8).
Fig. 9
Predicted conformational epitopes (A and B) and linear epitopes (C and D) within construct V3.
Molecular dynamics and normal mode analysis
Stability of the vaccine construct V3 was investigated through mobility aene">nalysis (Fig. 10A aene">nd B), B-factor, eigeene">nvalue aene">nd deformability aene">nalysis, n class="Species">covariance map and recommended elastic network model. Results revealed that the placements of hinges in the chain was insignificant (Fig. 10C) and the B-factor column gave an averaged RMS (Fig. 10D). The estimated higher eigenvalue 6.341333e−06 (Fig. 10E) indicated low chance of deformation of vaccine protein V3. The correlation matrix and elasticity of the construct have been shown in Fig. 10G and Fig. 10H, respectively.
Fig. 10
Normal Mode Analysis (NMA) of vaccine protein V3. The directions of each residues are given by arrows and the length of the line represented the degree of mobility in the 3D model (A and B). The main-chain deformability derived from high deformability regions indicated by hinges in the chain which are negligible (C). The experimental B-factor was taken from the corresponding PDB field and calculated from NMA (D). The eigenvalue represents the motion stiffness and directly related to the energy required to deform the structure (E). The variance associated to each normal mode is inversely related to the eigenvalue. Coloured bars show the individual (red) and cummulative (green) variances (F). The covariance matrix indicates coupling between pairs of residues, where they may be associated with correlated, uncorrelated or anti-correlated motions, indicated by red, white and blue colors respectively (G). The elastic network model identifies the pairs of atoms connected via springs. Each dot in the diagram is coloured based on extent of stiffness between the corresponding pair of atoms. The darker the greys, the stiffer the springs (H). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Homology modeling, structure validation and solubility prediction of construct V3 (A: Cartoon structure, B: Surface structure, C: Ramachandran Plot analysis, D: Quality factor analysis, E: Solubility analysis).Predicted conformational epitopes (A and B) and linear epitopes (C and D) within construct V3.Allergenicity, antigenicity and solubility prediction of designed vaccine constructs.The bold sequences are linker sequencesNormal Mode Analysis (NMA) of vaccine proteiene">n V3. The directions of each residues are giveene">n by arrows aene">nd the leene">ngth of the liene">ne represeene">nted the degree of mobility iene">n the 3D model (A aene">nd B). The maiene">n-chaiene">n deformability derived from high deformability regions iene">ndicated by hiene">nges iene">n the chaiene">n which are negligible (C). The experimeene">ntal B-factor was takeene">n from the correspondiene">ng PDB field aene">nd calculated from NMA (D). The eigeene">nvalue represeene">nts the motion stiffene">ness aene">nd directly related to the eene">nergy required to deform the structure (E). The variaene">nce associated to each normal mode is iene">nversely related to the eigeene">nvalue. Coloured bars show the iene">ndividual (red) aene">nd cummulative (greeene">n) variaene">nces (F). The n class="Species">covariance matrix indicates coupling between pairs of residues, where they may be associated with correlated, uncorrelated or anti-correlated motions, indicated by red, white and blue colors respectively (G). The elastic network model identifies the pairs of atoms connected via springs. Each dot in the diagram is coloured based on extent of stiffness between the corresponding pair of atoms. The darker the greys, the stiffer the springs (H). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Protein-protein docking
The structural interaction between HLA alleles and the designed vaccines were investigated by molecular docking approach. The server detected the complexed structure by focusiene">ng on complemeene">ntary score, ACE (Atomic Contact Eene">nergy) aene">nd estimated iene">nterface area of the compouene">nd (Table 6
). The molecular affiene">nity betweeene">n the putative vacciene">ne molecules V3 aene">nd several immuene">ne receptors were also experimeene">nted. The result showed that construct V3 iene">nteracted with each receptor with sigene">nificaene">ntly lower biene">ndiene">ng eene">nergy (Fig. 11
).
Table 6
Docking score of vaccine construct V3 with different HLA alleles and human immune receptors.
Vaccine construct
PDB ID's HLAs
Global energy
Hydrogen bond energy
ACE
Score
Area
V1
1a6a
−11.07
−2.61
8.85
17,640
2211.50
1h15
−17.51
−1.51
4.35
15,862
2302.20
2fse
−5.38
0.00
5.61
16,720
2493.50
2q6w
−10.48
−0.39
−1.45
15,348
2289.00
2seb
9.03
0.00
0.20
17,640
2903.00
3c5j
−25.18
−4.60
8.20
16,804
3022.80
V2
1a6a
−37.24
−3.11
−9.54
19,666
2703.90
1h15
−9.33
0.00
4.33
16,132
2312.70
2fse
−0.33
0.00
−0.60
16,702
2956.10
2q6w
−10.85
−3.06
1.20
15,062
1924.40
2seb
1.38
0.00
−0.79
18,008
3034.30
3c5j
10.58
−1.56
5.54
16,158
2385.30
V3
1a6a
−4.14
−1.49
12.24
14,358
2208.30
1h15
−3.15
−4.45
10.93
13,908
1637.70
2fse
−8.68
−2.76
9.98
18,340
3024.40
2q6w
−21.02
−2.41
5.05
17,060
2626.90
2seb
−17.54
−5.91
3.87
16,094
2946.80
3c5j
−2.30
0.00
−2.30
18,550
3041.00
Fig. 11
Docked complex of vaccine construct V3 with human ACE 2 (A), TLR- (B), DPP4 (C) and APN (D).
Docking score of vaccine construct V3 with different HLA alleles and human immuene">ne receptors.
Docked complex of vaccine construct V3 with humann class="Gene">ACE 2 (A), TLR- (B), DPP4 (C) and APN (D).
The Codon Adaptation Index (CAI) and GC content for the predicted codons of the putative vaccine constructs V1 were demonstrated as 1.0 aene">nd 51.56% respectively. Aene">n iene">nsert of 1542 bp was fouene">nd which lacked the restriction sites for BglI aene">nd BglII, thus n class="Chemical">providing comfort zone for cloning. The codons were inserted into pET28a(+) vector alongside two restriction sites (BglI and BglII) and a clone of 5125 base pair was generated (Fig. 12
).
Fig. 12
In silico restriction cloning of the gene sequence of final vaccine construct V3 into pET28a(+) expression vector (A: Restriction digestion of the vector pET28a(+) and construct V3 with BglII and BglI, B: Inserted desired fragment (V3 Construct) between BglII (401) and BglI (1943) indicated in red colour. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
In silico restriction cloning of the gene sequence of final vaccine construct V3 into pET28a(+) expression vector (A: Restriction digestion of the vector pET28a(+) and construct V3 with BglII and BglI, B: Inserted desired fragment (V3 Construct) between BglII (401) and BglI (1943) indicated in red colour. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Discussion
In December 2019, a new coronavirus prevaleene">nce flourished iene">n Wuhaene">n, Chiene">na, causiene">ng clutter among the medical commuene">nity, as well as to the rest of the world (Suene">n et al., 2020a, Suene">n et al., 2020b). The new species has beeene">n reene">named as n class="Species">2019-nCoV or, SARS-CoV-2, already causing considerable number infections and deaths in China, Italy, Spain, Iran, USA and to a growing degree throughout the world. The major outbreak and spread of SARS-CoV-2 in 2020 forced the scientific community to make considerable investment and research activity for developing a vaccine against the pathogen. However, owing to high infectivity and pathogenicity, the culture of SARS-CoV-2 needs biosafety level 3 conditions, which may obstructed the rapid development of any vaccine or therapeutics. It had been found that about 35 companies and academic institutions are engaged in such works (Spinney et al., 2020, Ziady, 2020). Among the potential SARS-CoV-2 vaccines in the pipeline, four have nucleic acid based designs, four involve non-replicating viruses or protein constructs, two contain live attenuated virus and one involves a viral vector (Pang et al. 2020), while only one, called mRNA-1273 (developed by NIAID collaboration with Moderna, Inc.), has confirmed to start phase-1 trial (NIH News Event, 2020). However, in this study we emphasized on a different approaches by prioritizing the advantages of different genome and proteome database using the immunoinformatic approach. Computational vaccine predictions were adopted by the researchers to design vaccines against both MERS-CoV (Sudhakar et al. 2013; Fernando et al., 2013) and SARS-CoV-1 (; Oany et al., 2014), targeting the outer membrane or functional proteins (Sharmin and Islam 2014). Several in silico strategies have also been employed to predict potential T cell and B cell epitopes against SARS-CoV-2, either emphasizing on spike glycoprotein or envelope proteins (Behbahani 2020; Rasheed et al., 2020). None of the studies, however, focused on other structural proteins. Moreover, random genetic changes and mutations in the protein sequences (Yin, 2020) may obstruct the development of effective vaccines and therapeutics against human coronavirus in the future. Hence, the present study was employed to identify the similarity and divergence among the close relatives of the target pathogen and develop a novel chimeric recombinant vaccine considering all major structural proteins i.e. spike glycoprotein, membrane glycoprotein, envelope protein and nucleocapsidprotein simultaneously.
Structural proteiene">ns are ofteene">n choseene">n as attractive therapeutic targets by the researchers due to their surface exposure. n class="Gene">Spike (S) protein is currently the most promising antigen formulation for SARS-CoV-2 vaccine research which is able to be directly recognized by host immune system (Wrapp et al. 2020) and already been used for vaccine development to prevent Severe Acute Respiratory Syndrome (SARS) and Middle-East Respiratory Syndrome (MERS) (Zhou et al. 2018; Du et al. 2009). However, Membrane (M) protein, the most abundant one on the virus surface (Neuman et al. 2011), was also reported to stimulate efficient neutralizing antibodies in SARS patients upon immunization (Pang et al. 2004). Moreover, structural investigation revealed the existence of T-cell epitope cluster on the trans-membrane domain of M protein with the ability to induce a strong cellular immune response (Liu et al. 2010). Thus, it can be utilized as a candidate antigen for developing SARS-CoV-2 vaccine. Researchers have already considered Envelope (E) protein for vaccine candidacy against COVID-19 due to high antigenicity (Abdelmageed et al. 2020). It was also investigated against SARS-CoV in 2003 and more recently, against MERS-CoV (Schoeman and Fielding 2019). Nucleocapsid (N) protein, on the other hand, is believed to be more conserved than spike or membrane glycoprotein with the potential to elicit broad-based cellular responses (Zhao et al. 2005). Previously, cellular response against the N protein of some animal coronaviruses enhanced recovery from viral infection (Glansbeek et al. 2002; Wasmoen et al., 1995; Wesseling et al. 1993). A study showed that N protein of SARS-CoV act as an important B cell immunogen, indicating its potential value in vaccine development against SARS (Zhao et al. 2005). All these data suggest that S, E, M and N protein can induce both humoral and cellular immune responses against SARS-CoV-2.
Phylogenetic analysis is used for fundamental as well as applied issues of virus research and it includes the development of antiviral drugs and vaccines (Gorbalenya and Lauber 2017). Here, we used phylogeny data to perform similarity analysis between different coronavirus subfamiles to fiene">nalize which structural n class="Chemical">protein to be used as vaccine target in this study. We wanted to target the proteins which are most common in each subfamily and exhibit significant similarities to design a broad and universally effective vaccine candidate. The results revealed that each of the four major structural proteins possess highly conserved domains among alpha-, beta- and gamma-coronaviruses. Therefore we considered all major structural proteins to design a multiepitope subunit vaccine which is expected to stimulate more effective and broad response in the host. The topology of the phylogenetic trees of the whole genome and the stated four proteins sequences from different species of coronaviruses reveal that SARS-CoV-1 and bat coronaviruses are the closest homologs of the novel coronaviruses. Our results infer a significant level of similarities within the COVID-19 and SARS-CoV-1 which was also aligned with the previous findings (Jaimes et al., 2020; Sun et al., 2020a, Sun et al., 2020b; Wu 2020). The sequence similarities between the SARS-CoV, bat coronaviruses and the COVID-19 from the reported studies (Hu et al. 2018; Wu et al. 2020; Wu 2020) suggests that those are distantly related, in spite those are capable of infecting the humans and therefore possess the adaptive convergent evolution. Interestingly, the COVID-19envelope proteins form clade with the Turkey coronavirus which belongs to gamma-coronavirus genus. So, in terms of envelope proteins, the envelope gene of turkey coronavirus might contribute to the convergence process, which needs further analysis. In addition, from the domain-based phylogeny of nucleocapsidproteins, it can be deduced that this protein might have originated in bats and was transmitted to camels and then later on choose human as the potential host. Overall, the COVID-19 might go through complex adaptation strategies in order to be transmitted into the human via different animals.
The homologous proteiene">n sets for four structural n class="Chemical">proteins of Coronavirus were sorted to identify conserved regions through BLASTp analysis and MSA. Only the conserved sequences were utilized to identify potential B-cell and T-cell epitopes for each individual protein (Table 1). Thus, our constructs are expected to stimulate a broad-spectrum immunity in host upon administration. Cytotoxic CD8 + T lymphocytes (CTL) play a crucial role to control the spread of pathogens by recognizing and killing diseased cells or by means of antiviral cytokine secretion (Garcia et al. 1999). Thus, T cell epitope-based vaccination is a unique process to confer defensive response against pathogenic candidates (Shrestha and Diamond, 2004). Approximately 800 MHC-I peptides (CTL epitopes) and 600 MHC-II peptides (HTL epitopes) were predicted via IEDB server, from which we screened the top ones through analyzing the antigenicity score, trans-membrane topology, conservancy level and other important physiochemical parameters employing a number of bioinformatics tools (Table 2). The top 10 epitopes from each protein was further assessed by investigating the toxicityprofile and allergenicity pattern. Different servers rely on different parameters to predict the allergenic nature of small peptides. Therefore, we used 4 distinct servers for such assessment and the epitopes predicted as non-allergen at least via 3 servers were retained for further analysis (Table S5). Vaccine initiates the generation of effective antibodies that are usually produced by B cells and plays effector functions by targeting specifically to a foreign particles (Cooper and Nemerow 1984). The potential B cell epitopes were generated by three different algorithms (Bepipred linear epitope prediction 2.0, Kolaskar and Tongaonkar antigenicity prediction and Emini surface accessibility prediction) from IEDB database (Table 3).
Suitable linkers and adjuvants were used to combine top finalized epitopes from each proteiene">n that led to develop multiepitope vacciene">ne molecules (Table S6). As PADRE sequeene">nce was usually recommeene">nded to lesseene">n the polymorphism of HLA molecules iene">n the population (Ghaffari-Nazari et al. 2015), it was also considered to construct the fiene">nal vacciene">ne molecule. Here, adjuvaene">nts would eene">nhaene">nce the immuene">nogeene">nicity of the vacciene">ne constructs aene">nd apn class="Chemical">propriate separation of epitopes in the host environment would be ensured by the linker (Yang et al. 2015). Allergenicity, physiochemical properties, antigenicity and three-dimensional structure of vaccine constructs were characterized, and it was concluded that V3 was superior to V1 and V2 vaccine constructs due to its highly negative allergenicity score (−0.89886723). Therefore, it is expected to be less likely to cause adverse side effects or hypersensitivity. V3 also had a solubility score (0.60) and antigenicity (0.58) over threshold value (Table 5). Hence, construct V3 was proposed as the most suitable and safe vaccine protein for further analysis.
The final construct also occupied by several interferon-α produciene">ng epitopes (Table S8). The vacciene">ne n class="Chemical">protein (V3) was subjected to di-sulfide engineering to enhance its stability. Analysis of the normal modes in internal coordinates by iMODS was employed to investigate the collective motion of vaccine molecules (López-Blanco et al., 2014). Negligible chance of deformability at molecular level was analyzed for the putative vaccine construct V3, thereby strengthening our prediction. Moreover, molecular docking was investigated to analyze the molecular affinity of the vaccine with different HLA molecules i.e. DRB1*0101, DRB5*0101, DRB3*0202, DRB1*0401, DRB3*0101 and DRB1*0301 (Table 6). It had been reported that a specific receptor-binding domain of CoVspikeprotein usually recognizes its host receptor ACE2 (angiotensin-converting enzyme 2) (Li et al., 2003; Li 2015). Previous studies also identified dipeptidyl peptidase 4 (DPP4) as a functional receptor for human coronavirus (Raj et al. 2013). Therefore, we performed another docking study prioritizing these immune receptors to strengthen our prediction (Fig. 11). Results showed that the designed construct bound with the selected receptors with minimum binding energy which was biologically significant. Finally, in-silico restriction cloning was adopted to check the suitability of construct V3 for entry into pET28a (+) vector and expression in E. coli strain K12 (Fig. 12).
Traditional ways to vaccine development are time consuming and laborious. Moreover, the result may not be always as expected or fruitful (Stratton et al., 2003; Hasaene">n et al., 2019a, Hasaene">n et al., 2019b). Iene">n silico prediction aene">nd prescreeene">niene">ng methods, on the contrary, offer some advaene">ntages while saviene">ng time aene">nd cost for n class="Chemical">production. Therefore, the present study may aid in the development of preventive strategies and novel vaccines to combat infections caused by 2019-nCoV. However, further wet lab trials involving model organism needs to be experimented for validating our findings.
The following are the supplementary data related to this articleSecondary structure prediction of constructed vaccineproteiene">n V3.
3D modelled structure of vaccineproteiene">n V1 aene">nd V2.
Disulfide eene">ngiene">neeriene">ng of vacciene">ne n class="Chemical">protein V3 (A: Initial form, B: Mutant form).
Table S1
Predicted CTL and HTL epitopes of n class="Gene">spike glycoprotein.
Table S2
Predicted CTL and HTL epitopes of membraene">ne.
Table S3
Predicted CTL and HTL epitopes of n class="Gene">envelope protein
Table S4
Predicted CTL and HTL epitopes of n class="Gene">nucleocapsid protein.
Table S5
Allergenicity pattern and toxicityn class="Chemical">profile of top T cell epitopes.
Table S6
Proposed CTL aene">nd n class="Disease">HTL epitopes for vaccine construction.
Table S7
Predicted conformational epitopes within construct V3.
Table S8
Predicted IFN alpha produciene">ng epitopes iene">n the vacciene">ne coene">nstruct V3.
File S1
NCBI IDs of the complete genome, spike glycon class="Chemical">protein, envelope protein, membrane protein and nucleocapsidprotein of coronavirus with Genera and Sub-Genera.
File S2
Retrieved proteiene">n sequeene">nces of major structural n class="Chemical">proteins of COVID-19.
File S3
Secondary structure and domain analysis of spike glycon class="Chemical">protein, envelope protein, membrane protein and nucleocapsidproteins.
Credit author statement
Mst Rubaiat Nazneen Akhand: Conceptualization, Methodology, Data curation, Formal aene">nalysis, Software aene">nd Writiene">ng - origiene">nal draft. Kazi Faizul Azim: Conceptualization, Methodology, Software, Writiene">ng - origiene">nal draft Syeda Farjaene">na Hoque: Software, Data Cun class="Species">ration and Writing - original draft Mahmuda Akther Moli: Software, Data Curation and Writing - original draft Bijit Das Joy: Software, Validation, and Visualization Hafsa Akter: Software, Validation and Visualization Ibrahim Khalil Afif: Software and Validation Nadim Ahmed: Software and Validation Mahmudul Hasan: Conceptualization, Supervision, Project administration, Resources and Writing - review & editing.
Funding information
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Declaration of Competing Interest
Authors declare that they have no conflict of interests.
Authors: Daniel Wrapp; Nianshuang Wang; Kizzmekia S Corbett; Jory A Goldsmith; Ching-Lin Hsieh; Olubukola Abiona; Barney S Graham; Jason S McLellan Journal: Science Date: 2020-02-19 Impact factor: 47.728
Authors: Miyssa I Abdelmageed; Abdelrahman H Abdelmoneim; Mujahed I Mustafa; Nafisa M Elfadol; Naseem S Murshed; Shaza W Shantier; Abdelrafie M Makhawi Journal: Biomed Res Int Date: 2020-05-11 Impact factor: 3.411
Authors: Muhammad Saqib Sohail; Syed Faraz Ahmed; Ahmed Abdul Quadeer; Matthew R McKay Journal: Adv Drug Deliv Rev Date: 2021-01-17 Impact factor: 17.873
Authors: Daniel Andrés Dos Santos; María Celina Reynaga; Juan Cruz González; Gabriela Fontanarrosa; María de Lourdes Gultemirian; Agustina Novillo; Virginia Abdala Journal: PeerJ Date: 2022-07-25 Impact factor: 3.061