Literature DB >> 32882432

Genome based evolutionary lineage of SARS-CoV-2 towards the development of novel chimeric vaccine.

Mst Rubaiat Nazneen Akhand¹, Kazi Faizul Azim², Syeda Farjana Hoque³, Mahmuda Akther Moli³, Bijit Das Joy¹, Hafsa Akter⁴, Ibrahim Khalil Afif⁵, Nadim Ahmed⁴, Mahmudul Hasan⁶.

Abstract

The present study aimed to predict a novel chimeric vaccine by simultaneously targeting four major structural proteins via the establishment of ancestral relationship among different strains of coronaviruses. Conserved regions from the homologous protein sets of spike glycoprotein, membrane protein, envelope protein and nucleocapsid protein were identified through multiple sequence alignment. The phylogeny analyses of whole genome stated that four proteins reflected the close ancestral relation of SARS-CoV-2 to SARS-COV-1 and bat coronavirus. Numerous immunogenic epitopes (both T cell and B cell) were generated from the common fragments which were further ranked on the basis of antigenicity, transmembrane topology, conservancy level, toxicity and allergenicity pattern and population coverage analysis. Top putative epitopes were combined with appropriate adjuvants and linkers to construct a novel multiepitope subunit vaccine against COVID-19. The designed constructs were characterized based on physicochemical properties, allergenicity, antigenicity and solubility which revealed the superiority of construct V3 in terms safety and efficacy. Essential molecular dynamics and normal mode analysis confirmed minimal deformability of the refined model at molecular level. In addition, disulfide engineering was investigated to accelerate the stability of the protein. Molecular docking study ensured high binding affinity between construct V3 and HLA cells, as well as with different host receptors. Microbial expression and translational efficacy of the constructs were checked using pET28a(+) vector of E. coli strain K12. However, the in vivo and in vitro validation of suggested vaccine molecule might be ensured with wet lab trials using model animals for the implementation of the presented data.

Entities: Chemical Disease Gene Species

Keywords: COVID-19; Chimeric vaccine; Evolutionary relationship; Molecular docking; Normal mode analysis; Restriction cloning; SARS-CoV-2

Mesh：

Substances：

Year: 2020 PMID： 32882432 PMCID： PMC7462568 DOI： 10.1016/j.meegid.2020.104517

Source DB: PubMed Journal: Infect Genet Evol ISSN： 1567-1348 Impact factor: 3.342

Introduction

Novel coronavirus named SARS-CoV-2/2019-nCoV was identified at the end of 2019 in Wuhan, a city in the Hubei province of China, causing severe pneumonia that leads to huge death cases (Wang et al., 2020). Gradually this virus emerged as a new threat to the whole world and affecting almost all parts of the world. To date, the pathogen has affected 198 countries, and thus becoming a global public health emergency. Global public health concern with pandemic notion of COVID-19 was declared on January 30th, 2020 by the World Health Organization (WHO), 2020a, World Health Organization (WHO), 2020b. Again, an adverse situation has also been announced on 13 March 2020 for increasing the infections of COVID-19 (Kunz and Minder, 2020). Till April 10, 2020, total virus affected people around the world exceeded 1,633,272 and more than 97,601 committed death, while 366,610 people fully recovered from the infection World Health Organization (WHO), 2020a, World Health Organization (WHO), 2020b). The alarming situation is that the number of confirmed cases worldwide has exceeded one million by this time. It took more than three months to reach the first 10,000 confirmed cases, while required only 12 days to detect the next 100,000 cases. The situation is getting worse in European region. Total death cases in Italy, Spain, USA, France, United Kingdom was 14,681,11,744,7847,6507 and 4313 respectively (till April 4, 2020) and this number is exacerbating day by day (World Health Organization (WHO), 2020a, World Health Organization (WHO), 2020b). Some common clinical manifestations of COVID-19 are fever, sputum production and shortness in breath, cough, fatigue, sore throat and headache which leads to severe cases of pneumonia. A few patients also have gastrointestinal symptoms with diarrhea and vomiting (Guan et al., 2020). Though several early studies showed that the mortality rate for SARS-CoV-2 is not as high (2–3%), the latest global death rate for COVID-19 is 3.4% which indicates the increasing trends (Wu et al., 2020). The investigation of Chinese Center for Diseases Control and Prevention (2020) revealed that the prevalence of COVID-19 is more apparent in the people ages 50 years rather than the lower age groups (Jeong-ho et al., 2020). High fever and Lymphocytopenia were found more common in Covid-19, though the frequency of the patient without fever condition is also higher than in the earlier outbreaks caused by SARS-CoV (1%) and MERS-CoV (2%) (Huang et al., 2020; Chen et al., 2020). SARS-CoV-2 is a beta-coronavirus that has a positive sense, 26–32 kb in length, single stranded RNA molecule as its genetic material and belongs to the family Coronaviridae, order Nidovirales (Hui et al., 2020). It shares genome similarity with SARS-CoV (79.5%) and bat coronavirus (96%) (Zhou et al., 2020a, Zhou et al., 2020b; Zhu et al., 2020). However, there are still obscured hypothesis regarding the vector or carrier of SARS-CoV-2, though its detection was primarily linked to Wuhan’s Huanan Seafood wholesale market (Lu et al., 2020; World Health Organization (WHO), 2020a, World Health Organization (WHO), 2020b). Though the species of SARS-CoV-1 and bat coronavirus shares sufficient sequence similarities with the COVID-19, the known way mechanism of infection to the host, and the death rate is quite different in case of the novel coronavirus. In addition, there is an evolutionary distance between SARS-CoV-1 and bat coronavirus as well as the COVID-19 (Hu et al., 2018; Wu et al., 2020; Wu 2020). Because of high sequence variability of the pathogen, many of the efforts that have been undertaken to develop vaccine against SARS-CoV remain unsuccessful (Graham et al. 2013). Therefore, there is an urgent need to develop vaccines for treatment of SARS-CoV-2 based on the understanding of actual evolutionary ancestral relationship. While some natural metabolites and traditional medication may come up with comfort and take the edge off few symptoms of COVID-19, there is no proof that existing treatment procedures can effectively combat against the diseased condition (World Health Organization (WHO), 2020a, World Health Organization (WHO), 2020b). However, inactivated or live-attenuated forms of pathogenic organisms are usually recommended for the initiation of antigen-specific responses that alleviate or reduce the possibility of host experience with secondary infections (Thompson and Staats 2011). Moreover, all of the proteins are not usually targeted for protective immunity, whereas only a few numbers of proteins are necessary depending on the microbes (Tesh et al., 2002, Li et al. 2014). Depending on sufficient antigen expression from experimental assays, traditional vaccine could take 15 years to develop, while sometimes can lead to undesirable consequences (Purcell et al. 2007; Petrovsky and Aguilar 2004). Reverse vaccinology approach, on the other hand, is an effective way to develop vaccine against COVID-19. In this method, computation analysis towards genomic architecture of pathogenic candidate could predict the antigens of pathogens without the prerequisite to culture the pathogens in lab condition. Although, few pathogens that challenge to develop effective vaccines so far may become possible through such approach (Rappuoli 2000) which initiates a huge move in the development of vaccine against the deadly pathogens. The strategy included the comprehensive utilization of bioinformatics algorithm or tools to develop epitope based vaccine molecules, though further validation and experimental procedures are also needed (Moxon et al., 2019). In addition, peptide based subunit vaccines are biologically safer due to the absence of continuous in vitro culture during the production period, and also implies an appropriate activation of immune responses (Purcell et al. 2007; Dudek et al., 2010). Such immunoinformatic approaches have already been employed by the researchers to design vaccines against a number of deadly pathogens including Ebola virus (Khan et al. 2015), HIV (Pandey et al. 2018), Areanaviruses (Azim et al. 2019a), Marburgvirus (Hasan et al. 2019a), Norwalk virus (Azim et al. 2019b), Nipah virus (Saha et al. 2017), Influenza virus (Hasan et al. 2019b) and so on. At present, a suitable peptide vaccine to combat COVID-19 is urgently necessary that could efficiently generate enough immune response to destroy the virus. Hence, the study was designed to develop a chimeric recombinant vaccine against SARS-CoV-2 by targeting four major structural proteins of the pathogen, while revealing the evolutionary history of different species of coronavirus based on whole genome and protein domain-based phylogeny.

Materials and methods

Data acquisition

Complete Genomes of the COVID-19 and other coronaviruses were retrieved from the NCBI (https://www.ncbi.nlm.nih.gov/), using the keyword ‘coronavirus’ and the search option ‘nucleotide’. A total 61 complete genomes were retrieved, with unique identity (File S1). Protein sequence of the spike, envelope, membrane and nucleocapsid were also retrieved from the corresponding genome sequences found in NCBI (File S1).

Phylogeny construction and visualization

The complete genome sequences of coronaviruses and the sequences of viral proteins, such as spike glycoprotein, envelope, membrane and nucleocapsid were employed to construct different phylogenetic trees. Multiple sequence alignment (MSA) of the complete genome and protein sequences were performed using MAFFT v7.310 (Katoh and Standley 2013) tool. For the whole genome alignment, we used MAFFT Auto algorithm, while for the protein sequences alignment, MAFFT G-INS-I algorithm was used using default parameters. Next, alignment was visualized using the JalView-2.11 (Waterhouse et al. 2009). Alignment position with more than 50% gaps was pruned from coronavirus genome using Phyutility 2.2.6 program (Smith and Dunn 2008). Again, more than 20% gaps from the spike protein alignment were removed. PartitionFinder-2.1.1 (Lanfear et al., 2016) indicated the best fit substitution model of the completed genome sequences and the protein sequences. The phylogeny of the whole genome sequences of coronavirus was constructed using both the Maximum Likelihood Method and Bayesian Method. RAxML version 8.2.11 (Stamatakis 2014) with the substitution model GTRGAMMAI was used using 1000 rapid bootstrap replicates. MrBayes version 3.2.6 (Ronquist et al. 2012) with INVGAMMA model was used for the corona virus genomes. Phylogenetic analyses of four different protein sequences were performed by using RAxML-8.2.11 tool. For spike and nucleocapsid proteins, we found PROTGAMMAIWAG and PROTGAMMAIWAG as the best fit model, respectively. Again, PROTGAMMAWAG was the best fit model of evolution for both the membrane and envelope proteins. For the retrieval of the domain sequences of the stated protein sequences, InterPro database (https://www.ebi.ac.uk/interpro/) was utilized. Finally, the Interactive Tree of Life (iTOL; EMBL, Heidelberg, Germany) was used for the visualization of the phylogenetic trees. All the trees were rooted in the midpoint.

Identification of conserved regions as vaccine target

In the present study, reverse vaccinology technique was utilized to model a novel multiepitope subunit vaccine against 2019-nCoV. The scheme in Fig. 1 represents the complete methodology that has been adopted to develop the final vaccine construct. Among 496 proteins (available in the NCBI database) from different strains of novel corona virus, four structural proteins, i.e. spike glycoprotein, membrane glycoprotein, envelope protein and nucleocapsid protein, were prioritized for further investigation (File S2). After sequence retrieval from NCBI, the sequences were subjected to BLASTp analysis to find out the homologous protein sequences. Multiple sequence alignment was done by using Clustal Omega to identify the conserved regions (Sievers and Higgins 2014).The topology of each conserved regions were predicted by TMHMM Server v.2.0 (http://www.cbs.dtu.dk/services/TMHMM/), while the antigenicity of the conserved regions was determined by VaxiJen v2.0 (Doytchinova and Flower 2007a).

Fig. 1

Flow chart summarizing the protocol of multi-epitope subunit vaccine design against SARS-CoV-2 through reverse vaccinology approach.

T-cell epitope prediction, transmembrane topology screening and antigenicity analysis

Only the common fragments were used for T-Cell epitopes enumeration via T-Cell epitope prediction server of IEDB (http://tools.iedb.org/main/tcell/) (Vita et al., 2015). Again, TMHMM server was utilized for the prediction of transmembrane topology of predicted MHC-I and MHC-II binding peptides followed by antigenicity scoring via VaxiJen v2.0 server (Krogh et al. 2001; Doytchinova and Flower 2007b). The epitopes which have antigenic potency were picked and used for preceding analysis. Flow chart summarizing the protocol of multi-epitope subunit vaccine desigene">n against SARS-CoV-2 through reverse vaccinology approach.

Conservancy analysis and toxicity profiling of the predicted epitopes

The level of conservancy scrutinizes the ability of epitope candidates to impart capacious spectrum immunity. Homologous sequence sets of the chosen antigenic proteins were retrieved from the NCBI database by utilizing BLASTp tool. Later, conservancy analysis tool (http://tools.iedb.org/conservancy/) in IEDB was used to demonstrate the conservancy level of the predicted epitopes among different viral strains. The toxicity of non-allergenic epitopes was enumerated by using ToxinPred server (Gupta, 2013).

Population coverage and allergenicity pattern of putative epitopes

Among different ethnic societies and geographic spaces, the HLA distribution varies around the world. Population coverage study was conducted by using IEDB population coverage calculation server (Vita et al., 2015). To check the allergenicity of the proposed epitopes, four distinct servers i.e. AllergenFP (Dimitrov et al. 2014b), AllerTOP (Dimitrov et al., 2014a, Dimitrov et al., 2014b), Allermatch (Fiers et al., and Allergen Online (http://www.allergenonline.org/) servers were utilized.

Identification of B-cell epitopes and conservancy analysis

Three different algorithms i.e. Bepipred Linear Epitope Prediction 2.0 (Jespersen et al. 2017), Emini surface accessibility prediction (Emini et al. 1985) and Kolaskar and Tongaonkar antigenicity scale (Kolaskar and Tongaonkar 1990) from IEDB predicted the potential B-Cell epitopes within conserved fragments of the chosen viral proteins. To cross check the epitope conservancy, we used 30 strains from 30 different countries using World Health Organization’s most infected countries’ list. We demonstrated the conservancy for both B-cell and T-cell epitopes used to design the final vaccine construct in this study.

Construction of vaccine molecules and prediction of allergenicity, antigenicity and solubility of the constructs

Top CTL, HTL and B cell epitopes were compiled to desigene">n the final vaccine constructs in the study. Each vaccine constructs commenced with an adjuvant followed by top CTL epitopes, HTL epitopes and BCL epitopes respectively. For construction of novel corona vaccine, the chosen adjuvants i.e. L7/L12 ribosomal protein, beta defensin (a 45 mer peptide) and HABA protein (M. tuberculosis, accession number: AGV15514.1) were used (Rana and Akhter, 2016). Several linkers such as EAAAK, GGGS, GPGPG and KK in association with PADRE sequence were incorporated to construct fruitful vaccine sequences against COVID-19. The constructed vaccines were then analyzed whether they are non-allergenic by utilizing the following tool named Algpred (Azim et al., 2019a, Azim et al., 2019b). The most potential vaccine among the three constructs was then determined by assessing the antigenicity and solubility of the vaccines via VaxiJen v2.0 (Doytchinova and Flower 2007b) and Proso II server (Smialowski et al., 2007), respectively.

Physicochemical characterization and secondary structure analysis

ProtParam tool (https://web.expasy.org/protparam/), provided by ExPASy server (Hasan et al. 2019c) was used to functionally characterize (Gasteiger et al., 2005) the vaccine constructs. The studied functional properties were isoelectric pH, molecular weight, aliphatic index, instability index, hydropathicity, estimated half-life, GRAVY values and other physicochemical characteristics. Alpha helix, beta sheet and coil structures of the vaccine constructs were analyzed through GOR4 secondary structure prediction method using Prabi (https://npsa-prabi.ibcp.fr/). In addition, Espript 3.0 (Robert and Gouet 2014) was also used to predict the secondary structure of the stated protein sequences.

Homology modeling, structure refinement, validation and disulfide engineering

Vaccine 3D model was generated on the basis of percentage similarity between target protein and available template structures from PDB by using I-TASSER (Peng and Xu 2011). The modelled structures were further refined via FG-MD refinement server. Structure validation was performed by Ramachandran plot assessment in RAMPAGE (Hasan et al. 2019b). By utilizing DbD2 server, probable disulfide bonds were designed for the anticipated vaccine constructs (Craig and Dombkowski, 2013). The value of energy was considered <2.5, while the chi3 value for the residue screening was chosen between −87 to +97 for the operation (Hasan et al. 2019b).

Conformational B-cell and IFN-α inducing epitopes prediction

The B-cell epitopes of putative vaccine molecules were predicted via ElliPro server (http://tools.iedb.org/ellipro/) with minimum score 0.5 and maximum distance of 7 Å (Ponomarenko et al., 2004). Moreover, IFN- inducing epitopes within the vaccine were predicted using IFNepitope with motif and SVM hybrid detection strategy (Hajighahramani et al., 2017).

Molecular dynamics and normal mode analysis (NMA)

Normal mode analysis (NMA) was performed to predict the stability and large scale mobility of the vaccine protein. The iMod server determined the stability of construct V3 by comparing the essential dynamics to the normal modes of protein (Aalten et al., 1997; Wüthrich et al., 1980). It is a recommended alternative to costly atomistic simulation (Tama and Brooks 2006; Cui and Bahar 2007) and shows much quicker and efficient assessments than the typical molecular dynamics (MD) simulations tools (Prabhakar et al. 2016; Awan et al. 2017). The main-chain deformability was also predicted by measuring the efficacy of target molecule to deform at each of its residues. The motion stiffness was represented via eigenvalue, while the covariance matrix and elastic network model was also analyzed.

Protein-protein interaction study

Patchdock server was prioritized for docking between different HLA alleles and the putative vaccine molecules (Azim et al. 2020). In addition, the superior construct was also docked with different human immune receptors such as ACE 3, APN, DPP4 and TLR-8. The 3D structures of these receptors were retrieved from RCSB protein data bank. Detection of highest binding affinity between the putative vaccine molecules and the receptor was experimented based on the lowest interaction energy of the docked structure.

Codon adaptation and in silico cloning

JCAT tool was utilized for codon adaptation in order to fasten the expression of vaccine construct V3 in E. coli strain K12. For this, some restriction enzymes (i.e. BglI and BglII), Rho independent transcription termination and prokaryote ribosome-binding site were put away from the work (Grote et al., 2005). After that, the mRNA sequence of constructed V3 vaccine was ligated within BglI (401) and BglII (2187) restriction site at the C-terminal and N-terminal sites respectively. SnapGene tool was utilized for in silico restriction cloning (Solanki and Tiwari 2018).

Results

COVID-19 exhibits close ancestral relation to SARS-CoV-1 and bat-coronavirus

In the phylogenetic analysis, we introduced different coronavirus from three different genera: alpha-coronavirus, beta-coronavirus and gamma-coronavirus. Total 61 species of the corona virus covered 21 sub-genera (Table S1 and Fig. 2). These 61 species of corona viruses included 7 pathogenic species (Fig. 2), which are: COVID-19 or SARS-CoV-2, SARS-CoV-1 virus, MERS virus, HCoV-HKU1, HCoV-OC43, HCoV-NL63, HCoV-229E (Forni et al., 2017; Zhou et al., 2020a, Zhou et al., 2020b; Zumla et al. 2016). Among these, the first five species belong to the beta-coronavirus genera, while the last two belongs to the alpha-genera. Here, none of the members of Gammacoronavirus studied in this study were found to use human as potential host. Apart from the human coronaviruses, we introduced other coronaviruses which choose different species of bats, whale, turkey, rat, mink, ferret, swine, camel, rabbit, cow and others as host (Table S1).

Fig. 2

The phylogeny of these species clearly revealed two broad clades (Fig. 2), where first large clade contains Gammacoronavirus and Alpha-coronavirus genera, while the other belongs to the beta-coronavirus. Within the beta-coronavirus clade, we found three clear divisions. HCoV-HKU1 and HCoV-OC43 have been placed in the first clade, while in the second clade we found MERS coronavirus. COVID-19 or SARS-COV-2 formed clade with SARS-COV-1 and bat beta-coronaviruses (Fig. 2), which is consistent with the previous finding (Ceraolo and Giorgi 2020). Though SARS-CoV-1 belongs to the same sub-genus as the COVID-19, the bat coronaviruses belong to two different sub-genera including Hibecovirus and Nobecovirus (File S1).

Evolution of spike proteins based on domain

Domain analysis of spike protein of coronaviruses reveals that they contain mainly one signature domains namely, coronavirus S2 glycoprotein (IPR002552), which is present in all the candidates. All other beta-coronavirus contains spike receptor binding protein (IPR018548), coronavirus spike glycoprotein hapted receptor 2 domain (IPR027400) and spike receptor binding domain superfamily (IPR036326). SARS-CoV-1 contains an extra domain, namely spike glycoprotein N-terminal domain (IPR032500), which is also present in some the sub-genera (Embecovirus) of beta-coronavirus, but not in COVID-19. One important finding in our study is that the COVID-19 candidates do not contain the domain spike glycoprotein (IPR042578), which is present in the SARS-CoV-1 (Fig. 3). The secondary structure prediction study shows a large numbers of cysteine residues which contribute to the formation of disulfide bonds within the spike protein. Most of them fall within the S1 spike protein, which is 654 amino acid long in SARS-CoV-1, while 672 amino acids long in COVID-19. The RGD motif which is conserved within the COVID-19 is present in the vicinity of the S1 protein. It exists as KGD that clearly demonstrates the mutation over the short time period. Again, the receptor binding domain and receptor binding motif analyses disclose variations within several region between the COVID19 and SARS-CoV-1 (File S2). The domain-based phylogenetic analysis reflects two main divisions, where the all the novel beta-coronavirus i.e., COVID19 form clade with the SARS-CoV-1; while other beta-coronavirus fall in another clade which further divide to give rise different sub-genera. This clearly shows that the COVID-19 exerts specific ancestral connection to the SARS-CoV-1 in terms of spike glycoproteins. Interestingly, our study also revealed close relatedness of both the SARS-CoV-1 and COVID-19 to the bat beta-coronavirus that belongs to the Hibecovirus sub-genus. However, in our study, the bat coronaviruses of Nobecovirus sub-genus did not fall into the same clade of novel coronaviruses. The phylogenetic study and MSA also revealed that, the functional portion of the spike glycoprotein domain and spike glycoprotein N-terminal domain might be lost from the COVID-19 during the course of evolution.

Fig. 3

Phylogeny of spike glycoprotein of coronavirus. The sub-genera have been labeled in the left table. The filled and unfilled circles show the presence and absence of the domains labeled on the top.

Domain architecture and ancestral state of envelope proteins

The envelope proteins of both beta-coronavirus and Alpha-coronavirus contain only one protein domain (IPR003873) namely, Nonstructural protein NS3 or small envelope protein E (NS3/E). This domain is well conserved in coronavirus and also found in murine hepatitis virus. On the other hands, the gamma coronavirus shows the exception, which possess (IPR005296) IBV3C protein domain, which thought to be expressed from the ORF3C gene of infectious bronchitis virus (Jia and Naqi 1997) (File S1, File S2). The length of the domains for the COVID-19, SARS-CoV and MARS virus are 75, 76 and 82 amino acids, respectively. on the contrary, the length of the gamma coronavirus candidate in our study which utilizes turkey as host is 99 amino acids. The average length of the COVID-19 envelope proteins is 75 amino acids long. The NS3/E protein domains span the whole length of the protein and possess mainly one transmembrane domain, one non-cytoplasmic domain and a cytoplasmic domain. However, some species from the sub-genera of Embecovirus shows two transmembrane domains (Fig. 4, File S1, File S2). While in our in-silico study, we found 2 transmembrane domains in SARS-CoV-1 and 2–3 transmembrane domains in MERS virus, previous experiments proved that both contains only one α-helical transmembrane domains (Nieto-Torres et al. 2011; Surya et al. 2015). Though the computational analysis of CoVID-19 envelope protein secondary structure shows 2 transmembrane domains, our domain analysis shows only one such domain in their structures. Again, the Turkey corona viruses also possess the same transmembrane, non-cytoplasmic domain and cytoplasmic domain, with a variation in the orientation. The domain-based phylogeny of the envelope proteins of novel corona viruses reveals close ancestral relationship with the SARS-CoV-1 and bat coronaviruses (Fig. 4). In spite to the previous findings, where it was found that the envelope proteins of the MERS virus and SARS-CoV-1 exerted close proximity in terms of secondary structure and functions (Surya et al. 2015). Unlike to earlier finding, we got that gamma-coronavirus candidate in our study shows close connection with both SARS-CoV-1 and COVID-19 in terms of envelope proteins.

Fig. 4

Phylogeny of envelope proteins of coronaviruses. The sub-genera under three different genera have been shown on the left labels. The star signs represent the COVID-19 virus. 5 different pathogenic human corona viruses have been shown in bold form, including the MERS virus (blue) and SARS-CoV-1 (yellow). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Domain architecture and phylogeny of membrane proteins

Membrane proteins of all the coronavirus mainly contain coronavirus M matrix/glycoprotein (IPR002574) domain family. However, the candidates of alpha- and gamma-coronaviruses contain M matrix/glycoprotein: Alpha-coronavirus (IPR042551) and M matrix/glycoprotein: gamma-coronavirus (IPR042550) domains, respectively. The next two domains belong to the M matrix/glycoprotein domain family. The membrane proteins of coronaviruses range from 221 to 230 amino acid long. Computational analysis of secondary structures shows some variations of COVID-19 with SARS-CoV-1 and MERS virus (File S3). SARS-CoV-2 possesses alpha helical structure in their structure, while the other two completely devoid of this structure. Again, both SARS-CoV-1 and MARS contain parallel beta sheets, while it is completely absent in the novel corona virus (File S3). The phylogenetic analysis of the membrane supports that the novel coronavirus is closely connected to the SARS-CoV-1 virus membrane protein. As well as it produced connections with bat coronaviruses of Hibecovirus and Nobecovirus sub-genus (Fig. 5).

Fig. 5

Phylogeny of membrane proteins of coronavirus. The sub-genera under three different genera have been shown on the left labels. The blue and orange star represent the SARS-CoV1 and MERS virus, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Domain-based phylogeny of nucleocapsid proteins

The length of nucleocapsid proteins of beta-coronavirus genus ranges from 410 to 450 amino acids. Three signature domains are mainly present in the nucleocapsid proteins, which are: Coronavirus Nucleocapsid protein (IPR001218), Nucleocapsid Proteins C-terminal (IPR037179) and Nucleocapsid Proteins N-terminal (IPR037195). However, in our experiment, we didn't find these domains in HCoV-HKU1 (Fig. 6 ); this virus belongs to Embecovirus sub-genus and contains only Coronavirus Nucleocapsid I (IPR004876) domain. The candidates of alpha- and gamma-coronavirus have their special domains. According to our domain-based phylogeny study of nucleocapsid proteins, we found the close approximation of the COVID-19 with the SARS-CoV-1, which is consistent with the findings of phylogeny of whole genome. Strikingly, unlike spike and membrane proteins, the closest homologs of COVID-19 are not only the SARS-COV-1 and bat coronaviruses (Sub-genus: Hibecovirus and Nobecovirus), but it also includes the MERS viruses and other proteins which is from the Merbecovirus sub-genus.

Fig. 6

Phylogeny of nucleocapsid proteins of coronavirus. The sub-genera under three different genera have been shown on the left labels. The filled and unfilled circles show the presence and absence of the domains labeled on the top. COVID-19, SARS-CoV1 and MERS viruses are clades are labeled with blue, green and red colors, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Phylogeny of 61 species of coronaviruses. Seven pathogenic human coronaviruses have been represented by blue star and the IDs have been made bold. COVID-19 clade has been shown with red colour. SARS-CoV-1 and MERS virus have been represented by orange and blue colors, respectively. The genera have been represented on the left coloured labels. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Phylogeny of spike glycoprotein of coronavirus. The sub-genera have been labeled in the left table. The filled and unfilled circles show the presence and absence of the domains labeled on the top. Phylogeny of envelope proteins of coronaviruses. The sub-genera under three different genera have been shown on the left labels. The star signs represent the COVID-19 virus. 5 different pathogenic human corona viruses have been shown in bold form, including the MERS virus (blue) and SARS-CoV-1 (yellow). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Phylogeny of membrane proteins of coronavirus. The sub-genera under three different genera have been shown on the left labels. The blue and orange star represent the SARS-CoV1 and MERS virus, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Phylogeny of nucleocapsid proteins of coronavirus. The sub-genera under three different genera have been shown on the left labels. The filled and unfilled circles show the presence and absence of the domains labeled on the top. COVID-19, SARS-CoV1 and MERS viruses are clades are labeled with blue, green and red colors, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) A total 31, 24, 29 and 29 sequences of spike glycoprotein, membrane protein, envelope protein and nucleocapsid protein were retrieved from the NCBI database belonging to different strains of SARS-CoV-2. Following by BLASTp analysis and Multiple Sequence Alignment, two conserved regions were detected for membrane glycoprotein and nucleocapsid, while single fragments were identified for both spike glycoprotein and envelope protein (Table 1 ).Results showed that, all the conserved sequences except one from membrane glycoprotein met the criteria of default threshold standard in VaxiJen. Again, transmembrane topology scrutinizing showed that among the immunogenic conserved sequences from the corresponding proteins except spike glycoprotein met the criteria of desired exomembrane characteristics (Table 1).

Table 1

Identified conserved regions among different homologous protein sets.

Protein	Conserved region	Vaxijen score	Topology
Spike glycoprotein	NVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDD	0.5471	Inside
Membrane protein	LACFVLAAVYRINWITGGIAIAMACLVGLMWLSYFIASFRLFARTRSMWSFNPETNILL	0.7352	Outside
Membrane protein	NVPLHGTILTRPLLESELVIGAVILRGHLRIAGHHLGRCDIKDLPKEI	0.2688	Outside
Envelope protein	MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV	0.6025	Outside
Nucleocapsid protein	MSDNGPQNQRNAPRITFGGPSDSTGSNQNGERSGARSKQRRPQGLPNNTASWFTALTQHGKEDLKFPRGQGVPINTNSSPDDQIGYYRRATRRIRGGDGKMKDLSPRWYFYYLGTGPEAGLPYGANKDGIIWVATEGALNTPKDHIGTRNPANNAAIVLQLPQGTTLPKGFYAEGSRGGSQASSRSSSRSRNS	0.3985	Outside
Nucleocapsid protein	RNSTPGSSRGTSPARMAGNGGDAALALLLLDRLNQLESKMSGKGQQQQGQTVTKKSAAEASKKPRQKRTATKAYNVTQAFGRRGPEQTQGNFGDQELIRQGTDYKHWPQIAQFAPSASAFFGMSRIGMEVTPSGTWLTYTGAIKLDDKDPNFKDQVILLNKHIDAYKTFPPTEPKKDKKKKADETQALPQRQKKQQTVTLLPAADLDDFSKQLQQSMSSADSTQA	0.5965	Outside

Identified conserved regions among different homologous protein sets. A plethora of immunogenic epitopes were generated from the conserved sequences that were able to bind with most noteworthy number of HLA cells (Table S1, Table S2, Table S3, Table S4). Top epitopes with exomembrane characteristics were ranked for each individual protein after investigating their antigenicity score and transmembrane topology (Table 2).

Table 2

Predicted T-cell (CTL and HTL) epitopes of Spike glycoprotein, membrane protein, envelope protein and nucleocapsid protein.

Types	Proteins	Epitope	Start	End	Vaxijen score	No. of HLAs	Conservancy
CTL epitopes	Spike glycoprotein	ADYNYKLPD	26	34	1.3382	81	100% (47/47)
	Membrane protein	GIAIAMACL	18	26	1.2059	54	10.6% (5/47)
		LACFVLAAV	1	9	1.1825	27	93.6% (44/47)
		LVGLMWLSY	26	34	1.0633	54	8.51% (4/47)
		LACFVLAAVY	1	10	1.0354	27	91.5% (43/47)
		CLVGLMWLSY	25	34	1.0255	27	8.51% (4/47)
		LVIGAVILR	18	26	1.1027	54	8.51% (4/47)
		ELVIGAVILR	17	26	0.9998	27	8.51% (4/47)
		ESELVIGAV	15	23	0.9872	54	100% (47/47)
		IGAVILRGH	20	28	0.9127	27	8.51% (4/47)
		LESELVIGA	14	22	0.8597	27	23.4% (11/47)
	Envelope protein	VKPSFYVYS	52	60	1.0547	27	8.33% (3/36)
		FLLVTLAIL	26	34	0.9645	81	88.9% (32/36)
		VVFLLVTLA	24	32	0.9374	27	83.3% (30/36)
		LAILTALRL	31	39	0.8872	54	91.7% (33/36)
		LNSSRVPDL	65	73	0.8553	54	5.56% (2/36)
		LLFLAFVVF	18	26	0.8144	81	63.9% (23/36)
		VFLLVTLAI	25	33	0.8134	54	88.9% (32/36)
		LAFVVFLLV	21	29	0.7976	54	72.2% (26/36)
		VSLVKPSFY	49	57	0.7476	81	11.11% (4/36)
		FVVFLLVTL	23	31	0.7403	81	83.3% (30/36)
	Nucleocapsid	RSGARSKQR	32	40	1.7874	81	6.41% (5/78)
	Protein	DLSPRWYFY	103	111	1.7645	81	7.69% (6/78)
		DGKMKDLSP	98	106	1.7554	27	7.69% (6/78)
		KMKDLSPRW	100	108	1.7462	54	7.69% (6/78)
		TQHGKEDLKF	57	66	1.646	54	7.69% (6/78)
		KLDDKDPNF	144	152	2.6591	27	93.6% (73/78)
		AIKLDDKDP	142	150	2.167	27	8.97% (7/78)
		LDDKDPNFK	145	153	1.9433	54	93.6% (73/78)
		GAIKLDDKDP	141	150	1.9075	27	39.7% (31/78)
		TQGNFGDQE	88	96	1.8694	27	8.97% (7/78)
	Spike glycoprotein	SFVIRGDEVRQIAPG	6	20	0.5882	27	8.70% (8/92)
	Spike glycoprotein	DSFVIRGDEVRQIAP	5	19	0.1792		8.70% (8/92)
	Membrane glycoprotein	LACFVLAAVYRINWI	1	15	1.2905	27	8.51% (4/47)
		FVLAAVYRINWITGG	4	18	1.0230	27	8.51% (4/47)
		AIAMACLVGLMWLSY	20	34	0.9526	27	8.51% (4/47)
		IAIAMACLVGLMWLS	19	33	0.9464	27	8.51% (4/47)
		ITGGIAIAMACLVGL	15	29	0.9310	27	6.38% (3/47)
		LVIGAVILRGHLRIA	18	32	0.8769	27	8.51% (4/47)
		ELVIGAVILRGHLRI	17	31	0.7972	27	8.51% (4/47)
		PLLESELVIGAVILR	12	26	0.7261	27	8.51% (4/47)
		SELVIGAVILRGHLR	16	30	0.6768	27	8.51% (4/47)
		LESELVIGAVILRGH	14	28	0.6528	27	8.51% (4/47)
	Envelope protein	VTLAILTALRLCAYC	29	43	0.8599	27	75.0% (27/36)
		LAFVVFLLVTLAILT	21	35	0.8229	27	63.9% (23/36)
		LLFLAFVVFLLVTLA	18	32	0.8122	27	58.3% (21/36)
		VSLVKPSFYVYSRVK	49	63	0.7974	27	8.33% (3/36)
		VVFLLVTLAILTALR	24	38	0.7559	27	77.8% (28/36)
		VNVSLVKPSFYVYSR	47	61	0.7513	27	8.33% (3/36)
		FLAFVVFLLVTLAIL	20	34	0.7476	27	63.9% (23/36)
		LFLAFVVFLLVTLAI	19	33	0.7471	27	61.1% (22/36)
		ILTALRLCAYCCNIV	33	47	0.7427	27	83.3% (30/36)
		LVKPSFYVYSRVKNL	51	65	0.7311	27	8.33% (3/36)
	Nucleocapsid protein	DLSPRWYFYYLGTGP	103	117	1.5180	27	7.69% (6/78)
		LSPRWYFYYLGTGPE	104	118	1.3086	27	100% (78/78)
		KDLSPRWYFYYLGTG	102	116	1.2051	27	7.69% (6/78)
		GGDGKMKDLSPRWYF	96	110	1.169	27	7.69% (6/78)
		GDGKMKDLSPRWYFY	97	11	1.1013	27	7.69% (6/78)
		LDDKDPNFKDQVILL	145	159	1.4829	27	8.97% (7/78)
		WLTYTGAIKLDDKDP	136	150	1.2787	27	6.41% (5/78)
		DDKDPNFKDQVILLN	146	160	1.2508	27	8.97% (7/78)
		AFFGMSRIGMEVTPS	119	133	1.1085	27	83.3% (65/78)
		DPNFKDQVILLNKHI	149	163	1.1072	27	8.97% (7/78)

Conservancy analysis, toxicity profiling, population coverage and allergenicity pattern of the predicted epitopes

Epitopes from each protein showed high level of conservancy up to 100% (Table 2). ToxinPred server predicted the relative toxicity of each epitope which indicated that the top epitopes were non-toxin in nature (Table S5). Population coverage of four structures proteins were also done for the predicted CTL and HTL epitopes. From the screening, results showed that population of the various geographic regions could be covered by the predicted T-cell epitopes (Fig. 7).Finally, the allergenic epitopes were excluded from the list based on the evaluation of four allergenicity prediction server (Table S5).

Fig. 7

Population coverage analysis of spike protein (A), envelope protein (B), membrane protein (C) and nucleocapsid proteins (D).

Identification of B cell epitopes and conservancy analysis

Top B-cell epitopes were predicted for Spike glycoprotein, membrane protein, envelope protein and nucleocapsid protein using 3 distinct algorithms (i.e. Bepipred Linear Epitope prediction, Emini Surface Accessibility, Kolaskar and Tongaonkar Antigenicity prediction) from IEDB. Epitopes were also allowed to analyze their vaxijen scoring and allergenicity (Table 3 ). Results revealed that top B-cell and T-cell epitopes were highly conserved (up to100%) among different strains of SARS-CoV-2 (Table 4 ), while 19 out of final 20 epitopes showed conservancy > 90%. Therefore, the designed vaccine construct is expected to confer broad range immunity in the host.

Table 3

Allergenicity and antigenicity assessment of predicted B-cell epitopes.

Protein	Algorithms	Top peptide sequence	Allergenicity	Vaxijen score
Spike glycoprotein	Bepipred Linear Epitope Prediction 2.0	IRGDEVRQIAPGQTGKIADYNYK	Non-allergen	1.0547
	Emini surface accessibility prediction	GDEVRQ	Non-allergen	0.6701
	Kolaskar and Tongaonkar antigenicity	VRQIAPG	Non-allergen	1.2611
Membrane glycoprotein	Bepipred Linear Epitope Prediction 2.0	MWSFNPETN	Non-allergen	0.5509
	Emini surface accessibility prediction	LFARTRSMWSFNPET	Non-allergen	0.9033
	Kolaskar and Tongaonkar antigenicity	AMACLVGLM	Non-allergen	0.6251
Envelope protein	Bepipred Linear Epitope Prediction 2.0	YVYSRVKNLNSSRVP	Non-allergen	0.4492
	Emini surface accessibility prediction	PSFYVYSRVKNLNSSRVP	Non-allergen	0.5796
	Kolaskar and Tongaonkar antigenicity	VNSVLLFLAFVVFLLVTLA	Non-allergen	0.5893
Nucleocapsid protein	Bepipred Linear Epitope Prediction 2.0	PGSSRGTSPARMAGNGG	Non-allergen	0.3854
	Emini surface accessibility prediction	TEPKKDKKKKAD	Non-allergen	0.2378
	Kolaskar and Tongaonkar antigenicity	HWPQIAQFAPSASAF	Non-allergen	0.4320

Table 4

Conservancy of the final epitopes among different SARS-CoV-2 strains (retrieved from the list of World Health Organization for top 30 infected regions).

Protein	Epitope	Conservancy
Spike glycoprotein	ADYNYKLPD	100.00% (31/31)
	SFVIRGDEVRQIAPG	100.00% (31/31)
	IRGDEVRQIAPGQTGKIADYNYK	100.00% (31/31)
	GDEVRQ	100.00% (31/31)
	VRQIAPG	100.00% (31/31)
Envelope protein	FVVFLLVTL	96.30% (26/27)
	LCAYCCNIV	92.59% (25/27)
	VTLAILTALRLCAYC	96.30% (26/27)
	YVYSRVKNLNSSRVP	92.59% (25/27)
	PSFYVYSRVKNLNSSRVP	92.59% (25/27)
Membrane glycoprotein	LVIGAVILR	96.67% (29/30)
	LACFVLAAVYRINWI	100.00% (30/30)
	MWSFNPETN	100.00% (30/30)
	LFARTRSMWSFNPET	100.00% (30/30)
	AMACLVGLM	100.00% (30/30)
Nucleocapsid protein	AIKLDDKDP	93.33% (28/30)
	TEPKKDKKKKAD	93.33% (28/30)
	HWPQIAQFAPSASAF	93.33% (28/30)
	DLSPRWYFYYLGTGP	93.33% (28/30)
	PGSSRGTSPARMAGNGG	76.67% (23/30)

Population coverage analysis of spike protein (A), envelope protein (B), membrane protein (C) and nucleocapsid proteins (D). Predicted T-cell (CTL and HTL) epitopes of Spike glycoprotein, membrane protein, envelope protein and nucleocapsid protein. Allergenicity and antigenicity assessment of predicted B-cell epitopes. Conservancy of the final epitopes among different SARS-CoV-2 strains (retrieved from the list of World Health Organization for top 30 infected regions). Three putative vaccine molecules (i.e. V1, V2 and V3) were constructed, each comprising a protein adjuvant, eight T-cell epitopes, twelve B-cell epitopes and respective linkers (Table S6). PADRE sequence was included to extend the efficacy and potency of the constructed vaccine. The putative vaccine constructs, V1, V2 and V3 were 397, 481 and 510 residues long respectively. However, allergenicity score of V3 (−0.89886723) revealed that it was superior among the three constructs in terms safety and efficacy. V3 also had a solubility score (0.60) (Fig. 8E) and antigenicity (0.58) over threshold value (Table 5).

Fig. 8

Table 5

Allergenicity, antigenicity and solubility prediction of designed vaccine constructs.

Vaccine constructs	Composition	Complete sequence of vaccine construct	Allergenecity (Threshold −0.4)	Antigenicit (Threshold 0.4)	Solubility (Threshold 0.45)
V1	Predicted CTL,HTL and BCL epitopes with defensin adjuvant and PADRE sequence	EAAAKGIINTLQKYYCRVRGGRCAVLSCLPKEEQIGKCSTRGRKCCRRKKEAAAKAKFVAAWTLKAAAGGGSADYNYKLPDGGGSLVIGAVILRGGGSFVVFLLVTLGGGSAIKLDDKDPGPGPGSFVIRGDEVRQIAPGGPGPGLACFVLAAVYRINWIGPGPGVTLAILTALRLCAYCGPGPGDLSPRWYFYYLGTGPKKIRGDEVRQIAPGQTGKIADYNYKKKGDEVRQKKVRQIAPGKKMWSFNPETNKKLFARTRSMWSFNPETKKAMACLVGLMKKYVYSRVKNLNSSRVPKKPSFYVYSRVKNLNSSRVPKKLCAYCCNIVKKPGSSRGTSPARMAGGGKKTEPKKDKKKKADKKHWPQIAQFAPSASAFKKAKFVAAWTLKAAAGGGS	−0.17698947	0.60	0.61
V2	Predicted CTL, HTL and BCL epitopes with L7/L12 ribosomal protein adjuvant and PADRE sequence	EAAKMAKLSTDELLDAFKEMTLLELSDFVKKFEETFEVTAAAPVAVAAAGAAPAGAAVEAAEEQSEFDVILEAAGDKKIGVIKVVREIVSGLGLKEAKDLVDGAPKPLLEKVAKEAADEAKAKLEAAGATVTVKEAAAKAKFVAAWTLKAAAGGGSADYNYKLPDGGGSLVIGAVILRGGGSFVVFLLVTLGGGSAIKLDDKDPGPGPGSFVIRGDEVRQIAPGGPGPGLACFVLAAVYRINWIGPGPGVTLAILTALRLCAYCGPGPGDLSPRWYFYYLGTGPKKIRGDEVRQIAPGQTGKIADYNYKKKGDEVRQKKVRQIAPGKKMWSFNPETNKKLFARTRSMWSFNPETKKAMACLVGLMKKYVYSRVKNLNSSRVPKKPSFYVYSRVKNLNSSRVPKKLCAYCCNIVKKPGSSRGTSPARMAGGGKKTEPKKDKKKKADKKHWPQIAQFAPSASAFKKAKFVAAWTLKAAAGGGS	0.13269078	0.55	0.64
V3	Predicted CTL, HTL and BCL epitopes with HABA adjuvant and PADRE sequence	EAAKMAENPNIDDLPAPLLAALGAADLALATVNDLIANLRERAEETRAETRTRVEERRARLTKFQEDLPEQFIELRDKFTTEELRKAAEGYLEAATNRYNELVERGEAALQRLRSQTAFEDASARAEGYVDQAVELTQEALGTVASQTRAVGERAAKLVGIELEAAAKAKFVAAWTLKAAAGGGSADYNYKLPDGGGSLVIGAVILRGGGSFVVFLLVTLGGGSAIKLDDKDPGPGPGSFVIRGDEVRQIAPGGPGPGLACFVLAAVYRINWIGPGPGVTLAILTALRLCAYCGPGPGDLSPRWYFYYLGTGPKKIRGDEVRQIAPGQTGKIADYNYKKKGDEVRQKKVRQIAPGKKMWSFNPETNKKLFARTRSMWSFNPETKKAMACLVGLMKKYVYSRVKNLNSSRVPKKPSFYVYSRVKNLNSSRVPKKLCAYCCNIVKKPGSSRGTSPARMAGGGKKTEPKKDKKKKADKKHWPQIAQFAPSASAFKKAKFVAAWTLKAAAGGGS	−0.89886723	0.58	0.60

The bold sequences are linker sequences

Physicochemical characterization and secondary structure analysis of the construct

ProtParam tool was employed to analyze the physicochemical properties of V3. Molecular weight of 3 was scored as 55.181 kDa. The extinction coefficient of V3 was calculated as 63,830 at 0.1% absorption. It had been found that the protein would have net negative charge which was higher than the recommended pI 9.81. Aliphatic index and GRAVY value were found 77.80 and −0.383 respectively, which could express the thermostability and hydrophilic status of the V3 vaccine construct. Around sixty minutes in vitro half-life stability in mammalian reticulocytes was predicted for V3. The computed instability index (II) 36.98 classified the protein as a stable one. In contrast, Secondary structure of V3 exhibited to have 46.47% alpha helix, 15.00% sheet and 38.63% coil structure (Fig. S1).

Fig. S1

Secondary structure prediction of constructed vaccine protein V3.

Tertiary structure of the putative vaccine construct V3 was generated using I-TASSER server (Fig. 8A and B). The server used 10 best templates with highest sigene">nificant (measured via Z-score) from the LOMETS threading program to model the 3D structure. After refinement, Ramachandran plot analysis revealed that 92.7% and 5.7% residues were in the favored and allowed regions respectively, while only 8 residues (1.6%) occupied in the outlier region (Fig. 8C). The overall quality factor determined by ERRAT server was 91.56% (Fig. 8D). 3D modelled structure of V1 and V2 are shown in Fig. S2. DbD2 server recognized 33 pairs of amino acid residue with the potentiality to create disulfide bond between them. After analysis chi3 and B-factor parameter of residue pairs on the basis of energy, only 2 pairs (PRO 277-THR 329 and LEU 425-CYS 435) met the criteria for disulfide bond formation which were changed with cysteine (Fig. S3).

Fig. S2

3D modelled structure of vaccine protein V1 and V2.

Fig. S3

Disulfide engineering of vaccine protein V3 (A: Initial form, B: Mutant form).

Conformational B-cell and IFN- inducing epitopes prediction

Ellipro server predicted a total 6 conformational B-cell epitopes from the 3D structure of the construct V3. Epitopes No. 1 were considered as the broadest conformational B cell epitopes with 25 amino acid residues (Fig. 9 and Table S7). Results also revealed that predicted linear epitopes from 76 to 101, 131–143 and 48–55 were included in the conformational B-cell epitopes. Moreover, the sequence of the final vaccine was scanned for 15-mer IFN-inducing epitopes. Results showed that there were 292 positive IFN- inducing epitopes from which 20 had a score ≥ 5 (Table S8). Residues of 194–209 regions (GGGSLVIGAVILRGG) in the vaccine showed highest score of 17 (Table S8).

Fig. 9

Predicted conformational epitopes (A and B) and linear epitopes (C and D) within construct V3.

Molecular dynamics and normal mode analysis

Stability of the vaccine construct V3 was investigated through mobility analysis (Fig. 10A and B), B-factor, eigenvalue and deformability analysis, covariance map and recommended elastic network model. Results revealed that the placements of hinges in the chain was insignificant (Fig. 10C) and the B-factor column gave an averaged RMS (Fig. 10D). The estimated higher eigenvalue 6.341333e−06 (Fig. 10E) indicated low chance of deformation of vaccine protein V3. The correlation matrix and elasticity of the construct have been shown in Fig. 10G and Fig. 10H, respectively.

Fig. 10

Normal Mode Analysis (NMA) of vaccine protein V3. The directions of each residues are given by arrows and the length of the line represented the degree of mobility in the 3D model (A and B). The main-chain deformability derived from high deformability regions indicated by hinges in the chain which are negligible (C). The experimental B-factor was taken from the corresponding PDB field and calculated from NMA (D). The eigenvalue represents the motion stiffness and directly related to the energy required to deform the structure (E). The variance associated to each normal mode is inversely related to the eigenvalue. Coloured bars show the individual (red) and cummulative (green) variances (F). The covariance matrix indicates coupling between pairs of residues, where they may be associated with correlated, uncorrelated or anti-correlated motions, indicated by red, white and blue colors respectively (G). The elastic network model identifies the pairs of atoms connected via springs. Each dot in the diagram is coloured based on extent of stiffness between the corresponding pair of atoms. The darker the greys, the stiffer the springs (H). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Homology modeling, structure validation and solubility prediction of construct V3 (A: Cartoon structure, B: Surface structure, C: Ramachandran Plot analysis, D: Quality factor analysis, E: Solubility analysis). Predicted conformational epitopes (A and B) and linear epitopes (C and D) within construct V3. Allergenicity, antigenicity and solubility prediction of designed vaccine constructs. The bold sequences are linker sequences Normal Mode Analysis (NMA) of vaccine protein V3. The directions of each residues are given by arrows and the length of the line represented the degree of mobility in the 3D model (A and B). The main-chain deformability derived from high deformability regions indicated by hinges in the chain which are negligible (C). The experimental B-factor was taken from the corresponding PDB field and calculated from NMA (D). The eigenvalue represents the motion stiffness and directly related to the energy required to deform the structure (E). The variance associated to each normal mode is inversely related to the eigenvalue. Coloured bars show the individual (red) and cummulative (green) variances (F). The covariance matrix indicates coupling between pairs of residues, where they may be associated with correlated, uncorrelated or anti-correlated motions, indicated by red, white and blue colors respectively (G). The elastic network model identifies the pairs of atoms connected via springs. Each dot in the diagram is coloured based on extent of stiffness between the corresponding pair of atoms. The darker the greys, the stiffer the springs (H). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Protein-protein docking

The structural interaction between HLA alleles and the designed vaccines were investigated by molecular docking approach. The server detected the complexed structure by focusing on complementary score, ACE (Atomic Contact Energy) and estimated interface area of the compound (Table 6 ). The molecular affinity between the putative vaccine molecules V3 and several immune receptors were also experimented. The result showed that construct V3 interacted with each receptor with sigene">nificantly lower binding energy (Fig. 11 ).

Table 6

Docking score of vaccine construct V3 with different HLA alleles and human immune receptors.

Vaccine construct	PDB ID's HLAs	Global energy	Hydrogen bond energy	ACE	Score	Area
V1	1a6a	−11.07	−2.61	8.85	17,640	2211.50
	1h15	−17.51	−1.51	4.35	15,862	2302.20
	2fse	−5.38	0.00	5.61	16,720	2493.50
	2q6w	−10.48	−0.39	−1.45	15,348	2289.00
	2seb	9.03	0.00	0.20	17,640	2903.00
	3c5j	−25.18	−4.60	8.20	16,804	3022.80
V2	1a6a	−37.24	−3.11	−9.54	19,666	2703.90
	1h15	−9.33	0.00	4.33	16,132	2312.70
	2fse	−0.33	0.00	−0.60	16,702	2956.10
	2q6w	−10.85	−3.06	1.20	15,062	1924.40
	2seb	1.38	0.00	−0.79	18,008	3034.30
	3c5j	10.58	−1.56	5.54	16,158	2385.30
V3	1a6a	−4.14	−1.49	12.24	14,358	2208.30
	1h15	−3.15	−4.45	10.93	13,908	1637.70
	2fse	−8.68	−2.76	9.98	18,340	3024.40
	2q6w	−21.02	−2.41	5.05	17,060	2626.90
	2seb	−17.54	−5.91	3.87	16,094	2946.80
	3c5j	−2.30	0.00	−2.30	18,550	3041.00

Fig. 11

Docked complex of vaccine construct V3 with human ACE 2 (A), TLR- (B), DPP4 (C) and APN (D).

Docking score of vaccine construct V3 with different HLA alleles and human immune receptors. Docked complex of vaccine construct V3 with human ACE 2 (A), TLR- (B), DPP4 (C) and APN (D). The Codon Adaptation Index (CAI) and GC content for the predicted codons of the putative vaccine constructs V1 were demonstrated as 1.0 and 51.56% respectively. Aene">n insert of 1542 bp was found which lacked the restriction sites for BglI and BglII, thus providing comfort zone for cloning. The codons were inserted into pET28a(+) vector alongside two restriction sites (BglI and BglII) and a clone of 5125 base pair was generated (Fig. 12 ).

Fig. 12

In silico restriction cloning of the gene sequence of final vaccine construct V3 into pET28a(+) expression vector (A: Restriction digestion of the vector pET28a(+) and construct V3 with BglII and BglI, B: Inserted desired fragment (V3 Construct) between BglII (401) and BglI (1943) indicated in red colour. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Discussion

In December 2019, a new coronavirus prevalence flourished in Wuhan, China, causing clutter among the medical community, as well as to the rest of the world (Sun et al., 2020a, Sun et al., 2020b). The new species has been renamed as 2019-nCoV or, SARS-CoV-2, already causing considerable number infections and deaths in China, Italy, Spain, Iran, USA and to a growing degree throughout the world. The major outbreak and spread of SARS-CoV-2 in 2020 forced the scientific community to make considerable investment and research activity for developing a vaccine against the pathogen. However, owing to high infectivity and pathogenicity, the culture of SARS-CoV-2 needs biosafety level 3 conditions, which may obstructed the rapid development of any vaccine or therapeutics. It had been found that about 35 companies and academic institutions are engaged in such works (Spinney et al., 2020, Ziady, 2020). Among the potential SARS-CoV-2 vaccines in the pipeline, four have nucleic acid based designs, four involve non-replicating viruses or protein constructs, two contain live attenuated virus and one involves a viral vector (Pang et al. 2020), while only one, called mRNA-1273 (developed by NIAID collaboration with Moderna, Inc.), has confirmed to start phase-1 trial (NIH News Event, 2020). However, in this study we emphasized on a different approaches by prioritizing the advantages of different genome and proteome database using the immunoinformatic approach. Computational vaccine predictions were adopted by the researchers to design vaccines against both MERS-CoV (Sudhakar et al. 2013; Fernando et al., 2013) and SARS-CoV-1 (; Oany et al., 2014), targeting the outer membrane or functional proteins (Sharmin and Islam 2014). Several in silico strategies have also been employed to predict potential T cell and B cell epitopes against SARS-CoV-2, either emphasizing on spike glycoprotein or envelope proteins (Behbahani 2020; Rasheed et al., 2020). None of the studies, however, focused on other structural proteins. Moreover, random genetic changes and mutations in the protein sequences (Yin, 2020) may obstruct the development of effective vaccines and therapeutics against human coronavirus in the future. Hence, the present study was employed to identify the similarity and divergence among the close relatives of the target pathogen and develop a novel chimeric recombinant vaccine considering all major structural proteins i.e. spike glycoprotein, membrane glycoprotein, envelope protein and nucleocapsid protein simultaneously. Structural proteins are often chosen as attractive therapeutic targets by the researchers due to their surface exposure. Spike (S) protein is currently the most promising antigen formulation for SARS-CoV-2 vaccine research which is able to be directly recognized by host immune system (Wrapp et al. 2020) and already been used for vaccine development to prevent Severe Acute Respiratory Syndrome (SARS) and Middle-East Respiratory Syndrome (MERS) (Zhou et al. 2018; Du et al. 2009). However, Membrane (M) protein, the most abundant one on the virus surface (Neuman et al. 2011), was also reported to stimulate efficient neutralizing antibodies in SARS patients upon immunization (Pang et al. 2004). Moreover, structural investigation revealed the existence of T-cell epitope cluster on the trans-membrane domain of M protein with the ability to induce a strong cellular immune response (Liu et al. 2010). Thus, it can be utilized as a candidate antigen for developing SARS-CoV-2 vaccine. Researchers have already considered Envelope (E) protein for vaccine candidacy against COVID-19 due to high antigenicity (Abdelmageed et al. 2020). It was also investigated against SARS-CoV in 2003 and more recently, against MERS-CoV (Schoeman and Fielding 2019). Nucleocapsid (N) protein, on the other hand, is believed to be more conserved than spike or membrane glycoprotein with the potential to elicit broad-based cellular responses (Zhao et al. 2005). Previously, cellular response against the N protein of some animal coronaviruses enhanced recovery from viral infection (Glansbeek et al. 2002; Wasmoen et al., 1995; Wesseling et al. 1993). A study showed that N protein of SARS-CoV act as an important B cell immunogen, indicating its potential value in vaccine development against SARS (Zhao et al. 2005). All these data suggest that S, E, M and N protein can induce both humoral and cellular immune responses against SARS-CoV-2. Phylogenetic analysis is used for fundamental as well as applied issues of virus research and it includes the development of antiviral drugs and vaccines (Gorbalenya and Lauber 2017). Here, we used phylogeny data to perform similarity analysis between different coronavirus subfamiles to finalize which structural protein to be used as vaccine target in this study. We wanted to target the proteins which are most common in each subfamily and exhibit significant similarities to design a broad and universally effective vaccine candidate. The results revealed that each of the four major structural proteins possess highly conserved domains among alpha-, beta- and gamma-coronaviruses. Therefore we considered all major structural proteins to design a multiepitope subunit vaccine which is expected to stimulate more effective and broad response in the host. The topology of the phylogenetic trees of the whole genome and the stated four proteins sequences from different species of coronaviruses reveal that SARS-CoV-1 and bat coronaviruses are the closest homologs of the novel coronaviruses. Our results infer a significant level of similarities within the COVID-19 and SARS-CoV-1 which was also aligned with the previous findings (Jaimes et al., 2020; Sun et al., 2020a, Sun et al., 2020b; Wu 2020). The sequence similarities between the SARS-CoV, bat coronaviruses and the COVID-19 from the reported studies (Hu et al. 2018; Wu et al. 2020; Wu 2020) suggests that those are distantly related, in spite those are capable of infecting the humans and therefore possess the adaptive convergent evolution. Interestingly, the COVID-19 envelope proteins form clade with the Turkey coronavirus which belongs to gamma-coronavirus genus. So, in terms of envelope proteins, the envelope gene of turkey coronavirus might contribute to the convergence process, which needs further analysis. In addition, from the domain-based phylogeny of nucleocapsid proteins, it can be deduced that this protein might have originated in bats and was transmitted to camels and then later on choose human as the potential host. Overall, the COVID-19 might go through complex adaptation strategies in order to be transmitted into the human via different animals. The homologous protein sets for four structural proteins of Coronavirus were sorted to identify conserved regions through BLASTp analysis and MSA. Only the conserved sequences were utilized to identify potential B-cell and T-cell epitopes for each individual protein (Table 1). Thus, our constructs are expected to stimulate a broad-spectrum immunity in host upon administration. Cytotoxic CD8 + T lymphocytes (CTL) play a crucial role to control the spread of pathogens by recognizing and killing diseased cells or by means of antiviral cytokine secretion (Garcia et al. 1999). Thus, T cell epitope-based vaccination is a unique process to confer defensive response against pathogenic candidates (Shrestha and Diamond, 2004). Approximately 800 MHC-I peptides (CTL epitopes) and 600 MHC-II peptides (HTL epitopes) were predicted via IEDB server, from which we screened the top ones through analyzing the antigenicity score, trans-membrane topology, conservancy level and other important physiochemical parameters employing a number of bioinformatics tools (Table 2). The top 10 epitopes from each protein was further assessed by investigating the toxicity profile and allergenicity pattern. Different servers rely on different parameters to predict the allergenic nature of small peptides. Therefore, we used 4 distinct servers for such assessment and the epitopes predicted as non-allergen at least via 3 servers were retained for further analysis (Table S5). Vaccine initiates the generation of effective antibodies that are usually produced by B cells and plays effector functions by targeting specifically to a foreign particles (Cooper and Nemerow 1984). The potential B cell epitopes were generated by three different algorithms (Bepipred linear epitope prediction 2.0, Kolaskar and Tongaonkar antigenicity prediction and Emini surface accessibility prediction) from IEDB database (Table 3). Suitable linkers and adjuvants were used to combine top finalized epitopes from each protein that led to develop multiepitope vaccine molecules (Table S6). As PADRE sequence was usually recommended to lessen the polymorphism of HLA molecules in the population (Ghaffari-Nazari et al. 2015), it was also considered to construct the final vaccine molecule. Here, adjuvants would enhance the immunogenicity of the vaccine constructs and appropriate separation of epitopes in the host environment would be ensured by the linker (Yang et al. 2015). Allergenicity, physiochemical properties, antigenicity and three-dimensional structure of vaccine constructs were characterized, and it was concluded that V3 was superior to V1 and V2 vaccine constructs due to its highly negative allergenicity score (−0.89886723). Therefore, it is expected to be less likely to cause adverse side effects or hypersensitivity. V3 also had a solubility score (0.60) and antigenicity (0.58) over threshold value (Table 5). Hence, construct V3 was proposed as the most suitable and safe vaccine protein for further analysis. The final construct also occupied by several interferon-α producing epitopes (Table S8). The vaccine protein (V3) was subjected to di-sulfide engineering to enhance its stability. Analysis of the normal modes in internal coordinates by iMODS was employed to investigate the collective motion of vaccine molecules (López-Blanco et al., 2014). Negligible chance of deformability at molecular level was analyzed for the putative vaccine construct V3, thereby strengthening our prediction. Moreover, molecular docking was investigated to analyze the molecular affinity of the vaccine with different HLA molecules i.e. DRB1*0101, DRB5*0101, DRB3*0202, DRB1*0401, DRB3*0101 and DRB1*0301 (Table 6). It had been reported that a specific receptor-binding domain of CoV spike protein usually recognizes its host receptor ACE2 (angiotensin-converting enzyme 2) (Li et al., 2003; Li 2015). Previous studies also identified dipeptidyl peptidase 4 (DPP4) as a functional receptor for human coronavirus (Raj et al. 2013). Therefore, we performed another docking study prioritizing these immune receptors to strengthen our prediction (Fig. 11). Results showed that the designed construct bound with the selected receptors with minimum binding energy which was biologically significant. Finally, in-silico restriction cloning was adopted to check the suitability of construct V3 for entry into pET28a (+) vector and expression in E. coli strain K12 (Fig. 12). Traditional ways to vaccine development are time consuming and laborious. Moreover, the result may not be always as expected or fruitful (Stratton et al., 2003; Hasan et al., 2019a, Hasan et al., 2019b). In silico prediction and prescreening methods, on the contrary, offer some advantages while saving time and cost for production. Therefore, the present study may aid in the development of preventive strategies and novel vaccines to combat infections caused by 2019-nCoV. However, further wet lab trials involving model organism needs to be experimented for validating our findings. The following are the supplementary data related to this article Secondary structure prediction of constructed vaccine protein V3. 3D modelled structure of vaccine protein V1 and V2. Disulfide engineering of vaccine protein V3 (A: Initial form, B: Mutant form).

Table S1

Predicted CTL and HTL epitopes of spike glycoprotein.

Table S2

Predicted CTL and HTL epitopes of membrane.

Table S3

Predicted CTL and HTL epitopes of envelope protein

Table S4

Predicted CTL and HTL epitopes of nucleocapsid protein.

Table S5

Allergenicity pattern and toxicity profile of top T cell epitopes.

Table S6

Proposed CTL and HTL epitopes for vaccine construction.

Table S7

Predicted conformational epitopes within construct V3.

Table S8

Predicted IFN alpha producing epitopes in the vaccine construct V3.

File S1

NCBI IDs of the complete genome, spike glycoprotein, envelope protein, membrane protein and nucleocapsid protein of coronavirus with Genera and Sub-Genera.

File S2

Retrieved protein sequences of major structural proteins of COVID-19.

File S3

Secondary structure and domain analysis of spike glycoprotein, envelope protein, membrane protein and nucleocapsid proteins.

Credit author statement

Mst Rubaiat Nazneen Akhand: Conceptualization, Methodology, Data curation, Formal analysis, Software and Writing - original draft. Kazi Faizul Azim: Conceptualization, Methodology, Software, Writing - original draft Syeda Farjana Hoque: Software, Data Curation and Writing - original draft Mahmuda Akther Moli: Software, Data Curation and Writing - original draft Bijit Das Joy: Software, Validation, and Visualization Hafsa Akter: Software, Validation and Visualization Ibrahim Khalil Afif: Software and Validation Nadim Ahmed: Software and Validation Mahmudul Hasan: Conceptualization, Supervision, Project administration, Resources and Writing - review & editing.

Funding information

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Declaration of Competing Interest

Authors declare that they have no conflict of interests.

91 in total

Review 1. Structural basis of T cell recognition.

Authors: K C Garcia; L Teyton; I A Wilson
Journal: Annu Rev Immunol Date: 1999 Impact factor: 28.527

2. Protein solubility: sequence based prediction and experimental verification.

Authors: Pawel Smialowski; Antonio J Martin-Galiano; Aleksandra Mikolajka; Tobias Girschick; Tad A Holak; Dmitrij Frishman
Journal: Bioinformatics Date: 2006-12-06 Impact factor: 6.937

3. PartitionFinder 2: New Methods for Selecting Partitioned Models of Evolution for Molecular and Morphological Phylogenetic Analyses.

Authors: Robert Lanfear; Paul B Frandsen; April M Wright; Tereza Senfeld; Brett Calcott
Journal: Mol Biol Evol Date: 2017-03-01 Impact factor: 16.240

4. Immunoinformatics approaches to design a novel multi-epitope subunit vaccine against HIV infection.

Authors: Rajan Kumar Pandey; Rupal Ojha; Veeranarayanan Surya Aathmanathan; Muthukalingan Krishnan; Vijay Kumar Prajapati
Journal: Vaccine Date: 2018-03-20 Impact factor: 3.641

Review 5. Cytokines: the future of intranasal vaccine adjuvants.

Authors: Afton L Thompson; Herman F Staats
Journal: Clin Dev Immunol Date: 2011-07-31

6. A pneumonia outbreak associated with a new coronavirus of probable bat origin.

Authors: Peng Zhou; Xing-Lou Yang; Xian-Guang Wang; Ben Hu; Lei Zhang; Wei Zhang; Hao-Rui Si; Yan Zhu; Bei Li; Chao-Lin Huang; Hui-Dong Chen; Jing Chen; Yun Luo; Hua Guo; Ren-Di Jiang; Mei-Qin Liu; Ying Chen; Xu-Rui Shen; Xi Wang; Xiao-Shuang Zheng; Kai Zhao; Quan-Jiao Chen; Fei Deng; Lin-Lin Liu; Bing Yan; Fa-Xian Zhan; Yan-Yi Wang; Geng-Fu Xiao; Zheng-Li Shi
Journal: Nature Date: 2020-02-03 Impact factor: 69.504

7. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation.

Authors: Daniel Wrapp; Nianshuang Wang; Kizzmekia S Corbett; Jory A Goldsmith; Ching-Lin Hsieh; Olubukola Abiona; Barney S Graham; Jason S McLellan
Journal: Science Date: 2020-02-19 Impact factor: 47.728

8. Improving Multi-Epitope Long Peptide Vaccine Potency by Using a Strategy that Enhances CD4+ T Help in BALB/c Mice.

Authors: Haniyeh Ghaffari-Nazari; Jalil Tavakkol-Afshari; Mahmoud Reza Jaafari; Sahar Tahaghoghi-Hajghorbani; Elham Masoumi; Seyed Amir Jalali
Journal: PLoS One Date: 2015-11-10 Impact factor: 3.240

9. Design of a Multiepitope-Based Peptide Vaccine against the E Protein of Human COVID-19: An Immunoinformatics Approach.

Authors: Miyssa I Abdelmageed; Abdelrahman H Abdelmoneim; Mujahed I Mustafa; Nafisa M Elfadol; Naseem S Murshed; Shaza W Shantier; Abdelrafie M Makhawi
Journal: Biomed Res Int Date: 2020-05-11 Impact factor: 3.411

10. Screening and druggability analysis of some plant metabolites against SARS-CoV-2: An integrative computational approach.

Authors: Kazi Faizul Azim; Sheikh Rashel Ahmed; Anik Banik; Md Mostafigur Rahman Khan; Anamika Deb; Saneya Risa Somana
Journal: Inform Med Unlocked Date: 2020-06-09

6 in total

Review 1. Current and prospective computational approaches and challenges for developing COVID-19 vaccines.

Authors: Woochang Hwang; Winnie Lei; Nicholas M Katritsis; Méabh MacMahon; Kathryn Chapman; Namshik Han
Journal: Adv Drug Deliv Rev Date: 2021-02-06 Impact factor: 17.873

Review 2. In silico T cell epitope identification for SARS-CoV-2: Progress and perspectives.

Authors: Muhammad Saqib Sohail; Syed Faraz Ahmed; Ahmed Abdul Quadeer; Matthew R McKay
Journal: Adv Drug Deliv Rev Date: 2021-01-17 Impact factor: 17.873

3. Insights on the evolution of Coronavirinae in general, and SARS-CoV-2 in particular, through innovative biocomputational resources.

Authors: Daniel Andrés Dos Santos; María Celina Reynaga; Juan Cruz González; Gabriela Fontanarrosa; María de Lourdes Gultemirian; Agustina Novillo; Virginia Abdala
Journal: PeerJ Date: 2022-07-25 Impact factor: 3.061

Review 4. Immunology to Immunotherapeutics of SARS-CoV-2: Identification of Immunogenic Epitopes for Vaccine Development.

Authors: Apoorva Pandey; Swati Singh; Riya Madan
Journal: Curr Microbiol Date: 2022-09-05 Impact factor: 2.343

5. Combination of highly antigenic nucleoproteins to inaugurate a cross-reactive next generation vaccine candidate against Arenaviridae family.

Authors: Kazi Faizul Azim; Tahera Lasker; Rahima Akter; Mantasha Mahmud Hia; Omar Faruk Bhuiyan; Mahmudul Hasan; Md Nazmul Hossain
Journal: Heliyon Date: 2021-05-19

6. Spike protein-based epitopes predicted against SARS-CoV-2 through literature mining.

Authors: Wendong Li; Lin Li; Ting Sun; Yufei He; Guang Liu; Zixuan Xiao; Yubo Fan; Jing Zhang
Journal: Med Nov Technol Devices Date: 2020-10-08

6 in total