Literature DB >> 33561453

Current and prospective computational approaches and challenges for developing COVID-19 vaccines.

Woochang Hwang¹, Winnie Lei², Nicholas M Katritsis³, Méabh MacMahon⁴, Kathryn Chapman¹, Namshik Han⁵.

Abstract

SARS-CoV-2, which causes COVID-19, was first identified in humans in late 2019 and is a coronavirus which is zoonotic in origin. As it spread around the world there has been an unprecedented effort in developing effective vaccines. Computational methods can be used to speed up the long and costly process of vaccine development. Antigen selection, epitope prediction, and toxicity and allergenicity prediction are areas in which computational tools have already been applied as part of reverse vaccinology for SARS-CoV-2 vaccine development. However, there is potential for computational methods to assist further. We review approaches which have been used and highlight additional bioinformatic approaches and PK modelling as in silico methods which may be useful for SARS-CoV-2 vaccine design but remain currently unexplored. As more novel viruses with pandemic potential are expected to arise in future, these techniques are not limited to application to SARS-CoV-2 but also useful to rapidly respond to novel emerging viruses. Crown

Entities: Chemical

Mesh：

Substances：

Year: 2021 PMID： 33561453 PMCID： PMC7871111 DOI： 10.1016/j.addr.2021.02.004

Source DB: PubMed Journal: Adv Drug Deliv Rev ISSN： 0169-409X Impact factor: 17.873

Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which causes Coronavirus disease 2019 (COVID-19), was first identified in Wuhan in December 2019 and by December 2020 had spread to 191 Countries or Regions and infected over 75 million people, causing more than 1.5 million deaths [1]. Effects of SARS-CoV-2 infection in patients with COVID-19 can range from asymptomatic to life-threatening [2], with most symptomatic patients having moderate symptoms. This pandemic has impacted wider society around the world, leading to restrictions on travel and severely disrupting global economies and many health services. In the absence of effective treatments to prevent the development of acute respiratory distress syndrome (ARDS) in COVID-19 patients, a vaccine is the fastest and most promising way to stop the spread of SARS-CoV-2 and bring the pandemic under control. Like Severe Acute Respiratory Syndrome (SARS-CoV) and Middle East Respiratory Syndrome coronavirus (MERS-CoV), SARS-CoV-2 is a coronavirus with a zoonotic origin, and it is likely that further zoonotic viruses with pandemic potential will emerge, making additional crises of this kind probable in the future [3]. To respond effectively to novel emerging infectious diseases, it is necessary to refine and establish technologies that can develop effective, safe vaccines, rapidly. Vaccines provide adaptive immunity to disease and are a form of preventative medicine. Generally, vaccines are made of an attenuated (live, but weakened) or inactivated (killed with heat or chemicals) pathogen, or a specific subcomponent or subcomponents of the pathogen which are antigenic (subunit and conjugate vaccines). The first vaccine, verified by Edward Jenner, was the use of cowpox to prevent smallpox infection [4]. Vaccines have improved since Jenner’s day in step with new scientific discoveries and the technology used to develop them. For instance, Goodpasture’s invention of methods to propagate viruses in chicken eggs, led to mass production of vaccines including for typhus [5], [6]. More recently, use of recombinant DNA technology led to the subunit vaccine for hepatitis B [7]. The ability to engineer synthetic RNA aided the current development of mRNA vaccine approaches [8]. mRNA vaccines encode an antigen of interest for translation in the body where an immune response is generated against it. The first mRNA vaccines approved in history, BNT162b2 and mRNA-1273, have been recently approved for SARS-CoV-2. Reverse vaccinology is an application that aids the development of new vaccines, initiated with pathogen genomic sequencing [9], [10]. It was first applied successfully to meningococcus B to predict alternative antigens to develop a vaccine [11], [12]. After 4 decades of unsuccessful attempts with conventional vaccine design, reverse vaccinology facilitated successful vaccine development through overcoming the challenge of molecular mimicry between the meningococci capsular polysaccharide and human protein [13]. Reverse vaccinology can be used to select an antigen for a novel vaccine which can generate an immune response through epitope selection and screening vaccine candidates in silico, speeding up the process of vaccine design and reducing its cost. An antigen is a molecule capable of producing an immune response, while an epitope is a site on an antigen which is recognised by the immune system. There are two types of epitopes, B cell epitopes and T cell epitopes. B cell epitopes are recognised by antibodies, whereas T cell epitopes are recognised by T cell receptors. The full SARS-CoV-2 genome was released in January 2020 allowing reverse vaccinology approaches to be applied in the development of a vaccine for SARS-CoV-2 as the virus was spreading around the world. Modern vaccine design aims to maximise efficacy while minimising potential serious adverse effects, such as inducing an allergic reaction or toxic effects causing damage to the organism. To this end, newer approaches have shifted away from early live-attenuated and inactivated whole-pathogen vaccines, towards purified antigens (subunits), as these come with improved safety profile [14]. Funk et al. [15] in a snapshot of the global race for a SARS-CoV-2 vaccine, report that “protein-based” (subunit) vaccines represent the largest platform of development. However, subunit vaccines lack sufficient immunostimulatory capabilities on their own and are typically paired with adjuvants, which are compounds or formulations capable of providing the inflammatory cues needed to elicit the desired immune response [14]. Surprisingly, the knowledge that is currently available on vaccine absorption, distribution, metabolism, and excretion (ADME) processes is very limited [16]. However, a few PBPK models of adjuvants do exist in the literature and they may have relevance to the COVID-19 vaccine development process. In this review, we focus on computational techniques applied at the very early stage of the vaccine development pipeline for SARS-CoV-2. We explore computational approaches to antigen selection, epitope prediction, adjuvant selection, toxicology and allergenicity prediction and immune response modelling in the context of COVID-19.

Antigen selection

SARS-CoV-2 proteins targeted

The SARS-CoV-2 is made up of four structural proteins E (envelope), M (membrane), N (nucleocapsid), and S (spike), and several non-structural proteins (nsps) [17]. The S protein is an attractive vaccination candidate because it is involved in viral entry and elicited an immune response in the case of SARS-CoV and MERS-CoV studies [18]. An early study searching for vaccine candidates for SARS-CoV-2 used in silico methods to compare the sequence of the N and S proteins of SARS-CoV-2 to B and T cell epitopes derived from SARS-CoV which have been experimentally determined [18]. BNT162b2 [19], mRNA-1273 [20] and AZD1222 [21] are three recently approved vaccines for SARS-CoV-2, all of which use the S protein. BNT162b2 and mRNA-1273 are mRNA vaccines, both of which generate a prefusion-stabilised S protein and AZD1222 is a replication-deficient viral vector vaccine which uses a chimpanzee adenovirus as a vector with the S protein. Most in silico epitope prediction studies also focused on the spike protein, although several studies focused on multiple structural proteins [22], [23], [24]. One study focused exclusively on the E protein [25]. Ong et al. proposed a ’cocktail’ vaccine using both structural and non-structural proteins [26].

Antigen prediction based on viral genomic sequences

Virus replication only happens within cells, so blocking viral entry prevents the virus proliferating within the host. The S protein of SARS-CoV-2 is the part of the virus that enables it to enter our cells. If a person has antibodies that can recognise the S protein, this stops the virus in its tracks. That is why most of the vaccine candidates are focusing on the S protein as an antigen. The focus on the S protein is logical, however mutations have already arisen which affect the function of this protein [27]. If further such mutations arise which affect vaccine efficacy, antigens based on other viral proteins will be required. Since such an outcome isn't unlikely, we need to identify alternative antigen candidates as a backup to respond in a timely manner if such problematic mutations arise. In addition to the S protein, other proteins, such as the N protein, M protein, nsps, open-reading frames (ORFs) and accessory factors, may have the potential to serve as antigens. It is key to identify an antigen which can bind with the T or B cell receptors which are recognised and can induce an immune response. VaxiJen, a publicly available server for prediction of protective antigens, tumour antigens and subunit vaccines [28], is based on properties of principal amino acids, which are generated from protein sequences by autocross covariance transformation. VaxiJen predicts protective antigens without relying on sequences of previously known antigens and has been used for this purpose by several papers investigating SARS-CoV-2 [22], [25], [29]. Vaxign-ML [30] is an analysis system based on reverse vaccinilogy which predicts protective antigens for vaccine targets using the machine learning (ML) algorithm extreme gradient boosting, trained on data from Protogen, a database of protective antigens [31].

Antigen prediction based on cellular immune response

In searching for antigens beyond the S protein, alternative approaches not solely based on the genomic sequence of the virus could be considered. One such approach could be to search for antigens related to cellular immune response. Cellular immune response is activated for protection against pathogens which enter and inhabit the host cells during infection, as viruses including SARS-CoV-2 do. Although antigen properties which are important for this type of immunity are not yet well characterized, it is an area worth exploring for COVID-19 vaccine development [32]. Gene expression (transcriptomic) data from host and virus and virus-host interaction data could be combined to build a virus infection network to predict antigens based on their relation to host cellular immune response. SARS-CoV-2 proteins and their interactions with host factors were associated with imbalanced host immune responses, such as elevated proinflammatory cytokine levels [33]. Elevated proinflammatory cytokine levels can lead to progression of severe COVID-19 [34]. To understand which SARS-CoV-2 proteins are associated with imbalanced host immune responses, computational biology can be used to identify genes coding proteins associated with COVID-19 severity. Differential gene expression analysis using data from COVID-19 patient and non-COVID-19 infected controls has shown that neutrophil activation is significantly associated with COVID-19 status and severity [35]. We investigated this further by performing a meta-analysis on four transcriptomics datasets from COVID-19 severity studies (Table 1 ) and created a protein interactome network to test the hypothesis that other computational approaches may be useful in identifying alternative antigens (Fig. 1 ). We used EBSeq [37], an R package for gene and isoform differential expression analysis of RNA-seq data, to identify differentially expressed genes in patients with 1) moderate and 2) severe COVID-19 as defined by the World Health Organisation (WHO) standards [36]. By performing gene ontology (GO) enrichment analysis using g:Profiler, a webserver developed for this purpose [37], we identified significantly enriched biological functions of the differentially expressed genes, such as exocytosis, wound healing, and neutrophil degranulation, which are related to viral replication [38] and adaptive immune system [39].

Table 1

COVID-19 severity studies.

Study	Omics	Moderate	Severe	Cell types	Data	DEG analysis	Ref
Overmyer et al.	Bulk RNA-seq	51	55	PBMC	Available	EBSeq	[35]
Jain et al.	Bulk RNA-seq	10	3	PBMC	Available	DESeq2	[245]
Liu et al.	scRNA-seq	3	6	PBMC	NA	NA	[248]
Xu et al.	scRNA-seq	5	8	PBMC	Available	Seurat	[249]
Silvin et al.	scRNA-seq	1	2	PBMC	NA	NA	[250]
Arunachalam et al.	Bulk RNA-seq	4	12	PBMC	Available	DESeq2	[251]

Fig. 1

Antigen selection from transcriptomics analysis and protein interactome network analysis. (A) Venn diagram to show the numbers of DEGs in the COVID-19 moderate group and the severe group. (B) Bar graph to show the top 20 highly enriched functions of the up-regulated DEGs in the severe group. (C) SARS-CoV-2 proteins and their SHPs that are differentially expressed genes in the severe group. (D) Protein interactome network between the up-regulated DEGs and SHPs of nsp16.

COVID-19 severity studies. Antigen selection from transcriptomics analysis and protein interactome network analysis. (A) Venn diagram to show the numbers of DEGs in the COVID-19 moderate group and the severe group. (B) Bar graph to show the top 20 highly enriched functions of the up-regulated DEGs in the severe group. (C) SARS-CoV-2 proteins and their SHPs that are differentially expressed genes in the severe group. (D) Protein interactome network between the up-regulated DEGs and SHPs of nsp16. To investigate which, if any, SARS-CoV-2 proteins are associated with an imbalanced immune response, protein interactome networks [40] for each SARS-CoV-2 protein can be created to capture protein interactions between each SARS-CoV-2 protein-host protein interactions (SHPs) and the up-regulated differentially expressed genes in the severe COVID-19 group. By using this technique, we found that the integration host factors of nsp16 are closely connected to the up-regulated differentially expressed genes that play a key role in neutrophil degranulation and immune system (Fig. 1C and D). Direct interactions and the proteins of differentially expressed genes can be visualised using STRING [42], an online tool to create protein–protein interaction networks. Nsp16 has been suggested as a target for SARS-CoV and MERS-CoV vaccine because it encodes ribose 2′-O-methyltransferase and this methylation helps coronavirus avoid the activation of type I interferon-dependent innate immune response by viral RNA [41]. Taken together, we suggest this methodology shows promise in the identification of antigens based on cellular immune response.

Epitope prediction

B cell epitope prediction tools

Upon infection, B cell immune response is initiated by recognising free-floating antigens through B cell receptors (BCR), a membrane-bound immunoglobulin. Such an event activates B cells to differentiate into memory B cells or antibodies-secreting plasma cells in mediating adaptive humoral immunity. Antibodies from plasma cells can function in neutralizing toxins, opsonising pathogens, and further activating CD4+ T cells by acting as an antigen-presenting cell (APC) [42]. Memory B cells can retain antigen-specific immune memory and different immunoglobulin memory B cells distinct functions [43]. Amongst them, the IgG+ sub-class is most heavily studied in vaccinology as they preferentially differentiate into plasma cells [43]. B cell epitopes are parts of an antigen, which may contain a solvent-exposed region and range from 5 to 20 amino acids (AA) in length. They are further divided into continuous (linear) or discontinuous (conformational) epitopes. Whilst continuous epitopes are made up of consecutive solvent-exposed residues, discontinuous epitopes are formed by solvent-exposed residues that may not be sequential. Consequently, continuous epitopes, but not discontinuous epitopes, can be recognized by B cells even when the antigen is denatured. Although an estimated 90% of B-cell epitopes are discontinuous, more tools are available and utilised in predicting continuous epitopes than discontinuous epitopes. This is also reflected in the case of designing the COVID-19 vaccine. Traditional methods in mapping B cell epitopes vary drastically, including 3D structure study of the antigen–antibody complexes [44], antibody-binding peptide library screening [45], and functional assays with mutagenesis approach [46]. Although these epitope selection methods are more accurate than in silico prediction methods [47], epitope mapping experiments are restricted by their costs and feasibility. In silico epitope prediction is faster and can greatly improve the success rate by filtering promising candidates.

Continuous B cell epitope prediction in SARS-CoV-2

Earlier established prediction tools for continuous B cell epitopes were based on smaller datasets and their AA propensity scales that characterise the epitopes physicochemical features. For example, Parker and Hodges have developed a hydrophilicity parameter using high-performance liquid chromatography data with smoothing values in a seven-residue window [48] which has been used singly in the SARS-CoV-2 epitope design study [49]. Other parameters including surface accessibility by Emini, flexibility prediction on the mobility of protein segments by Karplus and Schulz [50] and β-turn prediction in the Chou-Fasman method [51]. Rahman et al. [52] also utilised additional structural prediction models including the RaptorX Property server for 2° structures and the Phyre2 server for 3° structures [53]. The relatively poor performance of AA propensity scales prompted the developments of ML-based methods [54], [55]. These ML-based tools were trained on feature vectors capturing the selected properties of experimentally validated B-cell and non-B-cell epitopes [56], [57], [58], [59], [60], [61]. Specifically, some tools trained the data with Support-Vector Machine classifiers (SVM), whilst others preferred the uses of Random Forests algorithms (RF) or artificial neural networks (ANN). Their accuracies and applications to SARS-CoV-2 have been summarised in Table 2 .

Table 2

Common B-cell epitope prediction tools.

Tool	Method	Accuracy	Utility in SARS-CoV-2 vaccine design publications
Continuous B-cell epitope prediction tools
Parker(Accessible on IEDB)	Single AA propensity scaleAA sequence hydrophilicity with smoothing values in a seven-residue window	61.21 AUC with epitopes of 16 AA length [Liu et al., 2020] [252]	[Rakib et al., 2020] [49]: applied to filter B-cell epitopes predicted by BepiPred, Emini, and Klaskar and Tongaonkar methods
Kolaskar and Tongaonkar (Accessible on IEDB)	Single AA propensity scaleBased on physicochemical properties and frequencies of AA in experimentally determined B-cell epitopes	Self-reported optimal accuracy of 75%55.76 AUC with epitopes length of 16 AA [Liu et al., 2020] [252]	[Rakib et al., 2020] [49]: identified 15 epitopes in N protein [Rahman et al., 2020] [52]: Identified 5 epitopes in the RBD regions of S,M, and E proteins [Abdulhameed Odhar et al., 2020] [253]: Identified 5B cell epitopes on chain A of S protein [Marino Gammazza et al., 2020] [254]: identified B cell epitope region in Replicase Polyprotein 1ab
Emini (Accessible on IEDB)	Single AA propensity scaleSurface accessibility	60.76 AUC with epitopes length of 16 AA [Liu et al., 2020] [252]	[Rakib et al., 2020] [49]: Identified 9 epitopes in N protein [Abdulhameed Odhar et al., 2020] [253]: identified 3 epitopes on chain A of S protein [Yazdani et al., 2020] [255]: identified B-cell epitopcy of predicted CTL epitopes in S, M, N, and E proteins [Alam et al., 2020] [256]: identified 6B-cell epitopes in S protein and transmembrane protein 199
Karplus and Schulz	Single AA propensity scaleFlexibility Prediction on the mobility of protein segments.	Unavailable	[Rakib et al., 2020] [49]: applied to filter B-cell epitopes predicted by BepiPred, Emini, and Klaskar and Tongaonkar methods [Alam et al., 2020] [256]: identified 6B-cell epitopes in S protein and transmembrane protein 199
Chou and Fasman	Single AA propensity scaleβ-turn prediction	52.70 AUC with epitopes length of 16 AA [Liu et al., 2020] [252]	[Rakib et al., 2020] [49]: applied to filter B-cell epitopes predicted by BepiPred, Emini, and Klaskar and Tongaonkar methods [Alam et al., 2020] [256]: identified 6B-cell epitopes in S protein and transmembrane protein 199
PREDITOP	Multiple AA propensity scales Based on AA hydrophilicity, accessibility, flexibility, and secondary structure property	Self-reported optimal accuracy of 60%	Not utilised/cited in any SARS-CoV-2 vaccine design publications
BeePro	Multiple AA propensity scalesBased on the antigenicity, hydrophilicity, hydrophobicity, accessible surface area, flexibility, interactivity, buriability, composition, polarity, volume , charge transfer and donor capability, hydrogen-bond donor capability, and secondary structure	Self-reported optimal accuracy of 99.29%	Not utilised/cited in any SARS-CoV-2 vaccine design publications
AAP	ML-basedSVM-based model trained on AA pair	Self-reported optimal accuracy 71.09%	[Tohidinia and Sefid 2020] [205]: initial filtering on S protein epitopes
ABCpred	ML-basedANN-based model trained on AA patterns	Self-reported optimal accuracy of 66.41%	[Behmard et al 2020] [131]: Identified nine 16-mer epitopes in S, E, M, and N proteins [Dai et al 2020] [76]: Identified 11 16-mer epitopes in N proteins [Rahman et al., 2020] [52]: Identified 29 peptides in S, M, and E proteins [Lon et al., 2020] [75]: 6 epitopes predicted in S, M, and E proteins [He et al., 2020] [132]: Predicting linear B-cell epitopcy of 16 sequences of S protein [Vashi et al., 2020] [257]: 24 epitopes predicted with various AA length from S protein
APCpred	ML-basedSVM-based model trained on anchoring pair composition	Self-reported optimal accuracy of 72.94%	Not utilised/cited in any SARS-CoV-2 vaccine design publications
BepiPred (2.0) (Accessible on IEDB)	ML-basedRF-based model trained on epitopes annotated from antibody-antigen protein structures	Self-reported optimal accuracy of 0.62 AUC	[Rahman et al., 2020] [52]: Predicted 14 epitopes of S, E, and M proteins [Rakib et al., 2020] [49]: 11 epitopes identified with various AA length in N protein [Ayyagari et al., 2020] [112]: 3 epitopes identified in M protein [He et al., 2020] [132]: 25 epitopes identified on S protein [Khairkhah et al., 2020] [92]: 8 epitopes identified on S and N proteins
LBEEP	ML-basedSVM and AdaBoost-RF based model trained on Dipeptide Deviation from Expected Mean (DDE) of epitopes	Self-reported optimal accuracy of 73%	Not utilised/cited in any SARS-CoV-2 vaccine design publications
Bayesb	ML-basedBayes Feature Extraction	Self-reported optimal accuracy 68.50%	Not utilised/cited in any SARS-CoV-2 vaccine design publications
BCPRED	ML-basedSVM-based model with various string kernels that select representing sequence into length-fixed feature vectors	Self-reported optimal accuracy of 0.758 AUC	[Singh et al., 2020] [64]: Identified 12B-cell epitopes in S protein [Singh et al., 2020] [64]: identified 2 linear epitopes in the structural glycoproteins [Rahman et al., 2020] [52]: identified 4 epitopes shared with B-cells and T-cells [Chauhan et al., 2020] [258]: 3B cell epitopes identified in SARS-CoV-2 genome-associated proteins
iBCE-EL	ML-basedCombination Of 6 ML algorithms trained on epitope sequences	Self-reported optimal accuracy of 72.9%	[Dar et al., 2020] [238]: Identification of linear B cell epitopes in selected HLA II class epitopes in S protein [Samad et al., 2020] [133]: identified 4B cell epitopes in S protein [Ahammad and Lira, 2020] [134]: identified 117 epitopes in S protein
SVMTriP	ML-basedSVM-based model trained on length-fixed tripeptide composition vectors	Self-reported optimal accuracy of 0.702 AUC	[Banerjee et al., 2020] [259]: identification of B-cell epitopes overlapping with CTL and HTL epitopes in N, M, E, Orf6, Orf7a, and Orf10 proteins
COBEpro	ML-basedSVM-based model predicting short peptide fragments followed by epitopic propensity prediction	Self-reported optimal accuracy of 0.829 AUC	Not utilised/cited in any SARS-CoV-2 vaccine design publications
EPMLR	ML-basedMultiple linear regression-based model trained on BEOracle dataset	Self-reported optimal accuracy of 0.728 AUC	Not utilised/cited in any SARS-CoV-2 vaccine design publications
LBtope	ML-basedSeveral models including SVM-based and nearest neighbor based trained on the modified AAP profile of epitopes	Self-reported optimal accuracy of 86%	[Anand et al., 2020] [260]: identified 7B cell epitopes in S, M, N, and Orf3a proteins
Discontinuous B-cell epitope prediction tools
CEP	Algorithms using residue accessibility and spatial distance cut-off	Self-reported optimal accuracy of 75%	Not utilised/cited in any SARS-CoV-2 vaccine design publications
DiscoTope (2.0) (Accessible on IEDB)	DiscoTope method incorporated with spatial neighbourhood and half-sphere exposure information	Self-reported optimal accuracy of 0.824 AUC	[Forni et al., 2020] [261]: identification of immunogenicity of antigenic variation in S and N proteins [Arshad Dar et al., 2020] [238]: identification of 77 epitopes on S protein [Ayyagari et al., 2020] [112]: identified 25B-epitopes on M protein [Grifoni et al., 2020] [109]: Predicted discontinuous epitope on 959 continuous epitopes on surface glycoprotein
ElliPro (Accessible on IEDB)	Thornon’s method integrated with residue clustering algorithm, MODELLER program in predicting and visualizing the epitope structures.	Self-reported optimal accuracy of 91%	[Rajesh Anand et al., 2020] [260]: 10 conformational epitopes identified on S, Orf3a, M, and N proteins [Dar et al., 2020] [238]: identification of 148 epitopcy residues on S protein [Singh et al., 2020] [64]: Scoring discontinuous epitope probability in 12 selected S proteins [Sarkar et al., 2020] [234]: Identified 4 potential regions in S, N, M, and E proteins [Wagas et al., 2020] [65]: identified 5 conformational epitopes in S, N, M, and E proteins
PEPITO	Incorporating AA propensity scale with side chain orientation and solvent accessibility information of epitope/non-epitope	Self-reported optimal AUC of 75.38	Not utilised/cited in any SARS-CoV-2 vaccine design publications
SEPPA	Scoring based on comparison to local spatial information of surface residues of 82 antigen–antibody protein complexes	Self-reported optimal accuracy of 0.742 AUC	Not utilised/cited in any SARS-CoV-2 vaccine design publications
PepMapper	Mimotope-basedAn esemble of MimoPro and Pep-3D-search for epitope mapping	Self-reported optimal sensitivity of 1.00, specificity of 0.839, and precision of 0.256	Not utilised/cited in any SARS-CoV-2 vaccine design publications
Epitopia	ML-basedNaïve Bayes classifier trained on 3D antigen–antibody structures and epitope sequences	Self-reported optimal accuracy of 89.4%	Not utilised/cited in any SARS-CoV-2 vaccine design publications
EPSVR	ML-basedSVR-based model trained on residue epitope propensity, conservation score, side chain energy score, contact number, surface planarity score, and secondary structure composition	Self-reported optimal accuracy of 0.597 AUC	Not utilised/cited in any SARS-CoV-2 vaccine design publications
CBTOPE	ML-basedSVM-based model trained on the physiochemical and sequence featured of conformational B-cell epitopes	Self-reported optimal accuracy of 86.6%	Not utilised/cited in any SARS-CoV-2 vaccine design publications
BepiPred (2.0) (Accessible on IEDB)	ML-basedRF-based model trained on conformational epitopes annotated from antibody-antigen protein structures	Self-reported optimal accuracy of 0.62 AUC on conformational epitopes	Not utilised/cited in any SARS-CoV-2 vaccine design publications for discontinuous B-cell epitope prediction.

This table shows some of the common B-cell epitope prediction tools and their reported accuracy or evaluated accuracy, if available. Their utilities in SARS-CoV-2 vaccine design papers are summarised briefly by the result they have indicated and their application on which protein on SARS-CoV-2, if mentioned. As seen, some tools with relatively high accuracies are yet to be utilised for SARS-CoV-2 vaccine designs.

Common B-cell epitope prediction tools. This table shows some of the common B-cell epitope prediction tools and their reported accuracy or evaluated accuracy, if available. Their utilities in SARS-CoV-2 vaccine design papers are summarised briefly by the result they have indicated and their application on which protein on SARS-CoV-2, if mentioned. As seen, some tools with relatively high accuracies are yet to be utilised for SARS-CoV-2 vaccine designs. Multi-parametric algorithms on AA propensity scales have also been developed, such as PREDITOP (based on 4 propensity scales), PEOPLE (based on 5 propensity scales), and BEEPro (based on 19 propensity scales) [55], [62], [63]. Despite the indication of increased accuracy, these tools are not as popular as the easily accessible IEDB server that acts as a single platform providing the methods by Emini (surface accessibility), Parker (hydrophilicity), Kolaskar and Tongaonkar’s (AA frequencies), Karplus and Schulz (flexibility), and Chou and Fasman (β-turn propensity) as a multi-parametric filter, as seen in multi studies [64], [65], [66], [67]. Some SARS-CoV-2 epitope design studies also integrated both the ML-based approaches and the propensity scales in predicting the epitope from their targets [68], [69], [70], [71], [72], [73]. Similarly, some studies filtered epitopes that were mutually found in two or more ML-based tools [74], [75], [76]. It is unclear whether a specific combination is superior to others, further investigation maybe needed to provide more guidance on which tool should be integrated or not.

Discontinuous B cell epitope prediction in SARS-CoV-2

In contrast, discontinuous B cell epitope prediction is more challenging than continuous B cell epitope prediction mainly due to two reasons: 1) limited 3D-structural information of antigen–antibody complex; 2) difficulties in isolating the discontinuous epitope for selective antibody production. Early computational tools rely on the identification of specific AA regions such as CEP (residue exposed to solvent) [77], DiscoTope1.2 (solvent accessibility and spatial information of AA) [78], and ElliPro (protruding region on antigen surfaces) [79]. In the latest version of DiscoTope2.0, a DiscoTope method was incorporated with spatial neighbourhood definition and half-sphere exposure and has achieved significant improvement in prediction accuracy (AUC from 0.791 to 0.824) [80]. Another approach classed as mimotope-based methods, identify conformational B cell epitope in proteins with known 3D structures. These include MIMOX [81], PEPITOPE [82], EpiSearch [83], MimoPro [84], and PEPMAPPER [85]. However, additional information including the 3D structure of the selected antigen and the antibody-affinity of the peptide is required for optimal prediction performances. Advances in computational power also led to development of ML-based prediction tools for discontinuous B cell epitopes. Specifically, EPSVR used the Support Vector Regression (SVR) method to integrate six epitope physiochemical properties and has been applied [86]. In the same publication, the authors also combined EPSVR with DiscTope2.0 [78], PEPITO [87], SEPPA [88], Epitopia [89], and EPCES [90] to construct EPMeta, which was reported to have a significantly higher performance than PEPITO and Disctope [86]. In contrast, BepiPRED-2.0, another version of the above-mentioned tools, has used a RF algorithm trained on the dataset of the antibody-antigen protein structures of conformational epitopes [60]. Although its reported accuracy for discontinuous B cell epitopes was fair, it is more commonly applied on linear epitope predictions in SARS-CoV-2 vaccine studies. Instead, most studies utilised DiscoTope2.0 or ElliPro in their discontinuous epitope predictions (see Table 2). In addition to continuous/discontinuous B cell epitope prediction tools, an alternative tool, IgPred, enables the prediction of which immunoglobulin subclass the B cell epitope is capable of [91]. The tool was trained on over 14,000 epitopes using the SVM method and can be used to identify epitopes with preference for inducing IgG and IgA antibodies [92].

T cell epitope prediction tools

Beside humoral immune response, circulating T follicular helper (TFH) cells and other T lymphocytes are important in inducing antibody-producing plasma cells and long-lived memory B cells [93]. A recent report also indicated that antibody level in COVID-19 patients is correlated with the level of circulating TFH cells [94]. Another study also reported a population of SARS-CoV-2-specific T cells with a ‘stem-like memory phenotype’ is present in the antibody-seronegative asymptomatic and mild COVID-19 patients [95]. Many more reports have highlighted the critical role played by T cells in inducing effective SARS-CoV-2-specific immune response [96], [97], [98]. T cells can be divided into 2 major types: CD4+ T cell (helper T cell) and CD8+ T cell (cytotoxic T cell). Their activations rely on 3 signals: 1) Binding between the antigen present on the major histocompatibility complex (MHC) of the antigen-presenting cells (APCs) and the T cell receptor (TCR) on the T cell; 2) Binding of CD80 or CD86 on APC by CD4+ T cells/binding of CD70 or CD137 on APC by CD8+ T cells; 3) Released cytokines in determining the cell faith of CD4+ T cells (i.e., IL-12 for Th1 type, IL-4 for Th2 type, and IL-6 and IL-23 for Th17) [99]. Specifically, MHC I is recognised by CD8+ T cells and MHC II by CD4+ T cells. Hence, epitope mapping for T cell can be generally divided into 3 steps: 1) Antigen processing by APC; 2) Peptide-binding to MHC molecules; 3) Recognition by TCR. Here, we have listed some T cell epitope prediction tools that can potentially accelerate epitope selection for MHC multimer [100], lymphoproliferation [101], and ELISPOT assays [102].

CD4+ T cell epitope prediction in SARS-CoV-2

CD4+ T lymphocytes and their associated cells recognise exogenous antigens. When APCs encounter, exogenous antigens are taken up by the APCs through phagocytosis into the endo/lysosomal pathway. The antigens are then broken down by enzymes such as cathepsins and proteases, and present on the class II MHC molecule [103]. Current established CD4+ T cell epitope prediction methods focus on the MHC II binding prediction. MHC II binding peptides vary in length between 9 and 22 residues but only the peptide-binding core with 9 AA in length sits in the binding groove [104]. Compared to MHC I molecule, MHC II binding groove is open, allowing ends of the peptide to extend beyond the groove, and the pocket is shallower than MHC I. Such characteristics constitute the cause of a lower accuracy in peptide-MHC II binding prediction methods [105]. Table 3 shows a list of peptide-MHC II binding prediction tools, their accuracy, and their applications in SARS-CoV-2 vaccine design studies. Notably, NetMHCIIpan 4.0, available on IEDB, integrated large-scale MHC II ligand mass spectrometry (MS) data for training using the NNAlign_MA method [106], [107]. Some SARS-CoV-2 vaccine design studies utilised MixMHC2Pred alternatively. MixMHC2Pred is a predictor trained on the MS data with a motif deconvolution algorithm of over 99,000 MHC-II peptides and has been shown to outperform the earlier version of NetMHCIIpan [108]. Another notable example is TepiTool, a pipeline for predicting MHC I and II epitopes using a combined number of tools [109] and have been used in several SARS-CoV-2 studies [29], [76], [80], [110]

Table 3

Common T-cell epitope prediction tools.

Tool	Method	Accuracy	Utility in SARS-CoV-2 vaccine design publications
MHC I/II binding prediction tools
ARB (Accessible on IEDB)	Average relating binding (ARB) matrix-based prediction of binding propensity to MHC II	Self-reported optimal accuracy of 83%	Not utilised/cited in any SARS-CoV-2 vaccine design publications
RANKPEP (MHC I: RANKPEP)	Position specific scoring matrix-based predicting of MHC binding peptides	Self-reported optimal accuracy of 0.96 AUC for MHC II and 80% for MHC I	[He et al., 2020] [132]: used to predict MHC II epitopcy in 28 epitopes from S protein [Yazdani et al., 2020] [255]: identified MHC-II immunogenicity of epitopes selected from S, E, N, and M proteins
Tepitope (Accessible on IEDB as Sturniolo)	A matrix-based method utilising pocket profiles of MHC binding peptides	Self-reported optimal sensitivity of 80%	[Dong et al., 2020] [69]: predicted MHC II epitopes in combination of NN-align, SMM-align, and MetMHCIIpan method of various SARS-CoV-2 proteins [Martin and Cheng, 2020] [262]: predicted MHC II epitopes in combination of NN-align, SMM-align, CombLib and MetMHCIIpan method of glycan protein on S protein [Srivastava et al., 2020] [236]: predicted MHC I epitopes of 11 ORF, structural and non-structural proteins in combination with NN-align, consensus, SMM-align, combinatorial library, and NetMHCIIpan
MHCAttnNet	A deep neural model trained on amino acid sequences of MHC epitopes	Self-reported optimal accuracy of 94.18%	Not utilised/cited in any SARS-CoV-2 vaccine design publications
MHCnuggets	A long short-term memory network trained on common or rare alleles of MHC epitopes	Self-reported optimal accuracy of 0.924 AUC	[Campbell et al., 2020] [126]: Predicted MHC-I epitopes with 5 other tools for HLA supertype covering 90% of the population in 57 countries
Combinatorial library (Accessible on IEDB)	Heuristic method translating positional scanning combinatorial library data of MHC I and II epitope data into main and secondary anchor positions with preferred residues for MHC binding peptide predictions	Self-reported optimal accuracy of 0.935 AUC	[Martin and Cheng, 2020] [262]: predicted MHC II epitopes in combination of NN-align, SMM-align, Tepitope and MetMHCIIpan method of glycan protein on S protein [Srivastava et al., 2020] [236]: predicted MHC I epitopes of 11 ORF, structural and non-structural proteins in combination with NN-align, consensus, SMM-align, Tepitope, and NetMHCIIpan [Yazdani et al., 2020] [255]: identified MHC-II immunogenicity of epitopes selected from S, E, N, and M proteins with ANN and SMM, applied as consensus method [Dong et al., 2020] [69]: Used as the ‘Combinatorial library’ with other IEDB accessible methods for predicting MHC epitopes in all proteins
MHC II binding prediction tools for CD4+ T-cells epitopes
NetMHCIIpan 4.0 (available on IEDB)	Indirect methodNNAlign_MA method trained on MHC II ligand MS data	Self-reported optimal accuracy of 0.952 PPV	[Liu et al., 2020] [128]: predicted MHC class II peptides in SARS-CoV-2 samples of different ethnic backgrounds [Ayyagari et al., 2020] [112]: Predicted 27 MHC II alleles in M protein [Khairkhah et al., 2020] [92]: Predicted 20 MHC II epitopes in S, N, and M proteins [Requena et al., 2020] [263]: Predicted 30 MHC II alleles with consideration of HLA frequencies by country [Liu et al., 2020] [264]: Filtered 22 potential MHC II epitope candidates in S, M, and N proteins [Dar et al., 2020] [238]: pan-MHC II binding prediction of surface glycoprotein
EpiDock	A peptide library of X-ray structure of HLA-DP2 proteins with prediction performed using AutoDock [Morris et al., 2009] and Rosetta Dock [Lyskov and Gray 2008]	Self-reported optimal accuracy of 0.900 AUC	[Can et al., 2020] [233]: used for filtering MHC II epitopes with low IC50 docking to specific MHC II allele
EpiTOP	Quantitative matrix-based approach in predicting MHC II binding based on proteochemometrics	Self-reported optimal accuracy of 93%	Not utilised/cited in any SARS-CoV-2 vaccine design publications
HLA-DR4Pred	SVM-based model in identifying HLA-DRB1*0401 binding peptides	Self-reported optimal accuracy of 86%	Not utilised/cited in any SARS-CoV-2 vaccine design publications
PREDIVAC	Specificity-determining residue (SDR) based MHC II binding prediction integrated for 95% of MHC II allelic variants	Self-reported optimal accuracy of 0.872 AUC	[Dar et al., 2020] [238]: used to predict MHC II supertype allele interaction following prediction by NetMHCIIpan
SMM-align (Accessible on IEDB)	A stabilized matrix alignment method based on MHC II epitope amino acids preferences by Gibs sampler and predicting using SVRMHC predictions	Self-reported optimal accuracy of 0.756 AUC	[Dong et al., 2020] [69]: predicted MHC II epitopes in combination of NN-align, Tepitope, and MetMHCIIpan method of various SARS-CoV-2 proteins [Martin and Cheng, 2020] [262]: predicted MHC II epitopes in combination of NN-align, Tepitope, CombLib and MetMHCIIpan method of glycan protein on S protein [Srivastava et al., 2020] [236]: predicted MHC I epitopes of 11 ORF, structural and non-structural proteins in combination with NN-align, consensus, Tepitope, combinatorial library, and NetMHCIIpan [Campbell et al., 2020] [126]: MHC binder identification for epitope prediction targeting 90% populations of 57 countries
NN-align (Accessible on IEDB)	ANN-based trained on MHC II binding epitopes with binding core and affinity results available upon query	Self-reported average accuracy of 0.855 AUC	[Behmard et al., 2020] [131]: utilised to identify MHC II binding epitopes in S and N proteins [Savafi et al., 2020] [74]: Predicted MHC II binding epitopes with SMM-align, MHCpred and NetMHCIIpan 3.2 methods [Rehman et al., 2020] [67]: Used for predicting T helper cell epitopes in spike protein
Consensus (Accessible on IEDB)	Combination of NN-align, SMM-align, and the combinatorial peptide scanning library methods on IEDB	Self-reported average accuracy of 0.89 ± 0.05 AUC	[Yazdani et al., 2020] [255]: identified MHC-II immunogenicity of epitopes selected from S, E, N, and M proteins[Singh et al., 2020] [64]: used to predict 15-mer T-helper cell epitope from S protein[Mukherjee et la., 2020] [110]: used to filter MHC II epitopes with lower than specified IC50 threshold
CD8+ T-cells epitopes Prediction Tools
vPcleavage	Proteasomal cleavage site prediction tool.A SVM-based classifier trained on amino acids of 7AA in length with cleavage site represented in sequence window	Self-reported optimal accuracy of 0.805 AUC	Not utilised/cited in any SARS-CoV-2 vaccine design publications
NetChop3.1(Accessible ovn IEDB)	Proteasomal cleavage site prediction tool. Neural network-based trained on sequence encoded data	Self-reported optimal accuracy of 0.85 AUC	[Shomuradova et al., 2020] [265]: Predicted proteasomal cleavage score of C-terminal AA in S protein peptides [Safavi et al., 2020] [74]: Predicted proteasomal processing in T cell epitope together with MAPP
PCPS	Proteasomal cleavage site prediction tool. Provides three models for predicting immunoproteasome and proteasome cleavage site each. Developed through training on residue fragments	Reported sensitivity of 0.88 and specificity of 0.57 in Gomez-Perosanz et al., 2019[117]	Not utilised/cited in any SARS-CoV-2 vaccine design publications
TAPPred	Proteasomal fragement-TAP binding prediction toolA cascade SVM-based method trained on quantitative matrix of TAP binding regions	Self-reported optimal accuracy of 0.81 Pearson's correlation coefficient	Not utilised/cited in any SARS-CoV-2 vaccine design publications
TAPhunter	Proteasomal fragement-TAP binding prediction toolA SVM-based model trained on TAP peptide fragments and composition effects	Self-reported optimal accuracy of 88%	Not utilised/cited in any SARS-CoV-2 vaccine design publications
TAPreg	Proteasomal fragement-TAP binding prediction toolSVM-based method trained on single residue positions and residue combinations	Self-reported optimal accuracy of 0.89 ± 0.03 Pearson’s correlation coefficient	Not utilised/cited in any SARS-CoV-2 vaccine design publications
Peters et al., 2003 (Accessible on IEDB as part of MHC-I Processing Predictions)	Stabilized matrix method (SMM) used to score sequence position in distinguishing TAP-binding epitopes	Self-reported optimal accuracy of 0.89 AUC	[Srivastava et al., 2020] [236]: Used to predict Cytotoxic T cell epitopes in the ‘MHC-I Processing Predictions’ pipeline [Anand et al., 2020] [260]: Predicted 11 antigenic T cell epitopes from structural proteins [Banerjee et al., 2020] [259]: Predicted TAP binding affinity as part of the IEDB server
MHCflurry2.0	MHC I binding prediction toolA neural network-based model trained on MHC I allele with affinity and MS data	Self-reported optimal accuracy of 0.94 AUC	[Campbell et al., 2020] [126]: Predicted MHC-I epitopes with 5 other tools for HLA supertype covering 90% of the population in 57 countries
NetCTLpan1.1	MHC I binding prediction toolIntegrated MHC peptide binding with NetMHCpan, with proteasome cleavage predicted with NetChop and TAP transport prediction by Peters et al., 2003	Self-reported optimal accuracy of 0.976 AUC	[Ayyagari et al., 2020] [112]: Identification MHC I epitopes on M proteins [Safavi et al., 2020] [74]: predicted MHC I epitopes on 6 non-structural proteins [Mishra et al., 2020] [266]: predicted MHC I epitopes with PickPocket 1.1 from structural and non-structural SARS-CoV-2 proteins
NetMHCpan 4.1 (available on IEDB)	MHC I binding prediction toolNNAlign_MA method trained on MHC I ligand MS data	Self-reported optimal accuracy of 0.8291 PPV	[Campbell et al., 2020] [126]: Predicted MHC-I epitopes with 5 other tools for HLA supertype covering 90% of the population in 57 countries [Liu et al., 2020] [128]: MHC-I epitope prediction for augmentation to increase vaccine population coverage [Dearalove et al., 2020] [267]: 9-mer MHC-I epitope prediction in S protein [Liu et al., 2020][264]: Predicted MHC Class I epitope with MHCflurry in ensemble.
DeepLigand	MHC I binding prediction toolA semi-supervised model trained on sequence features correspond to secondary factors of MHC I binding epitopes	Self-reported optimal accuracy of 0.979 AUC	Not utilised/cited in any SARS-CoV-2 vaccine design publications
PickPocket1.1	MHC I binding prediction toolA library of pocket-residues and corresponding binding specificities with consensus to NetMHCpan method	Self-reported optimal accuracy of 0.92 AUC	[Mishra, 2020] [113]: Predicted 921 cytotoxic T cell lymphocyte epitopes in 10 SARS-CoV-2 proteins with NetCTLpan1.1 [Yazdani et al., 2020] [255]: Predicted MHC-I epitopes on IEDB server together with other listed tools [Campbell et al., 2020] [126]: Predicted MHC-I epitopes with 5 other tools for HLA supertype covering 90% of the population in 57 countries
HLA-CNN	A convolutional neural network-based trained on HLA-Vec, an AA distributed representation of epitopes	Self-reported average accuracy of 0.836 AUC	Not utilised/cited in any SARS-CoV-2 vaccine design publications
HLA presentation coverage prediction
EvalVax (Unlinked/ Robust)	A combinatorial optimization from OptiVax tools considering HLA allele frequenciesUnlinked: assume independence between HLA lociRobust: considering linkage disequilibrium of HLA genes between loci with halotype frequencies for population coverage estimates	Unavailable	[Liu et al., 2020] [128]: Used for ensuring high population coverage in T cell epitopes
ShinyNap	A visualization tool for predicting antigen immunogenicity based on the HLA presentation	Unavailable	[Yamarkovich et al., 2020] [24]: used to identify 65 peptide sequence with potential coverage to 99.4% of population in the Bone Marrow Registry
TCR-peptide binding prediction
iPred	Estimate baseline frequencies of TCR specificities by database of TCR sequence with known antigen specificities	Unavailable	[Shomuradova et al., 2020] [265]: Used for motif discovery for TCR repertoires in epitopes [Lee, 2020] [268]: used for predicting immunogenicity between the SARS-CoV-2 epitope and IEDB peptides
ERGO	NLP-based methods in predicting CD4+ or CD8+ TCR binders using large-scale TCR-peptide dictionaries	Self-reported optimal accuracy of 0.98 AUC	Not utilised/cited in any SARS-CoV-2 vaccine design publications
NetTCR	A CNN-based sequence-based predictor trained on AA sequence of peptide and CDR3 region of TCR β-chain	Self-reported optimal accuracy of 0.727 AUC	Not utilised/cited in any SARS-CoV-2 vaccine design publications
TcellMatch	Multiple deep learning methods trained on data of MHC binding and TCR sequences from multimodal single-cell experiments	Self-reported optimal accuracy of 0.87 AUC	Not utilised/cited in any SARS-CoV-2 vaccine design publications
IFNepitope	SVM-based classifier applied on AA composition and length of IFN-γ inducing MHC class II peptides	Self-reported optimal accuracy of 81.39%	[Behmard et al., 2020] [131]: predicted of IFN-ɣ release in selected epitopes [Ayyagagari et al., 2020] [112]: predicted of IFN-ɣ release in selected epitopes [He et al., 2020] [132]: predicted IFN-ɣ inducing property of MHC epitopes [Samad et al., 2020] [133]: predicted IFN-ɣ inducing property of MHC epitopes [Ahammad et al., 2020] [133]: predicted IFN-ɣ inducing property of MHC epitopes
IL4pred	SVM-based classifier applied on IL4 inducing peptides for T_h2 helper cell epitope prediction	Self-reported optimal accuracy of 75.76%	[Behmard et al., 2020] [131]: predicted of IL-4 release in selected epitopes [Sarkar et al., 2020] [234]: predicted IL-4 inducing property of MHC epitopes [He et al., 2020] [132]: predicted IL-4 inducing property of MHC epitopes [Samad et al., 2020] [133]: predicted IL-4 inducing property of MHC epitopes [Ahammad et al., 2020] [134]: predicted IL-4 inducing property of MHC epitopes
IL10pred	RF-based model trained on motif and residues of IL-10 inducing peptides	Self-reported optimal accuracy of 0.88 AUC	[Behmard et al., 2020] [131]: predicted of IL-10 release in selected epitopes [Sarkar et al., 2020] [234]: predicted IL-10 inducing property of MHC epitopes [He et al., 2020] [132]: predicted IL-10 inducing property of MHC epitopes [Samad et al., 2020] [133]: predicted IL-10 inducing property of MHC epitopes [Ahammad et al., 2020] [134]: predicted IL-10 inducing property of MHC epitopes

This table shows some of the common T-cell epitope prediction tools and their reported accuracy or evaluated accuracy, if available. Their utilities in SARS-CoV-2 vaccine design papers are summarised briefly by the result they have indicated and their application on which protein on SARS-CoV-2, if mentioned. As seen, some tools with relatively high accuracies are yet to be utilised for SARS-CoV-2 vaccine designs.

Common T-cell epitope prediction tools. This table shows some of the common T-cell epitope prediction tools and their reported accuracy or evaluated accuracy, if available. Their utilities in SARS-CoV-2 vaccine design papers are summarised briefly by the result they have indicated and their application on which protein on SARS-CoV-2, if mentioned. As seen, some tools with relatively high accuracies are yet to be utilised for SARS-CoV-2 vaccine designs.

CD8+ T cell epitope prediction in SARS-CoV-2

CD8+ T lymphocytes are responsible for processing endogenous antigens. Endogenous antigens are cleaved by the proteasome into peptides and then transported by transporter-associated protein (TAP) to the endoplasmic reticulum for the association to MHC I molecules [103]. MHC I epitopes are usually shorter than MHC II epitopes, ranging from 9 to 11 AA in length, and sit in the MHC I groove delineated by two α-helices [104]. Although most predictions only focused on the intermediate step of MHC I binding affinity, tools are also available in predicting the proteasomal cleavage site, TAP binding affinity, and TCR binding affinity, as mentioned above. Additionally, a platform such as NetCTLpan [111] provides end-to-end cytotoxic T cell (CTL) epitope predictions and has been utilised in several SARS-CoV-2 vaccine studies [74], [112], [113]. Endogenous antigens are known to have proteasome cleavage sites at their C-terminal for cleavage by immunoproteasomes [114]. The cleavage specificities of the immunoproteasome are determined by the residues located at the cleavage sites and neighbouring positions [115]. Recently, ML classifiers have been trained on cleavage and non-cleavage sites such as Pcleavage [116], PCPS [117] and NetChop [118]. Following proteasomal cleavage, TAP binding is also essential in ensuring transport to MHC I binding. TAP transporters prefer to translocate peptides of 8–11 AA in length, similar to the MHC I epitope lengths [119]. They also have higher affinity to hydrophobic residues at specific positions [120]. Based on the TAP motifs, consensus matrix, and their subsequent application to ML-techniques, several methods have been developed. A summary of all endogenous antigen processing prediction tools is beyond the scope of this review. Instead, we present, in Table 3, the accuracy of the listed tools and some examples of their application on SARS-CoV-2 vaccine design. Notably, some tools listed for MHC II binder predictions can also be used for MHC I prediction due to the data used for their model training. For example, RANKPEP [121], [122] provides position-specific scoring matrix for both MHC I and II binding peptides and MHCnuggets [123] is a neural network-based model trained on common and rare alleles trained on MHC binders. Some tools are specific for MHC I peptide predictions, e.g., NetMHCpan-4.1 [124]. Again, their utility in the SARS-CoV-2 study and reported accuracy are summarised in Table 3.

HLA coverage

With the aim of inducing herd immunity on a global scale against SARS-CoV-2, it is important that a vaccine candidate is effective across the human population. Campbell et al. prioritised the HLA frequency data from Allele Frequency Net Database (AFND) [125] to ensure their epitope candidates can bind to greater than 90% of populations across 57 countries [126]. Other computational tools accounting HLA allele subtype coverage can estimate the immunogenicity of populations with different HLA subtype backgrounds. Yarmarkovich et al. applied ShinyNap [127] for scoring HLA presentation on a population scale for Cancer to predicted epitopes for SARS-CoV-2, additionally they calculated their epitopes were not similar to endogenous human sequences [24]. Liu et al. created two tools for the optimisation (OptiVax) and evaluation (EvalVax) of population coverage of epitopes for peptide vaccines [128]. Additional information can be found in Table 3.

TCR binding prediction

Following binding prediction to MHC molecules, there are also several direct methods used to predict whether the MHC-bound peptides can be recognized by the T cells (see Table 3). For example, ERGO is a Natural Language Processing (NLP) based method in creating a large-scaled T cell receptor (TCR)-peptide dictionary for TCR-peptide binding prediction and is applicable for both CD4+ and CD8+ T cell eptiopes [129]. Furthermore, cell-type specific TCR-peptide binding predictions are also available, such as IFNepitope for predicting epitope binding to IFN-ɣ-induced Th1 cells [130]. The combined utility of IL4pred, IL10preds and IFNepitope are seen in several SARS-CoV-2 vaccine design studies to check the corresponding cytokine inducibility of their MHC epitope candidates [131], [132], [133], [134], [135].

Epitope prediction evaluation and challenges

Despite the exhaustive list in Table 2 and Table 3, there are still many more tools available for epitope prediction. Even though some models have similar or even better accuracy, only minorities are utilised in the SARS-CoV-2 epitope design, possibly due to the convenience introduced by the web servers. Furthermore, several efforts are seen in benchmarking various tools but none when used in combinations [60], [135], [136], [137], [138]. It is beyond our scope to conclude the best epitope prediction pipeline for the purpose of this review. However, we advise readers to refer to Sohail et al. [135] where the performance of each T cell epitope prediction method used in SARS-CoV-2 vaccine designs is reviewed. There is an additional concern in the novelty of epitopes predicted in studies using the same prediction methods. Alternatively, readers can refer to Prachar et al. [137] and Sohail et al. [135] to avoid utilising a suboptimal tool and additional guidance for other novel approaches. SARS-CoV-2 is indicated to mutate 9.8 × 10−4 substitutions per site per year [139] and at least 15 variants have been reported throughout 2020 [140]. It is imperative that an epitope selected for vaccine development is conserved across mutations as well as across multiple populations. The IEDB conservancy tool [141] has been used to do this against SARS-CoV-2 epitope candidates [23], [142]. Clustal omega, EBI’s sequence alignment tool was also used [22], [24]. Some studies have also screened for conservation between SARS-CoV-2 and other coronaviruses such as SARS-CoV and MERS-CoV [24], [26]. In addition to the listed methodologies, Gupta et al. developed a platform of potential vaccine candidate or epitopes from SARS-CoV-2 proteins with predicted immunogenicity [91]. Similarly, Ahmed et al. (COVIDep) [18] and Wu et al. (COVIEdb) [143] presented databases of potential T and B cell epitopes for SARS-CoV-2. COVIEdb also provides SARS-CoV, and MERS-CoV with regularly updated validation results [143]. These resources could be used either for cross-validation or ensuring novelty of the epitope from other studies.

Immunogenicity prediction

The ongoing pandemic of COVID-19 has urged the scientific community to find answers, both in terms of repurposing existing drugs and expediting development of new vaccine candidates [15]. Currently, immunogenicity predictions of most SARS-CoV-2 vaccine studies focus on the static interaction between the antigen and the immune system component. These predictions, although useful, are unable to give specific knowledge about how the immune system, as a whole, acquires the pathogen-specific immune memory and fights off the pathogen. Such knowledge is only acquired during the later in vivo studies and may increase risks of failure, decrease productivity, and escalate costs. Additionally, limited knowledge about genomic instability and immunogenicity of genome-based (i.e., DNA, RNA and mRNA) vaccines encourage the development of novel strategies in studying the vaccine immune response dynamics [144]. Indeed, there is an increasing number of publications advocating the usage of relevant in silico modelling in vaccine design [145], [146], [147], [148]. In their review, Brown et al. [147] pointed out that system-based approaches in biology and clinical medicine have been almost entirely data-driven, despite the limited insight these models give into the biological mechanisms involved. In contrast, mechanistic models are constrained by prior knowledge, possess greater predictive value under extrapolation, and can be iteratively improved by comparison against newly acquired data—a process that can lead to gradual model refinement. They mention three aims mechanistic modelling is well aligned with: 1) gaining deeper understanding of the protective immune responses mounted against pathogens, 2) building a mechanistic context, which can enable a more thorough interpretation of clinical data sets, and 3) designing better informed medical experiments and clinical trials, with the potential to lead to lower rates of therapeutic failures. Rhodes et al. [146] focused instead on the dose-finding practices currently in place for new vaccines and proposed a novel in silico method which they termed Immunostimulation/ Immunodynamic (IS/ID) modelling. Typically, large dose ranges are first tested on small animal models, such as rats and mice, until a maximum plateau in response is reached. Allometric scaling assumes that physiological parameters follow a power-law relationship with biological measures, such as weight or height. Thus, allometric scaling is applied, even in the case of interspecies ones, allowing for effective human doses to be estimated. However, the applicability of allometric scaling on cross-species translation is challenging, as the immunological relationships involved are not well characterised. Additionally, dose escalation studies often rely on the long-standing assumption that the dose-host response relationship is saturating (sigmoidal). However, in some cases, a peaked dose–response curve has instead been observed. This exposes the potential risk of making sub-optimal, higher than needed, dosing decisions. The WHO has conducted retrospective dose ranging studies on vaccines in the past, where fractional doses were found to be equally, or in some cases more, immunogenic than the full licensed ones for diseases such as yellow fever, meningitis, and malaria [146]. The importance of informed dosing decisions cannot be overstated. In the case of Oxford/AstraZeneca’s COVID-19 vaccine (AZD1222), an efficacy of 62.1% was reported for participants who received two standard doses, while in those who received a lower first dose, followed by a standard second dose, an increased efficacy of 90% was observed [21]. In this context, in silico immunogenicity prediction can potentially be used to better inform first-in-human dose selection, as studies transition from the preclinical to the clinical setting.

Mathematical modelling on SARS-CoV-2 infection

Many mathematical models of the immune response exist in literature today, where the complex immune system is simplified and defined in formalisms, the algorithmic or mathematical frameworks. These often rely on ordinary differential equations (ODEs) or partial differential equations (PDEs) or follow an agent-based modelling (ABM) approach and are applied to both synthesizing the system through multi-scale or trans-scale integration. Some of the commonly used formalisms in simulating humoral and/or cell-mediated immunity upon infection have been summarised in Table 4 . Additional examples of in silico modelling of immunogenicity have been discussed in Hande et al. [149], Charoentong et al. [150], Dobrovolny et al. [151], and Narang et al. [152].

Table 4

Common formalisms used in modelling immunogenicity.

Formalism	Advantages	Disadvantages	Application to SARS-CoV-2 Immune Response Studies
Ordinary differential equations (ODEs)[190], [191], [192], [269]	Admit analytical solution; Can provide precise continuous-time dynamics of networks with multiple entities	Has a finite dimensional state vector; Unable to capture spatial dynamics; Relatively low scalability	[155], [156], [157], [158], [160], [163]
Partial differential equations (PDEs)[270]	Admit analytical solution; Allow derivatives of unknown function for dynamics in time and space; Has an infinite dimensional state	More complex than ODEs and DDEs; Computationally demanding; Relatively low scalability	[160], [161]
Impulsive differential equations (IDEs)[271]	Allow derivates with sudden changes in states over continuous model; Can be multi-parametric;	Relatively low scalability; Most are fixed moments; Traditional IDEs do not allow non-instantaneous impulses	[163], [272]
Cellular automata[273]	Able to exhibit complex system-level behaviour with qualitive results	Non-quantitative; Low scalability; Computationally demanding and time consuming; Difficult to transfer results into biological interpretation	[274]
Agent-based modelling[153]	Explores behaviour of individual entities; Some agents are adaptive in their behaviour; Depicts local interactions and environmental heterogeneity	Non-quantitative and entities of properties are discrete; Not suitable for intracellular networks	[159], [161], [275]

Common formalisms used in modelling immunogenicity. There is no global best formalism for system immunology modelling and the choice of model is dependent on the study objective [149], [152]. Modelling methodologies such as differential equations are more suitable to depict a homogeneous system whereas cellular automata and agent-based modelling are more commonly used in the simulation of a heterogeneous system [153], [154]. Cellular automata and differential equations are also more commonly used to study the intracellular network as compared to the uses of agent-based modelling in studying a network composed of both innate and adaptive immunity. ODEs and PDEs are also more favourable over cellular automata and ABM if a numerical analysis is needed to validate the hypothesis of interest. For example, ODEs have been used in several simulations to study the within-host response to SARS-CoV-2 infections [155], [156], [157], [158]. Alternatively, multi-formalism modelling is possible in overcoming the pitfalls of each formalism, and examples of applications to SARS-CoV-2 immune response studies are seen in Sahoo et al. [155], Getz et al. [159], Peter et al. [160], and Fain et al. [161]. For example, Hernadez-Vargas and Velasco-Hernandez [162] used target cell-limited models integrated with ODEs in exploring the T cell-induced immunity upon SARS-CoV-2 infection. The model also adapted mathematical terms in evaluating the consequence upon the introduction of a hypothetical drug. Although the approach is succinct, this paper identified some potential mechanisms regarding the timeline of symptoms-development and the benefits of prophylactic medication upon SARS-CoV-2 infection. Similarly, Getz et al. [159] has assembled a multidisciplinary team of scientists in developing a Physicell-based multi-scale model in simulating critical processes during SARS-CoV-2 infection. The framework is based on agent-based modelling in imitating the 2D and 3D tissue environments coupled with cell phenotypic parameters using ODEs. In the study by Sahoo et al. [163], the model predicts that an intermediate strength innate immune response integrated with a weaker adaptive immune response will require longer virus clearance time, as reflected by the age-dependent severity in COVID-19 in clinical settings [164]. In another study, their mathematical model suggests that the disease severity is increased for people who are exposed more often to SARS-CoV-2 [156]. This is persistent with a clinical study where those infected in a high transmission setting of Wuhan have a high odd ratio for severe COVID-19 outcomes as compared to low transmission settings [22]. The model in Fain et al. [161] also indicates that the infection duration can range up to 73 days, despite most clinical studies suggesting a median duration of 14–20 days. This is supported by several clinical findings with patients who shed the virus for more than 60 days post-hospitalization [156], [165], [166]. These findings suggest that although the models are simplified, they can provide additional comparable insights into the infection. Later in this review, we will discuss how mathematical models can be applied in studying the vaccine immune response.

PK/PD studies in vaccine design

Mathematical modelling approaches have become an integral part of drug discovery and development in recent years and are used when studying drug PK and pharmacodynamics (PD). Surprisingly, there is only limited knowledge available on vaccine absorption, distribution, metabolism, and excretion (ADME) processes. Indeed, there is the long-lasting belief that PK studies are irrelevant in evaluating vaccine efficacy. This is also reflected on the current regulatory registration processes across the world, which require no experimental PK studies to be conducted for vaccine approval [16]. Here, the flawed premise of this notion is challenged by briefly revisiting how (adjuvanted) vaccines induce immunogenicity. Currently, PK modelling is rarely used in vaccine (antigen or adjuvant) immunogenicity prediction. The few PK models that exist in literature are presented and their relevance to the COVID-19 vaccine development process is discussed.

Timing and location in vaccine efficacy

The immune response can be broken down into two general categories: innate and adaptive. When an invading pathogen is encountered, cellular effectors of the innate immune response, such as macrophages and dendritic cells (DC), are able to detect the encounter, through activation of pattern recognition receptors (PRRs) that survey both the extracellular and intracellular space, responding to conserved pathogen-associated molecular patterns (PAMPs). This sets off a cascade of events, including antigen uptake by innate cells, such as immature DCs, that subsequently differentiate to mature antigen-presenting cells (APCs). Mature APCs that have processed the antigen to peptide fragments, present it on their surface in forms appropriate for inducing naïve T cell activation (priming). This interaction takes place within the lymph nodes (LNs), a pivotal meeting point between lymph-migrating APCs and naïve T and B cells– principal drivers of adaptive immunity– thus establishing a link between the innate and adaptive arms of the immune response. Indeed, the adaptive response requires innate signals for its activation [14], [167], [168]. The LNs are areas of high T and B cell concentration, however, these are compartmentalised into specific sites. T cells reside in the paracortex (T cell zone), deeper in the LNs, together with a large population of migratory antigen-presenting DCs. B cells are located in the follicles (comprising the B cell zone), areas where non-migratory follicular DCs (FDCs) are also found. Upon exposure to antigen presented by APCs, naïve CD4+ T cells (T helper cells or Th) are activated and differentiate into functionally distinct subpopulations. This process is influenced by the available levels of antigen and higher doses are associated with greater production of follicular helper T cells (TFH). B cell priming requires binding to cognate antigen in its native form and is less sensitive to antigen levels [167], [168]. Following activation, B cells enter specialised regions of the follicles that are named germinal centres (GCs). GCs are further split into a light and a dark zone. In the dark zone, activated B cells undergo proliferation and somatic hypermutation of their antigen receptors; when a round of proliferation ends, GC B cells migrate to the light zone, where they acquire antigen from the FDCs, which is then processed and appropriately presented on their surface to TFH cells. This is a cyclic process known as B cell affinity maturation and signals provided by the TFH cells drive the fate of the B cells. Upon exiting the GCs, B cells either become antibody-producing cells (short-lived, termed plasmablasts, or long-lived, named plasma cells), or memory cells [167]. The aim of vaccination is the induction of immunological memory. Modern subunit vaccines lack the pathogenic features (PAMPs) required for sufficient activation of the innate immune response, which in turn affects the effectiveness of the downstream adaptive response. In this case, adjuvants are paired with purified antigen in order to provide the inflammatory cues that the former lack. As discussed, however, efficient antigen transport to the LNs is a critical component in the cascade of events eventually leading to the formation of memory cells. The use of adjuvants adds one more layer of complexity that needs to be considered during vaccine design. In their excellent review, Irvine et al. [167] point out that cell-mediated antigen/adjuvant transport to the draining LNs (dLNs) is a relatively inefficient process compared to direct lymphatic transport. Furthermore, adjuvants that fail to accumulate in the dLNs and reach the blood (lack of right location), pose the additional risk of inducing systemic toxicity. All in all, it becomes apparent that the biodistribution profiles of the antigen and adjuvant should be evaluated together. High antigen levels in the absence of inflammatory cues within the LNs or vice versa (lack of right timing) will result in suboptimal immune response [167].

Predicting antigen immunogenicity

The introduction of IS/ID modelling, while in its early stages, aims to translate PK/PD modelling from drug development to vaccine design. Similar studies still remain scarce in the literature as far as vaccines are concerned. These models, while lacking strong mechanistic nature, provide concrete insights on the predictive capabilities of PK/PD methods in vaccine design. Indeed, IS/ID modelling has already been used to facilitate animal model decision-making and inform first-in-human dose selection. Rhodes et al. [169] use data from macaques to predict IFN-γ responses in humans, following tuberculosis (TB) bacillus Calmette-Guérin (BCG) vaccination. Two different macaques species, rhesus and cynomolgus, are the primary non-human primate models for TB. However, it has been shown that under the same experimental conditions, the outcomes can be species-dependent and even within the same species, the country of origin of macaque might affect the level of protection against infection and response post-vaccination. The model adopts a compartmental structure and nonlinear mixed-effects modelling (NLMEM) are used to describe the IFN-γ dynamics of two CD4+ T cell populations: transitional effector memory (TEM, short-lived) and resting “central” memory (CM, long-lived) cells. A proportion of TEM cells, whose rate of post-vaccination recruitment is time-dependent, undergo apoptosis, and the remaining transition to CM cells. Notably, this is an over-simplified model, consisting of 5 parameters in total, out of which 3 are fitted to the experimental data. The model, separately fitted to macaque and human ex vivo IFN-γ data, is shown to describe the empirical data well. It is then used, among others, to: 1) identify covariates that explain the within-population variation and 2) test which fitted macaque models best predict human IFN-γ responses. Macaques were found to be best stratified by colony and humans by baseline BCG status, and these two were significantly associated regarding the BCG-induced IFN-γ response. Indonesian cynomolgus macaques and Indian rhesus macaques best predicted the immune response of baseline BCG-naive humans, while Mauritian cynomolgus macaques were found to do the same for baseline BCG-vaccinated ones. In a subsequent publication, Rhodes et al. [169] use the same IS/ID modelling framework to inform TB vaccine dose decision-making in humans. The model is calibrated on IFN-γ response data for the TB vaccine H56 adjuvanted with IC31® (H56 + IC31) in mice and humans, and H1 + IC31 data in humans only. The mouse data were stratified by dose group and the TEM to CM cell transition rate (βTEM) was the single parameter selected to differ among them. Subsequently, the βTEM-dose curve was estimated for mice, assuming a peaked curve profile based on prior studies. This was used to extrapolate to doses ranging 0.01–50 µg for H56 + IC31. In humans, the single available experimental point for 50 µg H56/H1 + IC31 was used, together with an allometric scaling factor, to predict (human) βTEM values for doses in the 0.1–500 µg H56 + IC31 range, using the mouse-predicted curve as a starting point. This approach enabled the authors to predict that doses in the range of 0.8–8 µg may be as, or more, immunogenic in humans as larger doses. This prediction was later independently supported by a Phase 1/2a H56 + IC31 dose-ranging clinical trial [146]. To put the importance of this into perspective, there is an ongoing scientific effort to find suitable animal models for COVID-19 vaccine development [170]. The implications of utilising similar approaches in vaccine development, range from accelerating the process to enabling the identification of animal models that are better suited for special human populations (such as the elderly or disease groups).

Predicting adjuvant immunogenicity

Modern vaccine design aims to maximise efficacy while minimising potential serious adverse effects. To this end, newer approaches have shifted away from early live-attenuated and inactivated whole-pathogen vaccines, towards platforms with improved safety profiles, such as recombinant viral vectored, nucleic acid-based, and protein subunit vaccines. Adjuvants play a multifaceted role in vaccine development; in addition to boosting the elicited immune response, the use of an appropriate adjuvant can also induce a broadened antibody response with improved magnitude and functionality, due to a greater number of functional antibodies, antibodies with higher affinities, or both. Appropriate adjuvant selection can also guide towards desirable T cell responses. For example, Dong et al [33] propose that Th1-type biased adjuvants may be preferable when compared to those inducing Th2-type immune responses, since the latter have been connected with increased lung immunopathology in SARS-CoV-2 vaccines. As discussed, the PK profiles of adjuvants can have profound impact on the immune response generated by vaccines that rely on them for inflammatory cues. Thus, in silico PK approaches could be used to assess adjuvant biodistribution profiles and, in turn, allow for informed immunogenicity predictions to be made. PK models tend to adopt a compartmental representation, but the compartments involved often lack physiological meaning. The concept of physiologically-based pharmacokinetic (PBPK) modelling differs from other PK modelling approaches in that each compartment corresponds to an organ or tissue, though lumped compartments are sometimes used for dimensionality reduction [171]. Such models come with many advantages due to their nature, which is inherently mechanistic. Among their typical applications, PBPK models are often used for translating existing understanding to novel settings, e.g., different species or special populations. Other important applications include, but are not limited to, informing dosing regimens and generating and testing hypotheses regarding the physiological processes that drive observed drug behaviour [172]. When approached from a different point of view, it becomes apparent that the majority of the aforementioned applications are, essentially, trying to answer the same question: how may intrinsic factors– age, gender, race, weight, height, pregnancy, and organ dysfunction– influence exposure (PK) and/or response (PD) to a specific drug? [173] Achieving improved vaccine efficacy has significant advantages that become ever more relevant in the case of a pandemic. Reduction in the amount of antigen needed while maintaining the target antibody response would be expected to increase manufacturing capacity. Similarly, shortening the vaccination regimen– as in GlaxoSmithKline’s (GSK’s) Fendrix™ hepatitis B vaccine, where addition of the AS04® adjuvant enabled the reduction from a three to a two-dose regimen– could ease the logistical burden that distributing doses on a massive scale poses [174]. Despite their popularity in drug development, PBPK modelling has rarely been used in studying vaccine adjuvants. Tegenge and Mitkus [175] have developed a whole-body PBPK (WB-PBPK) model for squalene-containing adjuvants in human vaccines. In their initial publication, the model is used to make predictions for intramuscularly (IM) injected commercial MF59® adjuvant, a squalene-in-water (SQ/W) emulsion. In a subsequent study, the same generic WB-PBPK model is extended and applied to AS03®, which is a SQ/W adjuvant containing α-tocopherol. The original and subsequent models, adopting the typical PBPK structure and assumptions [171] are a collection of tissue-representing compartments, connected by a circulating blood system. Each tissue compartment is described as a mass-balance ODE under the assumptions of perfusion rate-limited and well-stirred conditions and defined by tissue volume and blood flow rate. These are physiological parameters, only specific to the species of interest, and widely available in the literature. Due to the inclusion of draining and distal LNs, the compartments are also described in terms of lymph flow rates, which are set at a fraction of the corresponding tissue blood flow. Niederalt et al. [176], making a similar assumption for lymphatic flow to Jones et al. [171], estimated the fraction by fitting to experimental concentration–time profiles. Similarly, in their original publication, Tegenge and Mitkus [175] assume a priori almost exclusive lymphatic transport of squalene in part due to its highly lipophilic nature. In a later publication [177], an optimal fractional lymphatic transport value is estimated for a sheep PBPK model by fitting with available experimental data. This is then used to inform a human PBPK model, that is otherwise built on top of human-specific physiological parameters. This, while not ideal, is testament to the translational power of this type of modelling; the ability to extrapolate across species allows for predictions to be made, even under limited data availability. Only passive transport (diffusion) is considered due to both blood and lymphatic flow. For that, tissue-plasma partition coefficients for squalene were estimated using mechanistic tissue composition-based equations for highly lipophilic compounds [175]. These methods, belonging to the broader family of in vitro-in vivo extrapolation (IVIVE) techniques, estimate tissue distribution from physicochemical and in vitro binding characteristics of the compound [171]. Inter-species extrapolation is therefore significantly simplified, since tissue-plasma ratios depend on the difference in tissue composition among species, physiological parameters that are generally available in literature [178]. Additionally, Tegenge and Mitkus [175] modelled emulsion cracking in deltoid muscle and dLNs, described by first-ordered kinetics fitted to in vivo data in squirrel monkeys. Squalene metabolism in adipose tissue and liver, as well as fecal excretion, were also considered following first-order kinetics and fitted to in-vivo data in rats and humans. There is an important point to be made here; PBPK modelling, despite introducing complexity, maintains a clear separation among physiological and substance-specific parameters. The former tends to be easily accessible from various sources in literature, while, for the latter IVIVE, techniques exist that reduce the dependence on in vivo data and thus allow for easier extrapolations. Furthermore, in the event where enough experimental PK data exist, more advanced estimation methods can be applied, even in cases of highly dimensional parameter spaces [179]. While not often used, PBPK models like those presented above could be applied on different scenarios to improve adjuvant selection with minimal effort. This is due to the separation of the physiological parameters (that remain unchanged between models) from the substance-specific ones which simplifies model parameterisation, and the increased availability of IVIVE techniques that allow in vitro data, which is easier to generate, to be used. All in all, the few examples of PBPK models applied in vaccine design exist only for adjuvants. However, these models adopt a generic structure, in that they can be applied to different species and compounds with minimal changes or be extended to cover novel scenarios in a relatively straightforward manner. As such, while PBPK modelling has been traditionally utilised in drug development, a compelling case can be made that their applicability in vaccines might come with minimal changes and that vaccine design, in general, could benefit just as much by the advantages this technique has to offer.

Predicting antigen ADME properties

The adjuvant-focused studies of Tegenge and Mitkus [175], [177] incorporate dLNs and LNs compartments, assume tissue-level lymph flow as a fixed fraction of the corresponding tissue blood flow rates and local preferential lymphatic transfer. To date, there are several PBPK models that include a more involved representation of the lymphatic system [176], [180], [181], [182], [183]; such models are usually built to predict the biodistribution of monoclonal antibodies (mAbs) in animal models or humans. Also, these typically share a common framework, first introduced by Baxter et al. [184], who incorporated the two-pore formalism to describe transcapillary exchange of mAbs. Macromolecular compounds, proteins and peptides convect and diffuse across the barrier between plasma and interstitial space by passing through large and small pores. Fluid recirculation also occurs, with small pores acting as filters, trapping large molecules in the interstitial space, while excess fluid flux into the latter is taken up by the lymphatic system [176], [181]. Niederalt et al. [176] give a detailed overview of the model, but the hydrodynamic radius of the compound is the only drug-dependent parameter needed. The two-pore model of extravasation can be used to describe the passive transport of both antigenic proteins or peptides, and endogenous IgG. Moreover, in contrast to the simplistic approach taken by Tegenge and Mitkus [175], where each compartment represents a single tissue, almost every model here splits tissues into several sub-compartments [176], [181], [182], [185]. Among the two already mentioned, plasma and interstitial space, other sub-compartments can include blood cells (vascular space), cellular space, and endosomal space [176]. This detailed tissue representation requires no preferential lymphatic transfer assumption to be made, as lymph flow is instead incorporated in a mechanistic manner, while also enabling better localisation of active processes and reactions. Importantly, the additional parameterisation needed is physiologically-focused, e.g., tissue-plasma partition coefficients, therefore compound-independent, and can also be handled by IVIVE estimation techniques. A notable example of a localised active process is incorporation of the neonatal Fc receptor (FcRn) binding [176], [181], [185]. The endosomal space represents the region within the vascular endothelial cells where catabolic clearance of IgG and albumin fusion proteins takes place. These proteins are taken up from plasma and interstitial space through pinocytosis; in the endosomes, they reversibly bind to FcRn with high affinity, due to the acidic environment. The protein-FcRn-complex is recycled back to plasma and interstitial space where it dissociates given the low affinity for FcRn binding due to the neutral environment. Any unbound proteins in the endosomal space degrade. The parameters involved in FcRn binding are species-specific [176], [185]. Models that incorporate tissue-specific FcRn expression by assuming a proportional relationship among FcRn concentration and its tissue-specific mRNA expression also exist in literature [186]. This sub-model could be of great interest in the case of “albumin hitchhiking” [187], [188], too, where antigens are modified with a lipophilic albumin-binding domain to achieve greater accumulation in LNs.

Incorporating epitope binding affinities

The components discussed until now account for passive transport of both antigen (macromolecules) and adjuvant (assumed as small molecules) compounds, which includes lymphatic transfer to LNs. They do so by adopting a fully mechanistic representation, while requiring a minimal amount of compound-specific parameters. For a complete integrated approach, however, there is still need for an immune response-mediated antigen/adjuvant clearance and PD response component. A straightforward implementation would be to incorporate pre-existing immune response models, as the ones previously presented. Quantitative systems pharmacology (QSP) techniques [189], [190], [191], [192], in particular, are of much interest for this. Jafarnejad et al. [192] utilise, among others, the formalism given by Chen et al. [190]; here, we focus on the latter, as it presents a generalised framework, including applications [190], that is better aligned with the objectives of this review. This is a multiscale ODE model that at the whole-body level accounts for the in vivo disposition of the antigen. For the latter, an empirical PK model is utilised that could, in principle, be substituted for the more detailed PBPK approach discussed above. Furthermore, mechanistic details are incorporated both at the subcellular and cellular level. The model considers the following immune cells: 1) DCs, chosen to represent all APCs, 2) CD4+ T helper cells, and 3) B cells, and includes activation and differentiation processes for all three. DC activation is modelled to be driven purely by lipopolysaccharide (LPS); after maturation, at the subcellular level, DCs uptake antigenic proteins, which are degraded into T epitope peptides within the endosomes, and then presented in MHC II complexes on their surface. This highly mechanistic description of antigen presentation allows for the integration of the number and MHC II binding affinities of T-epitopes, which can be obtained either in vitro or predicted in silico– as already discussed in this review. Indeed, the model is not only specifically built to allow for that, but also permits incorporation of subject-specific MHC II allele genotypes [189], [190]. At the cellular level, naïve T cells that are activated through DC-presented antigen, proliferate and differentiate into either memory T cells or functional T helper cells. The former can be directly activated by mature DCs, while the latter facilitate downstream B cell activation. B cells then proliferate and differentiate into antibody-secreting short-lived and long-lived plasma cells, or memory B cells.

Integrated PBPK models of vaccines

So far, techniques employed regarding antigen PK/PD (the IS/ID modelling approach) and adjuvant PBPK modelling have been explored. We focused on in silico studies that could be directly translated to COVID-19 vaccine design with minimal to no changes needed. As already discussed, the latter might have the potential to accelerate both preclinical and clinical studies. However, no integrated PBPK/PD modelling framework built around vaccine design exists to date; such approach could be thought to at least consist of: 1) a typical PBPK structure, expanded to include lymphoid compartments and transport, 2) passive compound transport and clearance processes, describing both adjuvant and antigen biodistribution, 3) active immune response-related transport and clearance of adjuvant and antigen compounds, and 4) an immune response PD component, that should be capable of predicting concentration–time profiles of markers relevant in preclinical and clinical research.

Perspectives and challenges

Incorporating PBPK modelling of adjuvants in the vaccine design workflow of COVID-19 vaccines can provide significant benefits. First, in terms of accelerating the development process, partially through speeding up the transition from preclinical to first-in-human, Phase I, clinical studies. In their adjuvant models, Tegenge and Mitkus [175], [177] extrapolate from limited ex/in vivo animal data in mice and sheep to humans. In similar fashion, Rhodes et al. [145] under limited available human data and more extensive, but still limited, mouse ex vivo data, use PK/PD modelling to predict the most immunogenic vaccine doses in humans. Taken together, these methods provide a significantly more refined approach to translating biodistribution profiles from animal models to humans than the currently-utilised allometric scaling approach. Future incorporation of a PB component could further strengthen the flexibility and predictive power of such approaches. Improving dose translation capabilities are particularly important, as small animal models and non-human primates are used extensively in vaccine development; in regard to COVID-19, the WHO has been reporting periodically on the progress made in identifying suitable animal models, which are in part to be used to accelerate preclinical testing of vaccine candidates [170], [193]. This is the same framework (PK/PD) that was also used to better inform animal model selection [169]. Second, in the presence of a validated human PBPK model, the framework can allow for extrapolation across human populations, potentially providing useful insights before, or during, Phase II and II trials are conducted. For example, in the presence of appropriate in vivo PK data in healthy volunteers, such as in Phase I trials, estimation of the expected PK behaviour in specific populations, e.g. the elderly or people with underlying medical conditions or in underrepresented subjects, often women and BAME groups, could take place. Indeed, global evidence shows a higher COVID-19 burden with old age, male sex, obesity, and comorbidities. People from Black, Asian, and minority ethnic (BAME) groups in the UK and Black, Hispanic, and Native American groups in the USA are also at increased risk of COVID-19 complications and death [194]. In this case, computational methods capable of accounting for such covariates should provide predictions of increased importance. Notably, these are the same intrinsic factors discussed earlier, that PBPK modelling is built around of. Such information might then be utilised for designing better-informed Phase II and III clinical trials and could potentially lead to in silico-augmented trials, in which both physical and virtual patients are combined [148]. Most importantly, given that appropriate biodistribution of both antigen and adjuvant is essential for vaccine efficacy, incorporating PBPK modelling in early vaccine development will increase the likelihood of progressing a successful vaccine. In all vaccine development, acceleration of preclinical and clinical testing and reduction in clinical trials cost would be beneficial. These benefits are amplified under pandemic conditions that require rapid vaccine design and evaluation. However, while adjuvant PK data, sparse as may be, exist in literature, antigen related ADME processes are rarely evaluated. Indeed, antigen is administered in small amounts and widely spaced, making kinetic studies hard to implement [16]. A summary of PK parameters reported in vaccine development studies is given by Gómez-Mantilla et al. [16] Then, the sparsity of such data can also be attributed, at least in part, to the absence of regulatory guidance mandating their existence for vaccine approval. That, together with the high dimensionality of the parameter space (88 parameters in total [190]), as is often the case in PBPK/PD modelling, and mechanistic modelling in general, are expected to present a major challenge. However, many of the parameters needed are species-specific; Chen et al. [190] developed their model for both mouse and human. The ability to predict across species, in turn, should allow for easier adaptation into both preclinical and clinical settings. Also, by incorporation of MHC II allele frequency and T-epitope binding affinities, Chen et al. [190] were able to predict the immune response of 1,000 virtual subjects against adalimumab. If combined with the additional PBPK-provided subject-level PK information and integration of in silico-predicted T-epitope affinities of novel vaccine antigens, this could allow for early vaccine immunogenicity profile prediction in different human populations. Such tools could potentially prove useful in cases where rapid vaccine development is prioritised, like in the ongoing COVID-19 pandemic.

Toxicity and allergenicity prediction

Toxicity prediction

There are various adverse clinical effects associated to the currently authorized SAR-CoV-2 vaccines. Taken those approved by the UK authority, i.e., Comirnaty [19], mRNA-1273 [20], and AZD1222 [21], the common side effects include pain, swelling, and redness at the site of administration and reactogenicity symptoms including mild or moderate fever, myalgia or headache after 7 days post-injection. These adverse effects are potentially caused by inflammatory cells infiltration at the injection site or result from the host immune response evoked by the vaccine components. Most importantly, these side effects are tolerable and do not possess lethal risks to the majority of the global population. In contrast, this may not be the case for some other vaccine candidates at the initial phase of vaccine design. It is possible that some promising candidates may present life-threatening toxicity and could not be picked up during in vitro or in vivo studies. More severe adverse events including trauma, macrophagic mypfasitis (MMF), enhanced diseases, and autoimmune diseases have been reported previously in toxicology studies of other vaccines [195], [196], [197], [198]. These studies demonstrate the double-edged potential of vaccine-mediated immune-stimulation. Although the precise mechanisms of these severe adverse events are not well characterized, existing computational tools are available to eliminate vaccine candidates with high chance of evoking toxicities.

Non-specific toxicity predictions for SARS-CoV-2 vaccine epitopes

Peptide toxicity predictions have been widely applied in facilitating the design of therapeutics with desired lack of toxicity [199]. Most prediction tools utilised ML approaches in classifying peptides based on their physicochemical, therapeutical, and toxicological properties [200], [201], [202], [203]. Amongst the number of available tools, ToxinPred predicts peptide toxicity using a SVM classifier trained on SwissProt and TrEMBL peptide sequence data [201]. In ToxinPred, various features can be selected from the training data including AA composition, dipeptide composition, binary profile pattern, and motif-based profile for toxicity predictions. It has a high accuracy of 94% (using dipeptide composition) and its web interface allows analogue generation with minimum mutations to design a non-toxic version of the peptide. The precision and convenience make ToxinPred the commonest tool for toxicity prediction in the published SARS-CoV-2 vaccine design studies [22], [204], [205], [206]. Alternative tools to ToxinPred are available. For example, ToxClassifier, a SVM-based classifier trained on their Tri-Blast Enhanced data, has the highest accuracy of 96% [202]. Unlike ToxinPred, ToxClassifier does not have a length limit of 30 AA. ToxClassifier also provides similar toxin or non-toxic sequence upon query submission. TOXIFY is a Recurrent Neural Networks-based classifier trained on UniProtKB protein sequences [203]. TOXIFY has a reported accuracy of 97.4% and shown superior to ToxClassifier in classifying non-toxic molecules. More recently, Jain and Kihara proposed NNTox, a neural network-based classifier trained on SwissProt data, for predicting 11 sub-classes of toxins. NNTox has an overall accuracy of 0.8 and can take DNA sequence as input. PredSTP is also available in predicting toxins based on the sequential tri-disulphide peptide from the Knottin database [207], [208]. As far as we are aware from existing publications, despite the reported improved performance of these alternative tools, only ToxinPred [209] has been used for toxicity prediction. This contrasts to the combinatorial methodology used for epitope predictions in SARS-CoV-2 vaccine design studies. By applying multiple toxicity tools, the more rigorous screening can eradicate more candidates with a potential safety hazard at the beginning stage of vaccine development.

Haemolytic toxicity predictions for SARS-CoV-2 vaccine epitopes

The above-mentioned tools provide prediction of all types of toxicity. They can be used to assist candidate-filtering in some of the conventional in vitro toxicity assays including lactate dehydrogenase (LDH) leakage, 3-(4,5-dimethyl-2-thiazolyl)-2, 5-diphenyl-2H-tetrazolium bromide (MTT), or adenosine triphosphate (ATP)-based assays. Other tools are available in indicating peptide’s haemolytic toxicity, the ability to damage the red blood cells. In conventional studies, such activity is measured by the amount of haemoglobin released from membrane-compromised erythrocytes [210]. With the advances in computational power, haemolytic activity can be predicted using tools such as HLPpred-Fuse [211], HemoPI [212], and HemoPImod [213]. Amongst the listed examples, HemoPI is constituted of HemoPI-1 (a dataset of 552 haemolytic peptides and 552 non-haemolytic peptides), HemoPI-2 (a set of peptides with low haemolytic potency, n = 462), and HemoPI-3 (a dataset with high haemolytic potency, n = 1,623) [212]. Users can select a SVM with/without motif-based models trained on all or specific dataset(s) according to their needs. For example, to filter a set of SARS-CoV-2 epitope candidates by their haemolytic potencies, the hybrid model trained on HemoPI-1 or HemoPI-2 should be chosen. HemoPI is available as a webserver and has been used in SARS-CoV-2 vaccine design studies [214], [215]. Similarly, HLPpred-Fuse was trained with haemolytic peptides using six ML classifiers, including SVM and random forests, to select 54 probabilistic features (e.g., binary profile, conjoint triad, and grouped dipeptide and tripeptide composition etc.) for the final prediction model [211]. The result is given as the haemolytic peptide activity and probability values. More recently, Kumar et al. [213] developed HemoPImod, a classification model based on 4 classifiers trained with multiple peptide features, including its 3D descriptors. The reported accuracy of HemoPImod is 78.29% and require prediction of modified structure with PEPstrMOD [216]. In contrast to HemoPI, both HLPpred-Fuse and HemoPImod appear not to have been applied in SARS-CoV-2 vaccine design studies, possibly due to their time of launch.

Vaccine component toxicity predictions with computational network analysis

Another potentially toxic ingredient constituted in any vaccine is the adjuvant. As mentioned in the Immunogenicity section, adjuvants are included to evoke an effective immune response mostly through innate or humoral immunity. Of the authorized SARS-CoV-2 vaccines, alum has been used as an adjuvant in CoronaVac [217]. Other common adjuvants include minerals (oil), diethylaminoethyl (DEAE)-dextran, enterotoxin, CpG oligodeoxynucleotide (ODN), and cytokines. The toxicity of the adjuvant itself is well-studied and documented during its design stage. However, a predictive platform has not yet been developed which takes into account that all vaccine components in combination can evoke higher toxicity [218], [219], [220]. Here, we hypothesise that a computational network analysis-based method can predict the toxicity of multiple vaccine components based on their protein-interactors. As indicated by Cheng et al. [221], network analysis on drug targets and disease proteins can be utilised in predicting efficacious combination therapies. In the study, the drug-target network was constructed based on the binding affinity data, inhibition constant/potency, dissociation constant, median effective concentration, or median inhibitory concentration from 6 databases (DrugBank [222], Therapeutic Target Database [223], PharmGKBdatabase [224], ChEMBL [225], BindingDB [226], and IUPHAR/BPS Guide to PHARMACOLOGY [227]). In the analysis, the adverse drug interactions were collected from DrugBank [222] and TWOSIDE [228] for adverse event-specific drugs. To study the potential side effects of a SARS-CoV-2 vaccine candidate, the binding affinity can be acquired through mass spectrometry-based proteomics with human cell in vitro. The data acquired can then be used to construct the protein interactome, in a similar method to Chent et al. [229]. Various network algorithms including Random Walk Restart and centralities can be used to extract key interacting proteins from the vaccine-protein interactome. The subsequent key proteins can deduct the potential side effects by consensus-filtering with SIDER database [230] or submission to toxicity-related protein database, such as DITOP [231]. Such approach assumes that the underlying mechanism for vaccine-induced toxicity would be similar to that of drug-induced toxicity. The method would capture cell-type specific toxicity which also presumes that the intracellular vaccine-protein interaction would have a systematic effect. Alternatively, one could measure the interactome by applying proteomics on animal models administrated with the vaccine. Nevertheless, the network analysis method may not replace any in vitro or in vivo studies but would serve as an auxiliary study in understanding the mechanism of any potential toxicity arises from the vaccine candidate injection.

Allergenicity prediction

Allergenicity measures the ability of a substance in inducing hypersensitivity. There are four types of hypersensitivity. In type I hypersensitivity, increased production of IgE primes mast cells to release granules, histamine, and cytokines, thus causing the allergy symptoms. Type I hypersensitivity is responsible for manifestation in asthma, allergic rhinitis, food allergies, and anaphylactic reactions to substances. Hence, IgE binding affinity is frequently used for allergenicity predictions. For type II hypersensitivity, the immunoglobulins produced are IgG and IgM with subsequent complement system activation leading to cell damage or cell lysis. Similarly, in type III hypersensitivity, IgM, IgG, and IgA production increased for polymorphonuclear leukocytes chemotaxis and induce tissue damage, whilst type IV are associated with Th1-mediated reactions where the symptoms are delayed, and macrophages or cytotoxic-T cells are activated in causing direct cellular damage.

IgE binding affinity predictions for SARS-CoV-2 vaccine epitopes

Similar to the technique used in epitope predictions, there are several ML-based classifiers available in predicting peptide binding affinity to IgE. Namely, AlgPred combines both SVM classifier trained on AA and dipeptide composition, motif-based method, IgE epitopes datasets, and BLAST search against allergen-representative peptides [56]. More recently, the latest version, AlgPred2.0 is trained on a larger dataset and has improved reported accuracy of 0.98 AUC [232]. AlgPred has been used in predicting the allergenicity of SARS-CoV-2 vaccine epitopes [204], [233], [234], [235], [236]. Another commonly used prediction tool is AllerTOP, a k-nearest neighbours classifier trained on 2,395 allergens [237]. AllerTOP was reported to have higher accuracy compared to AlgPred and is seen applied in Dar et al. [238], Behmard et al. [131], Ayyagari et al. [112], Dai et al. [76], and Das et al. [66]. Other popular prediction tools include AllerHunter [239], PREAL [240], ProAp [241], and AllergenFP [237]. It is beyond the scope of this review to benchmark these predictors. However, consensus-based filtering with multiple tools may decrease the chances of allergenicity amongst the epitope candidates, as employed in Sirohi et al. [204] and Behmard et al. [131]. Similarly, IgPred, a SVM-based B cell epitope prediction tool for predicting antibody-specific epitopes, can be used to eliminate candidates that tend to have a higher similarity to known-IgE epitopes [201].

Predicting vaccine-induced enhanced disease

Computational modelling of immunogenicity can also identify those who are least benefited by the COVID-19 vaccine candidate. Recent reports have suggested that the SARS-CoV-2 vaccine may be susceptible to inducing vaccine-induced enhanced disease– a condition reported being possibly caused by T helper 17 cells (Th17-type immune response) or through antibody-mediated manner [242], [243]. The immune enhancement can magnify SARS-CoV-2 infection and increase the severity of COVID-19 [243]. Although the condition is more commonly studied using in vitro and in vivo models [197], [243], one may apply mathematical models in eliciting molecular and cellular conditions related to the enhanced disease. For example, ODEs have been applied for predicting the likelihood of antibody-mediated enhancement in Influenza [197]. On the other hand, Carbo et al.’s method in predicting the probability of Th17-type immunogenicity could be adapted. Through these methods, the immune enhancement-associated host factors can be identified and serve as a predictive biomarker for the vaccine in preventing disease exacerbation [244]. More importantly, the enhancement-associated properties of the vaccine can be identified and used to refine the vaccine design.

Discussion

Computational approaches can be employed to aid vaccine development in several ways and can accelerate the process (Fig. 2 ). While typical traditional vaccine development takes ~ 10 years, computational techniques could speed up this process and also reduce the associated costs. There has been a concerted, global effort to develop vaccines for COVID-19, which has recently seen three different vaccines approved with many more in the pipeline, and although computational approaches have been exploited in this effort, several techniques remain underutilised. In this review, we covered antigen prediction, epitope prediction, adjuvant selection and toxicology and allergenicity prediction, and the applicability of ADME modelling in vaccine design. Fig. 2 also includes other approaches that can stimulate the vaccine development process. This includes vaccine efficacy prediction during the clinical trial phases and multi-omics analysis to further our understanding of the disease and the vaccine.

Fig. 2

Summary of the advantages using computational approaches in vaccine development. Computational approaches can accelerate vaccine development at various stages. The top panel shows a traditional vaccine development process that requires at least 10 years of research and validation. In the centre, the diagram shows how different computational approaches (texts in teal) spur the vaccine development process. In the bottom panel, the timeline at the vaccine design stage has been enlarged. The box summarises the prospectives and challenges our review has proposed at each stage of our suggested computational-assisted vaccine design tools. Processes that have been associated as parts of reverse vaccinology are represented in filled-gold boxes. Computational techniques for vaccine design have been used to select antigens, predict B and T cell epitopes, and evaluate these potential epitopes to ensure conservation across mutations in the virus and across the human population, and to predict immunogenicity and toxicity. There are other computational techniques which could aid vaccine design for COVID-19 which have so far been underutilised. Some research groups have studied the immune signature of moderate and severe COVID-19 patient cohorts to identify features which classify these cohorts [35], [245], [246]. The activities of the antigen-specific T cells, virus-neutralising antibodies, and the antigen presenting cells during SARS-CoV-2 infection can elicit the underlying protective mechanisms, thus these parameters that can influence vaccine epitope immunogenicity [162]. Attempts to develop COVID-19 vaccines have tended to focus on preventing viral entry into cells. For this reason, the S protein has been the focus for antigen selection. However, other proteins could be promising, including those which induce the adaptive immune system; our meta-analysis of four transcriptomics datasets suggests nsp16 warrants further investigation as a potential antigen for SARS-CoV-2, based on its interaction with proteins involved in host immune response. This may especially be interesting if any of the mutations which arise in the S protein render currently approved vaccines ineffective. In such a scenario, alternative antigen selection would be necessary to deal with the new variants. Multiple groups have applied reverse vaccinology to predict epitopes for SARS-CoV-2 based on the AA structures. An issue with these approaches is that most studies use the same tools, mostly relying on web server software to predict epitopes. This over-reliance on specific tools/pipelines leads to a lack of diversity of approach which could mean promising epitopes are missed. Although newer, and indeed more accurate models have been developed, there has been a lack of uptake of these. There is also an urgent need for a systemic benchmarking of using different epitope prediction tools in combination. Similarly, many studies have utilised structural databases for predicting epitopes with decreased risks of allergenicity and toxicology. However, there is currently no approach in predicting toxicity of all vaccine components in combination computationally, which may be achievable through computational network analysis. PK studies are rarely carried out in vaccines. However, the importance of PK in vaccine safety and efficacy is well recognised [167]. This is especially relevant in subunit vaccines, the largest platform of COVID-19 vaccines at the moment, where the biodistribution of both adjuvant and antigen need to be well-timed together. However, modelling which has been applied looks at either antigen or adjuvant alone. Recently, many studies [145], [179] advocate the use of in silico PK modelling in vaccine design. In the past, it has been shown it is possible to extrapolate from animal models to study adjuvant PK profiles in humans. There are also multiple immune response models studying human immunogenicity. PBPK/PD may not have been utilised in vaccine development, including COVID-19, yet, but has the potential to lower antigen doses through better informed adjuvant selection, accelerate the transition from preclinical animal models to human subjects, and make predictions for special populations that still remain underrepresented in modern clinical trials. It is worth exploring these techniques and developing robust computational vaccine design strategies for COVID-19. If the unfortunate situation arises in which SARS-CoV-2 variants emerged with resistance to immunity induced by the approved vaccines [247], our proposed in silico approaches could accelerate the development of a vaccine against the new variant lineage. Specifically, antigen prediction based on cellular immune response could suggest alternative antigens for a new vaccine and the proposed immunogenicity and ADME prediction models could indicate if the appropriate subcomponents of the immune system would be activated to secure an effective immunisation. Additionally, these strategies should be ready to be deployed against further zoonotic diseases as they emerge. The increase in contact between humans and animals due to the continued expansion of human society and the reduction of animal habitats is expected to result in similar viral infections emerging in the future. This is an era in which a system for fast and safe vaccine development is needed more than ever.

259 in total

1. Prediction of MHC class I binding peptides using profile motifs.

Authors: Pedro A Reche; John-Paul Glutting; Ellis L Reinherz
Journal: Hum Immunol Date: 2002-09 Impact factor: 2.850

2. B Cells Are the Dominant Antigen-Presenting Cells that Activate Naive CD4⁺ T Cells upon Immunization with a Virus-Derived Nanoparticle Antigen.

Authors: Sheng Hong; Zhimin Zhang; Hongtao Liu; Meijie Tian; Xiping Zhu; Zhuqiang Zhang; Weihong Wang; Xuyu Zhou; Fuping Zhang; Qing Ge; Bing Zhu; Hong Tang; Zhaolin Hua; Baidong Hou
Journal: Immunity Date: 2018-10-02 Impact factor: 31.745

3. Humoral and circulating follicular helper T cell responses in recovered patients with COVID-19.

Authors: Jennifer A Juno; Hyon-Xhi Tan; Stephen J Kent; Adam K Wheatley; Wen Shi Lee; Arnold Reynaldi; Hannah G Kelly; Kathleen Wragg; Robyn Esterbauer; Helen E Kent; C Jane Batten; Francesca L Mordant; Nicholas A Gherardin; Phillip Pymm; Melanie H Dietrich; Nichollas E Scott; Wai-Hong Tham; Dale I Godfrey; Kanta Subbarao; Miles P Davenport
Journal: Nat Med Date: 2020-07-13 Impact factor: 53.440

4. Computational perspectives revealed prospective vaccine candidates from five structural proteins of novel SARS corona virus 2019 (SARS-CoV-2).

Authors: Rajesh Anand; Subham Biswal; Renu Bhatt; Bhupendra N Tiwary
Journal: PeerJ Date: 2020-09-29 Impact factor: 2.984

5. Macrophagic myofasciitis: an emerging entity. Groupe d'Etudes et Recherche sur les Maladies Musculaires Acquises et Dysimmunitaires (GERMMAD) de l'Association Française contre les Myopathies (AFM).

Authors: R K Gherardi; M Coquet; P Chérin; F J Authier; P Laforêt; L Bélec; D Figarella-Branger; J M Mussini; J F Pellissier; M Fardeau
Journal: Lancet Date: 1998-08-01 Impact factor: 79.321

6. MimoPro: a more efficient Web-based tool for epitope prediction using phage display libraries.

Authors: Wen Han Chen; Ping Ping Sun; Yang Lu; William W Guo; Yan Xin Huang; Zhi Qiang Ma
Journal: BMC Bioinformatics Date: 2011-05-25 Impact factor: 3.307

7. Pepitope: epitope mapping from affinity-selected peptides.

Authors: Itay Mayrose; Osnat Penn; Elana Erez; Nimrod D Rubinstein; Tomer Shlomi; Natalia Tarnovitski Freund; Erez M Bublil; Eytan Ruppin; Roded Sharan; Jonathan M Gershoni; Eric Martz; Tal Pupko
Journal: Bioinformatics Date: 2007-10-31 Impact factor: 6.937

Review 8. Vaccine Adjuvants: from 1920 to 2015 and Beyond.

Authors: Alberta Di Pasquale; Scott Preiss; Fernanda Tavares Da Silva; Nathalie Garçon
Journal: Vaccines (Basel) Date: 2015-04-16

9. Basic concepts in physiologically based pharmacokinetic modeling in drug discovery and development.

Authors: Hm Jones; K Rowland-Yeo
Journal: CPT Pharmacometrics Syst Pharmacol Date: 2013-08-14

10. Reverse vaccinology approach to design a novel multi-epitope vaccine candidate against COVID-19: an in silico study.

Authors: Maryam Enayatkhani; Mehdi Hasaniazad; Sobhan Faezi; Hamed Gouklani; Parivash Davoodian; Nahid Ahmadi; Mohammad Ali Einakian; Afsaneh Karmostaji; Khadijeh Ahmadi
Journal: J Biomol Struct Dyn Date: 2020-05-02

7 in total

Review 1. Artificial Intelligence-Based Data-Driven Strategy to Accelerate Research, Development, and Clinical Trials of COVID Vaccine.

Authors: Ashwani Sharma; Tarun Virmani; Vipluv Pathak; Anjali Sharma; Kamla Pathak; Girish Kumar; Devender Pathak
Journal: Biomed Res Int Date: 2022-07-06 Impact factor: 3.246

Review 2. Resources and computational strategies to advance small molecule SARS-CoV-2 discovery: lessons from the pandemic and preparing for future health crises.

Authors: Natesh Singh; Bruno O Villoutreix
Journal: Comput Struct Biotechnol J Date: 2021-04-26 Impact factor: 7.271

Review 3. COVID-19 vaccines development in Africa: a review of current situation and existing challenges of vaccine production.

Authors: Emmanuel Lamptey; Ephraim Kumi Senkyire; Dooshima Aki Benita; Evans Osei Boakye
Journal: Clin Exp Vaccine Res Date: 2022-01-31

Review 4. Anti-COVID-19 Nanomaterials: Directions to Improve Prevention, Diagnosis, and Treatment.

Authors: Mohammad Souri; Mohsen Chiani; Ali Farhangi; Mohammad Reza Mehrabi; Dariush Nourouzian; Kaamran Raahemifar; M Soltani
Journal: Nanomaterials (Basel) Date: 2022-02-25 Impact factor: 5.076

5. Development of multivalent vaccine targeting M segment of Crimean Congo Hemorrhagic Fever Virus (CCHFV) using immunoinformatic approaches.

Authors: Maaza Sana; Aneela Javed; Syed Babar Jamal; Muhammad Junaid; Muhammad Faheem
Journal: Saudi J Biol Sci Date: 2021-12-10 Impact factor: 4.052

6. COVID-19 vaccination: Is it a matter of concern?

Authors: Poonam Kushwaha; Ashish Pundhir; Anju Gahlot
Journal: J Family Med Prim Care Date: 2022-06-30

7. A Hybrid Model Based on Improved Transformer and Graph Convolutional Network for COVID-19 Forecasting.

Authors: Yulan Li; Kun Ma
Journal: Int J Environ Res Public Health Date: 2022-09-30 Impact factor: 4.614

7 in total