Literature DB >> 33465451

In silico T cell epitope identification for SARS-CoV-2: Progress and perspectives.

Muhammad Saqib Sohail¹, Syed Faraz Ahmed¹, Ahmed Abdul Quadeer², Matthew R McKay³.

Abstract

Growing evidence suggests that T cells may play a critical role in combating severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Hence, COVID-19 vaccines that can elicit a robust T cell response may be particularly important. The design, development and experimental evaluation of such vaccines is aided by an understanding of the landscape of T cell epitopes of SARS-CoV-2, which is largely unknown. Due to the challenges of identifying epitopes experimentally, many studies have proposed the use of in silico methods. Here, we present a review of the in silico methods that have been used for the prediction of SARS-CoV-2 T cell epitopes. These methods employ a diverse set of technical approaches, often rooted in machine learning. A performance comparison is provided based on the ability to identify a specific set of immunogenic epitopes that have been determined experimentally to be targeted by T cells in convalescent COVID-19 patients, shedding light on the relative performance merits of the different approaches adopted by the in silico studies. The review also puts forward perspectives for future research directions.

Entities: Chemical

Keywords: Allergenicity; COVID-19; Computational prediction; Coronavirus; Immunogenicity; Immunoinformatics; Peptide-HLA binding; Reverse vaccinology; SARS-CoV; Toxicity

Mesh：

Substances：

Year: 2021 PMID： 33465451 PMCID： PMC7832442 DOI： 10.1016/j.addr.2021.01.007

Source DB: PubMed Journal: Adv Drug Deliv Rev ISSN： 0169-409X Impact factor: 17.873

Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of the ongoing COVID-19 pandemic, has resulted in major loss of human life worldwide. At present, prior to the mass distribution of functioning vaccines, governments have mainly resorted to traditional public health measures to control the spread of COVID-19, such as social distancing, quarantine, mask mandates, and community containment. The quest for an effective COVID-19 functioning vaccine has been pursued at an unprecedented pace, with two vaccines already being authorized for emergency use by the U.S. Food and Drug Administration (FDA) [1] (in less than a year), and more than 300 vaccine candidates currently under trial [2]. Vaccines can stimulate both cellular and humoral responses, mediated by T cells and antibodies respectively. Thus far, the majority of COVID-19 vaccine candidates (including the approved ones) are based on the spike (S) protein [2], which is the major target of neutralizing antibodies [[3], [4], [5]]. However, there is growing evidence to suggest that stimulating broad T cell responses via vaccines may also be important [6,7]. This is supported by SARS-CoV-2 immunological studies reporting robust T cell responses to be correlated with favourable disease outcome [[8], [9], [10]] and being potentially long-lasting [8]. Emerging data also suggests the coordinated response of neutralizing antibodies and T cells to be protective against COVID-19 [2,11]. Systematic longitudinal SARS-CoV-2 immunological studies on larger cohorts will help to further corroborate these findings, but nevertheless, they support the notion that T cells, may play a prominent role in conferring protection against COVID-19. There is a large diversity of SARS-CoV-2 vaccines currently under trial and these have been developed using both conventional and modern approaches [12]. Vaccines developed using conventional approaches, based on using a live-attenuated or inactivated virus, have been effective in preventing infectious diseases caused by viruses such as polio, measles, rubella, chickenpox, and many others. These approaches, however, have not produced successful vaccines against other lethal viruses such as human immunodeficiency virus (HIV), hepatitis C virus, and dengue virus. Modern vaccine technologies (e.g., peptide, subunit, DNA/RNA) provide promising alternatives. In fact, for the case of SARS-CoV-2, the majority of vaccines in the advanced phase II or III of clinical trials are based on modern vaccine approaches [2]. These approaches introduce specific parts of the virus or their genes into the body to produce a targeted immune response. As a natural infection pathway is not followed, they can suffer from weak immunogenicity, though immunogenicity may be boosted by augmenting with appropriately developed adjuvants [13,14]. Like for other viruses, T cells get activated by recognizing SARS-CoV-2 peptides (short linear amino acid sequences) presented on the surface of infected cells via human leukocyte antigen (HLA) molecules (Fig. 1 ). The goal of a COVID-19 vaccine is to mimic this procedure by stimulating immune cells that specifically recognize SARS-CoV-2, thereby preparing the immune system to combat the virus upon natural infection. A common challenge faced by modern vaccine approaches is to identify specific SARS-CoV-2 peptides that are capable of inducing a robust and protective T cell immune response. This is an important task in the overall vaccine development pipeline as it provides the basic recommendations for all downstream efforts to be subsequently pursued in vaccine synthesis, laboratory testing, and clinical trials. Experimentally identifying SARS-CoV-2 peptides that elicit T cell responses is difficult, due in part to the enormous number of possible choices to test, and to the high genetic variability of major histocompatibility complex (MHC) genes that code for HLA molecules.

Fig. 1

Schematic illustration of T cell responses against SARS-CoV-2 and T cell epitope prediction using in silico approaches. (A) Viral peptides, derived from SARS-CoV-2 proteins after multiple intra-cellular processing steps, are presented on the surface of infected cells and antigen presenting cells via HLA class I and class II molecules, respectively. Naïve T cells, specialized in distinguishing foreign-peptides from self-peptides via training in the thymus, scan these peptide-HLA complexes to determine if the peptides belong to a foreign microbe. Recognition of a foreign-peptide leads to activation, proliferation, and differentiation of naïve T cells into effector cells. There are two main types of effector T cells: CD8+ T cells (or cytotoxic T lymphocytes; CTLs) that get activated by viral peptides bound to HLA class I molecules and help in killing the SARS-CoV-2 infected cells (right panel), while CD4+ T cells (or helper T lymphocytes) get activated by peptides bound to HLA class II molecules and help in further enhancing SARS-CoV-2-specific CD8+ T cell and antibody responses (left panel). These adaptive immune cells, activated by peptide-HLA complexes, can collectively mount a potent immune response against SARS-CoV-2. (B) In silico approaches analyze SARS-CoV-2 protein sequences to predict a number of potential HLA-I and HLA-II epitopes that can be used to guide experiments to characterize T cell responses in COVID-19 patients and to inform SARS-CoV-2 vaccine design. While each person has 12 unique types of HLA alleles, currently more than 27,000 known HLA alleles are listed in the immune polymorphism database [15], and these vary in their peptide binding specificities. With the availability of a large amount of data related to peptide-HLA binding, numerous attempts to solve the problem of T cell epitope identification (i.e., predicting peptides capable of eliciting T cell response) have been proposed that leverage this data through in silico methods [[16], [17], [18], [19]]. For SARS-CoV-2, very soon after the first genetic sequences were made available in January 2020, in silico methods began to be employed to predict and recommend T cell epitopes as potential targets for a SARS-CoV-2 vaccine (Fig. 1). In addition to guiding vaccine development, many of these predictions have been helpful in informing experimental studies directed towards understanding immune responses naturally elicited in convalescent COVID-19 patients (Fig. 1). This review discusses the rationale and features of the in silico methods and tools that have been employed so far for SARS-CoV-2 T cell epitope prediction. As we describe, a diverse set of computational techniques have been employed, often exploiting machine learning approaches, and in some cases exploiting the expected cross-reactivity of epitopes between genetically similar viruses. These in silico methods and tools have often been developed independently and in many cases have been trained using datasets related to other viruses or other microbes, thereby making it difficult to understand the relative performance of the epitope predictions for SARS-CoV-2. To help shed light on these questions, this review presents a comparison of the predictions of 61 SARS-CoV-2 in silico studies, revealing commonalities and differences among the specific SARS-CoV-2 epitopes predicted by different methods. We also assess and compare the predictions when applied to emerging data from nine experimental studies that have identified SARS-CoV-2 T cell epitopes targeted in convalescent COVID-19 patients. Insights into the current state of SARS-CoV-2 T cell epitope prediction are also put forward, together with perspectives on future research directions and opportunities.

In silico methods used for SARS-CoV-2 T cell epitope prediction

We queried PUBMED on 8 September 2020 using the search terms “T cell, covid-19, epitopes, computational, and in silico”, which produced a list of 40 publications. After excluding those that did not report SARS-CoV-2 epitopes, this list was reduced to 31 publications (entries 1 to 31 in Table 1 ). Using the same search terms in Google Scholar on 8 September 2020, we gathered an additional 34 publications, giving a total of 65 SARS-CoV-2 in silico epitope prediction studies (Table 1). These studies can be broadly grouped into two classes based on their rationale for epitope prediction: those that predict SARS-CoV-2 epitopes using SARS-CoV immunological data by exploiting the genetic similarity between the two viruses (Ahmed et al. [20], Lee et al. [21], Grifoni et al. [22], and Ranga et al. [23]), and those that apply peptide-HLA binding prediction methods (the remaining 61 studies). We review and discuss each of these approaches in the following.

Table 1

List of reviewed in silico SARS-CoV-2 T cell epitope prediction studies.

No.	Study labela	HLA-I epitope predictionb	HLA-II epitope predictionc	Immunogenicityd	IFN-γ productione	Conservation	Allergenicityf	Toxicityg	Autoimmunity	Vaccine construct
1	Ahmed2020 [20]	Using SARS-CoV immunological data	Using SARS-CoV immunological data	-	-	Y	-	-	-	-
2	Grifoni2020h [22]	Using SARS-CoV immunological data, NetMHCpan-4.0	Using SARS-CoV immunological data, Tepitool	-	-	-	-	-	-	-
3	Ranga2020 [23]	Using SARS-CoV immunological data, NetCTL-1.2	-	-	-	-	-	-	-	-
4	Lee2020h [21]	Using SARS-CoV immunological data, NetMHCpan-4.0	Using SARS-CoV immunological data	iPred	-	-	-	-	-	-
5	Baruah2020 [112]	NetCTL-1.2, NetChop, CTLPred	-	-	IFNepitope	-	-	-	-	-
6	Crooke2020 [96]	NetCTL-1.2, NetMHCpan-4.0	NetMHCIIpan-3.2	Vaxijen-2.0	-	-	AllerCatPro	ToxinPred	-	-
7	Ojha2020 [113]	NetCTL-1.2	IEDB (method NSi)	-	-	-	-	-	-	Y
8	Wang2020 [114]	NetMHCpan-4.0	IEDB (recommended)	Vaxijen-2.0	-	-	-	-	Y	-
9	Poran2020 [115]	HLAthena	NeonMHC2	Response against few predicted epitopes tested in recovered patients	-	-	-	-	Y	-
10	UlQamar2020 [116]	IEDB (consensus)	IEDB (consensus)	Vaxijen-2.0	-	-	AllerTOP-2.0	NS	-	Y
11	Gupta2020vr [117]	NetCTLpan-1.1	IEDB (recommended)	Vaxijen-2.0	IFNepitope	-	AllerTOP-2.0	ToxinPred	-	-
12	Enayatkhani2020 [118]	RANKPEP	RANKPEP		-	-	AllerTOP-2.0	-	-	Y
13	Ong2020 [119]	Vaxign, IEDB (consensus)	Vaxign, IEDB (consensus)	-	-	-	-	-	Y	-
14	Abdelmageed 2020 [120]	IEDB (consensus)	IEDB (recommended)	-	-	-	-	-	-	-
15	Mukherjee2020 [121]	Tepitool, NetMHCpan-4.0, nHLAPred, CTLPred	Tepitool	Vaxijen-2.0	-	Y	AllerTOP-2.0, AlgPred	ToxinPred	Y	-
16	Vashi2020 [122]	IEDB (method NS)	IEDB (method NS)	-	-	Y	-	-	-	-
17	Ahmad2020 [123]	MHCPred	MHCPred	Vaxijen-2.0	-	-	AllerTOP-2.0	-	Y	Y
18	Naz2020 [124]	Tepitool	IEDB (recommended)	Vaxijen-2.0,Calis et al.	-	-	AllerTOP-2.0	-	-	Y
19	Chen2020 [125]	NetMHCpan-4.0	IEDB (recommended)	Vaxijen-2.0,Calis et al.	-	-	AllerTOP-2.0	ToxinPred	-	Y
20	Martin2020 [126]	NetCTL-1.2	IEDB (recommended)	Vaxijen-2.0	IFNepitope	-	AllerTOP-2.0	ToxinPred	-	Y
21	Dong2020 [127]	NetCTL-1.2	IEDB (consensus)	-	IFNepitope	-	-	-	-	Y
22	Ghafouri2020 [128]	IEDB (method NS)	IEDB (method NS)	Vaxijen-2.0	-	-	AllerTOP-2.0	ToxinPred	-	Y
23	Banerjee2020 [129]	NetCTL-1.2	NetMHCII-2.3	-	-	-	-	-	-	Y
24	Samad2020 [130]	NetCTL-1.2	IEDB (consensus)	Vaxijen-2.0,Calis et al.	IFNepitope	-	AllerTOP-2.0	ToxinPred	-	Y
25	Bhatnager2020 [131]	NetMHCpan-4.0, CTLPred	IEDB (recommended 2.2)		IFNepitope	-	AlgPred, AllergenFP	NS	-	Y
26	Devi2020 [132]	NetCTL-1.2	IEDB (consensus)	Vaxijen-2.0,	IFNepitope	-	AllerTOP-2.0	ToxinPred	-	Y
27	AbrahamPeele2020 [133]	NetCTL-1.2	IEDB (method NS)	Vaxijen-2.0,Calis et al.	IFNepitope	-	AllerTOP-2.0	ToxinPred	-	Y
28	Ismail2020 [134]	NetMHC-4.0, , MHCPred	IEDB (consensus), MHCPred	Vaxijen-2.0	IFNepitope	-	AllerTOP-2.0	ToxinPred	Y	Y
29	Jakhar2020 [135]	NetCTL-1.2, IEDB (method NS)	NetMHCIIpan-3.0	Vaxijen-2.0,Calis et al.	IFNepitope	Y	-	ToxinPred	-	Y
30	Panda2020 [136]	NetCTL-1.2	-	Vaxijen-2.0	-	-	-	-	-	-
31	Campbell2020 [137]	pVACtools	pVACtools	-	-	-	-	-	-	-
32	Tilocca2020 [138]	IEDB (method NS)	IEDB (method NS)	-	-	-	-	-	-	-
33	Santoni2020 [139]	NetMHC-4.0, NetCTL-1.2	-	-	-	-	-	-	Y	-
34	Dijkstra2020 [140]	NetMHC	-	-	-	-	-	-	-	-
35	Prachar2020 [83]	NetMHC-4.0	NetMHCII-2.3	-	-	-	-	-	-	-
36	Ramaiah2020 [141]	-	IEDB (consensus)	-	-	-	-	-	-	-
37	Gupta2020 [142]	NetMHCpan-4.0	Sturniolo method	Vaxijen-2.0, Calis et al.			AllerTOP-2.0	ToxinPred
38	Srivastava2020 [143]	IEDB (consensus)	SMM-align, Sturniolo method	-	IFNepitope	-		ToxinPred	-	Y
39	Mitra2020 [144]	NetMHC-4.0, NetCTL-1.2, IEDB (consensus)	MHCPred, NetMHCIIpan-3.2, IEDB (consensus)	Vaxijen-2.0	IFNepitope	-	AllerTOP, AlgPred	ToxinPred	Y	Y
40	Singh2020 [145]	NetCTL-1.2, IEDB (consensus)	NetMHCIIpan-3.2	Vaxijen-2.0	IFNepitope	-	AllerTOP-2.0		Y	Y
41	Saha2020 [146]	ProPred1	ProPred	Vaxijen-2.0	-	-	-	-	-	Y
42	Nerli2020 [147]	NetMHCpan-4.0	-	Electrostatic surface potential	-	-	-	-	-	-
43	Liu2020 [108]	NetMHCpan-4.0, MHCflurry	NetMHCIIpan-4.0	-	-	-	-	-	Y	-
44	Khan2020 [148]	NetCTL-1.2	PREDIVAC	Calis et al.	-	-	AlgPred	ToxinPred	Y	-
45	Banerjee2020a [149]	-	IEDB (method NS)	Vaxijen-2.0	-	-	-	-	-	-
46	Bojin2020 [150]	IEDB (method NS)	IEDB (method NS)	-	-	-	-	-	-	-
47	NazneenAkhand 2020 [151]	IEDB (method NS)	IEDB (method NS)	Vaxijen-2.0	IFNepitope	-	AllergenFP, AllerTOP	ToxinPred	-	Y
48	Feng2020 [152]	NetMHCpan, iNeo-Pred	-	NS	-	-	-	NS	Y	-
49	Bhattacharya2020 [153]	ProPred1	ProPred	Vaxijen-2.0	-	-	-	-	-	Y
50	Chauhan2020 [154]	NetCTL-1.2,	IEDB (consensus), NetMHCIIpan-3.2	Vaxijen-2.0	IFNepitope	-	AlgPred, AllerTOP-2.0	-	-	Y
51	Fast2020 [155]	NetMHCpan-4.0	MARIA	-	-	-	-	-	-	-
52	Joshi2020 [156]	NetMHC-4.0, MHCPred	NetMHCIIpan-3.2, MHCPred	Vaxijen-2.0	-	-	-	ToxinPred	-	-
53	Kar2020 [157]	NetCTL-1.2, IEDB (consensus)	NetMHCIIpan-3.2	Vaxijen-2.0,Calis et al.	IFNepitope	-	AllerTOP-2.0, AllergenFP	-	-	Y
54	Qamar2020 [158]	IEDB (consensus)	IEDB (consensus)	Vaxijen-2.0	-	Y	AllergenFP	ToxinPred	-	-
55	Ahammad2020 [159]	NetCTL-1.2	IEDB (consensus)	Vaxijen-2.0,Calis et al.	IFNepitope	-	AllerTOP-2.0	ToxinPred	Y	Y
56	Kiyotani2020 [160]	NetMHCpan-4.0, NetMHC-4.0	NetMHCIIpan-3.1	-	-	-	-	-	-	-
57	Sarkar2020 [161]	NetMHCpan-4.0	IEDB (Sturniolo)	Vaxijen-2.0	-	-	AllerTOP-2.0, AllergenFP	ToxinPred	Y	Y
58	Romero-Lopez2020 [162]	Tepitool	IEDB (recommended 2.2)	Calis et al.	-	-	-	-	-	-
59	Sanami2020 [163]	ProPred1	ProPred	Vaxijen-2.0	IFNepitope	-	AllerTOP-2.0	ToxinPred	Y	Y
60	Kalita2020 [164]	NetCTL-1.2, IEDB (method NS)	NetMHCIIpan-3.0	-	IFNepitope	-	-	ToxinPred	-	Y
61	Rahman2020 [165]	SMM	SMM-align	-	IFNepitope		-	-	-	Y
62	Lin2020 [166]	IEDB (consensus)	IEDB (method NS)	Vaxijen-2.0	-	-	AllergenFP	ToxinPred	-	-
63	Yarmarkovich2020 [167]	NetMHC-4.0	NetMHCII-2.3	-	-	-	-	-	Y	-
64	Yazdani2020 [168]	NetMHC-4.0, CTLPred	IEDB (consensus), RANKPEP	Vaxijen-2.0	IFNepitope	-	AllergenFP	-	-	Y
65	Lucchese2020 [169]	Identified pentamers from the proteome	-	-	-	-	-	-	Y	-

Studies that used SARS-CoV immunological data for prediction [[1], [2], [3], [4]] are shown in bold font, while all other studies that used peptide-HLA binding prediction methods are in regular font. In silico studies [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30]] were obtained from PubMed, while the remaining studies [[31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65]] were obtained by searching Google Scholar.

HLA-I epitope prediction: ProPred1 [54], SMM [58,59], CTLPred [51], Tepitool [66], nHLAPred [67], NetMHC-4.0 [40], NetChop [48], NetCTL-1.2 [49], NetMHCpan [37], HLAthena [17], MHCflurry [41], NetMHCpan-4.0 [16], iNeo-Pred [152], Vaxign [57], RANKPEP [52], pVACtools [65], NetMHC [36], NetCTLpan-1.1 [39], MHCPred [62], IEDB (consensus) [63].

HLA-II epitope prediction: MHCPred [62], NeonMHC2 [19], NetMHCII-2.3 [42], NetMHCIIpan-3.0 [43], NetMHCIIpan-3.1 [44], NetMHCIIpan-3.2 [42], NetMHCIIpan-4.0 [18], SMM-align [60], RANKPEP [52], Sturniolo method [61], ProPred [55], pVACtools [65], PREDIVAC [56], IEDB (recommended) [43], MARIA [45], IEDB (consensus) [64], Vaxign [57], Tepitool [66], IEDB (recommended 2.2) [43].

Immunogenicity: Vaxijen-2.0 [79], Calis et al. [80], iPred [81]; Electrostatic surface potential [147].

IFN-γ production: IFNepitope [85].

Allergenicity: AlgPred [91], AllerTOP [92], AllerTOP-2.0 [93], AllergenFP [94], AllerCatPro [95].

Toxicity: ToxinPred [98].

Lee2020 [21] and Grifoni2020 [22] reported two sets of epitopes: one set based on using SARS-CoV immunological data and the other using NetMHCpan-4.0. For the purpose of this analysis, we have only considered the set of epitopes predicted using SARS-CoV immunological data.

NS: Not specified.

List of reviewed in silico SARS-CoV-2 T cell epitope prediction studies. Studies that used SARS-CoV immunological data for prediction [[1], [2], [3], [4]] are shown in bold font, while all other studies that used peptide-HLA binding prediction methods are in regular font. In silico studies [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30]] were obtained from PubMed, while the remaining studies [[31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65]] were obtained by searching Google Scholar. HLA-I epitope prediction: ProPred1 [54], SMM [58,59], CTLPred [51], Tepitool [66], nHLAPred [67], NetMHC-4.0 [40], NetChop [48], NetCTL-1.2 [49], NetMHCpan [37], HLAthena [17], MHCflurry [41], NetMHCpan-4.0 [16], iNeo-Pred [152], Vaxign [57], RANKPEP [52], pVACtools [65], NetMHC [36], NetCTLpan-1.1 [39], MHCPred [62], IEDB (consensus) [63]. HLA-II epitope prediction: MHCPred [62], NeonMHC2 [19], NetMHCII-2.3 [42], NetMHCIIpan-3.0 [43], NetMHCIIpan-3.1 [44], NetMHCIIpan-3.2 [42], NetMHCIIpan-4.0 [18], SMM-align [60], RANKPEP [52], Sturniolo method [61], ProPred [55], pVACtools [65], PREDIVAC [56], IEDB (recommended) [43], MARIA [45], IEDB (consensus) [64], Vaxign [57], Tepitool [66], IEDB (recommended 2.2) [43]. Immunogenicity: Vaxijen-2.0 [79], Calis et al. [80], iPred [81]; Electrostatic surface potential [147]. IFN-γ production: IFNepitope [85]. Allergenicity: AlgPred [91], AllerTOP [92], AllerTOP-2.0 [93], AllergenFP [94], AllerCatPro [95]. Toxicity: ToxinPred [98]. Lee2020 [21] and Grifoni2020 [22] reported two sets of epitopes: one set based on using SARS-CoV immunological data and the other using NetMHCpan-4.0. For the purpose of this analysis, we have only considered the set of epitopes predicted using SARS-CoV immunological data. NS: Not specified.

Methods that exploit immunological data of SARS-CoV

Compared to other human coronaviruses, early studies using phylogenetic analysis [24,25] suggested SARS-CoV-2 to be most similar to SARS-CoV, the virus that caused the 2003 SARS outbreak. In fact, the genetic similarity of SARS-CoV-2 and SARS-CoV was found to be quite high in the structural proteins (~76% in S and >90% in N, M, and E proteins) [20], which were known to induce robust and long-lasting T cell immunity against SARS-CoV [26]. Motivated by the high genetic similarity of SARS-CoV with SARS-CoV-2, multiple in silico studies [[20], [21], [22], [23]] used the information of T cell epitopes available from previous SARS-CoV immunological studies to predict likely targets of SARS-CoV-2 T cell responses (Table 1). This approach is well motivated, with cross-reactive T cell epitopes being reported previously for genetically similar viruses [27], including zika virus, dengue virus, and other flaviviruses [[28], [29], [30], [31]]. Interestingly, this was also the basic idea behind the first successful vaccine against an infectious disease, developed by Jenner in 1796, which induced protective immunity against the smallpox virus through inoculation with a related cowpox virus. For identifying the SARS-CoV T cell epitopes likely to generate cross-reactive immune responses against SARS-CoV-2, Ahmed et al. [20] scanned all available SARS-CoV T cell epitopes in the ViPR database [32] that had been determined previously by using experimental positive HLA binding or T cell assays, and identified epitopes that had an exact match in the SARS-CoV-2 sequences available at that time. Subsequent studies by Lee et al. [21], Grifoni et al. [22], and Ranga et al. [23] used the SARS-CoV epitopes as well as peptide-HLA binding prediction methods (discussed in the next subsection) and proposed the common ones as potential SARS-CoV-2 epitopes. The sequence data of SARS-CoV-2 continues to be deposited into public databases at an unprecedented rate, with the number of complete genome sequences available for SARS-CoV-2 in the GISAID database exceeding 65,000 (as of September 2020), much greater than that available for many other viruses (e.g., ~12,900 whole genome sequences are publicly available for HIV [33], and ~4400 for the hepatitis C virus [34]). Taking advantage of this data, the authors of [20] later proposed a web-based platform, COVIDep [35], for reporting SARS-CoV T cell epitopes as potential vaccine targets for SARS-CoV-2 based on the latest sequence data available. Compared to [20], which reported SARS-CoV epitopes that were fully conserved within the SARS-CoV-2 sequences available in February 2020, COVIDep provides a parameter that enables identification of SARS-CoV epitopes that are identical among a desired fraction of the latest available SARS-CoV-2 sequence data. The epitopes predicted using SARS-CoV immunological information have been used by multiple experimental studies for SARS-CoV-2 to probe the immune responses in convalescent COVID-19 patients. As will be discussed in Section 4, many epitopes predicted using this approach have been found to elicit a cross-reactive T cell response in patients.

Methods based on prediction of peptide-HLA binding

The large majority of in silico SARS-CoV-2 vaccine design studies so far have predicted T cell epitopes using existing peptide-HLA binding prediction methods (see Table 1). A number of these studies also performed a subsequent refinement step where the set of epitopes was narrowed down by running computational tests to identify those capable of eliciting a robust and safe T cell response. In some cases, the refined set of predicted epitopes were used to design a vaccine construct by including appropriate signal sequences, proteasomal cleavage sites, and linkers, and these were further tested in silico for features such as immunogenicity, safety, and structural stability.

Initial epitope prediction methods

The in silico search for SARS-CoV-2 T cell epitopes has benefitted from years of research in developing peptide-HLA binding algorithms. These methods have matured over time thanks to increased availability of experimental data and methodological advancements. Most methods are specialized for predicting either HLA class I-restricted epitopes (i.e., CD8+ T cell epitopes) or HLA class II-restricted epitopes (i.e., CD4+ T cell epitopes), while some methods have been developed for predicting epitopes for both HLA classes. Based on the methodology employed, these T cell epitope prediction methods can be broadly divided into two groups: Machine learning (ML) based methods and (non-ML) bioinformatics based methods. The current state-of-the-art ML epitope prediction methods utilize artificial neural networks (ANN). Such methods that have been used for SARS-CoV-2 HLA-I epitope prediction include NetMHC [36], NetMHCpan [37,38], NetCTLpan-1.1 [39], NetMHC-4.0 [40], HLAthena [17], MHCflurry [41] and NetMHCpan-4.0 [16]. Corresponding methods that have been used for prediction of SARS-CoV-2 HLA-II epitopes include NetHMCII-2.3 [42], NetMHCIIpan-3.0 [43], NetMHCIIpan-3.1 [44], NetMHCIIpan-3.2 [42], NetMHCIIpan-4.0 [18], NeonMHC2 [19] and MARIA [45] (for a historical perspective on the evolution of these methods, see [46]). The suffix -pan in the names of these methods indicates pan-specificity; i.e., the ability to predict peptide-HLA binding for a large set of alleles within the HLA class, including the ones that are absent in the training set. This feature has been made possible by integrating information about the amino acids characterizing the HLA binding groove in training the ANN [47]. The earlier ANN-based methods such as NetMHC, NetMHCpan, NetHMCII-2.3, NetMHCIIpan-3.0, NetMHCIIpan-3.1, etc., were mostly trained using data obtained from HLA binding assays, which characterize the binding affinity of synthetic peptides to HLA molecules. The most recent ANN-based methods, such as MHCflurry, NetMHCpan-4.0, and NetMHCIIpan-4.0, additionally employ data from HLA ligand elution assays, which use advances in mass spectrometry techniques to isolate a large number of peptides that are naturally processed and presented by human cells expressing a single HLA. A few SARS-CoV-2 epitope prediction studies such as HLAthena and NeonMHC2 have also used methods that were trained solely on HLA ligand elution assay data. However, it has been demonstrated for both HLA classes that the ANN models trained using both types of data provide superior performance to models trained on one type of data only [16]. Some methods used by SARS-CoV-2 epitope prediction studies include the HLA-I specific methods NetChop [48] and NetCTL-1.2 [49], which incorporated additional intracellular factors involved in HLA antigen presentation in an attempt to improve peptide-HLA binding prediction. These factors include proteasomal cleavage sites and transport efficiency of TAP (the transporter associated with antigen processing) in antigen presenting cells [50]. However, inclusion of these factors has been found to show only a marginal improvement in peptide-HLA binding prediction over the ANN-based methods trained solely on HLA binding assays [48]. While almost all recent epitope prediction methods are ANN-based, a few early ones (such as CTLPred [51] and RANKPEP [52]) were based on an alternative ML approach, support vector machines, and have also been used by SARS-CoV-2 studies (Table 1). Distinct from the ML methods just described, several SARS-CoV-2 studies have used (non-ML) bioinformatics methods to predict SARS-CoV-2 epitopes. These methods use position-specific scoring functions that assume each position in a peptide to be independently interacting with HLA. They generally assign a score to the studied peptide according to position-specific amino acid features such as amino acid frequencies or amino acid physicochemical profiles (e.g., obtained from BLOSUM matrices [53]) at specific peptide positions. Such methods that have been used to predict SARS-CoV-2 epitopes include ProPred1 [54] for predicting HLA-I epitopes, ProPred [55] and Predivac [56] for predicting HLA-II epitopes, and Vaxign [57] for predicting epitopes for both HLA classes. Some bioinformatics methods also use scoring functions involving the interactions of pairs of peptide positions with the HLA. Methods in this category that have been used to predict SARS-CoV-2 epitopes include SMM [58,59] for predicting HLA-I epitopes, SMM-Align [60] and the method by Sturniolo [61] for predicting HLA-II epitopes, and MHCPred [62] for predicting epitopes restricted by both HLA classes. In general, these bioinformatics methods work for a limited set of HLA alleles, with the exception of Predivac [56] and the method by Sturniolo [61], both of which are pan-specific. Several SARS-CoV-2 studies have used the analysis resource provided by the immune epitope database (IEDB) [43] for epitope prediction. The IEDB provides a collection of several of the above-mentioned prediction methods, and also recommends a best performing method for each HLA class. Some SARS-CoV-2 studies have used the IEDB’s HLA class-specific consensus methods, which predict peptide-HLA binding for their respective HLA class based on the consensus of several prediction methods [63,64]. A few studies have also used pipeline tools like pVACtools [65] and TepiTool [66], that allow users to predict epitopes for both HLA classes from a set of pre-defined ML and bioinformatics methods. Another method used for predicting SARS-CoV-2 epitopes is nHLAPred [67], which uses a combination of ML and bioinformatics methods to predict HLA-I-restricted epitopes. The variety of peptide-HLA binding prediction methods used by in silico studies to predict SARS-CoV-2 epitopes raises questions as to whether the predictions of these studies are overlapping or distinct, and whether there are specific methods that appear most appropriate for predicting SARS-CoV-2 epitopes. We explore these questions subsequently in Sections 3 and 4, where we show that multiple common SARS-CoV-2 epitopes have indeed been predicted by independent in silico studies, while also identifying methods whose predicted epitopes are found to have induced T cell responses in convalescent COVID-19 patients.

Refinement of predicted epitopes

Beyond epitope prediction based on peptide-HLA binding, a number of SARS-CoV-2 in silico studies used computational tools to screen the predicted epitopes for their immunogenicity, and for their ability to elicit a robust and safe T cell response. A brief description of these tools and the rationale for using them is discussed below.

Screening predicted epitopes for their immunogenicity and robustness

Presentation of a peptide by an HLA molecule, while necessary for inducing a T cell response, does not guarantee T cell recognition and activation. That is, presentation does not imply that the peptide will be immunogenic. Thus, it is important to assess the immunogenicity of the predicted epitopes obtained from peptide-HLA binding prediction methods. The specific factors that differentiate an immunogenic HLA-presented peptide from a non-immunogenic one are still not well known, though a number of factors have been suggested to be the cause of this difference. For example, immunogenicity of a peptide may increase due to abundance of peptide-HLA complexes displayed on cells [68,69], early expression of the protein to which the peptide belongs [63,70,71], competition with other peptide-HLA complexes for stimulating T cells [72,73], and low genetic similarity of the peptide to a self-peptide (i.e., a host derived peptide) [[74], [75], [76], [77], [78]]. Several existing computational tools have been used to assess the immunogenicity of the SARS-CoV-2 epitopes obtained from peptide-HLA binding prediction methods. Of these, Vaxijen-2.0 [79] is the most commonly used and can predict immunogenicity of both HLA-I and HLA-II epitopes. This method was originally developed to predict protein immunogenicity by accounting for higher order interactions between protein sequence positions and exploiting the physicochemical properties (hydrophobicity, molecular size, polarity) of amino acids. It was trained on a set of known immunogenic and non-immunogenic proteins of viral, bacterial and tumor origins. Another common method that has been used for determining immunogenicity of HLA-I-restricted peptides in SARS-CoV-2 studies is the one available on the IEDB (Calis et al. [80]). This method was developed by comparing a set of immunogenic and non-immunogenic presented peptides compiled from multiple experimental sources. Specifically, immunogenicity of presented peptides was found to be largely dependent on the amino acids present at positions 4-6 of the peptide, and specific physicochemical properties of the amino acids, having large and aromatic side chains, in the peptide. A model exploiting these features was then developed to predict immunogenicity of a given HLA-I-restricted peptide. In addition to Vaxijen-2.0 and Calis et al., one SARS-CoV-2 study [21] used iPred [81] for immunogenicity prediction, which is also based on physicochemical properties of the amino acids in the peptide. A novel method to predict the immunogenicity of SARS-CoV-2 HLA-I restricted peptides was also proposed in Gao et al. [82] which utilizes a physics-based model and takes into account factors such as peptide-HLA binding affinity and similarity of a peptide with pathogen-derived and human-derived peptides. For model training and testing, a well-characterized dataset of immunogenic HIV T cell epitopes was used. This model was reported to outperform Calis et al. [80] in predicting immunogenic epitopes for the HIV dataset. Gao et al. used this method to filter the SARS-CoV-2 epitopes proposed in Ahmed et al. [20] and Prachar et al. [83] based on their predicted immunogenicity. For the specific case of peptides presented by HLA class II, the activated CD4+ T cells differentiate into T helper (Th) cells of various types having distinct effector functions [72]. Of particular importance are the Th1 and Th2 cells. Th1 cells secrete the interferon-gamma (IFN-γ) cytokine and promote cell-mediated immunity, while Th2 cells secrete the interleukin-4 cytokine and promote humoral immunity [84]. For inducing robust cell-mediated immunity, multiple in silico SARS-CoV-2 vaccine design studies have further screened the predicted immunogenic HLA-II peptides to identify those that are likely to induce IFN-γ (Table 1). This was done primarily using the computational method IFNepitope [85], an ML approach that uses a dataset comprising peptides experimentally determined to induce IFN-γ, along with peptides that induce cytokines other than IFN-γ. This method was trained on data of IFN-γ inducing epitopes obtained from IEDB. In addition to screening for immunogenicity, to ensure that stimulated T cell responses are robust to genetic variations arising during viral evolution, it is important to identify epitopes that are highly conserved. For SARS-CoV-2, while the mutation rate appears low due to the presence of a genetic proof-reading exoribonuclease nsp14 protein [86], it is still important to consider conservation of epitopes to avoid mutations that accumulate in the population [87]. Thus, a few in silico SARS-CoV-2 vaccine design studies have considered the conservation of predicted immunogenic epitopes among the available SARS-CoV-2 sequences (or a subset of them) for providing vaccine target recommendations (Table 1). The majority of the in silico SARS-CoV-2 studies have computed the conservation of predicted epitopes using in-house code, while a few have used the epitope conservancy tool available at the IEDB [43].

Screening predicted epitopes for their allergenicity, toxicity and autoimmunity

Vaccines are generally administered to otherwise healthy individuals as a preventive measure against disease. A crucial factor when selecting epitopes for a vaccine, apart from the ability of the epitopes to elicit a protective immune response, is whether or not they have any associated safety concern. Any adverse reaction caused by a vaccine is likely to contribute to anti-vaccine sentiment and potentially lead to loss of public trust in immunization programs [88,89]. Even though any potential vaccine would ultimately need to undergo rigorous in vivo and in vitro safety trials, computational tools can help, as a first step, to screen for potential safety concerns. Here, we give a brief review of computational tools that have been used for assessing allergenicity, toxicity, and autoimmunity of SARS-CoV-2 epitopes. Almost all studies proposing an in silico vaccine perform these tests on the vaccine construct, while some also perform these at the initial epitope selection step (see Table 1). Allergenicity of a substance is its potential to cause hypersensitivity in an individual by evoking an immune response [90]. The reason why certain proteins or peptides cause an allergic reaction in humans is not precisely known. Several SARS-CoV-2 studies have used more than one computational tool to test for allergenicity of predicted epitopes (see Table 1). These include AlgPred [91], AllerTOP [92], AllerTOP-2.0 [93], AllergenFP [94], and AllerCatPro [95]. AlgPred is a suite of multiple allergenicity prediction approaches based on motif alignment, support vector machines, and hybrid approaches; AllerTOP, AllerTOP-2.0, and AllergenFP are three related methods based on the physicochemical similarity of the considered protein with known allergens; while AllerCatPro is a recently developed method that uses 3D structure information along with sequence similarity with known allergens to predict allergenicity. In a comparative test, AllerCatPro was shown to have superior performance to the other methods in identifying allergens [95]. However, of the 65 in silico studies considered in this review (Table 1), only one study [96] used AllerCatPro for allergenic screening of predicted SARS-CoV-2 epitopes. Toxicity is the capacity of a substance to damage a living organism by interacting with biomolecules and disrupting normal cellular functions [97]. The effects of this disruption may range from slight symptoms like nausea in mild cases, through to death in severe cases. SARS-CoV-2 studies that tested for toxicity of predicted epitopes have used ToxinPred [98], a support vector machine based method that uses a position-specific scoring function to predict peptide toxicity from sequence information. ToxinPred has been trained on a set of known toxic (of bacterial and animal origin) and non-toxic peptides obtained from the Universal Protein Resource [99]. While a number of SARS-CoV-2 in silico studies have used ToxinPred (Table 1), the accuracy of these toxicity predictions for viral epitopes has yet to be confirmed. Autoimmunity refers to the process of the body launching an immune response against its own healthy self [100]. In the context of T cells, this would constitute a T cell mounting a response against a self-peptide. The exact causal mechanism that triggers such a response in not known, though there is evidence that it may be triggered in some individuals in the aftermath of a viral or bacterial infection [101,102]. While evidence for a vaccine-induced autoimmune response has not been found in controlled studies, occasional case reports of such an occurrence are present in the literature [103,104]. This gives rise to some concern that if a vaccine contains viral epitopes similar to self-peptides, it may evoke an autoimmune response [105,106]. To account for this, some of the SARS-CoV-2 studies have checked for sequence similarity between the human proteome and viral epitopes to identify epitopes that may potentially induce an autoimmune reaction [107]. The computational hurdle here is the large size of the human proteome which makes testing multiple epitopes for sequence similarity challenging. However, of all possible SARS-CoV-2 peptides of lengths 8-10 (HLA-I restricted) and 13-20 (HLA-II restricted), only ~0.03% were found to match exactly with the human proteome [108], suggesting that autoimmunity may not be a common issue for SARS-CoV-2. Although the above-mentioned screening tests are well motivated, the ability of the specific tools used to effectively screen SARS-CoV-2 epitopes for immunogenicity, induction of IFN-γ, allergenicity, and toxicity still remains unclear. In an attempt to shed some light on the predictive nature of these tools, in Section 4 we analyze and compare their predictions when applied to epitopes of SARS-CoV-2 that have been confirmed experimentally in patients.

Design of a vaccine construct

While some of the SARS-CoV-2 in silico studies only identified a list of epitopes, several went a step further to propose a multi-epitope vaccine construct (see Table 1). Briefly, this entailed selecting appropriate linkers, which play a key role in the structural stability of the vaccine, and adjuvants, which help to boost the immune response. These in silico designs spanned a host of modern vaccine technologies, e.g., subunit vaccines, peptide vaccines, RNA and DNA vaccines [109,110]. Almost all SARS-CoV-2 studies that presented an in silico design of a multi-epitope vaccine construct also performed in silico tests for immunogenicity, conservation, allergenicity, and toxicity using the tools reviewed above, regardless of whether or not these tests were performed at the epitope level. As mentioned above, the majority of these in silico tools were originally developed to analyze proteins, and thus they may be more appropriate for analyzing multi-epitope vaccine constructs than individual epitopes. In addition, physicochemical composition, secondary and tertiary structure predictions, as well as molecule docking studies with human immune receptors were also investigated for the proposed constructs. None of these studies, however, tested their predictions experimentally. As the focus of the current review is on epitope identification, we refer the interested reader to [111] for more details on vaccine constructs.

Comparison of studies predicting SARS-CoV-2 epitopes

To compare the predictions of the in silico studies (listed in Table 1), the predicted epitopes and their specific HLA allele associations [170] were compiled. In cases where the specific HLA allele information was unavailable, we recorded it as “NA”. For a meaningful comparison, we focused only on the predicted T cell epitopes comprising 8 to 20 amino acids, representing the typical combined range of CD4+ and CD8+ T cell epitopes [171,172]. This excluded Enayatkhani et al. [118], Yarmarkovich et al. [167], Yazdani et al. [168], and Lucchese et al. [169] from the analysis, giving a remaining set of 61 studies (Fig. 2 ).

Fig. 2

Summary and comparison of 61 in silico studies that have predicted SARS-CoV-2 T cell epitopes. (Top left panel) Heatmap shows the fraction of common epitopes predicted across each pair of studies. The fraction is computed relative to the number of epitopes predicted by the study indicated in each row (the total number of epitopes predicted in each study are shown within parentheses on the right). Four in silico studies that used SARS-CoV immunological data are indicated in bold font. Of the epitopes predicted by these studies, only the ones predicted based on homology with SARS-CoV epitopes were included. Study labels indicated in the figure correspond to those in Table 1. (Top right panel) Bar plots show the fraction of predicted epitopes for each HLA class in each study, with the total number shown within parentheses. (Bottom left panel) Heatmap shows the number of predicted epitopes derived from each SARS-CoV-2 protein for each in silico study. Each column in this heatmap corresponds to the study mentioned at the top of each column in the top left panel heatmap. Missing tiles indicate no predicted epitopes. (Bottom right panel) Bar plots show the fraction of predicted epitopes, across studies, derived from each SARS-CoV-2 protein, with the total number shown within parentheses. Predicted epitopes were assigned HLA class based on the HLA allele (bearing 4-digit resolution or higher) reported against them; or as “NA” otherwise. The number of T cell epitopes predicted by these SARS-CoV-2 studies varied widely (minimum = 1 epitope; maximum = 3407 epitopes), even for studies that used the same peptide-HLA prediction method (Fig. 2). This can be attributed to differences in subsequent prediction refinement steps and the study objectives. Studies reporting a very limited number of T cell epitopes (<10), e.g., Joshi et al. [156], Rahman et al. [165], Khan et al. [148], Jakhar et al. [135], Samad et al. [130], were primarily focused on designing a short vaccine construct for eliciting a targeted immune response. Most of these studies refined the set of initial epitopes obtained by peptide-HLA binding prediction methods based on the tests described in Section 2.2.2. In contrast, the studies that predicted a large set of epitopes (>400) had objectives such as identifying all possible epitopes having high coverage in a specific ethnic population (Feng et al. [152]), identifying epitopes recognized by a large number of HLA alleles including rare ones (Campbell et al. [137]), or finding all possible epitopes binding to a specific allele even with low predicted affinity (Nerli et al. [147]). These did not perform any refinement of the set of epitopes obtained using the peptide-HLA binding methods. A number of the SARS-CoV-2 studies had a sizeable overlap among their predicted set of epitopes (Fig. 2). This is not surprising since many had used similar computational pipelines (Table 1). For example, the 4 studies based on exploiting SARS-CoV immunological data all relied on sequence similarity of SARS-CoV-2 epitopes with known SARS-CoV epitopes, resulting in considerable overlap among their predicted set of epitopes (Fig. 2). Similarly, there was overlap among predictions of several studies based on peptide-HLA binding prediction (Fig. 2), which mostly used a similar set of methods for epitope prediction (Fig. 3 ). The ones that stand out among the latter group of studies for having a large overlap with multiple others include Feng et al. [152], Campbell et al. [137], and Nerli et al. [147] (Fig. 2), which may be attributed to the large number of epitopes that they predicted.

Fig. 3

Common in silico prediction methods that have been used by the reviewed SARS-CoV-2 studies. Only methods that were explicitly mentioned by at least 5 in silico SARS-CoV-2 studies (Table 1) are shown here. The methods are grouped according to the category shown in the legend. The in silico predicted epitopes lie within each of the 12 SARS-CoV-2 proteins. The structural S protein, which is also one of the most immunogenic proteins of SARS-CoV-2, was the most commonly analyzed protein (Fig. 2). Surprisingly, the other structural proteins (M, N and E), also reported to be immunogenic in SARS-CoV-2 [6] as well as SARS-CoV [173,174], were analyzed by only a few studies. The N protein, in particular, is highly expressed [115], and the epitopes derived from this protein may be especially worthy of further experimental investigation. Compared to other proteins, the number of epitopes predicted for the longer proteins (ORF1a, ORF1b, and S) was much larger. As expected, the number of unique T cell epitopes predicted per protein across all studies was found to be strongly correlated with the length of protein (r = 0.99). Similar numbers of unique HLA-I and HLA-II restricted epitopes were predicted by the SARS-CoV-2 studies listed in Table 1 (2,239 HLA-1 and 2,580 HLA-II), while a good number of epitopes (464) had missing (“NA”) HLA class restriction. For the studies predicting epitopes exclusively based on peptide-HLA binding methods, it is surprising to observe epitopes having unknown HLA class restriction. This is because almost all prediction methods (discussed in Section 2.2.1) require specifying the HLA allele for predicting associated epitopes. However, a close inspection of the related studies revealed that this was due to either non-reporting of HLA allele restriction of the predicted epitopes (e.g., Sarkar et al. [161]), or cases where only the number of unique HLA alleles associated with the predicted epitopes were reported without specifying the individual correspondences (e.g., Sanami et al. [163]). As for the studies leveraging the immunological data of SARS-CoV from public databases, the SARS-CoV epitopes having no HLA class information were assigned the “NA” HLA class.

Correspondence between predicted and experimentally-determined T cell epitopes

With a large number of studies predicting SARS-CoV-2 T cell epitopes using different computational pipelines (involving specific epitope prediction and refinement methods), it is difficult to assess their accuracy solely based on their performance in predicting epitopes of other organisms [175]. For this purpose, “ground truth” information of experimentally-determined SARS-CoV-2 T cell epitopes is required. This data has started to emerge from immunological assays that analyze immune responses in COVID-19 patients. As of 8 September 2020, we found eight experimental studies [8,9,115,[176], [177], [178], [179], [180]], reviewed in [181], as well as an additional study [182], that reported positive T cell immune responses from blood samples of convalescent COVID-19 patients against epitopes derived from SARS-CoV-2 proteins. Compiling data from these nine studies yielded a total of 324 (unique) epitopes. These 324 epitopes present a sampling (albeit not comprehensive) of the landscape of epitopes targeted by COVID-19 patients, and hence they provide an initial basis for which to conduct a comparative analysis of different in silico epitope prediction methods. The nine experimental studies measured T cell responses after stimulation of the peripheral blood mononuclear cells (PBMCs) from convalescent COVID-19 patients. Of these, some studies obtained a set of peptides to synthesize using one or more of the in silico epitope prediction methods, which were then used to stimulate the PBMCs. Three such studies [8,9,180] used NetMHCpan-4.0, two studies [177,182] used NetMHC4.0, while one study [115] used HLAThena and NeonMHC2 to select the set of peptides to synthesize. Two of these studies [8,182] also investigated a few epitopes that were predicted in [20,22] using SARS-CoV immunological data. This set of immunological studies demonstrates the application of multiple in silico T cell epitope prediction methods in guiding experimental investigations. Specifically, immune responses are measured against a reduced set of defined peptides predicted in silico which helps in identifying precise epitopes. In alternative experimental studies, stimulation was done using pools of overlapping k-mer peptides. One such study [176] synthesized pools of 15-mer overlapping peptides to measure T cell responses, while another study [179] synthesized pools of 15-18-mer overlapping peptides in addition to epitopes predicted using SARS-CoV immunological data [20]. While using pools of overlapping peptides does not reveal the precise “epitope” stimulating the T cell response, it still provides the information of immunogenic peptides encompassing the epitopes. Lastly, one study [178] employed a new experimental framework called T-Scan [183] to identify SARS-CoV-2 epitopes. We compared the 324 experimentally-determined epitopes (including both precise epitopes and immunogenic peptides) against all unique 8-20 residue-long in silico predicted T cell epitopes (5273) (Fig. 2). We found that 309 of the experimentally-determined epitopes encompassed at least a single predicted epitope, while 163 of these epitopes matched identically to predicted ones. Looking closely at those in silico studies that predicted at least one of the 163 identically-matched epitopes, we observed that studies which used SARS-CoV immunological data had collectively a higher hit rate (proportion of identically-matched epitopes in the set of predicted epitopes) (25%) as compared to the studies that used peptide-HLA binding prediction-based methods (3%). This difference in hit rate points to the usefulness of using SARS-CoV data to predict immune targets for SARS-CoV-2. Most in silico prediction studies as well as experimental studies were biased towards certain HLA alleles. Together, the nine experimental studies reported T cell epitopes associated with 18 different HLA alleles, with the largest proportion of the reported epitopes (35/116) associated with the HLA-A*02:01 allele. This is not surprising since HLA-A*02:01 is the most prevalent HLA-I allele globally [184]. Similarly, roughly a third of the epitopes predicted by in silico methods were also associated with HLA-A*02:01. Thus, the experimentally-determined and predicted epitopes associated with HLA-A*02:01 represent a reasonable dataset for assessing the predictions of SARS-CoV-2 in silico studies. Altogether, 33 of the 35 experimentally-determined HLA-A*02:01-associated epitopes matched identically with predicted epitopes, and these were derived from five proteins (S, ORF1a, N, ORF3a and M) (Fig. 4 ). Most of these epitopes (27/33) were reported to elicit a T cell response in multiple convalescent COVID-19 patients, with 10 of them eliciting responses in more than 5 patients. This suggests the potential immunodominance of these epitopes [185,186] across the segment of population bearing the HLA-A*02:01 allele (Fig. 4). Of these 33 epitopes, in silico studies based on peptide-HLA binding prediction and those leveraging SARS-CoV immunological data predicted 32 and 17, respectively. After grouping the in silico studies based on the peptide-HLA prediction tool used, those involving NetMHCpan-4.0 (despite the differences in refinement steps) appeared to predict most (30/33) of the epitopes (Table 2 ). However, the method leveraging SARS-CoV immunological data had the highest hit rate in predicting experimentally-determined HLA-A*02:01-associated epitopes (14.8%) (Table 2).

Fig. 4

Table 2

Approaches adopted by in silico studies that predicted at least half of the experimentally-determined HLA-A*02:01-associated epitopes.

No.	In silico studies	Approach	Total number of predicted epitopes	Number of predicted epitopes matching experimentally-determined epitopes	Hit ratea
1	Nerli2020 [147], Wang2020 [114], Bhatnager2020 [131]	Based on peptide-HLA binding prediction (involving NetMHCpan-4.0)	722	30	4.2%
2	Ahmed2020 [20], Grifoni2020 [22], Ranga2020 [23], Lee2020 [21]	Using SARS-CoV immunological data	115	17	14.8%

Hit rate represents the positive predicted value (i.e., ratio of the number of predicted epitopes matching experimentally-determined epitopes to the total number of in silico predicted epitopes).

Summary of the experimentally-determined HLA-A*02:01-associated SARS-CoV-2 epitopes that were also predicted by in silico studies. (Left panel) List of 33 experimentally-determined HLA-A*02:01-associated epitopes that matched identically with epitopes predicted by in silico studies. (Middle panel) Number of convalescent COVID-19 patients bearing the HLA-A*02:01 allele whose blood sample responded (filled bar) and did not respond (empty bar) upon stimulation with the epitope. (Right panel) Number of in silico studies that predicted the epitope in the context of HLA-A*02:01. Orange represents the number of studies that used SARS-CoV immunological data, while purple represents the number of studies based on peptide-HLA binding prediction. The labels of the in silico studies (Table 1) predicting each epitope are listed on the right. Epitopes are colored according to the SARS-CoV-2 protein from which they are derived (counts shown in legend) and ordered in descending order of the number of patients whose samples responded. The two experimentally-determined HLA-A*02:01 epitopes which did not match identically with any epitope predicted by in silico studies were 906YLFDESGEFKL916 in ORF1a and 20FLAFVVFL27 in E. These epitopes were reported to induce a T cell response in 9/36 and 2/3 COVID-19 convalescent patients, respectively. Approaches adopted by in silico studies that predicted at least half of the experimentally-determined HLA-A*02:01-associated epitopes. Hit rate represents the positive predicted value (i.e., ratio of the number of predicted epitopes matching experimentally-determined epitopes to the total number of in silico predicted epitopes). A large fraction of these experimentally-determined epitopes (27/33) were predicted to promiscuously bind to multiple HLA alleles by in silico methods (Table 3 ). For example, the S-derived epitope 269YLQPRTFLL277 was collectively predicted by 12 in silico methods to bind to 43 HLA alleles in addition to HLA-A*02:01. This HLA promiscuity of the experimentally-determined epitopes predicted by in silico methods (Table 3) can help guide future experimental studies to identify the potential immunodominance of these epitopes across a large segment of the global population carrying different HLA alleles.

Table 3

Distinct HLA alleles predicted, across in silico studies, to be associated with the 33 experimentally-determined HLA-A*02:01-restricted SARS-CoV-2 epitopes.

No.	Epitopea	Protein	In silico prediction methods(count)	In silico predicted HLAs
1	₁₃₉LLYDANYFL₁₄₇	ORF3a	1	A*02:01
2	₃₈₈₆KLWAQCVQL₃₈₉₄	ORF1a	3	A02:01, A02:02, A02:03, A02:06, A02:121, A02:131, A02:141, A02:158, A02:173, A02:181, A02:196, A02:205, A02:214, A02:228, A02:238, A02:25, A02:257, A02:266, A02:70, A02:71, A02:73, A02:85, A02:95, A03:01, A*31:01
3	₂₆₉YLQPRTFLL₂₇₇	S	12	A01:01, A02:01, A02:02, A02:03, A02:06, A02:07, A02:12, A02:13, A02:131, A02:142, A02:150, A02:170, A02:180, A02:187, A02:196, A02:205, A02:214, A02:233, A02:238, A02:247, A02:257, A02:44, A02:50, A02:54, A02:69, A02:71, A02:73, A02:85, A23:01, A24:02, A32:01, B07:02, B08:01, B08:22, B08:38, B08:41, B08:56, C03:03, C03:04, C04:01, C06:02, C07:02, C12:03, C14:02
4	₄₀₉₄ALWEIQQVV₄₁₀₂	ORF1a	3	A02:01, A02:02, A02:03, A02:06, A02:11, A02:121, A02:139, A02:141, A02:150, A02:16, A02:173, A02:181, A02:19, A02:196, A02:205, A02:214, A02:228, A02:238, A02:25, A02:257, A02:266, A02:50, A02:70, A02:71, A02:73, A02:85, A02:95, A68:02, A69:01, B15:01
5	₁₀₀₀RLQSLQTYV₁₀₀₈	S	6	A02:01, A02:02, A02:03, A02:06, A02:07, A68:02
6	₂₂₂LLLDRLNQL₂₃₀	N	6	A02:01, A02:02, A02:03, A02:04, A02:07, A02:11, A02:13, A02:132, A02:141, A02:150, A02:16, A02:173, A02:181, A02:19, A02:196, A02:205, A02:214, A02:228, A02:238, A02:25, A02:262, A02:54, A02:70, A02:71, A02:73, A02:85, A02:95, A03:01, A23:01, A32:01, B08:01, B08:22, B08:38, B08:41, B08:56, B14:02, B46:01, B51:01, C01:02, C02:02, C03:02, C03:03, C03:04, C03:71, C04:01, C05:01, C06:02, C07:02, C08:02, C12:03, C14:02, C14:03, C*17:01
7	₂₃₃₂ILFTRFFYV₂₃₄₀	ORF1a	4	A02:01, A02:03, A02:06, A02:09, A02:107, A02:118, A02:121, A02:131, A02:141, A02:158, A02:160, A02:173, A02:181, A02:187, A02:196, A02:205, A02:214, A02:228, A02:238, A02:25, A02:257, A02:266, A02:28, A02:29, A02:30, A02:31, A02:33, A02:34, A02:40, A02:42, A02:51, A02:58, A02:59, A02:60, A02:61, A02:66, A02:67, A02:68, A02:70, A02:71, A02:73, A02:74, A02:85, A02:95, B*08:01
8	₁₀₇YLYALVYFL₁₁₅	ORF3a	4	A02:01, A02:02, A02:09, A02:11, A02:12, A02:121, A02:131, A02:141, A02:158, A02:16, A02:173, A02:181, A02:187, A02:196, A02:205, A02:214, A02:228, A02:238, A02:25, A02:263, A02:266, A02:54, A02:58, A02:69, A02:71, A02:73, A02:85, A02:95, A33:08, A74:04, B*40:13
9	₇₂ALSKGVHFV₈₀	ORF3a	3	A02:01, A02:50
10	₁₂₂₀FIAGLIAIV₁₂₂₈	S	8	A02:01, A02:02, A02:03, A02:06, A02:07, A02:131, A02:150, A02:170, A02:179, A02:187, A02:196, A02:205, A02:214, A02:228, A02:238, A02:248, A02:257, A02:50, A02:69, A02:71, A02:85, A02:95, A68:02, A2, B46:01, C03:04, C12:03, C*15:02
11	₂₂₁LLLLDRLNQL₂₃₀	N	2	A*02:01
12	₃₄₀₃FLNGSCGSV₃₄₁₁	ORF1a	4	A02:01, A02:02, A02:03, A02:06, A02:16, A02:171, A02:50, A68:02
13	₄₁₇KIADYNYKL₄₂₅	S	8	A02:01, A02:05, A02:06, A02:07, A02:102, A02:104, A02:128, A02:131, A02:142, A02:155, A02:161, A02:173, A02:186, A02:187, A02:196, A02:209, A02:22, A02:229, A02:243, A02:25, A02:262, A02:50, A02:69, A02:90, A02:95, A11:01, A24:02,A31:01, A32:01, B07:02, B08:01, B27:05, B35:01, B38:01, B39:02, C04:01, C07:02, C15:02, Cw*04:01
14	₈₂₁LLFNKVTLA₈₂₉	S	4	A02:01, A02:03, A02:06, A02:07, B*08:01
15	₄₂₄KLPDDFTGCV₄₃₃	S	3	A*02:01
16	₈₂₅FGDDTVIEV₈₃₃	ORF1a	2	A02:01, C05:21, C*05:30
17	₃₄₆₇VLAWLYAAV₃₄₇₅	ORF1a	3	A02:01, A02:02, A02:03, A02:06, A02:11, A02:148, A02:22, A02:230, A02:253, A02:258, A*68:02
18	₃₆₃₉FLLPSLATV₃₆₄₇	ORF1a	5	A02:01, A02:02, A02:03, A02:06, A02:11, A02:12, A02:13, A02:132, A02:141, A02:158, A02:16, A02:173, A02:181, A02:19, A02:196, A02:205, A02:214, A02:228, A02:238, A02:25, A02:263, A02:27, A02:28, A02:50, A02:51, A02:54, A02:61, A02:70, A02:71, A02:73, A02:85, A02:99, A*68:02
19	₁₀₆₂FLHVTYVPA₁₀₇₀	S	5	A02:01, A02:03, A02:06, A02:07, B*54:01
20	₉₈₃RLDKVEAEV₉₉₁	S	6	A02:01, A02:02, A02:03, A02:06, A02:07, A02:16, A68:02, C04:01
21	₉₉₅RLITGRLQSL₁₀₀₄	S	2	A*02:01
22	₂₆FLFLTWICL₃₄	M	4	A02:01, A02:07
23	₃₃₈KLDDKDPNF₃₄₆	N	2	A02:01, C05:33
24	₃₈₆KLNDLCFTNV₃₉₅	S	3	A02:01, A02:03
25	₁₁₂YLGTGPEAGL₁₂₁	N	1	A*02:01
26	₃₁₆GMSRIGMEV₃₂₄	N	5	A02:01, A02:03, A*02:50
27	₂₁₉LALLLLDRL₂₂₇	N	1	A*02:01
28	₂₀₂KIYSKHTPI₂₁₀	S	5	A02:01, A02:07, A30:01, A32:01, C03:04, C15:02
29	₈₅₇GLTVLPPLL₈₆₅	S	4	A02:01, A02:07
30	₉₅₈ALNTLVKQL₉₆₆	S	5	A02:01, A02:03
31	₉₉₆LITGRLQSL₁₀₀₄	S	4	A02:01, B07:02, B08:01, C03:03, C03:04, C12:03
32	₁₁₈₅RLNEVAKNL₁₁₉₃	S	6	A02:01, A02:03, A02:11, A02:128, A02:171, A02:196, A02:230, A02:238, A02:253, A02:258, A02:99, A32:01, B*27:20
33	₉₇₆VLNDILSRL₉₈₄	S	7	A01:01, A02:01, A02:03, A02:06, A02:07, A02:11, A02:13, A02:132, A02:148, A02:151, A02:171, A02:186, A02:19, A02:196, A02:209, A02:22, A02:230, A02:238, A02:253, A02:258, A02:52, A02:54, A02:70, A02:71, A02:73, A02:85, A02:99, A03:01, A11:01, A23:01, A24:02, A26:01, A30:01, A30:02, A31:01, A32:01, A33:01, A68:01, A68:02, B07:02, B08:01, B15:01, B35:01, B40:01, B44:02, B44:03, B46:01, B51:01, B53:01, B57:01, B58:01, C04:01, C05:04, C05:23, C*05:33

Epitopes are listed in the same order as in Fig. 4.

Distinct HLA alleles predicted, across in silico studies, to be associated with the 33 experimentally-determined HLA-A*02:01-restricted SARS-CoV-2 epitopes. Epitopes are listed in the same order as in Fig. 4. We used the compiled experimental SARS-CoV-2 data to assess the predictions of various refinement tools employed in SARS-CoV-2 in silico studies (Table 1). We selected the refinement tools for which a web-server was available and used the recommended parameters on the web-server of each tool to analyze the experimentally-determined SARS-CoV-2 epitopes. This compiled data of experimentally-determined epitopes, obtained from positive T cell responses in convalescent COVID-19 patients, serves as a reasonable ground truth for the tools predicting immunogenicity. For Vaxijen-2.0 [79], the most commonly used tool by SARS-CoV-2 in silico studies for screening epitopes for immunogenicity (Fig. 3), we obtained predictions for the experimentally-determined epitopes using the available web-server by selecting the organism as “viruses”, as recommended by the authors. Our analysis revealed that Vaxijen-2.0 classified only ~56% (182/324) of the experimentally-determined epitopes as immunogenic (Fig. 5 ). Unlike Vaxijen-2.0, Calis et al. [80] was developed for predicting immunogenicity of only HLA class I epitopes and does not perform binary classification. Instead, it provides a score to each epitope, with a high score representing high confidence in the epitope being immunogenic. We considered all epitopes predicted to have positive scores as immunogenic and vice versa, in accordance with the majority of SARS-CoV-2 in silico studies that used Calis et al. to predict epitope immunogenicity (Table 1). Assessing the immunogenicity of 98 HLA class I experimentally-determined epitopes, Calis et al. predicted only ~63% (62/98) of them to be immunogenic (Fig. 5). To investigate the performance of these methods in more detail, we tested their accuracy in predicting the top 10 HLA-A*02:01-associated immunodominant epitopes (Fig. 4). In this case, Vaxijen-2.0 and Calis et al. predicted 30% (3/10) and 60% (6/10) of these to be immunogenic. Hence, the most commonly-used methods for immunogenicity prediction incorrectly classified over a third of the 328 experimentally-determined epitopes as non-immunogenic. The accuracy of these methods does not appear to improve even for predicting the highly immunogenic epitopes, highlighting their suboptimal performance.

Fig. 5

Results obtained for the 324 experimentally-determined SARS-CoV-2 T cell epitopes [8,9,115,[176], [177], [178], [179], [180],182] when they were provided as input to the computational tools most commonly used in the reviewed SARS-CoV-2 in silico studies for refinement of epitopes obtained by peptide-HLA binding prediction methods. Positive outcomes indicate the number of epitopes the computational tool predicts to have the characteristic (immunogenicity, IFN-γ production, allergenicity, toxicity) being tested, and vice versa for negative outcomes. “NA” indicates the number of epitopes that could not be analyzed by the specific tool. In case of Calis et al. [80], the method is applicable to HLA-I epitopes only, while in the case of IFNepitope, this was because only a subset of the experimentally-determined epitopes had IFN-γ production information available. For a selected subset (20) of HLA-II restricted experimentally-determined epitopes, the recognizing CD4+ T cells were confirmed to be producing IFN-γ using flow cytometry [9]. We used this subset of epitopes to assess performance of the IFNepitope tool [85] commonly used in SARS-CoV-2 in silico studies (Fig. 3). The predictions of IFNepitope for the selected experimentally-determined epitopes were obtained from the associated webserver using the recommended approach (motif and SVM hybrid) and model (IFN-γ vs non-IFN-γ). This analysis showed that IFNepitope correctly predicted only 40% (8/20) of the experimentally-determined IFN-γ producing SARS-CoV-2 epitopes (Fig. 5). In contrast to immunogenicity and IFN-γ production, no information was available regarding allergenicity and toxicity of the experimentally-determined epitopes from the immunological studies. Thus, no ground truth information is available to assess the performance of tools predicting these epitope characteristics. Nevertheless, we can still analyze the experimentally-determined epitopes using these tools and, at least for the case of allergenicity prediction tools, compare their relative predictions. We used the default parameter settings in the webservers for all allergenicity and toxicity prediction tools. Our analysis showed that AllerTOP-2.0 [93] and AllergenFP [94], the two most commonly used tools for determining allergenicity (Fig. 3), predicted a high fraction ~43% (142/328 and 140/328, respectively) of the experimentally-determined epitopes to be allergenic, with less than half of these (66) being commonly predicted by both methods. Hence, the variation in predictions of these methods was high. This disparity was even more evident for the AllerCatPro method [95], which predicted all experimentally-determined epitopes to be non-allergenic (Fig. 5). Hence, due to the wide variation in predictions and a lack of experimental data to validate them, the practical applicability of the allergenicity tools for SARS-CoV-2 remains unclear and further investigation is required. Lastly, in terms of toxicity prediction, ToxinPred [98] was the only tool used to predict toxicity of epitopes by the SARS-CoV-2 in silico studies (Table 1). Our analysis revealed that it predicted 98% (322/328) of the experimentally-determined epitopes to be non-toxic. However, similar to the case of allergenicity predicting tools, evaluating the accuracy of these toxicity predictions is not possible at present due to a lack of relevant experimental information.

Summary and perspectives

In silico epitope identification is an important component in the vaccine development pipeline as it provides recommendations for immune targets that may be exploited by vaccine designs. It is also very helpful for guiding immunological assays designed to understand T cell responses elicited by vaccines or those mounted naturally against COVID-19 infections. This review has made vivid the large amount of work that has been done already in predicting and analyzing epitopes of SARS-CoV-2, with a focus on T cells. The 65 studies that we have reviewed employed different computational approaches, along with an impressive array of computational tools. The aim of this review was not only to summarize the methods that have been employed so far, but also to provide a comparative analysis of their epitope predictions, as well as to offer insights into the performance of the different approaches. The ability to test prediction accuracy hinges on the availability of experimental ground truth data, which is rapidly evolving but still remains limited. Data limitations precluded, for example, the performance evaluation of tools that predict epitope safety features. However, data from nine independent immunological studies of convalescent COVID-19 patients provided a set of T cell epitopes that offered a means to test the basic ability of the in silico T cell prediction methods for identifying SARS-CoV-2 epitopes reported to be immunogenic. The fact that the large majority (>95%) of the experimentally-determined epitopes for HLA-A*02:01 were identical to an epitope predicted by at least one in silico method offers strong evidence for the practical significance of these methods in identifying immunogenic T cell epitopes in the context of SARS-CoV-2. While the comparison carried out here cannot ascertain which prediction method performed better (given that many studies differed in their prediction refinement step and objectives), we could still compare the hit rates of in silico studies grouped according to their common underlying approach for predicting SARS-CoV-2 epitopes (Table 2). This analysis showed that for HLA-A*02:01 (the HLA allele with the largest number of epitopes available), the two approaches with the highest hit rates in predicting the set of experimentally-determined SARS-CoV-2 epitopes were peptide-HLA binding prediction using NetMHCpan-4.0 and the approach that leveraged SARS-CoV immunological data. Hence, these distinct approaches both appear to be well supported for their further use in guiding additional epitope identification for SARS-CoV-2, and for their application to identify epitopes for other viruses. Our analysis has provided insights into the computational tools that have been used by a number of SARS-CoV-2 in silico studies to further refine the set of predicted epitopes based on specific features. Most notably, the observation that the most commonly used tools for predicting epitope immunogenicity identified almost one-third of the experimentally-determined immunogenic epitopes to be non-immunogenic points to their suboptimality in relation to SARS-CoV-2. Similarly, the performance of the tool that has been used for screening HLA class II epitopes for inducing IFN-γ production was also found to be suboptimal. It should be recognized however, that these in silico screening tools are general-purpose tools that were developed more than seven years ago. For SARS-CoV-2, there appears to be significant room for improvement, and more specialized tools (e.g., Gao et al. [82]) may be more effective. While several of the surveyed studies used in silico tools to assess the safety (allergenicity, toxicity) of SARS-CoV-2 epitopes, the accuracy of these predictions could not be validated due to the lack of experimental data. Discordance in the prediction of some of these safety assessment tools (Section 4) also highlights the need for further research and systematic experimental validation. This is an important research direction since the utility of such in silico tools in pre-clinical trials is gaining recognition by both regulatory bodies and funding agencies [187,188], as it is in line with the principles of 3Rs (replacement, reduction, refinement) for humane animal research. Several experimentally-determined epitopes associated with HLA-A*02:01 appear to be immunodominant across multiple convalescent COVID-19 patients. Interestingly, the majority of these epitopes were predicted to have promiscuous HLA association by multiple methods. This suggests that vaccines designed to target such epitopes have the potential to provide high population coverage. However, the promiscuity of these epitopes remains to be verified experimentally, and this would appear to be an important direction for future studies. In this review we have focused on T cells, which form one arm of the adaptive immune system. The other arm, comprising antibodies produced by B cells, is also important for preventing viral infection. In fact, recent experimental studies have suggested that protection against SARS-CoV-2 may be mediated collectively by both T cells and antibodies [11]. There have been extensive efforts in characterizing the neutralizing antibodies against SARS-CoV-2 [[3], [4], [5]], as well as in identifying the B cell epitopes which may be targeted by neutralizing antibodies. Of the in silico SARS-CoV-2 studies that have been reviewed (Table 1), many had also predicted B cell epitopes for potentially eliciting a neutralizing antibody response. Some of these epitope predictions, particularly those made by methods leveraging SARS-CoV data [20,22], have also been observed experimentally [35,[189], [190], [191], [192], [193]]. The development of in silico methods to identify B cell epitopes for SARS-CoV-2 is currently an active area of research. Like for the case of T cells, ML methods may also be considered for predicting B cell epitopes (e.g., [194]), however developing predictive models for B cells is more complicated since the predicted epitopes must fold into conformations that are similar to the native protein for eliciting an antibody response. For SARS-CoV-2, as well as other coronaviruses, one feature that simplifies the identification of potentially robust epitopes is the fact that the genetic variation is quite low. For example, almost all (~99%) of the epitopes that were predicted by a study in early 2020 [24] are still highly conserved (>99%) within SARS-CoV-2 sequences [35], despite a three orders of magnitude increase in the amount of sequence data available. Hence, based on our current understanding, T cell escape by genetic variation may not be a significant factor for SARS-CoV-2. This is in contrast to other viruses that are highly mutable, such as HIV and hepatitis C virus, for which more elaborate computational methods have been developed to facilitate robust T cell epitope identification and to aid vaccine design [[195], [196], [197], [198], [199], [200], [201], [202], [203], [204], [205], [206], [207]]. Generally speaking, the knowledge being gained through the broad application of in silico T cell epitope prediction methods and tools to SARS-CoV-2 can help guide further studies aimed at epitope determination and vaccine design for various other viruses. For example, the observed cross-reactivity of T cell epitopes between SARS-CoV and SARS-CoV-2 motivate studies that seek to identify epitopes that are genetically similar across a spectrum of coronaviruses (e.g., SARS-CoV, MERS-CoV, SARS-CoV-2, common cold human coronaviruses, as well as animal coronaviruses). The identification of such epitopes could help guide “pan-coronavirus” vaccine designs that are aimed at safeguarding against both current human coronaviruses and novel coronaviruses that may leap from other species to infect humans in the future [208,209]. An increased understanding of the landscape of SARS-CoV-2 immunogenic T cell epitopes targeted by COVID-19 patients would open up the space of possibilities to explore, and this could play an important role in the search for a pan-coronavirus vaccine.

185 in total

1. Amino acid substitution matrices from protein blocks.

Authors: S Henikoff; J G Henikoff
Journal: Proc Natl Acad Sci U S A Date: 1992-11-15 Impact factor: 11.205

2. Coordinate linkage of HIV evolution reveals regions of immunological vulnerability.

Authors: Vincent Dahirel; Karthik Shekhar; Florencia Pereyra; Toshiyuki Miura; Mikita Artyomov; Shiv Talsania; Todd M Allen; Marcus Altfeld; Mary Carrington; Darrell J Irvine; Bruce D Walker; Arup K Chakraborty
Journal: Proc Natl Acad Sci U S A Date: 2011-06-20 Impact factor: 11.205

3. Comparison of experimental fine-mapping to in silico prediction results of HIV-1 epitopes reveals ongoing need for mapping experiments.

Authors: Julia Roider; Tim Meissner; Franziska Kraut; Thomas Vollbrecht; Renate Stirner; Johannes R Bogner; Rika Draenert
Journal: Immunology Date: 2014-10 Impact factor: 7.397

4. T-Scan: A Genome-wide Method for the Systematic Discovery of T Cell Epitopes.

Authors: Tomasz Kula; Mohammad H Dezfulian; Charlotte I Wang; Nouran S Abdelfattah; Zachary C Hartman; Kai W Wucherpfennig; Herbert Kim Lyerly; Stephen J Elledge
Journal: Cell Date: 2019-08-08 Impact factor: 41.582

5. Epitope based peptide vaccine against SARS-COV2: an immune-informatics approach.

Authors: Richa Bhatnager; Maheshwar Bhasin; Jyoti Arora; Amita S Dang
Journal: J Biomol Struct Dyn Date: 2020-07-03

6. ViPR: an open bioinformatics database and analysis resource for virology research.

Authors: Brett E Pickett; Eva L Sadat; Yun Zhang; Jyothi M Noronha; R Burke Squires; Victoria Hunt; Mengya Liu; Sanjeev Kumar; Sam Zaremba; Zhiping Gu; Liwei Zhou; Christopher N Larson; Jonathan Dietrich; Edward B Klem; Richard H Scheuermann
Journal: Nucleic Acids Res Date: 2011-10-17 Impact factor: 16.971

Review 7. CD8(+) T cells: foot soldiers of the immune system.

Authors: Nu Zhang; Michael J Bevan
Journal: Immunity Date: 2011-08-26 Impact factor: 31.745

8. Linear B-cell epitopes in the spike and nucleocapsid proteins as markers of SARS-CoV-2 exposure and disease severity.

Authors: Siti Naqiah Amrun; Cheryl Yi-Pin Lee; Bernett Lee; Siew-Wai Fong; Barnaby Edward Young; Rhonda Sin-Ling Chee; Nicholas Kim-Wah Yeo; Anthony Torres-Ruesta; Guillaume Carissimo; Chek Meng Poh; Zi Wei Chang; Matthew Zirui Tay; Yi-Hao Chan; Mark I-Cheng Chen; Jenny Guek-Hong Low; Paul A Tambyah; Shirin Kalimuddin; Surinder Pada; Seow-Yen Tan; Louisa Jin Sun; Yee-Sin Leo; David C Lye; Laurent Renia; Lisa F P Ng
Journal: EBioMedicine Date: 2020-07-22 Impact factor: 8.143

9. A pneumonia outbreak associated with a new coronavirus of probable bat origin.

Authors: Peng Zhou; Xing-Lou Yang; Xian-Guang Wang; Ben Hu; Lei Zhang; Wei Zhang; Hao-Rui Si; Yan Zhu; Bei Li; Chao-Lin Huang; Hui-Dong Chen; Jing Chen; Yun Luo; Hua Guo; Ren-Di Jiang; Mei-Qin Liu; Ying Chen; Xu-Rui Shen; Xi Wang; Xiao-Shuang Zheng; Kai Zhao; Quan-Jiao Chen; Fei Deng; Lin-Lin Liu; Bing Yan; Fa-Xian Zhan; Yan-Yi Wang; Geng-Fu Xiao; Zheng-Li Shi
Journal: Nature Date: 2020-02-03 Impact factor: 69.504

10. Reverse vaccinology approach to design a novel multi-epitope vaccine candidate against COVID-19: an in silico study.

Authors: Maryam Enayatkhani; Mehdi Hasaniazad; Sobhan Faezi; Hamed Gouklani; Parivash Davoodian; Nahid Ahmadi; Mohammad Ali Einakian; Afsaneh Karmostaji; Khadijeh Ahmadi
Journal: J Biomol Struct Dyn Date: 2020-05-02

21 in total

1. Rapid Identification of MHCII-Binding Peptides Through Microsphere-Assisted Peptide Screening (MAPS).

Authors: Luke F Bugada; Mason R Smith; Fei Wen
Journal: Methods Mol Biol Date: 2022

2. HLA repertoire of 115 UAE nationals infected with SARS-CoV-2.

Authors: Halima Alnaqbi; Guan K Tay; Herbert F Jelinek; Amirtharaj Francis; Eman Alefishat; Sarah El Haj Chehadeh; Amna Tahir Saeed; Mawada Hussein; Bassam H Mahboub; Maimunah Uddin; Nawal Alkaabi; Habiba S Alsafar
Journal: Hum Immunol Date: 2021-08-21 Impact factor: 2.850

3. Developing Acid-Responsive Glyco-Nanoplatform Based Vaccines for Enhanced Cytotoxic T-lymphocyte Responses Against Cancer and SARS-CoV-2.

Authors: Yanan Gao; Qingyu Zhao; Huiling Dong; Min Xiao; Xuefei Huang; Xuanjun Wu
Journal: Adv Funct Mater Date: 2021-07-17 Impact factor: 19.924

Review 4. Current and prospective computational approaches and challenges for developing COVID-19 vaccines.

Authors: Woochang Hwang; Winnie Lei; Nicholas M Katritsis; Méabh MacMahon; Kathryn Chapman; Namshik Han
Journal: Adv Drug Deliv Rev Date: 2021-02-06 Impact factor: 17.873

Review 5. Resources and computational strategies to advance small molecule SARS-CoV-2 discovery: lessons from the pandemic and preparing for future health crises.

Authors: Natesh Singh; Bruno O Villoutreix
Journal: Comput Struct Biotechnol J Date: 2021-04-26 Impact factor: 7.271

6. Bioinformatic prediction of immunodominant regions in spike protein for early diagnosis of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

Authors: Siqi Zhuang; Lingli Tang; Yufeng Dai; Xiaojing Feng; Yiyuan Fang; Haoneng Tang; Ping Jiang; Xiang Wu; Hezhi Fang; Hongzhi Chen
Journal: PeerJ Date: 2021-04-08 Impact factor: 2.984

7. Immunoinformatics Analysis of SARS-CoV-2 ORF1ab Polyproteins to Identify Promiscuous and Highly Conserved T-Cell Epitopes to Formulate Vaccine for Indonesia and the World Population.

Authors: Marsia Gustiananda; Bobby Prabowo Sulistyo; David Agustriawan; Sita Andarini
Journal: Vaccines (Basel) Date: 2021-12-09

8. Evidence of a SARS-CoV-2 double Spike mutation D614G/S939F potentially affecting immune response of infected subjects.

Authors: Sara Donzelli; Francesca Spinella; Enea Gino di Domenico; Martina Pontone; Ilaria Cavallo; Giulia Orlandi; Stefania Iannazzo; Giulio Maria Ricciuto; Raul Pellini; Paola Muti; Sabrina Strano; Gennaro Ciliberto; Fabrizio Ensoli; Stefano Zapperi; Caterina A M La Porta; Giovanni Blandino; Aldo Morrone; Fulvia Pimpinelli
Journal: Comput Struct Biotechnol J Date: 2022-01-21 Impact factor: 7.271

9. Landscape of epitopes targeted by T cells in 852 individuals recovered from COVID-19: Meta-analysis, immunoprevalence, and web platform.

Authors: Ahmed Abdul Quadeer; Syed Faraz Ahmed; Matthew R McKay
Journal: Cell Rep Med Date: 2021-05-21

Review 10. SARS-CoV-2 Vaccines Based on the Spike Glycoprotein and Implications of New Viral Variants.

Authors: Daniel Martínez-Flores; Jesús Zepeda-Cervantes; Adolfo Cruz-Reséndiz; Sergio Aguirre-Sampieri; Alicia Sampieri; Luis Vaca
Journal: Front Immunol Date: 2021-07-12 Impact factor: 7.561