Literature DB >> 35841887

Resolving SARS-CoV-2 CD4⁺ T cell specificity via reverse epitope discovery.

Mikhail V Pogorelyy¹, Elisa Rosati², Anastasia A Minervina¹, Robert C Mettelman¹, Alexander Scheffold³, Andre Franke⁴, Petra Bacher⁵, Paul G Thomas⁶.

Abstract

The current strategy to detect immunodominant T cell responses focuses on the antigen, employing large peptide pools to screen for functional cell activation. However, these approaches are labor and sample intensive and scale poorly with increasing size of the pathogen peptidome. T cell receptors (TCRs) recognizing the same epitope frequently have highly similar sequences, and thus, the presence of large sequence similarity clusters in the TCR repertoire likely identify the most public and immunodominant responses. Here, we perform a meta-analysis of large, publicly available single-cell and bulk TCR datasets from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-infected individuals to identify public CD4+ responses. We report more than 1,200 αβTCRs forming six prominent similarity clusters and validate histocompatibility leukocyte antigen (HLA) restriction and epitope specificity predictions for five clusters using transgenic T cell lines. Collectively, these data provide information on immunodominant CD4+ T cell responses to SARS-CoV-2 and demonstrate the utility of the reverse epitope discovery approach.

Entities: Chemical

Keywords: CD4 T cells; COVID-19; T cell receptor; TCR repertoire; epitope discovery; public T cell response

Mesh：

Substances：

Year: 2022 PMID： 35841887 PMCID： PMC9247234 DOI： 10.1016/j.xcrm.2022.100697

Source DB: PubMed Journal: Cell Rep Med ISSN： 2666-3791

Introduction

The global scientific effort to overcome the COVID-19 pandemic has led to the generation of an extraordinarily large amount of publicly available data describing the human immune response to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Many of these studies point to the importance of robust T cell responses in resolving COVID-19, as well as providing long-term protection against newly emerging SARS-CoV-2 variants.6, 7, 8, 9, 10, 11, 12, 13 One increasingly popular method to study T cell response complexity is to characterize the T cell receptor (TCR) repertoires of activated T cells following infection or vaccination. The TCR is a heterodimer of alpha and beta chains, both of which are formed in a semi-random DNA recombination process resulting in a unique repertoire in each individual that can be resolved through sequencing. However, despite their complex and unique nature, TCR repertoires often have similar and definable features across individuals encountering the same antigens. In particular, responses to immunodominant epitopes trigger large clonal expansions, and TCRs recognizing such epitopes frequently have highly similar sequences.16, 17, 18 Thus, analysis of TCR repertoires could shed light on the differences and commonalities of the T cell immune response across individuals, help discern the identities of the most dominant immunogenic antigens, and provide targets for development of diagnostic and therapeutic strategies, such as adoptive T cell transfer. Indeed, identification of T cell clonotypes reactive to SARS-CoV-2 antigens has led to the development of a novel diagnostic test for SARS-CoV-2 authorized for emergency use by the US Food and Drug Administration (FDA). In order to identify TCR repertoire signatures related to COVID-19, multiple groups have utilized either bulk TCR repertoire sequencing methods,3, 20, 21, 22, 23, 24 which quantitatively measure the frequencies of large numbers of unpaired TCR alpha or beta chain clonotypes, or single-cell TCR-sequencing techniques,1, 2, 25, 26, 27, 28, 29, 30, 31 which produce fewer but paired alpha/beta TCR sequences. One of the major challenges of TCR-sequencing approaches is that only a small fraction of total peripheral T cells recognize viral epitopes, even at the peak of anti-SARS-CoV-2 immune responses. Hence, many studies rely on methods designed to enrich antigen-specific T cell responses prior to TCR sequencing, such as the antigen-reactive T cell enrichment strategy (ARTE), activation-induced marker (AIM), and multiplex identification of T cell receptor antigen specificity (MIRA) assays,32, 33, 34, 35 which combine major histocompatibility complex (MHC)-multimer staining or peptide stimulation with subsequent selection of activated T cells. These methods increase the number of SARS-CoV-2-specific TCRs detected in each sample and help to identify immunodominant epitopes (reviewed in Grifoni et al.). Stimulating T cells with peptide libraries is the most frequently used approach in SARS-CoV-2 epitope discovery.,37, 38, 39, 40, 41, 42 However, peptide stimulation across the whole viral peptidome requires very large sample sizes, which becomes increasingly challenging and cost prohibitive for bigger pathogens, such as SARS-CoV-2. Moreover, instead of a uniform response against all available peptides, the immune system tends to focus on several so-called “immunodominant” epitopes that we are yet unable to predict based on epitope features alone. Here, we propose a reverse epitope discovery technique, which moves the burden of T cell epitope detection away from large peptide screens and rather utilizes rich TCR repertoire datasets as the means to identify unbiased immunodominant responses. We performed a comprehensive TCR meta-analysis of publicly available single-cell and bulk CD4+ TCR repertoire datasets and identified more than 1,200 highly public SARS-CoV-2-reactive TCRs with complete TCR alpha and beta chain information and inferred their corresponding HLA restriction. Moreover, by clustering TCRs based on sequence similarity, we were able to (1) identify several prominent alpha/beta TCR motifs, (2) predict their antigen specificity, and (3) validate the prediction experimentally using transgenic T cell lines, demonstrating the utility of the reverse epitope discovery approach.

Results

In this report, we jointly analyzed multiple datasets across three study groupings. The first grouping focused on two published single-cell datasets of SARS-CoV-2-reactive CD4+ T cells identified based on CD154+ up-regulation after peptide pool stimulation, with associated gene expression and TCR information from a total of 59 individuals (49 COVID-19 positive; 10 healthy unexposed unvaccinated controls)., The second grouping utilized the largest published bulk TCRbeta datasets currently available, which together comprise 786 healthy pre-pandemic samples and 1,414 COVID-19 patients and include a TCR sub-dataset (MIRA class II dataset) with known specificity for certain SARS-CoV-2 peptide pools. Lastly, our third group, which was used to validate our findings, included a published bulk TCR dataset obtained from SARS-CoV-2-unexposed participants sampled before and after immunization with the AstraZeneca ChAdOx1 SARS-CoV-2 vaccine. In order to find public CD4+ T cell responses to SARS-CoV-2 infection, we first merged two publicly available single-cell datasets of CD4+ SARS-CoV-2-reactive T cells., Both datasets were obtained using the same antigen-reactive T cell enrichment strategy (ARTE assay). Of note, in the study from Bacher et al., the peptide pools span the spike, membrane, and nucleocapsid proteins, while peptide pools used in Meckiff et al. only include peptides from the spike (without N-terminal domain) and membrane proteins. The combined dataset contained 125,258 cells that passed quality control steps, which resulted in 13 functional clusters after unsupervised analysis (Figure 1A). Cluster phenotypes were defined using cell-population markers used in the original publications., In particular, we found clusters corresponding to T follicular helper (Tfh) cells (clusters 1 and 2), type 1 T helper (Th1) cells (cluster 3), transitional Tfh and T central memory (Tcm) cells (cluster 4), fully differentiated Tcm cells (cluster 5), Th17 phenotypes (clusters 6 and 7), effector memory T (Tem) cells (cluster 8), type I interferon (IFN)-signature T cells (clusters 9 and 10), cytotoxic T cells (clusters 11 and 12), and cycling T cells (cluster 13) (Figures 1A and 1B; Table S1). Using this extended dataset, we also confirmed findings reported in the study from Meckiff et al. of significant Tfh enrichment (clusters 1 and 2) in COVID-19 patients compared with unexposed healthy controls (Figures 1C and 1D). Interestingly, the abundance of the two main Tfh subsets was significantly different in hospitalized versus non-hospitalized COVID-19 patients (Figure 1D). In particular, cluster 2 was significantly enriched in severe disease (p = 0.004; after Benjamini-Hochberg multiple testing correction) and expressed higher cytotoxic markers, such as CCL3, CCL4, CCL5, XCL1, XCL2, GZMB, and GNLY. In contrast, cluster 1, expressing high levels of IL-2 and CD69 and genes of the TNF family, was more prevalent in patients with mild disease, as previously shown in Meckiff et al., although not reaching statistical significance in the merged dataset (p = 0.4 after Benjamini-Hochberg multiple testing correction) (Figure 1D; Table S1).

Figure 1

Merged analysis of single-cell CD4+ SARS-CoV-2-reactive T cell public datasets

(A) Uniform Manifold Approximation and Projection (UMAP) of single cells from merged datasets containing SARS-CoV-2-antigen-enriched CD4 T cells. Colors indicate clusters of cells with distinct gene expression profiles.

(B) Differentially expressed genes in each gene expression (GEX) cluster.

(C) Distribution of cells between GEX clusters is plotted for each donor; clusters of healthy donors do not contain Tfh cells (populations 1 and 2).

(D) Boxplots depicting the fraction of cells among functional clusters for each participant (Mann-Whitney U test; Bonferroni multiple comparison correction, ∗p < 0.05, ∗∗p < 0.005, ∗∗∗p < 0.0005).

Merged analysis of single-cell CD4+ SARS-CoV-2-reactive T cell public datasets (A) Uniform Manifold Approximation and Projection (UMAP) of single cells from merged datasets containing SARS-CoV-2-antigen-enriched CD4 T cells. Colors indicate clusters of cells with distinct gene expression profiles. (B) Differentially expressed genes in each gene expression (GEX) cluster. (C) Distribution of cells between GEX clusters is plotted for each donor; clusters of healthy donors do not contain Tfh cells (populations 1 and 2). (D) Boxplots depicting the fraction of cells among functional clusters for each participant (Mann-Whitney U test; Bonferroni multiple comparison correction, ∗p < 0.05, ∗∗p < 0.005, ∗∗∗p < 0.0005). In order to select TCRs corresponding to the most public CD4+ T cell responses, we next searched TCRbeta sequences from the combined TCR alpha/beta single-cell dataset in the TCRbeta repertoires from a large cohort of COVID-19 patients, as well as from pre-pandemic COVID19-naive controls. We identified 1,248 unique alpha/beta TCRs shared among individuals that are strongly enriched in COVID-19 patients as compared with controls (COVID-enriched TCRs) (p < 0.05; after Benjamini-Hochberg correction; Fisher’s exact test), which are reported in Table S2. We also identified 594 alpha/beta TCRs, which were significantly decreased in COVID-19 patients as compared with the healthy pre-pandemic controls (COVID-depleted TCRs), reported in Table S3 (Figure 2A). Notably, when mapping COVID-enriched TCRs to the single-cell RNA sequencing (RNA-seq) data, these TCRs significantly accumulated in Tfh-containing clusters (clusters 1, 2, and 4), while COVID-depleted TCRs, on the other hand, accumulated in Tem subpopulations (clusters 6–8) (Figure 2B). The overall effect size (log2-fold enrichment) for COVID-depleted TCRs was much smaller than for COVID-enriched TCRs (Figure 2A). Moreover, COVID-depleted TCRs were present in a large fraction of donors from both the control and the COVID-19 cohorts (Figure 2C); in particular, 465 out of 594 were simultaneously found in >100 controls and in >100 COVID-19 patients. We hypothesize these COVID-depleted clonotypes are a consequence of COVID-19-associated lymphopenia. In fact, the number of unique T cell clones in a subset of the analyzed COVID-19 patients was lower in comparison to healthy controls. This could lead to small yet significant underrepresentation of highly public clonotypes in the COVID-19 cohort. We therefore focused on the COVID-enriched TCR clonotypes for further analysis, as the occurrence pattern and phenotype of this group is consistent with expansion of T cell clones specific for SARS-CoV-2 antigens.

Figure 2

Reverse epitope discovery of SARS-Cov-2-reactive public CD4+ T cell clonotypes

(A) Volcano plot shows enrichment of TCRbeta chains from merged single-cell TCR sequencing (scTCR-seq) datasets in a large (n = 1,414) collection of bulk TCRbeta repertoires from COVID-19 patients (purple) in comparison to the healthy donor cohort from Emerson et al. (n = 786) (x axis) versus p value (y axis). ns, not significant.

(B) Barplot showing the distribution of COVID-enriched (purple) and COVID-depleted (green) TCR clonotypes in GEX clusters. Fisher’s exact test was used for the comparison, with Bonferroni multiple comparison correction. ∗p < 0.05, ∗∗p < 0.005, ∗∗∗p < 0.0005, ns, not significant.

(C) The boxplots show the fraction of donors from healthy and COVID-19 cohorts sharing significantly COVID-depleted (green) and COVID-enriched (purple) clonotypes.

(D) A similarity network of COVID-associated public TCR clonotypes. Each vertex represents a TCR alpha/beta clonotype, and edges connect vertices with <120 TCRdist units. Colors show predicted specificity to SARS-CoV-2 peptide pools from the MIRA class II dataset. Bottom: TCRdist logos for the most prominent clonotype clusters with predicted peptide specificity and HLA restriction are shown.

(E) Manhattan plot for association of representative clonotypes from cluster 2 with various HLA types.

(F) A tree map showing the fraction of T cells of the merged single-cell dataset carrying clonotypes from the prominent TCR similarity clusters from (C).

(G) Occurrence of TCRbeta from six large clusters from (C) prior to and following SARS-CoV-2 vaccination with the ChAdOx1 (AstraZeneca) vaccine. Significantly more TCRs from spike specific (clusters 2 and 5) are found after vaccination (one-sided Wilcoxon rank-sum test with Benjamini-Hochberg multiple testing correction).

Reverse epitope discovery of SARS-Cov-2-reactive public CD4+ T cell clonotypes (A) Volcano plot shows enrichment of TCRbeta chains from merged single-cell TCR sequencing (scTCR-seq) datasets in a large (n = 1,414) collection of bulk TCRbeta repertoires from COVID-19 patients (purple) in comparison to the healthy donor cohort from Emerson et al. (n = 786) (x axis) versus p value (y axis). ns, not significant. (B) Barplot showing the distribution of COVID-enriched (purple) and COVID-depleted (green) TCR clonotypes in GEX clusters. Fisher’s exact test was used for the comparison, with Bonferroni multiple comparison correction. ∗p < 0.05, ∗∗p < 0.005, ∗∗∗p < 0.0005, ns, not significant. (C) The boxplots show the fraction of donors from healthy and COVID-19 cohorts sharing significantly COVID-depleted (green) and COVID-enriched (purple) clonotypes. (D) A similarity network of COVID-associated public TCR clonotypes. Each vertex represents a TCR alpha/beta clonotype, and edges connect vertices with <120 TCRdist units. Colors show predicted specificity to SARS-CoV-2 peptide pools from the MIRA class II dataset. Bottom: TCRdist logos for the most prominent clonotype clusters with predicted peptide specificity and HLA restriction are shown. (E) Manhattan plot for association of representative clonotypes from cluster 2 with various HLA types. (F) A tree map showing the fraction of T cells of the merged single-cell dataset carrying clonotypes from the prominent TCR similarity clusters from (C). (G) Occurrence of TCRbeta from six large clusters from (C) prior to and following SARS-CoV-2 vaccination with the ChAdOx1 (AstraZeneca) vaccine. Significantly more TCRs from spike specific (clusters 2 and 5) are found after vaccination (one-sided Wilcoxon rank-sum test with Benjamini-Hochberg multiple testing correction). In order to assess the sequence similarity among TCRs enriched in COVID-19, we used TCRdist, which calculates a TCR sequence distance (similarity) measure optimized for clustering highly similar TCRs often with the same specificity. This analysis showed the presence of several prominent TCR clusters (Figure 2D). Interestingly, cluster 3 is largely defined by a conserved beta chain motif, allowing for diverse alpha chains, while in cluster 2, there is an almost invariant alpha chain paired with a set of very diverse TCRbeta chains. In a few other large clusters, both TCR chains show strongly conserved amino acid motifs (Figure 2D). These differences could be potentially explained by the variable number of contacts of TCR alpha/beta chains with the antigenic peptide and MHC. Thus, alpha-driven, beta-driven, and alpha/beta-driven motifs are interesting targets for solving TCR-peptide MHC (pMHC) ternary structures. As a TCR only binds its cognate epitope presented in the context of a specific HLA molecule, an individual’s HLA background can strongly influence the composition of the TCR repertoire. Thus, people with overlapping HLA alleles will have more similar TCR repertoires than people with different HLA backgrounds. This feature of the TCR repertoire was successfully exploited in an elegant study by De-Witt et al., where the authors were able to identify a set of TCR sequences associated with certain HLA alleles. Using this set of HLA-associated TCRs, we inferred the HLA types of the COVID-19 patients within the bulk TCR dataset, in which conventional HLA typing information was not available. Therefore, to derive associations between TCR clonotypes and HLA, we first determined whether participants with specific COVID-19-enriched TCR clonotypes carried a common HLA allele when compared with TCR-negative patients. Using this approach, we were then able to predict a potential HLA association of our COVID-enriched TCR set using Fisher’s exact test to evaluate enrichment within different HLA contexts. See Figure 2E for the representative Manhattan plot and Table S2 for predicted HLA restrictions. To predict the potential antigen reactivity of the TCR clusters, we next cross-referenced the COVID-enriched TCRbeta sequences with the MIRA MHC class II dataset. This dataset contains TCRbeta sequences of CD4+ T cells with known specificity to 1 of 56 peptide pools containing one to six overlapping 19-mer peptides spanning the membrane (M), nucleocapsid (N), and spike (S) SARS-CoV-2 proteins. Based on TCR similarity, 428 of 1,242 (34%) COVID-enriched TCRs were successfully mapped to 22 of 56 peptide pools from the MIRA dataset, with most matches to the M149–191, N46–96, and S743–801 protein pools. Most TCRs belonging to the same TCRdist cluster were assigned to the same peptide pool. Interestingly, the biggest TCRdist cluster (cluster 1) included mostly TCRs mapping to the membrane protein pool M149–191; however, a few TCRs in this cluster were assigned to multiple pools and proteins of the MIRA database, which we hypothesize to be a minor confounding effect of the antigen-reactivity deconvolution approach of the MIRA assay. To further narrow down the specificity prediction to the epitope level, we used NetMHCIIpan4.0 to identify potential binding peptides within SARS-CoV-2 S, M, and N regions covered by the predicted MIRA peptide pools and restricted by HLA alleles determined for each cluster. Potential binders were identified within each region. Large TCR similarity clusters—those containing high numbers of unique clonotypes found in many different individuals—are likely to recognize immunodominant epitopes. Indeed, the three TCR similarity clusters with the greatest number of TCR clones also correspond to the largest magnitude of response when accounting for the clone size (Figure 2F). To further validate the set of COVID-enriched CD4+ clonotypes using an independent dataset, we used a large collection of TCRbeta repertoires from healthy unexposed individuals before and after immunization with ChAdOx1, a replication-deficient simian adenovirus-vectored vaccine encoding the SARS-CoV-2 S protein produced by AstraZeneca. For each individual, we calculated the fraction of unique TCRbeta clonotypes identified in our largest antigen-specific TCR clusters out of total clonotypes sampled at either the pre-vaccination (day 0) or post-vaccination (day 28) time point (Figure 2D). As expected, only TCRs reactive to S protein antigens (clusters 2 and 5) were significantly enriched after vaccination, while the frequency of TCR clusters reactive to the M (1 and 6) or N proteins (3 and 4) remained unchanged (Figure 2G). The same significant enrichment was observed when analyzing all TCR clusters predicted to be reactive for the S protein (Figure S1A). Interestingly, at least one of the prominent orphan TCR clusters (defined here as TCR clusters without a match in the MIRA dataset and thus unknown antigen reactivity) also showed significant enrichment after vaccination, suggesting TCR reactivity to the S protein (Figure S1B). This result serves as an independent validation that our reverse epitope discovery approach correctly predicts antigen reactivity, at least at the protein level, and at the same time demonstrates how the TCR clusters we resolved may be used to identify SARS-CoV-2 epitope-specific TCRs. Excitingly, as we correctly identified epitopes within the context of both vaccination and natural infection, this approach may have the potential to distinguish between these two conditions and, further, to define the relative contribution of external and internal proteins to the overall individual immune response. Next, we sought to experimentally validate our predicted pMHC epitopes using assays to probe pMHC and TCR engagement and evaluate functional activation. To do this, we transduced TCR-null Jurkat cells with constructs encoding TCRs with representative TCR sequences from five of the six largest TCR clusters (Figure 2D). These individual Jurkat-TCR cell lines, now expressing the encoded TCRs at the cell surface, were co-cultured with peripheral blood mononuclear cells (PBMCs) from healthy donors carrying the predicted HLA restriction necessary to bind and present the peptide epitopes. The co-cultured cells were then pulsed with overlapping 17-mer peptides spanning the predicted MIRA peptide pools and evaluated for TCR activation using the endogenous NFAT-GFP reporter in the transgenic Jurkat cells (Figure 3A). All generated T cell lines reacted to one or two specific peptides (Figure 3B), confirming our specificity prediction and allowing us to precisely determine the epitope location within a MIRA pool. Importantly, for four of five TCRs, the experimentally determined reactive peptide region overlaps with the peptide region identified to be a binder by NetMHC within the context of a specific MIRA pool and HLA allele. Moreover, the predicted antigenic peptide and HLA restriction for TCRdist cluster 2 and cluster 5 also exactly match the experimental results from two independent studies., However, for cluster 6, the predicted immunogenic region did not match the experimental results, likely due to a misassigned HLA restriction. To test this, we compared NetMHC binding estimates for the reactive peptides across all HLA class II alleles present on the donor PBMCs (Figure 3C). While, for four out of five TCRs, the highest binding affinity towards the experimentally defined cognate peptide was in the context of the predicted HLAs, for cluster 6, the HLA allele predicted to have the highest binding to the identified reactive peptides was DRB3∗02:02, rather than the predicted allele, DQB1∗03:01/DQA1∗05:05. Importantly, DRB3 allele information is absent in many standard HLA-typing datasets and in the reference dataset utilized by us and thus could not be predicted by our approach.

Figure 3

Results of TCR specificity validation experiment

(A) Gating strategy. Jurkat activation is tracked by GFP expression under NFAT control.

(B) Peptides triggering the response for each analyzed cell line are shown on the corresponding regions of SARS-CoV-2 M, N, and S proteins. The height of the bars indicates the percentage of antigen-specific response of the NFAT-GFP TCR transgenic Jurkat cell lines in co-culture with PBMCs from healthy donors pulsed with overlapping 17-mer peptides covering the predicted antigenic region. Dashed lines show the background activation level of the corresponding transgenic Jurkat cell line in an unstimulated sample. The previously computationally predicted epitope and HLA are indicated above each plot in blue or red lines for weak and strong predicted HLA binders, respectively.

(C) HLA restriction prediction by NetMHC: the identified immunogenic peptides are computationally tested for HLA binding against the HLA alleles present on the donor PBMCs used in the experiment; colors show peptides overlapping with weak (blue) and strong (red) HLA-binding cores.

Results of TCR specificity validation experiment (A) Gating strategy. Jurkat activation is tracked by GFP expression under NFAT control. (B) Peptides triggering the response for each analyzed cell line are shown on the corresponding regions of SARS-CoV-2 M, N, and S proteins. The height of the bars indicates the percentage of antigen-specific response of the NFAT-GFP TCR transgenic Jurkat cell lines in co-culture with PBMCs from healthy donors pulsed with overlapping 17-mer peptides covering the predicted antigenic region. Dashed lines show the background activation level of the corresponding transgenic Jurkat cell line in an unstimulated sample. The previously computationally predicted epitope and HLA are indicated above each plot in blue or red lines for weak and strong predicted HLA binders, respectively. (C) HLA restriction prediction by NetMHC: the identified immunogenic peptides are computationally tested for HLA binding against the HLA alleles present on the donor PBMCs used in the experiment; colors show peptides overlapping with weak (blue) and strong (red) HLA-binding cores. Together, our data demonstrate the utility of using T cell repertoire and individual specificity to predict antigen binding down to single epitope resolution. Although the quality of these predictions depends on the limitations of the source datasets, it is clear that this method is a viable alternative to peptide-focused epitope discovery approaches and will only improve as more datasets are published. Future expansion of public resources, including TCR repertoire sequencing data, will further improve TCR specificity prediction accuracy and may also allow the identification of certain TCR repertoire features correlated with protection from SARS-CoV-2 infection and/or severe disease, such as a high frequency of SARS-CoV-2-specific clones or sharing of certain public clonotypes recognizing immunodominant epitopes.

Discussion

Here, we describe an approach to identify public immunodominant CD4+ T cell responses based on TCR amino acid sequence similarity. Combined with other datasets and methods, this approach may help to resolve new epitopes across different immune response contexts. As a proof of concept, we identified 1,248 paired TCR clonotypes potentially specific to highly immunogenic epitopes from SARS-CoV-2. Many of these were so-called orphan TCRs without defined epitopes. However, using an approach we term reverse epitope discovery, we successfully identified and inferred antigen specificity for 428 TCRs with the aid of the MIRA dataset. We also inferred possible HLA restrictions for most of these TCRs (88%). TCR-HLA pairings were further validated based on NetMHC binding predictions and functional experiments in this study and by others., The resulting set of highly characterized public TCRs reactive to SARS-CoV-2 covers more than 76% of individuals from Snyder et al., indicating that at least 20 unique COVID-enriched TCR sequences per individual were found using this approach. Interestingly, two out of six major responses were restricted by HLA alleles with low variability among the human population (DPB1-04:01/02 and DRB3-02:02), thus providing a means to further investigate immunodominant responses in genetically diverse individuals. Furthermore, the high publicity of the characterized clonotypes makes them promising candidates for further studies on CD4+ T cell immune responses mounted against SARS-CoV-2 as well as for immunotherapeutic applications aimed at utilizing highly specific, immunodominant T cell responses in the context of precision and personalized medicine.45, 46, 47

Limitations of the study

The described method is strongly focused on public T cell responses, which provide the necessary power for a robust statistical analysis and, further, hold the highest potential for additional population-wide applications. However, in many cases, immunodominant responses within a particular individual may be driven by private clonotypes or clonotypes without an identifiable motif cluster, which would be missed by the current approach or would not appear among the most interesting hits. Applying different sequence similarity measures and integrating more datasets will likely expand the number of identified T cell responses with this approach. The major limitation for the TCR-repertoire-based HLA typing algorithm utilized here is the co-inheritance of certain HLA alleles. Since the algorithm is based on the co-occurrence of a TCR sequence with a particular HLA allele, linked HLA alleles from a haplotype can be difficult to distinguish. However, even narrowing down the possible HLA restriction to several HLA alleles can significantly facilitate TCR specificity determination. HLA typing based on TCR repertoire data is also challenging for rare, non-classical, or low-diversity alleles (e.g., DRB3/4/5), as no known TCRs have been associated with these loci to date. More datasets, including deeply sequenced TCR repertoires from HLA-typed individuals can address this limitation in the future. While the detection of TCR clusters relies on bulk and/or single-cell TCR sequencing, peptide target prediction requires more sophisticated datasets containing TCR antigen-specificity information, such as VDJdb and the MIRA dataset. The incompletely overlapping peptide pools utilized for our reference single-cell datasets can limit the reverse epitope discovery approach. For example, cluster 2 from Figure 2 is formed mostly by clonotypes from Bacher et al. and absent in Meckiff et al. Indeed, the peptide pools used in the latter did not cover the N-terminal domain of the S protein and thus did not include the target peptide of the TCRs in cluster 2. Although this is a limitation of these datasets, it also indirectly confirms the predicted antigen specificity of cluster 2. The expansion of these resources is of the utmost importance for target identification of orphan TCRs. The strong general interest in the COVID-19 pandemic led to an unprecedented production of high amounts of publicly available TCR repertoire data, which still far exceed data available for other antigens or diseases. Despite these limitations, the reverse epitope discovery approach described here has already proven valuable for identifying both cross-reactive and immunodominant responses to SARS-CoV-2, and it holds potential for application in other disease contexts.

STAR★Methods

Key resources table

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Paul G. Thomas (paul.thomas@stjude.org).

Materials availability

All transgenic T cell lines generated in this study are available from the lead contact with a completed Materials Transfer Agreement. No other unique reagents were generated in this study.

Experimental model and subject details

Cell lines

TCR-null Jurkat 76.7 cell lines with endogenous NFAT-GFP reporter were generously provided by Wouter Scheper and were maintained in RPMI (Gibco) containing 10% FBS and 1% penicillin/streptomycin.

Utilized public data

Single-cell data of SARS-CoV-2 reactive CD154+ T cells were obtained from refs. and . Bulk TCR data of healthy individuals and COVID-19 patients were obtained from ref. and ref. , respectively.

Method details

Transgenic T cell lines generation

For the six most prominent TCR similarity clusters from Figure 2, we selected a representative TCR sequence (the sequence with the most number of neighbors) for cloning into a TCR-null Jurkat 76.7 cell line (generously provided by Wouter Scheper), which express an NFAT-GFP reporter. TCRalpha and TCRbeta sequences were altered to use murine constant regions (murine TRAC∗01 and murine TRBC2∗01) to facilitate surface expression of the TCR. Six gene fragments were synthesized by Genscript to encode TCRalpha, TCRbeta chain, and mCherry fluorescent protein, linked together by 2A sites and put into the pLVX-EF1α-IRES-Puro lentiviral backbone (Clontech). To generate transducing particles, HEK 293T packaging cells (ATCC CRL-3216) were transfected with an individual TCR-encoding lentiviral vector, psPAX2 packaging plasmid (Addgene plasmid #12260), and pMD2.G envelope plasmid (Addgene plasmid #12259). We collected transducing particle-containing media 24- and 48-h post-transfection and concentrated the lentivirus with Lenti-X Concentrator (Clontech) according to the manufacturer’s protocol. Jurkat 76.7 cells were transduced, then antibiotic selected for 1 week using 1 μg/mL puromycin in RPMI (Gibco) containing 10% FBS and 1% penicillin/streptomycin. Five out of six Jurkat cell lines were successfully transduced as confirmed by expression of mCherry.

Peptide stimulation assay

Jurkat cell line expressing select TCRs (105) were co-cultured with PBMCs from pre-pandemic healthy donors (2 × 105) in cRPMI, pulsed with 1 μM of peptide, and co-stimulated with 1 μg/mL each of anti-human CD28 and CD49d (BD Biosciences). Negative unstimulated (no peptide; CD28, CD49d) and positive (CD28, CD49d, 1X Cell Stimulation Cocktail, PMA/ionomycin; eBioscience) controls were included in each assay. Cells were incubated for 18 h (37°C, 5% CO2). After incubation cells were washed twice with FACS buffer (PBS, 2% FBS, 1 mM EDTA), and analyzed by flow cytometry on a custom-configured BD Fortessa using FACSDiva software (Becton Dickinson). Flow cytometry data were analyzed using FlowJo software (BD Biosciences). Responsiveness to peptide stimulation was determined by measuring the frequency of cells positive for endogenous NFAT-GFP expression (Table S4).

Quantification and statistical analysis

Single-cell datasets integration and filtering

Preprocessing of scRNAseq data was performed with 10x Genomics’ Cell Ranger software v3.1.0 using the human genome reference GRCh38 v3.0.0 for the mapping. The resulting raw feature-barcode matrix files were analyzed with the Seurat v.3.2.0 R package (Butler et al., 2018). All genes with detected expression in less than 0.1% of the cells were excluded. TCR genes were not considered for further analyses to avoid functional clustering of cells based on TCR information. For cell quality control only cells harboring between 400 and 3000 RNA features and less than 5% mitochondrial RNA were selected for further processing. TCR information was integrated into the Seurat object metadata after filtering cells containing more than 2 TCRalpha or 2 TCRbeta chains. After merging of Seurat metadata and TCR information, cells without TCR information were excluded from further analysis. Afterwards, data were log-normalized and scaled based on all genes. After performing a PCA dimensionality reduction (40 dimensions) with the RunPCA function, expression values were corrected for batch effects caused by different sources of the data, sample preparation batches, and sequencing run batches using the R package Harmony v1.0. In the final steps, the Uniform Mani-fold Approximation and Projection (UMAP) dimensional reduction was performed with the RunUMAP function using 40 dimensions, a shared nearest neighbor graph was created with the FindNeighbors method, and the clusters identification was performed with a resolution of 0.4 using the FindClusters function. 13 clusters were identified. Cluster marker genes were determined using FindMarkers with the MAST method and are available in Table S1.

COVID-19 TCR association using bulk TCR public datasets

To identify public TCRbeta clonotypes we used two large datasets, one of TCRbeta repertoires from COVID-19 patients (n = 1414) and one of healthy subjects sampled pre-pandemic (n = 786). For each TCRbeta from the combined single-cell TCRseq dataset we calculated the number of unique donors from the both bulk TCRbeta repertoire cohorts sharing the clonotype. TCRbeta clonotypes were considered shared if both CDR3 amino acid sequences and V segment families matched. Next, we use a two-sided Fisher’s exact test with Benjamini-Hochberg multiple-comparisons correction to identify overrepresented (i.e. found in more donors) TCRbeta clonotypes in either COVID-19 or healthy donors (adjusted p-value<0.05 is used as significance threshold).

Identification of motifs in TCR amino acid sequences using TCRdist

We used the TCRdist implementation in the CoNGA python package to calculate pairwise TCRdist between unique alpha/betaTCR sequences and to plot sequence logos for TCR motifs. We define TCR motifs as a cluster on the TCR similarity network, where each node is a unique alpha/betaTCR clonotype, and edges connect nodes if the distance between them is less than 120 TCRdist units. To filter TCR chimeras and other artifacts occuring during 10x Genomics sequencing leading to rare spurious connections between TCR motif clusters, we deleted the top 1% of nodes and vertices by network betweenness centrality values. The igraph R package was used to manipulate similarity networks, gephi was used for network layout and visualization.

HLA specificity imputation from TCR data

For each donor from refs. 3, 4 we use HLA-types inferred as previously described in ref. . In brief, for each TCRbeta significantly enriched in the COVID-19 cohort we performed a one-sided Fisher’s exact test with Benjamini-Hochberg multiple-comparisons correction to check if a given TCRbeta co-occurs with a certain HLA-allele. To determine an HLA restriction within a TCR similarity cluster we considered the HLA with the most significant association for each individual TCR in the cluster. If the HLA-allele was the best prediction for 25% or more TCRs of the cluster, it was considered associated with that cluster.

Prediction of COVID-enriched TCR specificity

We mapped the TCRbeta chain sequences from the merged aggregated single-cell dataset to peptide pool-specific TCRbeta clonotypes of the MIRA class II dataset (release 002.1) allowing for one amino acid mismatch between CDR3 amino acid sequences. Next we selected six large clusters on the TCRdist similarity network with distinct MIRA peptide pool assignments. The cluster was assigned a certain MIRA pool if at least 20% of TCR sequences within a cluster mapped to a MIRA pool; it was considered “orphan” otherwise. We calculated the consensus MIRA pool and HLA-restriction within each TCR cluster. We then used NetMHCIIpan-4.0 to predict the epitope location within the MIRA pool, for the specific HLA-restriction. We ran NetMHCIIpan-4.0 on the complete SARS-CoV-2 Wuhan-1 S, M, N protein sequences (Genbank acc: MT019529.1) for peptide lengths 9–15, for predicted HLAs (see Figure 2D), all other parameters were set to default. We then checked for presence of strongly (Rank_EL<1%) and weakly (Rank_EL<5%) HLA-binding peptide cores within the predicted MIRA region. To confirm the HLA-restriction of responding peptides in the validation experiments, we used the same method with slight modifications: HLA-binding peptide cores were first predicted for each of the antigen-presenting cell donors’ HLA alleles, after which we checked for overlap of these binding cores within peptides eliciting a response in the cloned TCR-expressing Jurkat cell lines (see Figure 3C).

Statistical analysis

Statistical analyses were performed in R version 4.0.2. Wilcoxon rank-sum test (Mann-Whitney U test) was used to compare the proportion of cells in each Seurat functional cluster between healthy controls and COVID-19 patients as well as between severe and mild COVID-19 cases. Fisher exact test was used to compare the number of COVID-depleted and COVID-enriched clonotypes being part of each functional Seurat cluster. Multiple testing correction was performed using the Benjamini-Hochberg procedure. Not significant (ns); ∗p < 0.5, ∗∗p < 0.01, ∗∗∗p < 0.001.

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Antibodies

Co-stimulatory anti-human CD28 antibody (clone CD28.2)	BD Biosciences	cat#: 555725 RRID: AB_396068
Co-stimulatory anti-human CD49d antibody (clone 9F10)	BD Biosciences	cat#: 555501 RRID: AB_2130052

Chemicals, peptides, and recombinant proteins

Lenti-X Concentrator	Clontech	cat#: 631232
1x Cell Stimulation cocktail	eBioscience	cat#: 00-4970-93
SARS-CoV-2 peptides (>95% purity)	This paper	Genbank acc: MT019529.1

Deposited data

Single-cell RNA-seq of SARS-CoV-2-reactive CD4⁺ T cells	Bacher et al.¹	SRA: SRP293741
Single-cell RNA-seq of SARS-CoV-2-reactive CD4⁺ T cells	Meckiff et al.²	SRA: SRP267404
Bulk TCR repertoire from SARS-CoV-2-infected individuals	Snyder et al.³	https://clients.adaptivebiotech.com/pub/covid-2020
Bulk TCR repertoire from healthy individuals	Emerson et al.⁴	https://clients.adaptivebiotech.com/pub/emerson-2017-natgen
Bulk TCR repertoire from SARS-CoV-2 unexposed participants sampled before and after immunization with the AstraZeneca ChAdOx1 SARS-CoV-2 vaccine	Swanson et al.⁵	https://clients.adaptivebiotech.com/pub/emerson-2017-natgen
MIRA class II bulk TCR dataset with known specificity for certain SARS-CoV-2 peptide pools (release 002.1)	Nolan et al.²²	https://clients.adaptivebiotech.com/pub/covid-2020
Original code for data processing	This paper	https://github.com/pogorely/reverse_epitope_discovery

Experimental models: Cell lines

293T	ATCC	cat#:CRL-3216
Jurkat 76.7 (variant of TCR-null Jurkat 76.7 cells that expresses human CD8 and an NFAT-GFP reporter)	gift from Wouter Scheper

Recombinant DNA

pLVX-EF1α-IRES-Puro	Clontech	cat#: 631253
TCR_cluster1-mCherry	This paper
TCR_cluster2-mCherry	This paper
TCR_cluster3-mCherry	This paper
TCR_cluster5-mCherry	This paper
TCR_cluster6-mCherry	This paper
psPAX2 packaging plasmid	gift from Didier Trono	Addgene plasmid #12260 RRID: Addgene_12260
pMD2.G envelope plasmid	gift from Didier Trono	Addgene plasmid #12259 RRID: Addgene_12259

Software and algorithms

FlowJo v10.7.1	BD Biosciences	https://www.flowjo.com/solutions/flowjo/downloads
Cell Ranger v3.1.0	10x Genomics	https://www.10xgenomics.com
Seurat v.3.2.0	Butler et al.⁵⁰	https://www.satijalab.org/seurat
Harmony v1.0	Korsunsky et al.⁵¹	https://portals.broadinstitute.org/harmony/articles/quickstart.html
R v. 4.0.2		https://www.r-project.org
Biorender		https://biorender.com
MiGEC v. 1.2.7	Shugay et al.⁵²	https://github.com/mikessh/migec
MiXCR v. 3.0.3	Bolotin et al.⁵³	https://github.com/milaboratory/mixcr
CoNGA python package	Schattgen et al.⁵⁴	https://github.com/phbradley/conga
data.table R package v. 1.14.0		https://github.com/Rdatatable/data.table/wiki
stringdist R package v. 0.9.6.3		https://github.com/markvanderloo/stringdist
igraph R package v. 1.2.6	Csardi and Nepusz⁵⁵	https://igraph.org/r/
gephi v. 0.9.2	Jacomy et al.⁵⁶	https://gephi.org
ggplot2 R package v. 3.3.3		https://cran.r-project.org/web/packages/ggplot2/index.html

55 in total

1. Identifying specificity groups in the T cell receptor repertoire.

Authors: Jacob Glanville; Huang Huang; Allison Nau; Olivia Hatton; Lisa E Wagar; Florian Rubelt; Xuhuai Ji; Arnold Han; Sheri M Krams; Christina Pettus; Nikhil Haas; Cecilia S Lindestam Arlehamn; Alessandro Sette; Scott D Boyd; Thomas J Scriba; Olivia M Martinez; Mark M Davis
Journal: Nature Date: 2017-06-21 Impact factor: 49.962

2. Phenotypic analysis of antigen-specific T lymphocytes.

Authors: J D Altman; P A Moss; P J Goulder; D H Barouch; M G McHeyzer-Williams; J I Bell; A J McMichael; M M Davis
Journal: Science Date: 1996-10-04 Impact factor: 47.728

Review 3. Viral-specific T-cell transfer from HSCT donor for the treatment of viral infections or diseases after HSCT.

Authors: C Qian; Y Wang; L Reppel; M D'aveni; A Campidelli; V Decot; D Bensoussan
Journal: Bone Marrow Transplant Date: 2017-10-23 Impact factor: 5.483

Review 4. Virus-specific T-cell banks for 'off the shelf' adoptive therapy of refractory infections.

Authors: R J O'Reilly; S Prockop; A N Hasan; G Koehne; E Doubrovina
Journal: Bone Marrow Transplant Date: 2016-04-04 Impact factor: 5.483

5. Erratum: Author Correction: Immune cell profiling of COVID-19 patients in the recovery stage by single-cell sequencing.

Authors: Wen Wen; Wenru Su; Hao Tang; Wenqing Le; Xiaopeng Zhang; Yingfeng Zheng; Xiuxing Liu; Lihui Xie; Jianmin Li; Jinguo Ye; Liwei Dong; Xiuliang Cui; Yushan Miao; Depeng Wang; Jiantao Dong; Chuanle Xiao; Wei Chen; Hongyang Wang
Journal: Cell Discov Date: 2020-06-20 Impact factor: 10.849

6. Immune fingerprinting through repertoire similarity.

Authors: Thomas Dupic; Meriem Bensouda Koraichi; Anastasia A Minervina; Mikhail V Pogorelyy; Thierry Mora; Aleksandra M Walczak
Journal: PLoS Genet Date: 2021-01-04 Impact factor: 5.917

7. Pre-existing polymerase-specific T cells expand in abortive seronegative SARS-CoV-2.

Authors: Mariana O Diniz; Nathalie M Schmidt; Oliver E Amin; Aneesh Chandran; Emily Shaw; Leo Swadling; Corinna Pade; Joseph M Gibbons; Nina Le Bert; Anthony T Tan; Anna Jeffery-Smith; Cedric C S Tan; Christine Y L Tham; Stephanie Kucykowicz; Gloryanne Aidoo-Micah; Joshua Rosenheim; Jessica Davies; Marina Johnson; Melanie P Jensen; George Joy; Laura E McCoy; Ana M Valdes; Benjamin M Chain; David Goldblatt; Daniel M Altmann; Rosemary J Boyton; Charlotte Manisty; Thomas A Treibel; James C Moon; Lucy van Dorp; Francois Balloux; Áine McKnight; Mahdad Noursadeghi; Antonio Bertoletti; Mala K Maini
Journal: Nature Date: 2021-11-10 Impact factor: 69.504

8. SARS-CoV-2 antigen exposure history shapes phenotypes and specificity of memory CD8⁺ T cells.

Authors: Anastasia A Minervina; Mikhail V Pogorelyy; Allison M Kirk; Jeremy Chase Crawford; E Kaitlynn Allen; Ching-Heng Chou; Robert C Mettelman; Kim J Allison; Chun-Yang Lin; David C Brice; Xun Zhu; Kasi Vegesana; Gang Wu; Sanchit Trivedi; Pratibha Kottapalli; Daniel Darnell; Suzanne McNeely; Scott R Olsen; Stacey Schultz-Cherry; Jeremie H Estepp; Maureen A McGargill; Joshua Wolf; Paul G Thomas
Journal: Nat Immunol Date: 2022-04-05 Impact factor: 31.250

9. Multiplex Identification of Antigen-Specific T Cell Receptors Using a Combination of Immune Assays and Immune Receptor Sequencing.

Authors: Mark Klinger; Francois Pepin; Jen Wilkins; Thomas Asbury; Tobias Wittkop; Jianbiao Zheng; Martin Moorhead; Malek Faham
Journal: PLoS One Date: 2015-10-28 Impact factor: 3.240

10. Selective and cross-reactive SARS-CoV-2 T cell epitopes in unexposed humans.

Authors: Alessandro Sette; Daniela Weiskopf; Jose Mateus; Alba Grifoni; Alison Tarke; John Sidney; Sydney I Ramirez; Jennifer M Dan; Zoe C Burger; Stephen A Rawlings; Davey M Smith; Elizabeth Phillips; Simon Mallal; Marshall Lammers; Paul Rubiro; Lorenzo Quiambao; Aaron Sutherland; Esther Dawen Yu; Ricardo da Silva Antunes; Jason Greenbaum; April Frazier; Alena J Markmann; Lakshmanane Premkumar; Aravinda de Silva; Bjoern Peters; Shane Crotty
Journal: Science Date: 2020-08-04 Impact factor: 47.728