| Literature DB >> 33193332 |
Chloe H Lee1,2, Mariolina Salio1, Giorgio Napolitani1, Graham Ogg1, Alison Simmons1,3, Hashem Koohy1,2.
Abstract
Adaptive immune recognition is mediated by specific interactions between heterodimeric T cell receptors (TCRs) and their cognate peptide-MHC (pMHC) ligands, and the methods to accurately predict TCR:pMHC interaction would have profound clinical, therapeutic and pharmaceutical applications. Herein, we review recent developments in predicting cross-reactivity and antigen specificity of TCR recognition. We discuss current experimental and computational approaches to investigate cross-reactivity and antigen-specificity of TCRs and highlight how integrating kinetic, biophysical and structural features may offer valuable insights in modeling immunogenicity. We further underscore the close inter-relationship of these two interconnected notions and the need to investigate each in the light of the other for a better understanding of T cell responsiveness for the effective clinical applications.Entities:
Keywords: T cell cross reactivity; T cell specificity; adaptive immune system; antigen presentation; antigen specificity; epitope
Mesh:
Substances:
Year: 2020 PMID: 33193332 PMCID: PMC7642207 DOI: 10.3389/fimmu.2020.565096
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
List of algorithms to predict immunogenicity.
| Tung et al. ( | Trained on 9-mer HLA-A2 restricted peptides. From MHCPEP, SYFPEITHI and IEDB, consist of 558 immunogenic, 527 non-immunogenic peptides | Decision tree learning methods to identify informative physicochemical properties from 531 physicochemical properties retrieved from version 9.0 of amino acid index (AAindex) database. Support vector machine with a weighted string kernels for immunogenicity prediction (named POPISK) | Top AAindex contributors: (i) Retention coefficient in HPLC, pH2.1, (ii) Principal property value z2, (iii) Hydrophobicity scale from native proteins, (iv) Normalized composition of membrane proteins, and (v) pK-C. Found positions 4, 6, 8, and 9 critical for 9-mer peptide |
| Calis et al. ( | Trained on 9-mer from MHC-I associated peptides. From IEDB and three immunogenicity studies in mice ( | Per non-anchor residue of the presented peptide, log enrichment score calculated as ratio between the fraction of specific amino acid in immunogenic vs. non-immunogenic data, then score weighted to the importance of that position measured as Kullback-Leibler divergence. The weighted log enrichment scores of all (non-anchor) residues summed as immunogenicity score | Preference for residues with larger or aromatic side chains Positions 4–6 critical for 9-mer peptide |
| Trolle and Nielsen ( | Trained on 9-mer peptides covering 9 HLA alleles. From 295 T cell epitopes from SYFPEITHI and 1,216 T cell epitopes from IEDB, allele-balanced training data created by randomly selecting 50 epitopes from each of 9 HLA alleles except 2 alleles having 14 epitopes each, Total 378 epitopes | Weighted sum of pMHC binding affinity [NetMHCcons ( | Performance gain obtained by summing pMHC binding affinity, pMHC stability predictions and T cell propensity than individual predictions |
| Chowell et al. ( | Trained on 9-mer H-2Db and HLA-A2 restricted peptides (separately for two ANN-Hydro models). From IEDB, 204 immunogenic and 232 non-immunogenic (self-peptides from MHC ligand elution experiment with no known immunogenicity) for H-2Db, and 372 immunogenic and 201 non-immunogenic peptides for HLA-A2 | Hydrophobicity-based artificial neural network (ANN-Hydro) based on numeric sequence of amino acid hydrophobicity | Strong bias toward hydrophobic amino acids at TCR contact residues (P4, P6, P7, and P8 for 9-mers) within immunogenic epitopes. Negative correlation between polarity of amino acids and immunogenicity |
| Łuksza et al. ( | Trained on 2,552 MHC-I immunogenic peptides from IEDB. Neoantigens with mutations generated from non-hydrophobic, wild-type residues at positions 2 and 9 excluded (as prediction of MHC affinities for wild-type peptides with non-hydrophobic anchor residues led to non-informative amplitudes) | Recognition potential of a neoantigen = A × R, where amplitude (A) is relative probability that a neoantigen is presented on MHC-I whereas its wild-type counterpart is not, and R is probability that neoantigen will be recognized by TCR repertoire. R defined by a multistate thermodynamic model, treating sequence similarity as proxy for binding energies | High sequence similarity of a given neoantigen with epitopes in IEDB by gapless alignment with BLOSUM62 amino acid similarity matrix |
| Bjerregaard et al. ( | From 13 publications, analyzed total 1,948 peptide-HLA complexes, of which 53 reported immunogenic | HLA binding prediction by NetMHCpan-4.0. Similarity between each neo- and normal peptide using kernel similarity measure proposed by Shen et al. ( | High predicted binding score (HLA binding strength). Peptide sequence dissimilarity to self (wild-type counterpart of the neopeptide), especially for those with comparable HLA binding |
| Pogorelyy et al. ( | Trained on 9-mer peptides. From ( | Principal component analysis and dimensionality reduction on 10-dimensional vectors of Kidera factor sums for each epitope. Fit multinomial Gaussian model using expectation maximization to estimate probability of being immunogenic | Distinct physicochemical properties in Kidera space |
| Jurtz et al. ( | Trained on 8,920 TCRβ CDR3 sequences and 91 HLA-A2 cognate peptides obtained from IEDB. 379 TCR and 16 peptides from the MIRA assay in ( | Convolutional neural networks (CNN) to predict whether a given TCR is able to recognize a specific peptide, with amino acid sequences of peptide and CDR3 region of TCRβ chain as input. CNNs scans the input and detects pattern to be integrated into network (named NetTCR) | Conserved sequence patterns of peptide-TCR pairs encoded by BLOSUM50 matrix |
| Smith et al. ( | Trained on 8-11mer 141 epitopes from MHC-I H2b and H2d haplotypes | Using amino acid features (tiny, small, aliphatic, aromatic, non-polar, polar, charged, basic and acidic), variables derived by presence/absence of each feature at each absolute and relative position, at site of SNV mutation, at being/middle/end residues, difference of each feature in mutated vs. reference antigen. Most predictive features into gradient boosting algorithm and trained by 10,000-fold cross-validation | Peptide biochemical features: valine at position 1, valine at last position, small amino acids at the last position, basic amino acids of the reference at the mutated position, changes in the mutated position to a small amino acid, lysine at relative site 1, and presence of valine within the first 3 positions |
| Ogishi and Yotsuyanagi ( | Trained on 8–11 mer MHC-I and 11–30 mer MHC-II peptides. From IEDB, LANL HIV and HCV database and TANTIGEN database, 6,957 HLA-I and 16,642 HLA-II immunogenic peptides. 191,326 TCR CDR3β sequences obtained from MiXCR | TCR-peptide contact potential profiling (CPP) by optimal alignment between CDR3β (randomly down-sampled to 10,000 sequences) and peptides and using pairwise contact potential scales from AAindex. Peptide sequence-based estimates of physicochemical properties (= peptide descriptors) using: | Physicochemical and CPP features: features from short (3- and 4-aa) and longest (8- and 11-aa for MHC-I and MHC-II, respectively) fragments, skewness- and kurtosis-derived features and AAindexes, including inverse of modified Miyazawa-Jernigan transfer energy, inverse of quasichemical energy in an average protein environment from interfacial regions of protein-protein complexes, and distance-dependent statistical potential within 10–12 Å |
| Riley et al. ( | Trained on 9-mer HLA-A2 restricted peptides. 155 immunogenic from IEDB, 2,756 HeLa HLA-A2 binding self-peptides and 1,044 HLA-A2 non-binders | A feed-forward neural network with inputs describing structural and structure-based energetic features of 9-aa in peptide sequence and peptide-HLA complex. Structural and energy features are those comprising Talaris 2014 energy function or derived from Table S3 ( | Structural and energic features: van der Waals interaction, hydrophobic solvation, Coulombic potentials, hydrogen bond energies, side chain rotamer energies, and solvent accessible surface areas (SASA) |
Figure 1Features associated with TCR:pMHC interaction. Description of sequence-based, structural, kinetic, and biophysical features previously found to be associated with pMHC recognition by TCR The diagram is 1G4 TCR bound to NY-ESO-1/HLA-A*02:01 (PDB 2BNR) where TCRα, TCRβ, MHC, β2-microglobulin and peptide are colored in orange, red, blue, light blue, and yellow, respectively.
Glossary.
| Accessible surface area | Also known as solvent-accessible surface area (SASA); the surface area of a biomolecule that is accessible to a solvent. Measurement is usually described in units of square Ångstroms |
| Adoptive T cell transfer | A type of immunotherapy in which T cells are given to a patient to improve immune functionality to fight diseases |
| Amino acid index database (AAindex) | A database of amino acid indices and amino acid mutation matrices. An amino acid index is a set of 20 numerical values representing various physicochemical and biochemical properties of amino acids. An amino acid mutation matrix is generally 20 ×20 numerical values representing similarity of amino acids |
| Clonal expansion | A process in which a small number of precursor cells recognize a specific antigen, proliferate into expanded clones, differentiate and acquire various effector and memory phenotypes |
| Combinatorial peptide library | A library typically comprised of millions to billions of random peptides covering possible combinations of amino acids in each position |
| Degeneracy | Ability to recognize diverse ligands |
| Electrostatic potential | The amount of work needed to move a unit of charge against an electric field |
| Featured peptide | A peptide with solvent-exposed, prominent side chains or harmonious bulged confirmations and typically correspond to a diverse repertoire of TCRs |
| Find Individual Motif Occurrence | A motif-based sequence analysis tool that scans a set of sequences for individual matches to each of the motifs provided by the users |
| Flexible docking | A macromolecular docking where the internal geometry of the interacting partners can be changed when a complex is formed |
| Heterologous immunity | An immunity that can develop to one pathogen after a host has had exposure to non-identical pathogens |
| Immunodominant peptide | A peptide having a strong affinity for binding with HLA and for stimulating a T cell response |
| Kidera factor | A set of orthogonal physicochemical properties that reflect 20 amino acids, which include helix/bend preference, side-chain size, extended structure preference, hydrophobicity, double-bend preference, partial specific volume, flat extended preference, occurrence in alpha region, pK-C and surrounding hydrophobicity |
| Molecular mimicry | A phenomena that sequence similarities between foreign and self-peptides are sufficient to trigger cross-activation of autoreactive T cells by pathogen-derived peptides |
| Peptide-MHC display system | A platform with engineered functional peptide-MHC complexes for high-throughput screening of immunogenic peptides against TCRs |
| Polarization | A process to adopt different functionality in response to the signals from their microenvironment |
| Positional specific scoring matrix | An amino acid scoring matrix in a 20 ×20 table such that position indexed with amino acids e.g., position (X, Y), gives the score of alignment or substitution of amino acid X with amino acid Y |
| Private TCR | A TCR unique to an individual |
| Public TCR | A TCR shared among different individuals |
| Rigid docking | A computational modeling of the quaternary structure of complexes formed by two or more interacting biological macromolecules, where the relative orientation of interacting partners was allowed to vary but the internal geometry of each of the partners was held fixed |
| Rosetta terms | A set of 19 terms comprising Rosetta Energy Function 2015 (REF15), a model parametrized from small-molecule and X-ray crystal structure data, used to approximate the energy associated with each biomolecule conformation |
| Tetramer-associated T cell receptor sequencing | A method to link TCR sequences to their cognate antigens in single cells at high throughput manner. Peptide-TCR binding is determined using a library of DNA-barcoded antigen tetramers |
| ZAFFI score | Abbreviation for Zlab affinity enhancement; an algorithm to predict the effect of point mutations on binding affinity of TCRs. Training of energy function was performed using a dataset of systematic point mutations at 10 positions on the ovomucoid turkey inhibitor (OMTKY) molecule in four enzyme-inhibitor complexes. The optimal terms and weights for the function was obtained to fit the energies of OMTKY point mutants and tested using point mutations of T cell receptor. The terms and weights making up the score are: van der Waals attractive (0.24), van der Waals repulsive (0.017), Lazaridis-Karplus solvation (0.24), intra-residue clash (0.073) and atomic contact energy (0.32) |
Figure 2Current workflow for predicting antigen specificity of TCRs. The tetramer-sorted antigen specific CDR3β or TCRβ are clustered by distance measure defined by either global sequence similarity, motif enrichment or sequence co-occurrence pattern. Then, specificity clusters are investigated for their descriptive features, such as enrichment of common V-genes, CDR3 length, clonal expansions, and motif significance, to be considered in making the prediction of antigen specificity. Based on the collection of identified features, previously uncharacterized CDR3βs or TCRβs are predicted for their antigen specificity. The example sequences have been retrieved from (16, 182).
Algorithms to predict antigen specificity of TCR repertoire.
| Thomas et al. ( | CDR3 sequences of CD4+ T cell repertoire before and after immunization | Replace each CDR3 by all possible n-mer peptides, then convert each n-mer peptide into numeric Atchley vectors | K-means clustering of Atchley vectors, count number of Atchley vectors assigned to each cluster, and generate into a feature vector. Classify the feature vector using hierarchical clustering (unsupervised) or support vector machine (supervised) |
| Dash et al. ( | pMHC-facing loop between CDR2 and CDR3 and trimmed CDR3 sequences from 4,635 paired TCRαβ sequences | Similarity-weighted mismatch distance between the potential pMHC-contacting loops of two TCRs, defined by BLOSUM62 (named TCRdist) | Sampling density nearby each TCR estimated by weighted average distance to the nearest-neighbor receptors in repertoire (a small nearest-neighbor distance, NN-distance). Each TCR repertoire clustered using “greedy” fixed-distance-threshold clustering algorithm. At each step, TCR with the largest number of neighbors within the distance threshold chosen as a cluster center and iterated for all TCRs |
| Glanville et al. ( | CDR3 from 5,711 TCRβ sequences | Global similarity by CDR3 hamming distance between two TCRs with same Vβ segment and same-length CDR3. A fold-change enrichment of local convergence motif by observed frequency of the motif over expected frequency in repeat random sampling from naïve distribution | Cluster TCRs sharing either global similarity below Hamming distance threshold (differ <2 amino acids) or share a significant motif (>10-fold enriched and <0.001 probability of occurring than in naïve TCR pool) |
| Cinelli et al. ( | CDR3 from CD4+ TCRβ sequences before and after immunization | CDR3β sequences deconstructed into k-mers, then motifs ranked according to one-dimensional Bayesian classifier score comparing their frequency in repertoires of two immunization classes | Top ranking motifs selected and used to create feature vectors to train a support vector machine for classifying into distinct clusters |
| Priel et al. ( | ~360,000 TCRβ sequences from ( | Levenshtein distance between TCRβ and cluster representative | UClust algorithm ( |
| DeWitt et al. ( | TCRβ sequences from 666 healthy individuals from ( | Co-occurrence of global TCRβ (for genetic background) and HLA-restricted TCRβ (for immune history and receptor specificity) by analysis of covariation and hypergeometric distribution to assess significance | DBSCAN algorithm ( |
| Meysman et al. ( | Two independent datasets of 412 TCRβ from [( | Investigated length-based distance, GapAlign score, profile score, trimer score, dimer score, Lavenshtein distance score, and VJ edit distance | DBSCAN algorithm ( |
| Pogorelyy and Shugay ( | CDR3 from TCRβ sequences from ( | Hamming distance, allowing single substitution | TCR similarity networks by Hamming distance and identify enriched TCR network hubs by testing neighborhood size (degree) enrichment against VDJ rearrangement model using ALICE algorithm ( |
| Thakkar and Bailey-Kellogg ( | CDR3 sequences, CDR3α and CDR3β analyzed separately | Local alignment using Smith-Waterman (SW) algorithm with BLOSUM45 | Hierarchical agglomerative clustering, with CDRdist (a nearest neighbor classifier to predict label of another CDR based on nearby labeled CDRs) as a comparison function. Clusters defined by CDRdist thresholds |
| Zhang et al. ( | 82,000 CDR3 sequences from 9,700 tumor RNA-Seq samples from TCGA | Pairwise alignment score with BLOSUM62, normalized by the length of longer CDR3 sequence | From pairwise score matrix, apply a predefined cut-off value (default 3.5) to filter out low scoring comparisons A depth-first search (DFS) on the matrix to identify all connected CDR3 clusters (named iSMART) |
Figure 3Interplay between unique clusters of pMHCs and TCRs. In an ideal world with an accurate distance measure, pMHCs in the same cluster should share the common specificity toward TCRs and vice versa. Each node denotes pMHC (circle) or TCR (polygon) entities and edge denote the distance with the closest pMHC or TCR, respectively.