| Literature DB >> 32983088 |
Ido Springer1, Hanan Besser1, Nili Tickotsky-Moskovitz2, Shirit Dvorkin1, Yoram Louzoun1.
Abstract
Current sequencing methods allow for detailed samples of T cell receptors (TCR) repertoires. To determine from a repertoire whether its host had been exposed to a target, computational tools that predict TCR-epitope binding are required. Currents tools are based on conserved motifs and are applied to peptides with many known binding TCRs. We employ new Natural Language Processing (NLP) based methods to predict whether any TCR and peptide bind. We combined large-scale TCR-peptide dictionaries with deep learning methods to produce ERGO (pEptide tcR matchinG predictiOn), a highly specific and generic TCR-peptide binding predictor. A set of standard tests are defined for the performance of peptide-TCR binding, including the detection of TCRs binding to a given peptide/antigen, choosing among a set of candidate peptides for a given TCR and determining whether any pair of TCR-peptide bind. ERGO reaches similar results to state of the art methods in these tests even when not trained specifically for each test. The software implementation and data sets are available at https://github.com/louzounlab/ERGO. ERGO is also available through a webserver at: http://tcr.cs.biu.ac.il/.Entities:
Keywords: TCR repertoire analysis; autoencoder (AE); deep learning; epitope specificity; evaluation methods; long short-term memory (LSTM); machine learning
Mesh:
Substances:
Year: 2020 PMID: 32983088 PMCID: PMC7477042 DOI: 10.3389/fimmu.2020.01803
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1(A) Illustration of the tests we suggest for evaluating the model performance as explained—SPB, single peptide binding; MPS, multi-peptide selection; and TPP, TCR-peptide pairing. (B) LSTM based model architecture. (C) Autoencoder based model architecture. (D) ROC curve of autoencoder based model SPB performance on 3 human peptides from Dash et al. (18) dataset. (E–G) Comparison of amino acids of CDR3 beta sequences of TCRs binding Dash et al. (18) peptides vs. TCRs that do not bind these peptides, in McPAS (23) database (logos were created with Two-Sample-Logos); the height of symbols within the stack in each logo indicates the relative frequency of each amino acid at that position. Only amino acids whose distribution differs significantly between the two sets are shown, and only 13 length TCRs were compared.
Comparison between the different versions of the ERGO classifier [AE (Autoencoder) vs. LSTM and McPAS (23) vs. VDJdb (27)] for the SPB task.
| LPRRSGAAGA | 0.772 | 0.760 | KLGGALQAK | 0.695 | 0.731 |
| GILGFVFTL | 0.843 | 0.832 | GILGFVFTL | 0.820 | 0.817 |
| NLVPMVATV | 0.835 | 0.821 | NLVPMVATV | 0.665 | 0.686 |
| GLCTLVAML | 0.803 | 0.816 | AVFDRKSDAK | 0.676 | 0.695 |
| SSYRRPVGI | 0.969 | 0.980 | RAKFKQLL | 0.828 | 0.825 |
The five most frequent peptides in each database are shown. Other less frequent peptides SPB results are in the .
Comparison between the different versions of the ERGO classifier [AE vs. LSTM and McPAS (23) vs. VDJdb (27)] and existing classifiers [TCRGP by Jokinen et al. (19), TCRex by Gielis et al. (20)] for the SPB task.
| GLCTLVAML | 0.803 | 0.816 | 0.764 | 0.770 | 0.708 | 0.686 | 0.816 | 0.782 | 0.82 ± 0.02 | |
| NLVPMVATV | 0.835 | 0.821 | 0.665 | 0.686 | 0.624 | 0.632 | 0.587 | 0.651 | 0.72 ± 0.01 | |
| GILGFVFTL | 0.843 | 0.832 | 0.820 | 0.817 | 0.725 | 0.712 | 0.818 | 0.822 | 0.81 ± 0.01 | |
Bolded values are the best results. The peptides here are the human peptides in Dash et al. (.
Comparison between the different versions of the ERGO classifier [AE vs. LSTM and McPAS (23) vs. VDJdb (27)] for the binding to a specific antigen.
| NP177 | 0.772 | 0.767 | IE1 | 0.703 | 0.738 |
| M1 | 0.843 | 0.832 | M | 0.825 | 0.820 |
| pp65 | 0.814 | 0.803 | pp65 | 0.702 | 0.716 |
| BMLF1 | 0.808 | 0.819 | EBNA4 | 0.711 | 0.717 |
| PB1 | 0.958 | 0.970 | Gag | 0.890 | 0.897 |
There are no previous results on this task.
Figure 2(A) AE and LSTM models MPS accuracy per number of peptide classes in McPAS-TCR (23) and VDJdb (27) datasets. (B) ROC curve of TPP-I, II, and III AE model performance on McPAS dataset. (C) AUC for TPP-I as a function of the sub-sample size. (D) AUC of TPP-I per missing amino-acids index. (E) Number of TCRs per peptide distribution in McPAS-TCR and VDJdb datasets, logarithmic scale. (F) AUC of TPP-I per number of TCRs per peptide bins (bins are the union of all TCRs that match peptides with total number of TCRs in a specific range).
AUC of TPP task with either known peptide and TCR (but unknown pairing TPP-I), known peptide unseen TCR (TPP-II), and unseen peptide and TCR (TPP-III).
| TPP-I | 0.859 | 0.840 | 0.842 | 0.776 | 0.761 | 0.805 | 0.813 | |
| TPP-II | 0.798 | 0.792 | 0.764 | 0.770 | 0.745 | 0.805 | 0.813 | |
| TPP-III | 0.601 | 0.562 | 0.669 | 0.522 | 0.636 | 0.570 | 0.646 | |
The results are the test AUC using either AE or LSTM on McPAS (.