| Literature DB >> 29963059 |
Sandeep Kumar Dhanda1, Edita Karosiene1, Lindy Edwards1, Alba Grifoni1, Sinu Paul1, Massimo Andreatta2, Daniela Weiskopf1, John Sidney1, Morten Nielsen2,3, Bjoern Peters1,4, Alessandro Sette1,4.
Abstract
BACKGROUND: Prediction of T cell immunogenicity is a topic of considerable interest, both in terms of basic understanding of the mechanisms of T cells responses and in terms of practical applications. HLA binding affinity is often used to predict T cell epitopes, since HLA binding affinity is a key requisite for human T cell immunogenicity. However, immunogenicity at the population it is complicated by the high level of variability of HLA molecules, potential other factors beyond HLA as well as the frequent lack of HLA typing data. To overcome those issues, we explored an alternative approach to identify the common characteristics able to distinguish immunogenic peptides from non-recognized peptides.Entities:
Keywords: HLA; TCR repertoire; bioinformatics; epitopes; immunodominance; immunogenicity; predictions
Year: 2018 PMID: 29963059 PMCID: PMC6010533 DOI: 10.3389/fimmu.2018.01369
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Full list of datasets used in this study.
| Overlapping | 18 | ( | 65 | 53 | |
| Predicted | 28 | ( | 1,043 | ||
| Overlapping | 61 | ( | 362 | ||
| Confirmed epitopes | 61 | ( | 137 | ||
| Timothy grass | Overlapping | 25 | ( | 60 | 360 |
| Predicted | 35 | ( | 360 | ||
| Overlapping | 21 | ( | 6 | ||
| Overlapping | 37 | ( | 0 | ||
| House dust mite (HDM) | Overlapping | 20 | ( | 52 | 6 |
| Cockroach | Overlapping | 19 | ( | 71 | 521 |
| Dengue antigens | Predicted | 150 | ( | 325 | 140 |
| Erythropoietin | Overlapping | 5 | ( | 9 | 11 |
| CRJ1 and CRJ2 | Overlapping | 54 | ( | 30 | 18 |
| Mouse allergens | Predicted | 22 | ( | 82 | 885 |
| Novel HDM antigens | Predicted | 20 | ( | 105 | 186 |
| Pertussis vaccine antigens | Overlapping | 53 | ( | 100 | 202 |
| Ragweed allergens | Overlapping | 25 | ( | 15 | 183 |
| Tetanus | 20 | ( | 28 | 98 | |
| ZIKA virus polyprotein | Overlapping | 18 | (Grifoni et al., unpublished) | 48 | 529 |
| Yellow fever virus polyprotein | Overlapping | 42 | (Weiskopf et al., unpublished) | 42 | 639 |
| Overall | 1,032 | 5,739 | |||
| Acetylcholine receptor subunit alpha ( | 22 | ( | 4 | 18 | |
| Circumsporozoite (CS) protein ( | 22 | ( | 4 | 4 | |
| Conserved hypothetical lipoprotein ( | 10 | ( | 3 | 10 | |
| Other protein ( | 12 | ( | 7 | 5 | |
| CS protein ( | 64 | ( | 7 | 10 | |
| CS protein ( | 35 | ( | 7 | 7 | |
| Api m 1 ( | 40 | ( | 6 | 9 | |
| Myelin basic protein ( | 12 | ( | 3 | 3 | |
| CS protein ( | 52 | ( | 7 | 5 | |
| Acetylcholine receptor sub. γ and δ ( | 22 | ( | 14 | 42 | |
| Acetylcholine receptor sub. α ( | 22 | ( | 8 | 17 | |
| Glutamate decarboxylase 2 ( | 44 | ( | 2 | 10 | |
| Structural polyprotein ( | 10 | ( | 4 | 7 | |
| Envelope glycoprotein D ( | 24 | ( | 6 | 6 | |
| Thyroglobulin and thyrotropin receptor ( | 15 | ( | 5 | 10 | |
| Fusion glycoprotein F0 ( | 13 | ( | 12 | 50 | |
| Poa p 5, | 13 | ( | 9 | 8 | |
| Myelin basic protein ( | 20 | ( | 6 | 7 | |
| Structural polyprotein ( | 14 | ( | 4 | 74 | |
| Acetylcholine receptor sub. δ and α ( | 58 | ( | 12 | 33 | |
| Hev b 1 ( | 19 | ( | 2 | 2 | |
| Api m 1 ( | 10 | ( | 7 | 6 | |
| TRAP ( | 50 | ( | 21 | 30 | |
| Nucleoprotein ( | 19 | ( | 9 | 40 | |
| Genome polyprotein ( | 22 | ( | 14 | 13 | |
| Subtilisin-like protease 6 ( | 38 | ( | 8 | 20 | |
| Blood groups Rh(D) and Rh(CE) polypeptides ( | 22 | ( | 19 | 15 | |
| Myelin proteolipid and myelin basic protein ( | 16 | ( | 7 | 14 | |
| Polyprotein Ent. virus B; Glut. Decarboxylase2 ( | 22 | ( | 7 | 26 | |
| Gal d 1 ( | 14 | ( | 2 | 1 | |
| Genome polyprotein ( | 10 | ( | 5 | 122 | |
| Hev b 6 ( | 16 | ( | 4 | 12 | |
| Bos d 9 ( | 10 | ( | 2 | 5 | |
| Cha o 1 ( | 19 | ( | 10 | 24 | |
| Genome polyprotein ( | 22 | ( | 12 | 257 | |
| Genome polyprotein ( | 41 | ( | 18 | 33 | |
| Bos d 9, | 29 | ( | 8 | 12 | |
| Cytochrome P450 2D6 ( | 80 | ( | 28 | 29 | |
| Capsid protein VP1 ( | 19 | ( | 8 | 54 | |
| Integrin beta-3 ( | 31 | ( | 7 | 51 | |
| Genome polyprotein ( | 44 | ( | 7 | 286 | |
| Equ c 1 ( | 10 | ( | 15 | 32 | |
| Merozoite surface protein 1 ( | 48 | ( | 10 | 18 | |
| Cry j 1 ( | 12 | ( | 4 | 33 | |
| Cha o 2 ( | 19 | ( | 6 | 36 | |
| Capsid protein VP1 ( | 16 | ( | 28 | 62 | |
| Non-specific lipid-transfer protein ( | 15 | ( | 3 | 5 | |
| Aquaporin-4 ( | 32 | ( | 6 | 10 | |
| UniProt:B8ZU53 ( | 152 | ( | 8 | 1 | |
| Pas n 1 allergen ( | 18 | ( | 4 | 11 | |
| Pen a 1 allergen ( | 16 | ( | 15 | 13 | |
| Genome polyprotein ( | 47 | ( | 26 | 46 | |
| Other wolf or dog protein ( | 25 | ( | 18 | 12 | |
| Can f 5 ( | 24 | ( | 25 | 31 | |
| Botulinum neurotoxin type A ( | 25 | ( | 6 | 13 | |
| Genome polyprotein ( | 20 | ( | 15 | 34 | |
| Botulinum neurotoxin type A ( | 14 | ( | 6 | 14 | |
| Overall | 530 | 1,758 | |||
Figure 1Predictive performances for different motif lengths. Bars show cross-validation performance for the training dataset. Area under the ROC curve (AUC) values are shown for each artificial neural network training done by choosing different sequence lengths to define a preferred sequence motif within a 15-mer peptide. Error bars show SD of the five cross-validation sets.
Figure 2Predictive performances obtained combining HLA binding and immunogenicity scores. The figure shows the performance dependency on an α coefficient used to combine HLA binding and immunogenicity scores. The model trained on the training dataset described in the text and validated on independent literature datasets, also described in the text.
Figure 3Performance of independent literature datasets with combined approach and varying degree of alpha on the model trained with initial, in-house and tetramer datasets. The prediction values from HLA score and immunogencity score using different values of alpha are shown. A cutoff of 0.4 value for alpha is also highlighted by a dotted line.
Figure 4Two-sample logo created using epitopes and non-epitopes in all the data (p-value < 0.01). The immunogenicity motifs for epitopes and non-epitopes were derived from the combination of all the datasets.
Performance of overlapping dataset derived from literature at different threshold settings using the percentile combined score.
| Threshold | Average sensitivity | Average specificity | Total peptide to be synthesized (average) |
|---|---|---|---|
| 8 | 20 | 91 | 13 |
| 18 | 31 | 85 | 21 |
| 36 | 51 | 65 | 39 |
| 43 | 59 | 59 | 46 |
| 66 | 75 | 37 | 68 |
The performance for each study is calculated individually and then averaged.
Sensitivity = (TP/TP + FN) × 100, specificity = (TN/TN + FP) × 100, % total peptide to be synthesized = [(TP + FP)/(TP + TN + FP + FN)] × 100.
TN, true negative; FP, false positive; FN, false negative; TP, true positive.
Figure 5Screenshot for home page of immunogenicity prediction server.