| Literature DB >> 30127785 |
Sinu Paul1, Edita Karosiene1, Sandeep Kumar Dhanda1, Vanessa Jurtz2, Lindy Edwards1, Morten Nielsen2,3, Alessandro Sette1,4, Bjoern Peters1,4.
Abstract
CD4+ T cells have a major role in regulating immune responses. They are activated by recognition of peptides mostly generated from exogenous antigens through the major histocompatibility complex (MHC) class II pathway. Identification of epitopes is important and computational prediction of epitopes is used widely to save time and resources. Although there are algorithms to predict binding affinity of peptides to MHC II molecules, no accurate methods exist to predict which ligands are generated as a result of natural antigen processing. We utilized a dataset of around 14,000 naturally processed ligands identified by mass spectrometry of peptides eluted from MHC class II expressing cells to investigate the existence of sequence signatures potentially related to the cleavage mechanisms that liberate the presented peptides from their source antigens. This analysis revealed preferred amino acids surrounding both N- and C-terminuses of ligands, indicating sequence-specific cleavage preferences. We used these cleavage motifs to develop a method for predicting naturally processed MHC II ligands, and validated that it had predictive power to identify ligands from independent studies. We further confirmed that prediction of ligands based on cleavage motifs could be combined with predictions of MHC binding, and that the combined prediction had superior performance. However, when attempting to predict CD4+ T cell epitopes, either alone or in combination with MHC binding predictions, predictions based on the cleavage motifs did not show predictive power. Given that peptides identified as epitopes based on CD4+ T cell reactivity typically do not have well-defined termini, it is possible that motifs are present but outside of the mapped epitope. Our attempts to take that into account computationally did not show any sign of an increased presence of cleavage motifs around well-characterized CD4+ T cell epitopes. While it is possible that our attempts to translate the cleavage motifs in MHC II ligand elution data into T cell epitope predictions were suboptimal, other possible explanations are that the cleavage signal is too diluted to be detected, or that elution data are enriched for ligands generated through an antigen processing and presentation pathway that is less frequently utilized for T cell epitopes.Entities:
Keywords: CD4+ T cell epitopes; antigen processing; epitope prediction; human leukocyte antigen; ligand elution; major histocompatibility complex class II; natural cleavage motif
Mesh:
Substances:
Year: 2018 PMID: 30127785 PMCID: PMC6087742 DOI: 10.3389/fimmu.2018.01795
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1Selection of major histocompatibility complex (MHC) II ligand data and distribution of the data. (A) Distribution of ligand entries based on ligand length as collected from immune epitope database. The most abundant length was 15, followed by 14 and 16. Lengths 14–16 comprised around 50% and lengths 13–17 represented around 71% of the ligand data. (B) Example of random peptides generated by shuffling the amino acid residues from a ligand and corresponding predicted MHC II binding (percentile rank). Five random peptides were generated from each ligand by shuffling the component amino acids and the median of the predicted binding (percentile ranks) of the random peptides was assigned as the predicted percentile rank of the random peptide. (C) Fold difference between the proportion of predicted binders among ligands and shuffled peptides for different peptide lengths. The fold difference plateaued after reaching 2.5. The red line indicates fold difference 2.5. (D) Distribution of ligand data based on restricting alleles (only B-chain is shown). (E) Distribution of the ligand data based on final selected lengths.
Figure 2Enrichment and depletion of amino acids within and adjacent to major histocompatibility complex (MHC) II ligands and predicted MHC II binders. Heatmaps generated from the relative frequencies of amino acids at ligands/binders and nearby positions with respect to the overall amino acid frequency of the source proteins. “N” and “C” represents the N- and C-terminuses of the ligands, respectively and the numbers represent the amino acid positions with respect to N- and C-terminuses. The legend shows the color scale of the heat map with respect to the relative frequency of amino acids which is represented by the numbers on the legend. (A) Heatmap generated from ligands. (B) Heatmap from predicted binders. The pattern of amino acid enrichment and depletion was found to be significantly different between the heatmaps generated based on cleavage motif and binding motif.
Performance improvement with combined prediction approach in terms of area under the ROC curve (AUC).
| Alpha | Average AUC—training ligand data | Average AUC—evaluation ligand data |
|---|---|---|
| 0 | 0.591 | 0.630 |
| 0.1 | 0.635 | 0.665 |
| 0.2 | 0.675 | 0.693 |
| 0.3 | 0.710 | 0.712 |
| 0.4 | 0.738 | 0.723 |
| 0.5 | 0.759 | 0.728 |
| 0.6 | 0.774 | 0.726 |
| 0.7 | 0.779 | 0.722 |
| 0.8 | 0.778 | 0.716 |
| 0.9 | 0.774 | 0.708 |
| 1 | 0.768 | 0.700 |
Comparison of the prediction performance when only one scoring method is used (binding-based when α = 0 and cleavage motif-based when α = 1.0) with the combined scoring approach where both binding- and cleavage motif-based scoring schemes were combined together. The AUC was highest for the training data at alpha = 0.7 and highest for evaluation data at alpha = 0.5.