| Literature DB >> 29115832 |
Morten Nielsen1,2, Tim Connelley3, Nicola Ternette4.
Abstract
Peptide binding to MHC class I molecules is the single most selective step in antigen presentation and the strongest single correlate to peptide cellular immunogenicity. The cost of experimentally characterizing the rules of peptide presentation for a given MHC-I molecule is extensive, and predictors of peptide-MHC interactions constitute an attractive alternative. Recently, an increasing amount of MHC presented peptides identified by mass spectrometry (MS ligands) has been published. Handling and interpretation of MS ligand data is, in general, challenging due to the polyspecificity nature of the data. We here outline a general pipeline for dealing with this challenge and accurately annotate ligands to the relevant MHC-I molecule they were eluted from by use of GibbsClustering and binding motif information inferred from in silico models. We illustrate the approach here in the context of MHC-I molecules (BoLA) of cattle. Next, we demonstrate how such annotated BoLA MS ligand data can readily be integrated with in vitro binding affinity data in a prediction model with very high and unprecedented performance for identification of BoLA-I restricted T-cell epitopes. The prediction model is freely available at http://www.cbs.dtu.dk/services/NetMHCpan/NetBoLApan . The approach has here been applied to the BoLA-I system, but the pipeline is readily applicable to MHC systems in other species.Entities:
Keywords: BoLA; GibbsClustering; MHC; NetMHCpan; T-cell epitopes; antigen presentation; bioinformatics; mass spectrometry; prediction
Mesh:
Substances:
Year: 2017 PMID: 29115832 PMCID: PMC5759033 DOI: 10.1021/acs.jproteome.7b00675
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 4.466
Figure 1(A) Number of peptides obtained from each cell line and for the combined set (all) of peptides for each of the BoLA-A10, -A14, and -A18 haplotypes. (B) Overlap of peptide sequences for A10 and A14 samples. (C) Distribution of ligand lengths within the different data sets.
Figure 2GibbsCluster analysis of the three combined data sets. Each row displays the results from one haplotype data set. Left panels give the barplot of the Kullback–Leibler Distance (KLD) as a function of the number of clusters. The relative size of each black block within a bar is proportional to the size of each of the clusters. The right panels give the sequence motifs derived from the best solution (i.e., the solution with highest KLD) displayed in the form of sequence logos generated with Seq2Logo.[25]
Figure 3Mapping of GibbsClustered peptides to BoLA-I molecule specificities. Each haplotype is shown separated by the vertical lines as indicated. In each column the binding motif logos for each of the optimal GibbsCluster solutions (upper row) together with the best-matched NetMHCpan predicted binding motif for the BoLA-I molecules (central row) expressed by the relevant haplotype are shown (as determined by visual comparison). The lower row displays boxplot representations of the NetMHCpan-3.0 percentile rank prediction values for all peptides in each GibbsCluster against all BoLA-I molecules expressed by the given haplotype.
Association of GibbsCluster Clusters to BoLA-I Restrictions
| cell line | group | BoLA-I |
|---|---|---|
| A10 | G1 | BoLA-3*00201 |
| G2 | BoLA-2*01201 | |
| A14 | G1 | BoLA-4*02401 |
| G2 | BoLA-1*02301 | |
| G3 | BoLA-2*02501 | |
| A18 | G1 | BoLA-6*01301 |
Figure 4Length distribution of ligands restricted to each BoLA molecule.
Comparison of the Predictive Performance of NetMHCpan-4.0_BA (the binding affinity prediction score of the NetMHCpan-4.0 method trained on both eluted ligand and peptide binding affinity data) and NetMHCpan-3.0 Models on Quantitative Binding Affinity Data from the IEDB Affinity Data Seta
| NetMHCpan-4.0_BA | NetMHCpan-3.0 | |||||
|---|---|---|---|---|---|---|
| BoLA-I | no. peps | no. bind | PCC | AUC | PCC | AUC |
| BoLA-3*00101 (BoLA-AW10) | 166 | 8 | 0.497 | 0.816 | 0.381 | 0.792 |
| BoLA-1*02301 (BoLA-D18.4) | 258 | 182 | 0.648 | 0.832 | 0.551 | 0.747 |
| BoLA-6*01301 (BoLA-HD6) | 268 | 219 | 0.622 | 0.815 | 0.482 | 0.728 |
| BoLA-3*00201 (BoLA-JSP.1) | 158 | 32 | 0.464 | 0.703 | 0.277 | 0.622 |
| BoLA-T2c | 90 | 84 | 0.485 | 0.833 | 0.455 | 0.813 |
| BoLA-2*01201 (BoLA-T2a) | 167 | 47 | 0.691 | 0.852 | 0.635 | 0.812 |
| BoLA-6*04101 (BoLA-T2b) | 157 | 38 | 0.631 | 0.835 | 0.566 | 0.816 |
| Ave | 0.577 | 0.812 | 0.478 | 0.761 | ||
Names in parentheses in the first column refer to the historical names for the different alleles. Performance was estimated in terms of Pearson’s correlation coefficient (PCC) and AUC (area under the receiver operator curve). Both of these performance measures take a value of 1 for the perfect and values of 0.0 (PCC)/0.5 (AUC) for a random prediction.
Figure 5Predicted length preference for the BoLA-3*00201 (left) and BoLA-1*02301 (right) molecules. The solid line shows the length distribution for the MS eluted ligands in both panels and bars show the length distribution predicted by NetMHCpan-2.8 (2.8 - light gray), NetMHCpan-3.0 (3.0 - white), the eluted ligand output value of NetMHCpan-4.0 (4.0_EL - black), and the binding affinity output value of NetMHCpan-4.0 (4.0_BA - gray).
Predictive Performance of the NetMHCpan-4.0 Eluted Ligand Likelihood Prediction Model (4.0_EL) Compared with NetMHCpan-3.0 (3.0) on a Data Set of Known BoLA-I Restricted T-Cell Epitopes from Theileria parva (TP) and Bovine Herpes Virus (BHV)a,b
The part of the Table to the left of the vertical line gives the performance of the two methods on the original epitope data. The part of the Table to the right of the vertical line gives the results allowing each prediction method to suggest alternative epitopes overlapping with the known epitopes (either contained within known epitopes or with single amino acid extensions). In bold is highlighted the case where the two methods suggest alternative optimal epitopes.
#: Minimal epitope defined in ref (30); $: Minimal epitope (N. MacHugh personal communication).