| Literature DB >> 28832583 |
Michal Bassani-Sternberg1,2, Chloé Chong1,2, Philippe Guillaume1,2, Marthe Solleder1,3, HuiSong Pak1,2, Philippe O Gannon2, Lana E Kandalaft1,2, George Coukos1,2, David Gfeller1,2,3.
Abstract
The precise identification of Human Leukocyte Antigen class I (HLA-I) binding motifs plays a central role in our ability to understand and predict (neo-)antigen presentation in infectious diseases and cancer. Here, by exploiting co-occurrence of HLA-I alleles across ten newly generated as well as forty public HLA peptidomics datasets comprising more than 115,000 unique peptides, we show that we can rapidly and accurately identify many HLA-I binding motifs and map them to their corresponding alleles without any a priori knowledge of HLA-I binding specificity. Our approach recapitulates and refines known motifs for 43 of the most frequent alleles, uncovers new motifs for 9 alleles that up to now had less than five known ligands and provides a scalable framework to incorporate additional HLA peptidomics studies in the future. The refined motifs improve neo-antigen and cancer testis antigen predictions, indicating that unbiased HLA peptidomics data are ideal for in silico predictions of neo-antigens from tumor exome sequencing data. The new motifs further reveal distant modulation of the binding specificity at P2 for some HLA-I alleles by residues in the HLA-I binding site but outside of the B-pocket and we unravel the underlying mechanisms by protein structure analysis, mutagenesis and in vitro binding assays.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28832583 PMCID: PMC5584980 DOI: 10.1371/journal.pcbi.1005725
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1General pipeline for HLA-I motif identification and annotation, and training of predictors.
High accuracy HLA peptidomics data were first generated for 10 samples and collected from publicly available data for 40 other samples. In each sample motifs were identified using on our recent mixture model algorithm [24]. Motifs were then annotated to their respective allele based on co-occurrence of alleles across samples (e.g., first HLA-A24:02, then HLA-A01:01 and HLA-C06:02, see also Fig B in S1 Supporting Information for another example). Finally all peptides assigned to each motif were pooled together to train our new HLA-I ligand predictor for each HLA-I allele (MixMHCpred v1.0).
Fig 2Comparison between motifs predicted by our algorithm and known motifs.
A: Comparison with IEDB motifs for 44 HLA-I binding motifs identified with the fully unsupervised approach. Alleles without previously documented ligands are highlighted in red. For HLA-B56:01, the three known ligands are shown. B: Comparison with motifs obtained from mono-allelic cell lines [31]. C: Motif identified with the semi-supervised approach.
Fig 3Correlation between amino acids frequencies at positions P4 to P7 in our HLA peptidomics data and in the human proteome.
Fig 4Comparison between our predictor (MixMHCpred1.0) and existing tools.
A: Fraction of the true positives among the top 1% predictions (PP1%) for the naturally presented HLA-I ligand identified in mono-allelic cell lines, with 99-fold excess of decoy peptides. Of note, PP1% is equivalent to the both Precision and Recall, since the number of actual positives is the same as the number of predicted positives. B-C: Graphical representation of results in Table 1 and Table C in S1 Supporting Information. Panel B shows the AUC values and panel C the fraction of neo-antigens predicted in the top 1% of predictions (which typically corresponds to what is experimentally tested for immunogenicity). D-E: Predictions of Cancer Testis Antigens from the CTDatabase. Panel D shows AUC values and panel E shows PP1%. Truncated y-axes are explicitly indicated.
Ranking of the neo-antigens identified in four melanoma samples [17,20].
Column 2 shows the mutated neo-antigens (the mutated residue is highlighted in bold). Column 5 shows the ranking based on our predictions (i.e. number of peptide to be tested to find this neo-antigen). Columns 6 to 8 show the ranking based on NetMHC [8], NetMHCpan [12] and NetMHCstabpan [36], respectively. The last column shows the total number of neo-antigen candidates (i.e., all possible 9- and 10-mers encompassing all missense mutations).
| Sample | Sequence | Protein | Mutation | Rank | Net-MHC | Net-MHCpan | Net-MHC-stabpan | # Candi-dates |
|---|---|---|---|---|---|---|---|---|
| Mel8 | NOP16 | P169L | 7 | 7 | 14 | 1340 | ||
| Mel5 | SEPT2 | Q125R | 13 | 37 | 149 | 25807 | ||
| Mel5 | GABPA | E161K | 2088 | 1755 | 581 | 25807 | ||
| Mel15 | SYTL4 | S363F | 139 | 410 | 481 | 24766 | ||
| Mel15 | SEC23A | P52L | 1717 | 672 | 36 | 24766 | ||
| Mel15 | AKAP6 | M1482I | 198 | 100 | 44 | 24766 | ||
| Mel15 | ABCC2 | S1342F | 1115 | 2085 | 3207 | 24766 | ||
| Mel15 | NCAPG2 | P333L | 527 | 301 | 199 | 24766 | ||
| Mel15 | MAP3K9 | E689K | 3978 | 1351 | 3079 | 24766 | ||
| 12T | MED15 | P747S | 4206 | 4176 | 3181 | 15750 |
Fig 5Analysis of newly identified HLA-I motifs.
A: Structural view of two different HLA-I alleles with N90 as in HLA-A02:20 (PDB: 2BVQ [40], pink sidechains) or K90 as in HLA-A02:01 (PDB: 2BNR [41], green sidechains). For clarity, the α1 helix has been truncated. B: B pocket residues’ conservation across HLA-I alleles displaying preference for histidine at P2. The last line shows the sequence of HLA-B14:02, which does not show histidine preference at P2 (see motif in C), but has the same B pocket as HLA-B15:18. The last column shows amino acids at position 97, which is not part of the B pocket. C: Structural view of HLA-B14:02 in complex with a peptide with arginine at P2 (PDB: 3BVN [42]). Residues not conserved between HLA-B15:18 and HLA-B14:02 are displayed in orange. None of them are making direct contact with the arginine residue at P2. D: Stability values (half-lives) obtained for peptides with H or R at P2 for both HLA-B14:02 wt and W97R mutant. NB stands for no binding. Dashed lines indicate lower bounds for half-lives values. Residue numbering follows the one used in most X-ray structures in the PDB.