| Literature DB >> 35128348 |
Zeynep Koşaloğlu-Yalçın1, Jenny Lee1, Jason Greenbaum1, Stephen P Schoenberger2,3, Aaron Miller2,3, Young J Kim4, Alessandro Sette1,5, Morten Nielsen6,7, Bjoern Peters1,5.
Abstract
Many steps of the MHC class I antigen processing pathway can be predicted using computational methods. Here we show that epitope predictions can be further improved by considering abundance levels of peptides' source proteins. We utilized biophysical principles and existing MHC binding prediction tools in concert with abundance estimates of source proteins to derive a function that estimates the likelihood of a peptide to be an MHC class I ligand. We found that this combination improved predictions for both naturally eluted ligands and cancer neoantigen epitopes. We compared the use of different measures of antigen abundance, including mRNA expression by RNA-Seq, gene translation by Ribo-Seq, and protein abundance by proteomics on a dataset of SARS-CoV-2 epitopes. Epitope predictions were improved above binding predictions alone in all cases and gave the highest performance when using proteomic data. Our results highlight the value of incorporating antigen abundance levels to improve epitope predictions.Entities:
Keywords: Computational bioinformatics; Immunology; Mathematical biosciences
Year: 2022 PMID: 35128348 PMCID: PMC8806398 DOI: 10.1016/j.isci.2022.103850
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Figure 1HLA binding and abundance of source proteins of HLA class I eluted ligands
(A) HLA class I eluted ligands originate from highly expressed genes and are predicted good HLA binders. The quartile ranges and density of TPM (top) and predicted IC50 (bottom) values are displayed for the five alleles included in the dataset. Ligands (displayed in tan) are expressed at significantly higher levels than random background peptides (displayed in green) and are predicted to bind at significantly higher levels (p < 2.2 × 10−16, Wilcoxon Test). Dashed lines indicate TPM 10 and IC50 500 nM, respectively.
(B) Interplay between HLA binding of eluted ligands and expression of their source proteins. The binding affinities and TPM values were separated into ranges to create a 2-dimensional matrix with the TPM on the x axis and the IC50 on the y axis. Each peptide was assigned to a cell in this matrix according to its IC50 and TPM values. For each cell, the percentage of ligands among all peptides that fall into the corresponding IC50 and TPM ranges was determined, and the cell was colored accordingly.
Figure 2Performance of different predictors in identifying HLA class I eluted ligands in the Trolle dataset
(A and B) Receiver operating characteristic (ROC) curves (A) and ROC curves at 10% false-positive rate (B) for different NetMHCpan predictors, TPM and AXEL-F scores are displayed
Prediction performance of different predictors for ligand elution datasets
| Predictor | Trolle AUC | Trolle pAUC | Abelin AUC | Abelin pAUC | Pyke cell line AUC | Pyke cell line pAUC | Pyke tissue AUC | Pyke tissue pAUC |
|---|---|---|---|---|---|---|---|---|
| TPM | 0.812 | 0.629 | 0.763 | 0.607 | 0.704 | 0.583 | 0.694 | 0.580 |
| IC50 | 0.990 | 0.955 | 0.969 | 0.940 | 0.951 | 0.932 | 0.961 | 0.870 |
| EL_Rank | 0.991 | 0.963 | 0.977 | 0.961 | 0.965 | 0.940 | 0.979 | 0.932 |
| AXEL-F (IC50) | 0.993 | 0.971 | 0.976 | 0.949 | 0.954 | 0.924 | 0.965 | 0.883 |
| AXEL-F (EL_Rank) | 0.992 | 0.964 | 0.980 | 0.960 | 0.959 | 0.916 | 0.967 | 0.865 |
| AXEL-F (EL_to_IC50) | 0.994 | 0.974 | 0.980 | 0.961 | 0.964 | 0.941 | 0.978 | 0.925 |
| MHCflurry | 0.986 | 0.957 | 0.988 | 0.969 | 0.969 | 0.936 | 0.970 | 0.909 |
| AXEL-F (MHCflurry) | 0.991 | 0.970 | 0.989 | 0.975 | 0.941 | 0.824 | 0.974 | 0.918 |
| MixMHCpred | 0.990 | 0.959 | 0.975 | 0.963 | 0.909 | 0.900 | 0.969 | 0.923 |
| HLAthena | 0.869 | 0.899 | 0.911 | 0.926 | 0.960 | 0.934 | 0.955 | 0.920 |
Prediction performance (AUC) of different predictors in predicting immunogenic neoantigens
| Predictor | NCI set | Literature set |
|---|---|---|
| tumor_rna_alt_freq | 0.593 | – |
| tumor_rna_depth | 0.586 | – |
| tumor_rna_alt_reads | 0.642 | – |
| TCGA_TPM_subtype_matched | 0.641 | 0.641 |
| TCGA_TPM_pancancer | 0.613 | 0.613 |
| TCGA_TPM_subtype_mismatched | 0.520 | 0.541 |
| IC50 | 0.723 | 0.628 |
| EL_Rank | 0.729 | 0.614 |
| AXEL-F (EL_to_IC50, tumor_rna_alt_reads) | 0.753 | — |
| AXEL-F (EL_to_IC50, TCGA_TPM_pancancer) | 0.735 | 0.632 |
| AXEL-F (EL_to_IC50, TCGA_TPM_subtype_mismatched) | 0.669 | 0.606 |
| AXEL-F (EL_to_IC50, TCGA_TPM_subtype_matched) | 0.754 | 0.646 |
| MHCflurry | 0.779 | 0.639 |
| AXEL-F (MHCflurry, TCGA_TPM_subtype_matched) | 0.799 | 0.659 |
| MixMHCpred | 0.659 | 0.645 |
| HLAthena (TCGA_TPM_subtype_matched) | 0.756 | 0.657 |
Prediction performance (AUC) of different predictors in predicting SARS-CoV-2 epitopes
| Predictor | Tarke | Tarke with random | Peng |
|---|---|---|---|
| TPM_RNASeq | 0.682 | 0.703 | 0.766 |
| TPM_RiboSeq | 0.683 | 0.649 | 0.776 |
| Proteomic | 0.710 | 0.681 | 0.773 |
| EL_Rank | 0.521 | 0.606 | 0.808 |
| AXEL-F (RNA-Seq) | 0.663 | 0.722 | 0.866 |
| AXEL-F (Ribo-Seq) | 0.695 | 0.749 | 0.867 |
| AXEL-F (Proteomic) | 0.715 | 0.766 | 0.892 |
| MHCflurry | 0.514 | 0.599 | 0.733 |
| AXEL-F (MHCflurry, RNA-Seq) | 0.561 | 0.524 | 0.791 |
| AXEL-F (MHCflurry, Ribo-Seq) | 0.614 | 0.520 | 0.794 |
| AXEL-F (MHCflurry, Proteomic) | 0.602 | 0.509 | 0.808 |
| HLAthena (RNA-Seq) | 0.575 | 0.565 | 0.727 |
| HLAthena (Ribo-Seq) | 0.633 | 0.613 | 0.727 |
| HLAathena (Proteomic) | 0.629 | 0.670 | 0.916 |
| MixMHCpred | 0.528 | 0.610 | 0.798 |
Figure 3Performance of different predictors in identifying SARS-CoV-2 epitopes
Receiver operating characteristic (ROC) curves are displayed for NetMHCpan predictions (EL_Rank), and predictions using AXEL-F combining binding predictions with abundance measurements of viral proteins utilizing RNA-Seq, Ribo-Seq, and Proteomics.
(A and B) Tarke SARS-CoV-2 epitopes dataset; (B) Peng SARS-CoV-2 epitopes dataset.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Trolle Ligand Dataset | IEDB ( | |
| HeLa RNA-Seq | GEO | GSM3899456 |
| Abelin Ligand Dataset | Abelin et al. ( | |
| Abelin RNA-Seq | GEO | GSE93315 |
| Pyke Ligand and Neoantigen Datasets | Pyke et al. ( | |
| K562 RNA-Seq | Cancer Cell Line Encyclopedia ( | CCLE_RNAseq_rsem_genes_tpm_20180929.txt.gz |
| TCGA PANCAN RNA-Seq Dataset | TCGA | |
| NCI Dataset of Neoantigens | Parkhurst et al. ( | |
| Literature Dataset of Neoantigens | IEDB ( | using the following filters: Epitope Structure: Linear Sequence, Included Related Structures: Only neoepitopes, Include Positive Assays, Include Negative Assays, No B cell assays, No MHC assays, MHC Restriction Type: Class I, Host: |
| Tarke Dataset of SARS-CoV-2 Epitopes | Tarke et al. ( | |
| Peng Dataset of SARS-CoV-2 Epitopes | Peng et al. ( | |
| SARS-CoV-2 Proteome | UniProt | UP000464024 |
| SARS-CoV-2 RNA-Seq and Ribo-Seq data | GEO | GSE149973 |
| SARS-CoV-2 Proteomic Dataset | Poran et al. ( | Table S10 |
| NetMHCpan 4.1 | Reynisson et al. ( | |
| IEDB | Vita et al. ( | |
| MHCFlurry | O’Donnell et al. ( | |
| MixMHCPred | Gfeller et al. ( | |
| HLAathena | Sarkizova et al. ( | |
| R | R | |
| Python | Python | |