| Literature DB >> 31736974 |
Jingcheng Wu1,2, Wenzhe Wang2, Jiucheng Zhang2, Binbin Zhou2, Wenyi Zhao1,2, Zhixi Su3, Xun Gu4, Jian Wu2, Zhan Zhou1, Shuqing Chen1.
Abstract
Neoantigens play important roles in cancer immunotherapy. Current methods used for neoantigen prediction focus on the binding between human leukocyte antigens (HLAs) and peptides, which is insufficient for high-confidence neoantigen prediction. In this study, we apply deep learning techniques to predict neoantigens considering both the possibility of HLA-peptide binding (binding model) and the potential immunogenicity (immunogenicity model) of the peptide-HLA complex (pHLA). The binding model achieves comparable performance with other well-acknowledged tools on the latest Immune Epitope Database (IEDB) benchmark datasets and an independent mass spectrometry (MS) dataset. The immunogenicity model could significantly improve the prediction precision of neoantigens. The further application of our method to the mutations with pre-existing T-cell responses indicating its feasibility in clinical application. DeepHLApan is freely available at https://github.com/jiujiezz/deephlapan and http://biopharm.zju.edu.cn/deephlapan.Entities:
Keywords: cancer immunology; deep learning; human leukocyte antigen; neoantigen; recurrent neural network
Year: 2019 PMID: 31736974 PMCID: PMC6838785 DOI: 10.3389/fimmu.2019.02559
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1The architecture of DeepHLApan. Two types of data (437,077 binding data points, 32,785 immunogenicity data points) are collected for model training, and one-hot encoding is used for amino acid representation. Three layers of bidirectional GRU with attention have been employed as the model framework. The immunogenic score is used as a filter (>0.5), and peptides with binding scores ranked within the top 20 are predicted as high-confidence neoantigens.
Figure 2Binding model performance on never-seen HLA alleles. (A) Prediction AUC on alleles with both positive HLA-peptide pairs and negative pairs in descending order. (B) Prediction ACC on alleles with single-labeled HLA-peptide pairs in descending order.
Figure 3The comparison of actual motifs and predicted motifs on 16 HLA alleles. The motif logo is created by Weblogo. The actual motifs are based on their binding peptides, the predicted motifs are generated by taking top 1% predicted peptides out of 100,000 random peptides.
Figure 4Model comparison between the binding model of DeepHLApan with other tools. (A) Performance of the binding model compared with the other 12 well-acknowledged tools on the latest IEDB benchmark datasets. (B) Performance of the binding model compared with the other 5 binding tools on the independent MS dataset. The detailed information of each sub-dataset is listed in Tables S7A,B.
The improvement of precision and decrease of recall with immunogenicity model on all available neoantigen predictions.
| 10 | 9.6 | 96 | 13.8% (+43.8%) | 56.3% (−41.9%) |
| 2 | 36.6 | 92.2 | 48.6% (+32.8%) | 54.7% (−40.7%) |
The number in parentheses with a positive or negative sign indicates the percentage of improvement and decrease, respectively.
The improvement of precision and decrease of recall with immunogenicity model on HLA-A02:01-restricted neoantigen prediction.
| 10 | 10.1 | 96.0 | 15.7% (+55.4%) | 88.0% (−8.3%) |
| 2 | 37.5 | 84.0 | 45.2% (+20.5%) | 76.0% (−9.5%) |
The number in parentheses with a positive or negative sign indicates the percentage of improvement and decrease, respectively.
Figure 5For 26 mutations with pre-existing T-cell responses, we ranked them in order of probability of presentation within their corresponding patients. The mutation rank of NetMHCpan 4.0 was measured by taking the minimum predicted rank across all mutation-spanning peptides. The number of predicted mutations ranked in the top 5, 10, and 20 by EDGE and MHCflurry were derived from Bulik-Sullivan et al. (27).