| Literature DB >> 35785204 |
Shi Xu1, Xiaohua Wang1, Caiyi Fei1.
Abstract
In the past decade, the substantial achievements of therapeutic cancer vaccines have shed a new light on cancer immunotherapy. The major challenge for designing potent therapeutic cancer vaccines is to identify neoantigens capable of inducing sufficient immune responses, especially involving major histocompatibility complex (MHC)-II epitopes. However, most previous studies on T-cell epitopes were focused on either ligand binding or antigen presentation by MHC rather than the immunogenicity of T-cell epitopes. In order to better facilitate a therapeutic vaccine design, in this study, we propose a revolutionary new tool: a convolutional neural network model named FIONA (Flexible Immunogenicity Optimization Neural-network Architecture) trained on IEDB datasets. FIONA could accurately predict the epitopes presented by the given specific MHC-II subtypes, as well as their immunogenicity. By leveraging the human leukocyte antigen allele hierarchical encoding model together with peptide dense embedding fusion encoding, FIONA (with AUC = 0.94) outperforms several other tools in predicting epitopes presented by MHC-II subtypes in head-to-head comparison; moreover, FIONA has unprecedentedly incorporated the capacity to predict the immunogenicity of epitopes with MHC-II subtype specificity. Therefore, we developed a reliable pipeline to effectively predict CD4+ T-cell immune responses against cancer and infectious diseases.Entities:
Keywords: CD4+ T cell; IEDB; MHC-II; cancer vaccine; deep learning; neoantigen
Year: 2022 PMID: 35785204 PMCID: PMC9246415 DOI: 10.3389/fonc.2022.888556
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 5.738
Figure 1Schema of the MHC-II subtype hierarchical relationship encoding. Each layer representing one field of HLA name is converted into a [1×128] vector and concatenated into a [3×128] matrix, which is normalized and convoluted to get a [1×128] vector for later calculation.
Figure 2Schema of HLA-encoding fusion layer. (A) integrates sequence information by means of direct addition; (B) integrates the sequence information by means of concatenation; (C) shows that the sequence information is integrated by concatenation after the weight is processed by using the shared index embedding table.
Figure 3Schema of HLA–peptide fusion encoding. The orange HLA_head is the result generated in . HLA_head is loaded on top of peptide sequence in concatenation mode to form a new constituent sequence containing HLA_head and peptide.
Figure 4Architecture of FIONA. The dataset is downloaded from the IEDB database according to Section 2.1. The first dotted box contains an HLA_head and peptide, which are encoded separately and then integrated for feature transformation. The middle part contains an HLA_head, and peptide_Embedding represents sequence feature information. The right dotted box is our CNN model for training and prediction.
Results of the ablation experiment. MSE (mean-squared error), AUC (area under the curve), and PR (precision rate) are evaluation indicators.
| Method | MHC-II presentation | MHC-II immunogenicity | ||||
|---|---|---|---|---|---|---|
| MSE (test) | AUC (test) | PR (test) | MSE (test) | AUC (test) | PR (test) | |
| PE+HLA_Norm+con_ANA+BLOCKConv | ||||||
| PE+Batch_Norm+con_ANA+BLOCKConv | 0.0534 | 0.9242 | 0.9442 | 0.2049 | 0.8433 | 0.8839 |
| PE+Layer_Norm+con_ANA+BLOCKConv | 0.0610 | 0.9197 | 0.9328 | 0.2031 | 0.8340 | 0.8581 |
| PE+HLA_Norm+add_ANA+BLOCKConv | 0.0781 | 0.9038 | 0.9291 | 0.2274 | 0.8014 | 0.8230 |
| PE+HLA_onehot+con_ANA+BLOCKConv | 0.1042 | 0.8467 | 0.8835 | 0.2625 | 0.7637 | 0.7784 |
| PE+BLOCKConv | 0.2427 | 0.8046 | 0.8476 | 0.4691 | 0.5745 | 0.5872 |
| PE+HLA_Norm+con_ANA+Conv | 0.0578 | 0.8656 | 0.9103 | 0.2128 | 0.7877 | 0.8237 |
PE refers to the general peptide embedding, Batch_Norm, Layer_Norm, and HLA_Norm refer to the different HLA normalization methods described in section 2.4, while con_ANA is used to refer to the concatenate peptide_Embedding and HLA_Embedding header to get a [26×128] matrix for the following step calculation, add_ANA refers to peptide_Embedding, and HLA_Embedding is processed by direct addition, which is mentioned in Section 2.5.
Bold means highlight superiority of our model.
Figure 5Influence of balanced and unbalanced data ratios on FIONA-P. (A, B) are the ROC (receiver operating characteristic) curve and PR (precision and recall) curve of unbalanced data (AUC=0.90, PR=0.90), respectively, while (C, D) are the ROC curve and PR curve of balanced data (AUC=0.94, PR=0.95), respectively.
Figure 6Comparison of FIONA-P and other prediction tools on the presentation data of all available MHC-II subtypes. The black ones indicate that those MHC-II subtypes are not supported.
Results of immunogenicity prediction of MHC-II-restricted epitopes in terms of sensitivity, specificity, and positive predictive value (PPV).
| Tools | MHC-II Immunogenicity | ||
|---|---|---|---|
| PPV | Sensitivity | Specificity | |
| FIONA-I | |||
| FIONA-P | 0.2188 | 0.7340 | 0.8640 |
| NetMHCIIpan 4.0 | 0.1295 | 0.9271 | 0.6767 |
| BERTMHC | 0.1683 | 0.8093 | 0.7925 |
| Maria | 0.3279 | 0.7846 | 0.9166 |
| MixMHC2pred | 0.2812 | 0.7425 | 0.9032 |
Bold means highlight superiority of our model.