| Literature DB >> 32626747 |
Guangzhi Wang1,2, Huihui Wan2,3, Xingxing Jian2,4, Yuyu Li1, Jian Ouyang2, Xiaoxiu Tan3, Yong Zhao1, Yong Lin3, Lu Xie1,2.
Abstract
In silico T-cell epitope prediction plays an important role in immunization experimental design and vaccine preparation. Currently, most epitope prediction research focuses on peptide processing and presentation, e.g., proteasomal cleavage, transporter associated with antigen processing (TAP), and major histocompatibility complex (MHC) combination. To date, however, the mechanism for the immunogenicity of epitopes remains unclear. It is generally agreed upon that T-cell immunogenicity may be influenced by the foreignness, accessibility, molecular weight, molecular structure, molecular conformation, chemical properties, and physical properties of target peptides to different degrees. In this work, we tried to combine these factors. Firstly, we collected significant experimental HLA-I T-cell immunogenic peptide data, as well as the potential immunogenic amino acid properties. Several characteristics were extracted, including the amino acid physicochemical property of the epitope sequence, peptide entropy, eluted ligand likelihood percentile rank (EL rank(%)) score, and frequency score for an immunogenic peptide. Subsequently, a random forest classifier for T-cell immunogenic HLA-I presenting antigen epitopes and neoantigens was constructed. The classification results for the antigen epitopes outperformed the previous research (the optimal AUC = 0.81, external validation data set AUC = 0.77). As mutational epitopes generated by the coding region contain only the alterations of one or two amino acids, we assume that these characteristics might also be applied to the classification of the endogenic mutational neoepitopes also called "neoantigens." Based on mutation information and sequence-related amino acid characteristics, a prediction model of a neoantigen was established as well (the optimal AUC = 0.78). Further, an easy-to-use web-based tool "INeo-Epp" was developed for the prediction of human immunogenic antigen epitopes and neoantigen epitopes.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32626747 PMCID: PMC7315274 DOI: 10.1155/2020/5798356
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1The flow chart for “INeo-Epp” prediction.
Summary of IEDB epitope data.
| HLA supertype | IEDB HLA data | Number | HLA allele frequency Asian/Black/Caucasian | Motif view | |
|---|---|---|---|---|---|
| Negative | Positive | ||||
| A1 | A01:01 | 811 | 103 | 0.154/0.046/0.164 | 1-2(ST)-3-4-5-6-7-8-9(Y) |
| A26:01 | 83 | 19 | 0.041/0.014/0.030 | 1(DE)-2(ITV)-3-4-5-6-7-8-9(FMY) | |
| A2 | A02:01 | 1883 | 1580 | 0.049/0.123/0.275 | 1-2(LM)-3-4-5-6-7-8-9(ILV)-10(V) |
| A3 | A11:01 | 196 | 174 | 0.139/0.014/0.060 | 1-2(IMSTV)-3-4-5-6-7-8-9(K)-10(K) |
| A03:01 | 1400 | 169 | 0.063/0.083/0.139 | 1-2(ILMTV)-3-4-5-6-7-8-9(K)-10(K) | |
| A24 | A24:02 | 207 | 219 | 0.136/0.024/0.084 | 1-2(WY)-3-4-5-6-7-8-9(FIW) |
| A23:01 | 1138 | 12 | 0.006/0.109/0.019 | 1-2(WY)-3-4-5-6-7-8-9-10(F) | |
| B7 | B35:01 | 63 | 248 | 0.062/0.068/0.055 | 1-2(P)-3-4-5-6-7-8-9(FMY) |
| B07:02 | 523 | 244 | 0.034/0.005/0.0143 | 1-2(p)-3-4-5-6-7-8-9(FLM) | |
| B51:01 | 13 | 51 | 0.074/0.021/0.047 | 1-2(P)-3-4-5-6-7-8-9(IV) | |
| B8 | B08:01 | 317 | 195 | 0.036/0.037/0.114 | 1-2-3-4-5(HKR)-6-7-8-9(FILMV) |
| B27 | B27:05 | 100 | 86 | 0.008/0.008/0.037 | 1(RY)-2(R)-3(FMLWY)-4-5-6-7-8-9 |
| B44 | B37:01 | 1036 | 10 | 0.034/0.005/0.014 | — |
| B40:01 | 67 | 65 | 0.022/0.012/0.052 | — | |
| B44:02 | 73 | 66 | 0.008/0.020/0.095 | 1-2(E)-3-4-5-6-7-8-9(FIWY) | |
| B58 | B58:01 | 11 | 62 | 0.041/0.037/0.007 | 1-2(AST)-3-4-5-6-7-8-9(W) |
| B62 | B15:01 | 3 | 70 | 0.016/0.010/0.060 | 1-2(LMQ)-3-4-5-6-7-8-9(FY) |
| Total | 7924 | 3373 | |||
| Remove negative rank(%) > 2 | 5123 | 3373 | |||
| Remove negative human 100% similar | 4943 | 3373 | |||
External data included in validation set.
| Publication time | PMID | Author | Nonepitopes | Epitopes |
|---|---|---|---|---|
| 2013 | 23580623 | Weiskopf et al. | 477 | 42 |
| 2018 | 29397015 | Luxenburger et al. | 100 | 26 |
| 2018 | 30260541 | Xia et al. | — | 1 |
| 2018 | 30487281 | Vahed et al. | — | 4 |
| 2018 | 30518652 | Khakpoor et al. | — | 2 |
| 2018 | 30587531 | Huth et al. | — | 4 |
| 2018 | 30815394 | Sekyere et al. | — | 6 |
| Total | 577 | 85 | ||
| Remove negative with rank(%) > 2 and HLA supertypes (not appeared in training set) | 321 | 69 | ||
Neoepitope data included in this study.
| Publication time | PMID | Author | Tumor type | Nonimmunogenic neoepitopes | Immunogenic neoepitopes | T-cell assay |
|---|---|---|---|---|---|---|
| 2013-12 | 24323902 | D. A. Wick et al. | Ovarian cancer | — | 1 | ELISPOT |
| 2015-9 | 26359337 | E. M. Van Allen et al. | Melanoma | — | 18 | Clinical benefit |
| 2015-11 | 26752676 | T. Karasaki et al. | Lung adenocarcinoma | — | 4 | — |
| 2016-1 | 26901407 | A. Gros et al. | Melanoma | 12 | 14 | ELISPOT |
| 2016-5 | 27198675 | E. Strønen, et al. | Melanoma | 1134 | 16 | CTL clone |
| 2016-12 | 28405493 | A. Nelde et al. | Lymphoma | — | 2 | ELISPOT |
| 2017-6 | 28619968 | X. Zhang et al. | Breast cancer | — | 4 | Flow cytometry |
| 2017-10 | 29104575 | M. Markus et al. | Melanoma | 10 | 16 | — |
| 2017-11 | 29187854 | A.-M. Bjerregaard et al. | Polytype | 1874 | 42 | ELISPOT et al. |
| 2017-11 | 29132146 | V. P. Balachandran et al. | Pancreatic | — | 10 | Flow cytometry |
| 2018-5 | 29720506 | T. Matsuda et al. | Ovarian cancer | — | 3 | ELISPOT |
| 2018-12 | 29409514 | K. Sonntag et al. | Pancreatic ductal carcinoma | — | 3 | Flow cytometry |
| 2018-10 | 30357391 | V. Randi et al. | — | 6 | 35 | — |
| Total | 3030 | 168 | ||||
| Remove duplication | 2837 | 164 | ||||
| Remove negative rank(%) > 2 and human 100% similar | 1697 | 164 | ||||
Figure 2Epitope/neoepitope peptide composition and amino acid length distribution. (a) Detailed data distribution of seventeen HLA alleles of antigen peptides, the proportion of each HLA allele (positive and negative) epitopes, and the corresponding HLA frequency in Asians, Blacks, Caucasians. (b) Proportion of antigen peptides with lengths of 8-11 AA. (c) Data distribution of HLA alleles of neoantigen peptides. (d) Proportion of neoantigen peptides with lengths of 8-11 AA.
Figure 3Antigen epitope amino acid distribution frequency in the TCR contact site of epitopes and nonepitopes. Frequency distribution of amino acids at TCR contact sites in antigen epitope and nonepitope peptides, and the amino acids below the dotted line are preferred by the epitope.
Figure 4Feature selection in antigen epitopes and ROC curves of antigen epitope classification. (a) Peptide features: twenty-four features were screened, and we defined the features on the right of the dotted line as being effective. (b) Trained model: the line in blue represents antigen epitopes without screening; the line in green represents the selection with the deletion of the rank(%) > 2 nonepitope; the line in red represents the selection with the deletion of the nonepitopes 100% matching the human reference peptide sequence. (c) External validation: the ROC curves for the external verification set. The line in purple represents modeling using antigen epitopes without filtering, and the line in pink represents modeling using antigen epitopes removing nonepitopes with rank(%) > 2 and HLA for which supertypes did not appear in the training set.
Figure 5Feature selection in neoantigen epitopes and ROC curves of neoantigen epitope classification. (a) Twenty-seven features were screened, and the 25 features on the right of the dotted line were reserved for modeling using a random forest algorithm. (b) ROC curves of neoantigen epitope classification.