| Literature DB >> 31850062 |
Tianyi Zhao1, Liang Cheng2, Tianyi Zang1, Yang Hu1.
Abstract
Peptide-based vaccine development needs accurate prediction of the binding affinity between major histocompatibility complex I (MHC I) proteins and their peptide ligands. Nowadays more and more machine learning methods have been developed to predict binding affinity and some of them have become the popular tools. However most of them are designed by the shallow neural networks. Bengio said that deep neural networks can learn better fits with less data than shallow neural networks. In our case, some of the alleles only have dozens of peptide data. In addition, we transform each peptide into a characteristic matrix and input it into the model. As we know when dealing with the problem that the input is a matrix, convolutional neural network (CNN) can find the most critical features by itself. Obviously, compared with the traditional neural network model, CNN is more suitable for predicting binding affinity. Different from the previous studies which are based on blocks substitution matrix (BLOSUM), we used novel feature to do the prediction. Since we consider that the order of the sequence, hydropathy index, polarity and the length of the peptide could affect the binding affinity and the properties of these amino acids are key factors for their binding to MHC, we extracted these information from each peptide. In order to make full use of the data we have obtained, we have integrated different lengths of peptides into 15mer based on the binding mode of peptide to MHC I. In order to demonstrate that our method is reliable to predict peptide-MHC binding, we compared our method with several popular methods. The experiments show the superiority of our method.Entities:
Keywords: convolutional neural network; deep learning; epitope prediction; human leukocyte antigen; peptide-major histocompatibility complex class I binding prediction
Year: 2019 PMID: 31850062 PMCID: PMC6892951 DOI: 10.3389/fgene.2019.01191
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Binding of major histocompatibility complex -I molecules to affinity peptides.
Figure 2Encoding peptides of different lengths.
Hydropathy Index of 21 amino acids.
| Amino Acids | Hydropathy Index | Amino acids | Hydropathy Index |
|---|---|---|---|
| R | −2.5 | K | −1.5 |
| D | −0.9 | Q | −0.85 |
| N | −0.78 | E | −0.74 |
| H | 0.40 | S | −0.18 |
| T | −0.05 | P | 0.12 |
| Y | 0.26 | C | 0.29 |
| G | 0.48 | A | 0.62 |
| M | 0.64 | W | 0.81 |
| L | 1.1 | V | 1.1 |
| F | 1.2 | I | 1.4 |
| X | 0 |
Five Classes of amino acids based on polarity.
| Class | Label | Amino acids |
|---|---|---|
| NONE | 0 | X |
| Polarity without charge | 1 | A, G, I, L, F, P, V |
| Non-polarity | 2 | N, C, Q, S, T, W, Y, M |
| Negative charge (acidity) | 3 | D, E |
| Positive charge (alkalinity) | 4 | R, H, K |
Figure 3Detailed flow of generating training set and testing set.
Figure 4The structure of convolutional neural network.
Detailed information of data.
| Name | Source |
|---|---|
| IEDB affinity data |
|
| BD2013 |
|
| MS data |
|
Figure 5The distribution of the number of peptides of 193 alleles.
Figure 6Length preference of 193 alleles.
Prediction results for human leukocyte antigen-1 (HLA-I) alleles(A).
| (A) Summary of prediction results for HLA-A alleles (F1 Score) | ||||||
|---|---|---|---|---|---|---|
| CNN-NF | DCNN | NetMHCPan | SMM | ANN | PickPocket | |
| Mean | 0.643 | 0.638 | 0.608 | 0.601 | 0.579 | 0.561 |
| Median | 0.603 | 0.696 | 0.667 | 0.667 | 0.667 | 0.625 |
| Standard Deviation | 0.166 | 0.23 | 0.267 | 0.250 | 0.286 | 0.318 |
|
| ||||||
| CNN-NF | DCNN | NetMHCPan | SMM | ANN | PickPocket | |
| Mean | 0.692 | 0.593 | 0.606 | 0.578 | 0.606 | 0.560 |
| Median | 0.621 | 0.667 | 0.625 | 0.615 | 0.643 | 0.593 |
| Standard Deviation | 0.228 | 0.286 | 0.286 | 0.302 | 0.290 | 0.277 |
Figure 7The distribution ratio of F1 score.
Figure 8AUC of each allele.
Figure 9The distribution of AUC in 193 experiments.
Figure 10Predicted length preference of HLA-A*24:06.
Figure 12Predicted length preference of HLA-C*05:01.