| Literature DB >> 34079815 |
Ziqi Chen1, Martin Renqiang Min2, Xia Ning1,3,4.
Abstract
T-cell receptors can recognize foreign peptides bound to major histocompatibility complex (MHC) class-I proteins, and thus trigger the adaptive immune response. Therefore, identifying peptides that can bind to MHC class-I molecules plays a vital role in the design of peptide vaccines. Many computational methods, for example, the state-of-the-art allele-specific method MHCflurry , have been developed to predict the binding affinities between peptides and MHC molecules. In this manuscript, we develop two allele-specific Convolutional Neural Network-based methods named ConvM and SpConvM to tackle the binding prediction problem. Specifically, we formulate the problem as to optimize the rankings of peptide-MHC bindings via ranking-based learning objectives. Such optimization is more robust and tolerant to the measurement inaccuracy of binding affinities, and therefore enables more accurate prioritization of binding peptides. In addition, we develop a new position encoding method in ConvM and SpConvM to better identify the most important amino acids for the binding events. We conduct a comprehensive set of experiments using the latest Immune Epitope Database (IEDB) datasets. Our experimental results demonstrate that our models significantly outperform the state-of-the-art methods including MHCflurry with an average percentage improvement of 6.70% on AUC and 17.10% on ROC5 across 128 alleles.Entities:
Keywords: attention; convolutional neural networks; deep learning; peptide vaccine design; prioritization
Year: 2021 PMID: 34079815 PMCID: PMC8165219 DOI: 10.3389/fmolb.2021.634836
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
Binding affinity measurement mapping.
|
|
|
|
| Negative | >5,000 | 1 |
| Positive-low | 1,000–5,000 | 2 |
| Positive-intermediate | 500–1,000 | 3 |
| Positive | 100–500 | 4 |
| Positive-high | 0–100 | 5 |
Data statistics.
| Variables | Count |
|---|---|
| Entries | 202,510 |
| Alleles | 128 |
| Peptides | 53,253 |
Notations.
|
|
|
| Peptides and alleles | |
| | A peptide |
| | An amino acid of a peptide sequence |
| | A set of peptides |
| | An allele |
| Binding | |
| | Original/normalized binding affinity for a peptide-MHC pair |
| | Binding level for a peptide-MHC pair |
| Embeddings | |
| | Encoding vector of amino acid type |
| | Embedding vector of each amino acid |
| | Position embedding of each |
| | Feature matrix for a peptide sequence |
| | Feature matrix for a padded peptide sequence (i.e., input of global kernel in |
| Parameters | |
| | A scoring function |
| | Dimension of amino acid embedding |
| | Number of filters in convolution layer |
| | Dimension of position embedding |
| | Number of global kernels in |
| | Dimension of hidden units |
| | Kernel size in convolutional neural layer |
| | Attention weight learned in attention layer |
FIGURE 1Architectures of
and
.
Overall performance comparison (H; ++).
|
|
|
|
|
|
|
|
|
|
|
|
| 7.93 | 4.71 | 2.80 | 5.48 | 5.13 | 8.43 | 7.26 |
|
| 5.63 | 5.47 | 1.66 | 3.59 | 4.56 | 7.11 | 4.65 | |
|
| 6.35 | 5.70 | 0.99 | 2.59 | 4.16 | 4.69 | 4.42 | |
| MS | −6.26 | 0.02 | −7.87 | −3.98 | 0.16 | −3.34 | −3.94 | |
|
|
|
|
|
|
|
|
|
|
|
| 8.97 | 8.64 | 6.57 | 7.36 | 6.04 | 12.89 | 10.85 | |
|
| 10.01 | 8.87 | 4.73 | 6.00 | 6.00 | 14.01 | 11.36 | |
| MS | 8.66 | 8.14 | 2.77 | 4.28 | 3.93 | 13.54 | 9.68 | |
|
|
| 11.06 | 8.93 | 5.60 | 5.20 | 4.42 | 11.10 | 9.51 |
|
| 9.45 | 5.77 | 5.09 | 4.43 | 4.72 | 8.05 | 6.95 | |
|
| 8.83 | 6.35 | 4.54 | 5.73 | 4.52 | 7.10 | 5.88 | |
| MS | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
The values in the table are percentage improvement compared with the baseline with MS. Models are trained using ++ encoding methods, and selected with respect to and evaluated using the 7 evaluation metrics. The best improvement with respect to each metric is bold.
Overall performance comparison across HLA-A and HLA-B alleles (; ++).
|
|
|
|
|
|
|
|
|
|
|
| HLA-A |
|
| 0.56 | 3.38 | 1.32 | 4.68 | 2.04 | 2.04 | −0.43 |
|
| −3.12 | 1.06 | −2.44 | 1.12 | 0.95 | −1.27 | −3.03 | ||
|
| −4.23 | 3.38 | −3.41 | −2.02 | 0.76 | −3.93 | −4.62 | ||
| MS | −4.79 | 1.36 | −5.41 | −0.22 | −0.04 | 1.71 | −0.36 | ||
|
|
|
|
|
| 1.94 | 2.61 |
| 4.34 | |
|
| 3.73 | 2.66 | 3.10 | 4.81 |
| 6.10 | 1.79 | ||
|
| 3.28 | 4.47 | 3.02 | 1.90 | 2.54 | 4.14 | 0.51 | ||
| MS | −1.40 | 3.02 | −2.74 | −0.51 | 0.84 | 8.53 | 3.76 | ||
|
|
| 2.75 | 2.22 | 2.87 |
| 1.87 | 3.61 |
| |
|
| 2.43 | 1.65 | 2.39 | 4.37 | 2.29 | 1.30 | 0.50 | ||
|
| 2.51 | 1.64 | 2.21 | 0.57 | 1.94 | 0.11 | −0.57 | ||
| MS | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ||
| HLA-B |
|
|
|
| 0.71 | 5.38 | 2.45 | 11.20 | 3.87 |
|
| −0.81 | −1.63 | −1.43 | 1.57 | 2.18 | 5.47 | 0.19 | ||
|
| 0.27 | −2.90 | −4.44 | 3.44 | 2.29 | −0.42 | −1.21 | ||
| MS | −8.87 | 2.99 | −13.18 | −4.75 | −1.69 | 0.75 | −4.06 | ||
|
|
| 4.63 | 4.45 |
| 7.43 | 3.24 |
|
| |
|
| 3.53 | 2.37 | 5.85 |
| 3.06 | 15.48 | 8.95 | ||
|
| 7.08 | −1.05 | 4.79 | 9.04 |
| 15.77 | 8.31 | ||
| MS | −1.44 | −2.19 | −3.57 | 2.56 | 0.42 | 8.73 | 4.16 | ||
|
|
| 4.10 | 5.21 | 5.04 | 8.04 | 2.96 | 12.14 | 6.58 | |
|
| −2.16 | −0.99 | 3.89 | 7.41 | 2.17 | 11.69 | 5.31 | ||
|
| 3.51 | 2.16 | 3.02 | 4.78 | 1.49 | 7.52 | 3.05 | ||
| MS | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
The values in the table are percentage improvement compared with the baseline with MS. Models are trained using ++ encoding methods, and selected with respect to and evaluated using the 7 evaluation metrics. The best improvement with respect to each metric is bold.
FIGURE 2Performance improvement compared with with MS among all Alleles.
Encoding performance comparison on with using ().
|
|
|
|
|
|
|
|
|
|
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| −6.92 | −4.38 | −4.97 | −2.37 | −0.73 | −5.91 | −4.63 |
|
| −3.33 | 0.69 | −2.76 | −0.56 | −0.15 | −1.22 | −1.63 |
|
| −0.96 | 0.95 | 0.21 | 0.79 | 0.3 | 1.57 | 0.89 |
|
|
|
| 0.69 | 1.49 | 0.61 | 3.39 | 2.49 |
|
| −4.72 | −1.65 | −3.37 | −1.26 | −0.32 | −3.25 | −2.67 |
|
| −0.36 | 0.11 |
|
|
|
|
|
The values in the table are percentage improvement compared with with using . Models are selected with respect to and evaluated using the 7 evaluation metrics. The best improvement with respect to each metric is bold.
FIGURE 3Attention weights and motifs for HLA-A*02:01 and HLA-A*24:02.
FIGURE 4Attention weights and motifs for HLA-B*27:05 and HLA-B*58:01.
FIGURE 5Attention weights of three peptides for HLA-A*02:01 learned from .
FIGURE 6Performance comparison on and with and without position encoding.
Performance comparison over mass spectrometry dataset in .
|
|
|
|
|
|
| HLA-A*01:01 |
| 0.6578 | 0.7700 |
|
| HLA-A*02:01 |
| 0.6182 | 0.6516 |
|
| HLA-A*02:03 |
| 0.7060 | 0.6984 |
|
| HLA-A*02:07 |
| 0.2645 |
| 0.4608 |
| HLA-A*03:01 |
| 0.5238 | 0.5876 |
|
| HLA-A*24:02 |
| 0.6432 | 0.7257 |
|
| HLA-A*29:02 |
| 0.6334 | 0.7007 |
|
| HLA-A*31:01 |
| 0.3989 |
| 0.4209 |
| HLA-A*68:02 |
| 0.4975 |
| 0.4960 |
| HLA-B*35:01 | 0.6443 | 0.6119 |
|
|
| HLA-B*44:02 | 0.7213 | 0.6952 |
|
|
| HLA-B*44:03 |
| 0.6414 | 0.7478 |
|
| HLA-B*51:01 | 0.7104 | 0.6305 | 0.6248 |
|
| HLA-B*54:01 | 0.6371 | 0.5882 | 0.6230 |
|
| HLA-B*57:01 | 0.6223 | 0.5331 | 0.5952 |
|
The best performance for each allele is bold. The second best performance for each allele is underlined.
Performance comparison over mass spectrometry dataset in .
|
|
|
|
|
|
| HLA-A*01:01 |
| 0.9854 |
| 0.9864 |
| HLA-A*02:01 |
| 0.9775 | 0.9798 |
|
| HLA-A*02:03 |
|
| 0.9879 | 0.9878 |
| HLA-A*02:07 |
| 0.9176 |
| 0.9339 |
| HLA-A*03:01 |
| 0.9632 | 0.9648 |
|
| HLA-A*24:02 |
| 0.9857 | 0.9895 |
|
| HLA-A*29:02 | 0.9580 | 0.9549 |
|
|
| HLA-A*31:01 | 0.9276 |
|
| 0.9204 |
| HLA-A*68:02 |
| 0.8039 |
| 0.7930 |
| HLA-B*35:01 | 0.8765 |
| 0.8744 |
|
| HLA-B*44:02 |
| 0.9770 | 0.9791 |
|
| HLA-B*44:03 |
| 0.9696 | 0.9723 |
|
| HLA-B*51:01 |
| 0.9275 | 0.9195 |
|
| HLA-B*54:01 | 0.9255 |
| 0.9341 | 0.9264 |
| HLA-B*57:01 |
| 0.8667 | 0.8756 |
|
The best performance for each allele is bold. The second best performance for each allele is underlined.
FIGURE 7Comparison between mass spectrometry dataset and IEDB dataset.