| Literature DB >> 35741369 |
Limin Jiang1, Jijun Tang2, Fei Guo3, Yan Guo1.
Abstract
As an important part of immune surveillance, major histocompatibility complex (MHC) is a set of proteins that recognize foreign molecules. Computational prediction methods for MHC binding peptides have been developed. However, existing methods share the limitation of fixed peptide sequence length, which necessitates the training of models by peptide length or prediction with a length reduction technique. Using a bidirectional long short-term memory neural network, we constructed BVMHC, an MHC class I and II binding prediction tool that is independent of peptide length. The performance of BVMHC was compared to seven MHC class I prediction tools and three MHC class II prediction tools using eight performance criteria independently. BVMHC attained the best performance in three of the eight criteria for MHC class I, and the best performance in four of the eight criteria for MHC class II, including accuracy and AUC. Furthermore, models for non-human species were also trained using the same strategy and made available for applications in mice, chimpanzees, macaques, and rats. BVMHC is composed of a series of peptide length independent MHC class I and II binding predictors. Models from this study have been implemented in an online web portal for easy access and use.Entities:
Keywords: bidirectional long short-term memory neural network; deep learning; major histocompatibility complex
Year: 2022 PMID: 35741369 PMCID: PMC9220200 DOI: 10.3390/biology11060848
Source DB: PubMed Journal: Biology (Basel) ISSN: 2079-7737
Figure 1Overview of BVMHC. One-hot encoding was used to convert a peptide sequence to a matrix. BLOSUM was applied to initialize kernels in the convolutional neural network that was used to extract the peptide sequence feature at the evolutionary level. The biLSTMmodel was then applied to process the merged matrix at the sequential level.
Figure 2BVMHC performance on human datasets and binding motifs of a few extremity models. (A,B) The performance of BVMHC on the training dataset for predicting human MHC Class I (A) and II binders (B) in five-fold cross-validation. (C) The motifs of binders and non-binders for MHC Class I allele HLA-A*02:50. (D) The motifs of binders and non-binders for MHC Class I allele HLA-B*15:02. (E) The motifs of binders and non-binders for MHC Class II allele HLA-DQB1*05:01.
Five-fold cross-validation results stratified by peptide length.
| Length | Accuracy | AUC | F1 | MCC | Specificity | Sensitivity | Precision | AUPR | Positive 1 | Negative 2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Class I | 8 mer | 0.891 | 0.924 | 0.783 | 0.531 | 0.887 | 0.677 | 0.525 | 0.785 | 229 | 1879 |
| 9 mer | 0.883 | 0.915 | 0.745 | 0.650 | 0.902 | 0.735 | 0.760 | 0.800 | 23,000 | 72,963 | |
| 10 mer | 0.813 | 0.850 | 0.693 | 0.527 | 0.842 | 0.690 | 0.661 | 0.725 | 7263 | 14,024 | |
| 11 mer | 0.879 | 0.905 | 0.768 | 0.608 | 0.881 | 0.756 | 0.651 | 0.755 | 310 | 1604 | |
| Others | 0.986 | 1.000 | 0.992 | 0.564 | 0.750 | 1.000 | 0.985 | 1.000 | 54 | 803 | |
| Class II | 13 mer | 0.857 | 0.879 | 0.883 | 0.700 | 0.833 | 0.872 | 0.895 | 0.923 | 232 | 205 |
| 14 mer | 0.898 | 0.907 | 0.880 | 0.792 | 0.912 | 0.880 | 0.880 | 0.873 | 131 | 239 | |
| 15 mer | 0.868 | 0.906 | 0.781 | 0.687 | 0.912 | 0.769 | 0.794 | 0.840 | 16,743 | 25,683 | |
| 16 mer | 0.776 | 0.846 | 0.802 | 0.545 | 0.718 | 0.823 | 0.782 | 0.878 | 563 | 569 | |
| 17 mer | 0.680 | 0.673 | 0.429 | 0.312 | 0.933 | 0.300 | 0.750 | 0.643 | 106 | 257 | |
| 18 mer | 0.643 | 0.939 | 0.706 | 0.452 | 1.000 | 0.545 | 1.000 | 0.986 | 71 | 40 | |
| 19 mer | 0.875 | 0.938 | 0.857 | 0.775 | 1.000 | 0.750 | 1.000 | 0.950 | 55 | 75 | |
| 20 mer | 0.750 | 0.900 | 0.500 | 0.488 | 1.000 | 0.333 | 1.000 | 0.886 | 65 | 66 | |
| Other | 0.690 | 0.640 | 0.381 | 0.183 | 0.758 | 0.444 | 0.333 | 0.566 | 81 | 259 |
1 Number of positives; 2 Number of negatives
The comparison results of the BVMHC model against seven other prediction tools on the independent validation dataset. The best performance value in each comparison track is highlighted in bold text.
| Methods | Accuracy | Sensitivity | Specificity | AUC | AUPR | F1 | MCC | Precision | Positive 1 | Negative 2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Class I | BVMHC | 0.597 | 0.371 |
|
| 0.866 | 0.531 |
|
| 197 | 123 |
| NetMHCcons [ |
|
| 0.943 | 0.865 | 0.890 |
| 0.365 | 0.916 | 197 | 123 | |
| SMM [ | 0.584 | 0.350 |
| 0.859 |
| 0.509 | 0.357 | 0.932 | 197 | 123 | |
| NetMHCpan [ | 0.566 | 0.330 | 0.943 | 0.867 | 0.886 | 0.483 | 0.318 | 0.903 | 197 | 123 | |
| ANN [ | 0.563 | 0.325 | 0.943 | 0.867 | 0.880 | 0.478 | 0.314 | 0.901 | 197 | 123 | |
| PickPocket [ | 0.563 | 0.345 | 0.911 | 0.813 | 0.833 | 0.493 | 0.289 | 0.861 | 197 | 123 | |
| NetMHCpan EL [ | 0.553 | 0.335 | 0.902 | 0.816 | 0.856 | 0.480 | 0.269 | 0.846 | 197 | 123 | |
| comblib_sidney2008 [ | NAN § | NAN § | NAN § | 0.744 | NAN § | NAN § | NAN § | NAN § | 68 | 46 | |
| Class II | BVMHC |
|
| 0.965 | 0.718 | 0.417 |
|
| 0.600 | 18 | 113 |
| NN-align [ | 0.863 | 0.278 | 0.956 |
|
| 0.357 | 0.303 | 0.500 | 18 | 113 | |
| NETMHCIIPan [ | 0.870 | 0.111 |
| 0.795 | 0.423 | 0.190 | 0.235 |
| 18 | 113 | |
| SMM-align [ | 0.840 | 0.000 | 0.973 | 0.787 | 0.319 | NA § | −0.061 | 0.000 | 18 | 113 |
1 Number of positives 2 Number of negatives § NA: the sum of Sensitivity and Precision is zero, thus F1 is NA. § NAN: the evaluation indices cannot be obtained because the original score threshold is not available. The value in bold are the best for each column.
Figure 3Receiver-Operating-Characteristic (ROC) curves of the eight tools for predicting MHC class I binders on the independent validation dataset. (A–D). BVMHC and seven existing prediction tools for overall (A), 9-mer (B), 10-mer (C), and 11-mer (D) MHC class I binders, respectively.
Performance evaluation results of BVMHC model on non-human species.
| Alleles | Accuracy | AUC | F1 | MCC | Specificity | Sensitivity | Precision | AUPR | |
|---|---|---|---|---|---|---|---|---|---|
| Class I | H-2-Db | 0.829 | 0.855 | 0.573 | 0.466 | 0.897 | 0.564 | 0.583 | 0.602 |
| H-2-Dd | 0.924 | 0.870 | 0.696 | 0.660 | 0.975 | 0.615 | 0.800 | 0.751 | |
| H-2-Ld | 0.814 | 0.852 | 0.698 | 0.564 | 0.875 | 0.682 | 0.714 | 0.779 | |
| Mamu-A07 | 0.905 | 0.949 | 0.854 | 0.783 | 0.929 | 0.854 | 0.854 | 0.902 | |
| Mamu-A11 | 0.822 | 0.899 | 0.726 | 0.595 | 0.880 | 0.707 | 0.747 | 0.805 | |
| Mamu-A2201 | 0.908 | 0.957 | 0.854 | 0.789 | 0.955 | 0.814 | 0.897 | 0.943 | |
| Mamu-B01 | 0.942 | 0.865 | 0.667 | 0.654 | 0.988 | 0.550 | 0.846 | 0.767 | |
| Mamu-B03 | 0.857 | 0.921 | 0.769 | 0.666 | 0.903 | 0.758 | 0.781 | 0.843 | |
| Mamu-B08 | 0.852 | 0.911 | 0.690 | 0.600 | 0.875 | 0.769 | 0.625 | 0.776 | |
| Mamu-B17 | 0.822 | 0.882 | 0.717 | 0.592 | 0.838 | 0.782 | 0.662 | 0.710 | |
| Mamu-B52 | 0.827 | 0.870 | 0.870 | 0.617 | 0.677 | 0.912 | 0.832 | 0.884 | |
| Patr-A0101 | 0.816 | 0.838 | 0.619 | 0.520 | 0.935 | 0.520 | 0.765 | 0.688 | |
| Patr-A0401 | 0.881 | 0.904 | 0.636 | 0.565 | 0.929 | 0.636 | 0.636 | 0.616 | |
| Patr-A0701 | 0.825 | 0.820 | 0.545 | 0.438 | 0.901 | 0.522 | 0.571 | 0.682 | |
| Patr-B0101 | 0.911 | 0.947 | 0.794 | 0.759 | 0.991 | 0.675 | 0.964 | 0.894 | |
| Patr-B1301 | 0.875 | 0.917 | 0.903 | 0.727 | 0.824 | 0.903 | 0.903 | 0.951 | |
| RT1A | 0.893 | 0.923 | 0.400 | 0.352 | 0.923 | 0.500 | 0.333 | 0.667 | |
| Class II | H-2-IAb | 0.826 | 0.797 | 0.489 | 0.394 | 0.925 | 0.423 | 0.579 | 0.627 |
| H-2-IAd | 0.810 | 0.810 | 0.571 | 0.452 | 0.896 | 0.533 | 0.615 | 0.632 |