| Literature DB >> 34909478 |
Alexander G B Grønning1, Tim Kacprowski2,3, Camilla Schéele1.
Abstract
Peptide-based therapeutics are here to stay and will prosper in the future. A key step in identifying novel peptide-drugs is the determination of their bioactivities. Recent advances in peptidomics screening approaches hold promise as a strategy for identifying novel drug targets. However, these screenings typically generate an immense number of peptides and tools for ranking these peptides prior to planning functional studies are warranted. Whereas a couple of tools in the literature predict multiple classes, these are constructed using multiple binary classifiers. We here aimed to use an innovative deep learning approach to generate an improved peptide bioactivity classifier with capacity of distinguishing between multiple classes. We present MultiPep: a deep learning multi-label classifier that assigns peptides to zero or more of 20 bioactivity classes. We train and test MultiPep on data from several publically available databases. The same data are used for a hierarchical clustering, whose dendrogram shapes the architecture of MultiPep. We test a new loss function that combines a customized version of Matthews correlation coefficient with binary cross entropy (BCE), and show that this is better than using class-weighted BCE as loss function. Further, we show that MultiPep surpasses state-of-the-art peptide bioactivity classifiers and that it predicts known and novel bioactivities of FDA-approved therapeutic peptides. In conclusion, we present innovative machine learning techniques used to produce a peptide prediction tool to aid peptide-based therapy development and hypothesis generation.Entities:
Keywords: deep learning; machine learning; peptide bioactivity prediction; peptide therapeutics
Year: 2021 PMID: 34909478 PMCID: PMC8665375 DOI: 10.1093/biomethods/bpab021
Source DB: PubMed Journal: Biol Methods Protoc ISSN: 2396-8923
State-of-the-art binary classifiers for bioactivity prediction
| Prediction class | Algorithm type(s) | Name/description of tool |
|---|---|---|
| General bioactivity | Neural network | PeptideRanker [ |
| Antimicrobial peptides | Neural network | Convolutional long short-term memory neural network [ |
| Deep-AmPEP30 [ | ||
| Anticancer peptides | Support vector machine | mACPpred [ |
| Neuropeptide peptides | Various | PredNeuroP [ |
| Toxic peptides | Support vector machine | ToxinPred [ |
| Hemolytic peptides | Various | HLPpred-Fuse [ |
| Antioxidant peptides | Neural network | AnOxPePred [ |
Figure 1:Dendrogram template and overall architecture of the convolutional neural network. (A) Dendrogram template. From the bottom, the five class clades can be seen with the bioactivity classes of the clades written within each of them. Above the class clades are connecting levels that amalgamate all the class clades and complete the dendrogram. For visualization purposes, not all leaves of the class clades are shown. (B) The overall architecture of MultiPep. From the bottom, the input layer passes input data to the “network class-clades,” which consists of a CNN and an output layer. All layers with “Output” in their names are output layers. Above the network class clades are upper-level output layers that connect all the network class clades. The gray-filled circles on top of the output layers indicate the number of output nodes in the layers. The output nodes that are not explicitly named represent core bioactivity classes.
Classes and databases.
| Classes | CAMP3 | LAMP2 | APD3 | SATPdb | DBAASP | BIOPEP-UWM | PeptideDB | NeuroPedia | CancerPDD | BioDADPep | NeuroPep | Total class size | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ACE inhibitor | 1 | 39 | 1 | 687 | 29 | 973 | 4 | 1 | 1 | 65 | 3 | 973 |
| 1 | Antibacterial | 2104 | 11 516 | 2858 | 4533 | 10310 | 512 | 1059 | 16 | 265 | 31 | 37 | 13 538 |
| 2 | Anticancer | 245 | 1575 | 419 | 1177 | 1919 | 104 | 121 | 1 | 440 | 14 | 6 | 2426 |
| 3 | Antidiabetes | 11 | 46 | 17 | 125 | 41 | 189 | 9 | 3 | 2 | 1091 | 3 | 1112 |
| 4 | Antifreeze | 0 | 0 | 0 | 0 | 0 | 0 | 192 | 0 | 0 | 0 | 0 | 192 |
| 5 | Antifungal | 1457 | 4747 | 1955 | 2818 | 4172 | 219 | 559 | 11 | 197 | 22 | 27 | 5342 |
| 6 | Antihypertensive | 1 | 73 | 6 | 1664 | 51 | 783 | 10 | 3 | 5 | 107 | 8 | 1672 |
| 7 | Antimicrobial | 1882 | 13 446 | 2125 | 8887 | 3681 | 239 | 2544 | 15 | 225 | 9 | 57 | 14 362 |
| 8 | Antioxidative | 10 | 38 | 28 | 81 | 35 | 649 | 6 | 4 | 1 | 15 | 5 | 675 |
| 9 | Antiparasite | 133 | 479 | 190 | 342 | 388 | 46 | 94 | 5 | 20 | 6 | 10 | 503 |
| 10 | Antivirus | 735 | 4219 | 990 | 3918 | 1611 | 90 | 196 | 8 | 36 | 6 | 14 | 4500 |
| 11 | Cellcellsignaling | 11 | 38 | 15 | 679 | 30 | 22 | 378 | 392 | 0 | 3 | 391 | 682 |
| 12 | Cytokines_ growthfactors | 46 | 66 | 64 | 24 | 30 | 11 | 3729 | 0 | 1 | 0 | 1 | 3760 |
| 13 | Dipeptidyl peptidase inhibitor | 0 | 58 | 2 | 169 | 50 | 459 | 6 | 4 | 1 | 158 | 6 | 459 |
| 14 | Drugdelivery | 11 | 108 | 11 | 1484 | 96 | 17 | 44 | 29 | 5 | 1 | 45 | 1484 |
| 15 | Hemolytic | 183 | 1299 | 342 | 1279 | 1045 | 61 | 111 | 0 | 51 | 1 | 2 | 1339 |
| 16 | Neuropeptide | 18 | 70 | 26 | 473 | 33 | 108 | 2101 | 552 | 2 | 4 | 3822 | 3926 |
| 17 | Opioid | 0 | 5 | 0 | 27 | 1 | 117 | 12 | 9 | 2 | 0 | 16 | 117 |
| 18 | Peptidehormone | 22 | 149 | 28 | 499 | 33 | 34 | 6943 | 495 | 1 | 2 | 2077 | 6943 |
| 19 | Toxic | 411 | 1746 | 523 | 3840 | 1416 | 310 | 2404 | 2 | 76 | 1 | 10 | 5793 |
Bioactivity classes represented by upper-level output layers
| Layer outputs | Combined classes |
|---|---|
| Output_1_1 | Hemolytic, toxic, antiparasite, anticancer, antibacterial, antifungal, insecticides, antimicrobial, and antivirus |
| Output_1_2 | Cell–cell signaling, neuropeptide, peptide hormone, antifreeze, cytokines/growth factors, antioxidative, drugdelivery, opioid, ACE inhibitor, antihypertensive, antidiabetes, and dipeptidyl peptidase inhibitor |
| Output_2_1_1 | Hemolytic and toxic |
| Output_2_1_2 | Antiparasite, anticancer, antibacterial, antifungal, insecticides, antimicrobial, and antivirus |
| Output_2_2_1 | Cell–cell signaling, neuropeptide, and peptide hormone |
| Output_2_2_2 | Antifreeze, cytokines/growth factors, antioxidative, drugdelivery, opioid, ACE inhibitor, antihypertensive, antidiabetes, and dipeptidyl peptidase inhibitor |
| Output_3_1 | Antifreeze, cytokines/growth factors, antioxidative, drugdelivery, and opioid |
| Output_3_2 | ACE inhibitor, antihypertensive, antidiabetes, and dipeptidyl peptidase inhibitor |
Figure 2:Architecture of network class-clade convolutional neural networks. All layers with dark-gray backgrounds (except for the input layer) have weights, whereas the layers with white backgrounds are mathematical operations or dropout layers. Below the name of each layer (except for the dropout layers), the sizes of the layers’ output tensors are written. Arrows show the flow of information and how the layers are connected. What constitutes the convolutional neural network (CNN) is encapsulated by a light-gray box. “Conv4” means a 1D convolutional layer with a kernel size of four one-hot-encoded amino acids. The same logic applies to the remaining convolutional layers. “Dense500” means a dense layer with 500 nodes.
Mean and standard deviation of MCC of CV models on test sets
| Bioactivity class/output name | BCE + | BCE + | Weighted BCE | Weighted BCE—overall lowest loss |
|---|---|---|---|---|
| Output 1_1 | 0.839 ± 0.01 | 0.843 ± 0.01 | 0.803 ± 0.01 | 0.799 ± 0.01 |
| Output 1_2 | 0.856 ± 0.01 | 0.859 ± 0.01 | 0.81 ± 0.02 | 0.847 ± 0.01 |
| Output 2_1_1 | 0.782 ± 0.01 | 0.788 ± 0.01 | 0.764 ± 0.02 | 0.778 ± 0.01 |
| Output 2_1_2 | 0.849 ± 0.01 | 0.853 ± 0.01 | 0.813 ± 0.01 | 0.802 ± 0.01 |
| Output 2_2_1 | 0.895 ± 0.01 | 0.894 ± 0.01 | 0.875 ± 0.01 | 0.878 ± 0.01 |
| Output 2_2_2 | 0.826 ± 0.01 | 0.83 ± 0.01 | 0.751 ± 0.02 | 0.811 ± 0.01 |
| Output 3_1 | 0.798 ± 0.01 | 0.798 ± 0.01 | 0.711 ± 0.03 | 0.787 ± 0.02 |
| Output 3_2 | 0.766 ± 0.03 | 0.765 ± 0.02 | 0.749 ± 0.02 | 0.751 ± 0.02 |
| Hemolytic | 0.531 ± 0.04 |
| 0.514 ± 0.03 | 0.52 ± 0.03 |
| Toxic | 0.785 ± 0.01 |
| 0.768 ± 0.02 | 0.782 ± 0.01 |
| Antimicrobial |
| 0.674 ± 0.01 | 0.617 ± 0.01 | 0.587 ± 0.01 |
| Antivirus | 0.611 ± 0.02 |
| 0.593 ± 0.01 | 0.569 ± 0.03 |
| Antiparasite | 0.281 ± 0.05 | 0.173 ± 0.1 |
| 0.289 ± 0.07 |
| Anticancer | 0.51 ± 0.03 | 0.488 ± 0.02 |
| 0.47 ± 0.03 |
| Antibacterial | 0.702 ± 0.01 |
| 0.659 ± 0.01 | 0.644 ± 0.01 |
| Antifungal |
| 0.471 ± 0.02 | 0.411 ± 0.03 | 0.335 ± 0.06 |
| Cell–cell signaling |
| 0.552 ± 0.02 | 0.544 ± 0.03 | 0.548 ± 0.04 |
| Neuropeptide |
| 0.771 ± 0.02 | 0.74 ± 0.02 | 0.757 ± 0.03 |
| Peptide hormone |
| 0.844 ± 0.01 | 0.82 ± 0.01 | 0.828 ± 0.01 |
| Antifreeze | 0.966 ± 0.04 | 0.976 ± 0.03 | 0.976 ± 0.03 |
|
| Cytokines/growth factors | 0.932 ± 0.01 |
| 0.851 ± 0.02 | 0.923 ± 0.01 |
| Antioxidative | 0.467 ± 0.06 | 0.461 ± 0.06 | 0.378 ± 0.09 |
|
| Drugdelivery |
| 0.588 ± 0.04 | 0.248 ± 0.13 | 0.556 ± 0.05 |
| Opioid | 0.701 ± 0.11 | 0.676 ± 0.11 | 0.67 ± 0.11 |
|
| ACE inhibitor | 0.511 ± 0.02 |
| 0.502 ± 0.03 | 0.498 ± 0.03 |
| Antihypertensive |
| 0.664 ± 0.02 | 0.652 ± 0.02 | 0.655 ± 0.02 |
| Antidiabetes |
| 0.58 ± 0.03 | 0.587 ± 0.03 | 0.587 ± 0.03 |
| Dipeptidyl peptidase inhibitor | 0.564 ± 0.05 |
| 0.557 ± 0.03 | 0.559 ± 0.08 |
| Average of bioactivity classes: |
| 0.629 | 0.595 | 0.615 |
| Best total |
| 7 | 2 | 3 |
Notes: Bold values indicate the model with the best performance. The asterisk symbols indicate that more than three decimals are needed to reveal the highest values. At the two bottom rows, “Best total” shows the number of times the average of the models are the best and “Average of bioactivity classes” shows the average of the columns above, but only for the bioactivity classes.
Mean and standard deviation of F1 score of CV models on test sets
| Bioactivity class/output name |
|
| Weighted BCE | Weighted BCE—overall lowest loss |
|---|---|---|---|---|
| Output 1_1 | 0.936 ± 0.0 | 0.937 ± 0.0 | 0.922 ± 0.0 | 0.921 ± 0.0 |
| Output 1_2 | 0.909 ± 0.01 | 0.911 ± 0.01 | 0.874 ± 0.01 | 0.903 ± 0.01 |
| Output 2_1_1 | 0.801 ± 0.01 | 0.807 ± 0.01 | 0.782 ± 0.02 | 0.799 ± 0.01 |
| Output 2_1_2 | 0.928 ± 0.0 | 0.929 ± 0.0 | 0.911 ± 0.0 | 0.907 ± 0.0 |
| Output 2_2_1 | 0.915 ± 0.01 | 0.914 ± 0.01 | 0.898 ± 0.01 | 0.902 ± 0.01 |
| Output 2_2_2 | 0.857 ± 0.01 | 0.86 ± 0.01 | 0.787 ± 0.02 | 0.844 ± 0.01 |
| Output 3_1 | 0.82 ± 0.01 | 0.821 ± 0.01 | 0.723 ± 0.04 | 0.81 ± 0.02 |
| Output 3_2 | 0.781 ± 0.02 | 0.781 ± 0.02 | 0.766 ± 0.02 | 0.768 ± 0.02 |
| Hemolytic | 0.538 ± 0.04 |
| 0.52 ± 0.03 | 0.528 ± 0.03 |
| Toxic | 0.8 ± 0.01 |
| 0.78 ± 0.02 | 0.799 ± 0.01 |
| Antimicrobial | 0.776 ± 0.01 |
| 0.741 ± 0.01 | 0.724 ± 0.01 |
| Antivirus | 0.62 ± 0.02 |
| 0.596 ± 0.02 | 0.558 ± 0.04 |
| Antiparasite | 0.267 ± 0.06 | 0.151 ± 0.1 |
| 0.28 ± 0.07 |
| Anticancer | 0.5 ± 0.04 | 0.478 ± 0.02 |
| 0.447 ± 0.06 |
| Antibacterial | 0.793 ± 0.01 |
| 0.764 ± 0.01 | 0.753 ± 0.01 |
| Antifungal |
| 0.535 ± 0.02 | 0.478 ± 0.03 | 0.394 ± 0.07 |
| Cell-cell signaling |
| 0.551 ± 0.02 | 0.531 ± 0.03 | 0.536 ± 0.04 |
| Neuropeptide |
| 0.79 ± 0.02 | 0.761 ± 0.02 | 0.777 ± 0.03 |
| Peptide hormone |
| 0.868 ± 0.01 | 0.847 ± 0.01 | 0.854 ± 0.01 |
| Antifreeze | 0.965 ± 0.04 | 0.976 ± 0.03 | 0.976 ± 0.03 |
|
| Cytokines/growth factors | 0.937 ± 0.01 |
| 0.863 ± 0.02 | 0.929 ± 0.01 |
| Antioxidative | 0.468 ± 0.06 | 0.466 ± 0.07 | 0.318 ± 0.11 |
|
| Drugdelivery |
| 0.583 ± 0.04 | 0.166 ± 0.12 | 0.552 ± 0.05 |
| Opioid | 0.694 ± 0.11 | 0.667 ± 0.11 | 0.654 ± 0.11 |
|
| ACE inhibitor |
| 0.493 ± 0.02 | 0.505 ± 0.03 | 0.501 ± 0.03 |
| Antihypertensive |
| 0.66 ± 0.02 | 0.652 ± 0.02 | 0.656 ± 0.02 |
| Antidiabetes |
| 0.584 ± 0.03 | 0.586 ± 0.03 | 0.589 ± 0.03 |
| Dipeptidyl peptidase inhibitor | 0.562 ± 0.05 |
| 0.543 ± 0.04 | 0.55 ± 0.08 |
| Average of bioactivity classes: |
| 0.642 | 0.604 | 0.63 |
| Best total: |
| 7 | 2 | 3 |
Notes: Bold values indicate the model with the best performance. The asterisk symbol indicates that more than three decimals are needed to reveal the highest values. At the two bottom rows, “Best total” shows the number of times the average of the models are the best, and “Average of bioactivity classes” shows the average of the columns above, but only for the bioactivity classes.
Comparisons against state-of-the-art peptide bioactivity predictors
| MCC | F1 score | Precision | Recall | Accuracy | |
|---|---|---|---|---|---|
| MultiPep |
|
|
| 0.848 |
|
| Neural network by Veltri | 0.526 | 0.836 | 0.773 |
| 0.78 |
| MultiPep |
|
|
|
|
|
| Deep-AmPEP30, link | 0.657 | 0.9 | 0.92 | 0.88 | 0.858 |
| RF-AmPEP30, link | 0.712 | 0.914 | 0.94 | 0.889 | 0.879 |
| MultiPep |
| 0.631 |
| 0.461 |
|
| mACPpred, link | 0.459 |
| 0.512 |
| 0.688 |
| MultiPep |
|
|
| 0.833 |
|
| PredNeuroP, link | 0.698 | 0.722 | 0.579 |
| 0.901 |
| MultiPep |
|
|
|
|
|
| ToxinPred, link | 0.567 | 0.687 | 0.943 | 0.54 | 0.76 |
| MultiPep |
|
|
| 0.569 |
|
| HLPred-Fuse, link | 0.435 | 0.511 | 0.355 |
| 0.708 |
| MultiPep |
|
|
| 0.381 |
|
| AnOxPePred, link | 0.29 | 0.394 | 0.377 |
| 0.822 |
Notes: Bold values indicate that the performance is better than the compared tool(s). All values in the table have been rounded to three decimals. Links to the webtools/github of the tools are inserted next to the names of the tools in the first column from the left.
Prediction error of MultiPep compared with PeptideRanker
| Program names | Rounded prediction error |
|---|---|
| MultiPep (Output_1, | 0.074 |
| MultiPep (Output_1) | 0.140 |
| PeptideRanker | 0.299 |
Note: Final prediction errors rounded to three decimals.
Performance of MultiPep and Peptipedia
| MCC | F1 score | Precision | Recall | Accuracy | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Bioactivity classes |
|
|
|
|
|
|
|
|
|
|
| Hemolytic |
| 0.023 |
| 0.064 |
| 0.044 |
| 0.115 |
| 0.895 |
| Toxic |
| −0.052 |
| 0.172 |
| 0.115 |
| 0.341 |
| 0.551 |
| Antimicrobial |
| 0.065 |
| 0.48 |
| 0.355 |
| 0.74 |
| 0.463 |
| Antivirus |
| 0.021 |
| 0.171 |
| 0.116 |
| 0.324 |
| 0.666 |
| Antiparasite |
| 0.039 |
| 0.031 |
| 0.016 | 0.2 |
|
| 0.647 |
| Anticancer |
| 0.08 |
| 0.142 |
| 0.086 |
| 0.419 |
| 0.719 |
| Antibacterial |
| 0.254 |
| 0.497 |
| 0.477 |
| 0.519 |
| 0.671 |
| Antifungal |
| 0.211 |
| 0.32 |
| 0.214 | 0.569 |
|
| 0.668 |
| Neuropeptide |
| 0.249 |
| 0.296 |
| 0.377 |
| 0.243 |
| 0.896 |
| Drugdelivery |
| 0.111 |
| 0.143 |
| 0.133 |
| 0.154 |
| 0.937 |
| Antihypertensive |
| 0.0 |
| 0.0 |
| 0.0 |
| 0.0 |
| 0.978 |
| Antidiabetes |
| 0.046 |
| 0.05 |
| 0.097 |
| 0.034 |
| 0.973 |
Notes: Bold values indicate that the performance is better than the compared tool. All values in the table have been rounded to three decimals. M, MultiPep; P, Peptipedia.
Prediction of FDA-approved therapeutic peptides
| Therapeutic peptides | Predictions | |
|---|---|---|
|
| Antibacterial: | 0.486 |
|
|
| |
|
|
|
|
| Antivirus: | 0.232 | |
|
|
|
|
| Toxic: | 0.321 | |
|
|
| |
| Antimicrobial: | 0.276 | |
|
| Cell–cell signaling: | 0.430 |
|
|
| |
|
|
| |
|
| Cell–cell signaling: | 0.225 |
|
|
| |
|
|
| |
|
|
|
|
|
|
| |
|
|
| |
|
|
|
|
|
|
| |
|
|
|
|
|
|
| |
|
|
|
|
|
|
| |
|
|
|
|
|
|
| |
|
| Neuropeptide: | 0.173 |
|
|
| |
|
|
|
|
Notes: The names of the therapeutic peptides are written in Column 1 together with the peptides’ brand names, designer company, and a link to the peptides in THPdb. The second column contains MultiPep’s predictions. Predictions in bold are above threshold at 0.5. The predictions are based on the average of all 10 CV models. The threshold for showing prediction scores is 0.15.