| Literature DB >> 31870282 |
Xin Su1, Jing Xu2, Yanbin Yin3, Xiongwen Quan1, Han Zhang4.
Abstract
BACKGROUND: Antibiotic resistance has become an increasingly serious problem in the past decades. As an alternative choice, antimicrobial peptides (AMPs) have attracted lots of attention. To identify new AMPs, machine learning methods have been commonly used. More recently, some deep learning methods have also been applied to this problem.Entities:
Keywords: Antimicrobial peptide; Deep learning; Fusion model; Multi-scale convolutional network
Mesh:
Substances:
Year: 2019 PMID: 31870282 PMCID: PMC6929291 DOI: 10.1186/s12859-019-3327-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of modified models
| Model | SENS (%) | SPEC (%) | ACC (%) | MCC | auROC (%) |
|---|---|---|---|---|---|
| Replacing embedding layer | 89.61 | 93.26 | 91.43 | 0.8282 | 96.75 |
| Replacing multi-scale convolutional network | 89.75 | 91.15 | 90.44 | 0.8091 | 96.08 |
| Replacing pooling1 with LSTM | 89.75 | 93.25 | 91.5 | 0.8305 | 96.27 |
| Without pooling2 | 91.15 | 92.56 | 91.85 | 0.8371 | 96.3 |
| Additional FC layers | 90.31 | 93.68 | 91.99 | 0.8403 | 97.09 |
| proposed model | 91.01 | 93.64 | 92.41 | 0.8486 | 97.23 |
Dataset summary
| Dataset | DAMP dataset [ | AntiBP2 dataset | AIP dataset | APD3 dataset [ |
|---|---|---|---|---|
| Positive samples | 1778 | 999 | 1678 | 1713 |
| Negative samples | 1778 | 999 | 2516 | 8565 |
Fig. 110-fold cross validation performance of the model with single convolutional layer. We replaced the multi-convolutional network with a simple convolutional layer. This figure shows how the modified model performs when the filter length of the convolutional layer changes
Fig. 210-fold cross validation performance of the model with different parameter N
Comparison with the state-of-the-art methods
| Method | SENS (%) | SPEC (%) | ACC (%) | MCC | auROC (%) | |
|---|---|---|---|---|---|---|
| AntiBP2 | 87.91 | 90.8 | 89.37 | 0.7876 | 89.36 | < 0.001 |
| CAMP-ANN | 82.98 | 85.09 | 84.04 | 0.6809 | 84.06 | < 0.001 |
| CAMP-DA | 87.08 | 80.76 | 83.92 | 0.6797 | 89.97 | < 0.001 |
| CAMP-RF | 92.7 | 82.44 | 87.57 | 0.7554 | 93.63 | < 0.001 |
| CAMP-SVM | 88.9 | 79.92 | 84.41 | 0.691 | 90.63 | < 0.001 |
| iAMP-2 L | 83.99 | 85.86 | 84.9 | 0.6983 | 84.9 | < 0.001 |
| iAMPpred | 89.33 | 87.22 | 88.27 | 0.7656 | 94.44 | < 0.001 |
| gkmSVM | 88.34 | 90.59 | 89.46 | 0.7895 | 94.98 | < 0.001 |
| DNN | 89.89 | 92.13 | 91.01 | 0.8204 | 96.48 | < 0.001 |
| proposed model | 91.01 | 93.64 | 92.41 | 0.8486 | 97.23 | < 0.001 |
| fusion model with DNN | 88.48 | 93.26 | 90.87 | 0.8183 | 96.24 | < 0.001 |
| proposed fusion model | 89.89 | 94.96 | 92.55 | 0.8523 | 97.3 | < 0.001 |
Training time of modified models
| Model | Time for training on each epoch(s) |
|---|---|
| Replacing embedding layer | 13.69 |
| Replacing multi-scale convolutional network | 13.95 |
| Replacing pooling1 with LSTM | 121.4 |
| Without pooling2 | 56.06 |
| Additional dense layers | 58.45 |
| proposed model | 56.36 |
Comparison of the state-of-the-art methods on AntiBP2 dataset
| Method | ACC (%) | MCC |
|---|---|---|
| CAMP-ANN | 81.03 | 0.624 |
| CAMP-DA | 84.28 | 0.69 |
| CAMP-RF | 87.09 | 0.752 |
| CAMP-SVM | 86.69 | 0.739 |
| iAMP-2 L | 86.34 | 0.735 |
| iAMPpred | 92.84 | 0.858 |
| AntiBP2 | 91.64 | 0.831 |
| DNN | 92.95 | 0.86 |
| proposed model | 93.38 | 0.862 |
Comparison of the state-of-the-art methods on AIP dataset
| Model | SENS (%) | SPEC (%) | ACC (%) | MCC | auROC (%) | |
|---|---|---|---|---|---|---|
| DNN | 59.05 | 73.61 | 67.78 | 0.3273 | 71.12 | < 0.001 |
| proposed model | 55.24 | 84.9 | 73.02 | 0.4245 | 76.8 | < 0.001 |
| AIPpred | 75.8 | 71.11 | 73.4 | 0.46 | 80.1 | < 0.001 |
| fusion model with DNN | 51.67 | 79.81 | 68.54 | 0.3285 | 71.23 | < 0.001 |
| proposed fusion model | 60 | 83.15 | 73.88 | 0.4459 | 78.34 | < 0.001 |
Comparison of methods on APD3 dataset
| Method | SENS (%) | SPEC (%) | PREC (%) | BalACC (%) | ACC (%) | MCC |
|---|---|---|---|---|---|---|
| CAMP-ANN | 83.30 | 83.36 | 50.04 | 83.33 | 83.35 | 0.5549 |
| CAMP-DA | 88.09 | 81.25 | 48.44 | 84.67 | 82.39 | 0.5623 |
| CAMP-RF | 94.80 | 83.44 | 53.39 | 89.12 | 85.34 | 0.6388 |
| CAMP-SVM | 90.54 | 81.63 | 49.65 | 86.09 | 83.12 | 0.5848 |
| gkmSVM | – | – | – | – | – | – |
| iAMP-2 L | 88.32 | 86.12 | 56.00 | 87.22 | 86.49 | 0.6302 |
| iAMPpred | 93.46 | 79.02 | 47.12 | 86.24 | 81.43 | 0.5742 |
| DNN | 96.96 | 89.62 | 65.14 | 93.29 | 90.84 | 0.7471 |
| proposed model | 97.90 | 90.90 | 68.28 | 94.40 | 92.07 | 0.7761 |
| proposed fusion model | 98.25 | 91.00 | 68.58 | 94.62 | 92.21 | 0.7802 |
Note: the mark’—’ means that the result is not available. In this experiment, ‘gkmSVM’ method couldn’t be run successfully because the kernel requirement isn’t satisfied
Comparison of auROC using DeLong’s test on APD3 dataset
| Method 1 | Method 2 | auROC 1 | auROC 2 | Difference | P value |
|---|---|---|---|---|---|
| proposed model | CAMP-DA | 0.9892 | 0.9069 | 0.0823 | < 0.0001 |
| proposed model | CAMP-RF | 0.9892 | 0.9528 | 0.0365 | < 0.0001 |
| proposed model | CAMP-SVM | 0.9892 | 0.9202 | 0.0690 | < 0.0001 |
| proposed model | gkmSVM | 0.9892 | – | – | NA |
| proposed model | iAMP-2 L | 0.9892 | 0.8722 | 0.1170 | < 0.0001 |
| proposed model | iAMPpred | 0.9892 | 0.9466 | 0.0427 | < 0.0001 |
| proposed model | DNN | 0.9892 | 0.9802 | 0.0091 | < 0.0001 |
| proposed fusion model | CAMP-DA | 0.9918 | 0.9069 | 0.0849 | < 0.0001 |
| proposed fusion model | CAMP-RF | 0.9918 | 0.9528 | 0.0391 | < 0.0001 |
| proposed fusion model | CAMP-SVM | 0.9918 | 0.9202 | 0.0716 | < 0.0001 |
| proposed fusion model | gkmSVM | 0.9918 | – | – | NA |
| proposed fusion model | iAMP-2 L | 0.9918 | 0.8722 | 0.1196 | < 0.0001 |
| proposed fusion model | iAMPpred | 0.9918 | 0.9466 | 0.0453 | < 0.0001 |
| proposed fusion model | DNN | 0.9918 | 0.9802 | 0.0117 | < 0.0001 |
| proposed fusion model | proposed model | 0.9918 | 0.9892 | 0.0026 | < 0.0001 |
Note: the mark’—’ means that the result is not available. In this experiment, ‘gkmSVM’ method couldn’t be run successfully because the kernel requirement isn’t satisfied
Fig. 3The structure of the proposed model. The proposed model mainly uses embedding layer and convolutional layers. All sequences are encoded into numerical vectors of length 200 and are fed into the embedding layer. Each embedding vector dimension is 128. Then the outputs of embedding layer are fed into N convolutional layers. Each convolutional layer uses 64 filter kernels. These outputs are connected to feed into a max pooling layer and outputs of the pooling layers are concatenated to fed into another max pooling layer. Finally the output will be fed into a fully connection layer and passed through a sigmoid function. The final output is in range [0,1] as the prediction of the input sequence
Fig. 4The structure of the proposed fusion model. There are two parts in the fusion model. The proposed structure is on the left. An additional fully connected network is on the right and this part make use of the DPC and AAC of peptide sequences. This network incorporates redundant information into the proposed model