| Literature DB >> 30255784 |
Jesus A Beltran1, Longendri Aguilera-Mendoza1, Carlos A Brizuela2.
Abstract
BACKGROUND: Antimicrobial peptides are a promising alternative for combating pathogens resistant to conventional antibiotics. Computer-assisted peptide discovery strategies are necessary to automatically assess a significant amount of data by generating models that efficiently classify what an antimicrobial peptide is, before its evaluation in the wet lab. Model's performance depends on the selection of molecular descriptors for which an efficient and effective approach has recently been proposed. Unfortunately, how to adapt this method to the selection of molecular descriptors for the classification of antimicrobial peptides and the performance it can achieve, have only preliminary been explored.Entities:
Keywords: Antimicrobial peptides; Classification; Feature weighting; Molecular descriptors; Multi-objective evolutionary algorithm; Peptide representation
Mesh:
Substances:
Year: 2018 PMID: 30255784 PMCID: PMC6156846 DOI: 10.1186/s12864-018-5030-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1The consolidated non-dominated front (CNDF) visualization. The CNDF is generated after 30 runs of the MOEA-FW approach for each dataset. The markers represent the values for the best compromise solution given λ1. a DAMPD_AMP. b APD3_AMP. c DAMPD_ANTIBACTERIAL. d APD3_ANTIBACTERIAL. e DAMPD_BACTERIOCIN. f APD3_BACTERIOCIN
Fig. 2Percentage of number of molecular descriptors reduction for the best compromise solutions on six datasets
10-Fold Cross-Validation performance on six datasets for KNN and SVM-L, λ1=0.5
| Dataset | MLA | Sens(%) | Spec(%) | Prec(%) | Bal Acc(%) | Acc(%) | MCC | AUC |
|---|---|---|---|---|---|---|---|---|
| DAMPD_AMP | KNN | 71.97 |
|
| 84.60 |
|
| 0.846 |
| SVM-L |
| 92.30 | 69.56 |
| 91.62 | 0.734 | ||
| APD3_AMP | KNN | 80.85 |
| 88.06 |
| 0.747 | 0.881 | |
| SVM-L |
| 92.53 | 70.75 |
| 92.36 |
|
| |
| DAMPD_ANTIBACTERIAL | KNN |
| 96.45 |
|
|
|
|
|
| SVM-L | 88.49 |
| 84.18 | 92.51 | 95.06 | 0.832 | 0.925 | |
| APD3_ANTIBACTERIAL | KNN | 79.32 |
|
| 87.31 |
| 0.738 | 0.873 |
| SVM-L |
| 92.22 | 70.33 |
| 92.07 |
|
| |
| DAMPD_BACTEROCIN | KNN | 100 | 95.53 | 85.83 | 97.76 | 96.36 | 0.902 | 0.978 |
| SVM-L | 100 |
|
|
|
|
|
| |
| APD3_BACTEROCIN | KNN | 83.50 |
| 77.05 | 89.27 | 93.12 | 0.758 | 0.893 |
| SVM-L |
| 94.83 |
|
| 93.12 |
|
|
Each value is the average performance from 10-fold cross-validation by the classifier built by the machine learning algorithm (second column) on the dataset (first column). Wilcoxon signed rank test was performed on the measure resulting from the 10-fold cross-validation of KNN and SVM-L. The models with significant improvement at p-value ≤0.05 are marked with the symbol *
aBold font indicates the best value per measure for every dataset
Fig. 3Performance comparison between the best model achieved by MOEA-FW and the baseline. Each plot shows the performance measure by 10-fol cross-validation of the best model achieved by MOEA-FW and the baseline (i.e., all candidate input features) for a particular dataset. The polygon represents a particular performance’s model. When a polygon is covered means that the model is worse in all metrics that the model represented by the polygon that includes it. Wilcoxon signed rank test was performed on the measure resulting from the 10-fold cross-validation of best model achieved by MOEA-FW and the baseline. The models with significant improvement at p-value ≤0.05 are marked with the symbol *
Performance comparison of KNN and SVM-L on unseen sequences from the six datasets, λ1=0.5
| Dataset | ML | Sens(%) | Spec(%) | Prec(%) | Bal Acc(%) | Acc(%) | MCC | AUC |
|---|---|---|---|---|---|---|---|---|
| DAMPD_AMP | KNN | 72.16 |
|
| 83.17 |
|
| 0.832 |
| SVM-L |
| 91.62 | 61.98 |
| 89.47 | 0.631 |
| |
| APD3_AMP | KNN | 70.82 |
|
| 81.47 |
|
| 0.815 |
| SVM-L |
| 82.87 | 51.98 |
| 83.97 | 0.597 |
| |
| DAMPD_ANTIBACTERIAL | KNN |
| 90.91 | 60.27 |
| 89.30 | 0.634 |
|
| SVML | 74.55 |
|
| 83.82 |
|
| 0.838 | |
| APD3_ANTIBACTERIAL | KNN | 65.97 |
|
| 79.94 | 89.26 | 0.607 | 0.799 |
| SVM-L |
| 91.55 | 65.92 |
|
|
|
| |
| DAMPD_BACTEROCIN | KNN |
| 87.50 | 50.00 |
| 86.49 | 0.561 |
|
| SVM-L | 60 |
|
| 78.44 |
|
| 0.784 | |
| APD3_BACTEROCIN | KNN | 75.86 |
| 70.97 | 85.05 | 91.35 | 0.682 | 0.850 |
| SVM-L |
| 92.95 |
|
|
|
|
|
*Each value is the performance on the testing dataset by the classifier built by the machine learning algorithm (second column) on the dataset after applying the best compromise solution for λ1=0.5 (first column)
aBold font indicates the best value per measure for every dataset
Performance comparison among the AMPs prediction methods reported in [12] with our proposed approach for the DAMPD dataset
| Tool | Task | Sens(%) | Spec(%) | Prec(%) | Bal Acc(%) |
|---|---|---|---|---|---|
| MOEA-FW(SVM-L) | Antimicrobial | 77.32 | 91.62 |
|
|
| CAMPR3(RF) |
| 72.65 | 40.30 | 82.49 | |
| CAMPR3(SVM) | 90.13 | 72.10 | 39.25 | 81.11 | |
| ADAM | 84.09 | 68.88 | 35.09 | 76.49 | |
| MLAMP | 63.62 | 82.27 | 41.78 | 72.94 | |
| DBAASP | 22.12 |
| 38.28 | 57.49 | |
| AMPA | 48.81 | 84.79 | 39.09 | 66.80 | |
| MOEA-FW(KNN) | Antibacterial | 80.00 |
|
|
|
| AntiBP |
| 45.05 | 24.63 | 67.41 | |
| AntiBP2 | 86.90 | 15.97 | 17.14 | 51.44 | |
| MOEA-FW(KNN) | Bacteriocin | 80.00 | 87.50 | 50.00 | 83.75 |
| BAGEL3 |
|
|
|
| |
| BACTIBASE | 83.87 |
|
| 91.93 |
aBold font indicates the best value per measure
Performance comparison among the AMPs prediction methods reported in [12] with our proposed approach for the APD3 dataset
| Tool | Task | Sens(%) | Spec(%) | Prec(%) | Bal Acc(%) |
|---|---|---|---|---|---|
| MOEA-FW(SVM-L) | Antimicrobial | 89.24 | 82.87 |
|
|
| CAMPR3(RF) |
| 72.65 | 40.30 | 82.49 | |
| CAMPR3(SVM) | 90.60 | 72.10 | 39.25 | 81.11 | |
| ADAM | 91.07 | 68.88 | 35.09 | 76.49 | |
| MLAMP | 75.59 | 82.27 | 41.78 | 72.94 | |
| DBAASP | 62.81 | 92.87 | 38.28 | 57.49 | |
| AMPA | 39.17 |
| 39.09 | 66.80 | |
| MOEA-FW(SVM-L) | Antibacterial |
|
|
|
|
| AntiBP2 | 66.59 | 26.00 | 15.25 | 46.30 | |
| MOEA-FW(SVM-L) | Bacteriocin |
| 92.95 | 71.05 | 93.03 |
| BAGEL3 | 86.36 |
|
|
| |
| BACTIBASE | 38.36 |
|
| 69.48 |
aBold font indicates the best value per measure
Fig. 4The overall scheme of the feature weighting framework. The rectangles with bold texts represents processes, and the rounded rectangles represent the inputs and outputs of processes
Summary of peptide datasets
| Dataset | No. of AMP sequences | No. of Non-AMP sequences | Total |
|---|---|---|---|
| DAMPD_AMP | 438 | 2174 | 2612 |
| DAMPD_ANTIBACTERIAL | 255 | 1242 | 1497 |
| DAMP_BACTEROCIN | 24 | 123 | 147 |
| APD3_AMP | 1360 | 6860 | 8220 |
| ADP3_ANTIBACTERIAL | 1158 | 5777 | 6935 |
| ADP3_BACTEROCIN | 125 | 612 | 737 |
*The datasets were extracted from [12] and we removed the sequences with non-standard residues
Fig. 5The weighted sum approach. Illustration of the weighted sum approach. (a) f1 is less important than f2. (b) f1 is equally important as f2. (c) f2 is less important than f1