| Literature DB >> 29590297 |
Daniel Veltri1,2, Uday Kamath3, Amarda Shehu4,5,6.
Abstract
Motivation: Bacterial resistance to antibiotics is a growing concern. Antimicrobial peptides (AMPs), natural components of innate immunity, are popular targets for developing new drugs. Machine learning methods are now commonly adopted by wet-laboratory researchers to screen for promising candidates.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29590297 PMCID: PMC6084614 DOI: 10.1093/bioinformatics/bty179
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The proposed DNN uses Conv and LSTM layers. Peptide sequences are encoded into uniform numerical vectors of length 200. These vectors (X) are fed to an embedding layer of length 128, followed by a convolutional layer comprised of 64 filters. Each of these filters undergoes a 1D convolution and is downsampled via a maximal pooling layer of size 5. Next, an LSTM layer with 100 units allows the DNN to remember or ignore old information passed along the horizontal dotted arrows extending from each X input. The final output from the DNN is passed through a sigmoid function so that predictions (Y) are scaled between 0 and 1
Model performance on different training and evaluation data partitions
| Training set | Evaluation set | SENS(%) | SPEC(%) | ACC(%) | MCC | auROC(%) |
|---|---|---|---|---|---|---|
| Train-Only | Train | 98.60 | 98.87 | 98.69 | 0.9706 | 99.87 |
| Train-Only | Tune | 95.76 | 83.85 | 87.80 | 0.7582 | 96.67 |
| Train+Tune | Train+Tune | 97.19 | 99.53 | 98.36 | 0.9674 | 99.75 |
| Train+Tune | Test | 89.89 | 92.13 | 91.01 | 0.8204 | 96.48 |
| All Data | All Data | 98.26 | 99.66 | 98.96 | 0.9793 | 99.94 |
| All Data | 10-fold CV | 88.81 ( | 94.21 ( | 91.51 ( | 0.8327 ( | 96.58 ( |
Note: Performance is shown for DNN models built and evaluated on the datasets listed in columns 1 and 2, respectively, on metrics listed in columns 3–7. The bottom row shows 10-fold CV performance; accompanying SD values are shown in parentheses.
Performance comparison on the AMP dataset testing partition
| Method | SENS(%) | auROC(%) | |||
|---|---|---|---|---|---|
| AntiBP2 | 87.91 | 90.80 | 89.37 | 0.7876 | 89.36 |
| CAMP-ANN | 82.98 | 85.09 | 84.04 | 0.6809 | 84.06 |
| CAMP-DA | 87.08 | 80.76 | 83.92 | 0.6797 | 89.97 |
| CAMP-RF | 82.44 | 87.57 | 0.7554 | 93.63 | |
| CAMP-SVM | 88.90 | 79.92 | 84.41 | 0.6910 | 90.63 |
| iAMP-2L | 83.99 | 85.86 | 84.90 | 0.6983 | 84.90 |
| iAMPpred | 89.33 | 87.22 | 88.27 | 0.7656 | 94.44 |
| gkmSVM | 88.34 | 90.59 | 89.46 | 0.7895 | 94.98 |
| Our DNN | 89.89 | ||||
| DNN reduced amino acid | 88.66 ( | 90.47 ( | 89.57 ( | 0.7938 ( | 96.13 ( |
| DNN random amino acid | 81.00 ( | 81.64 ( | 81.32 ( | 0.6310 ( | 89.55 ( |
| gkmSVM reduced amino acid | 87.92 | 87.64 | 87.78 | 0.7556 | 94.16 |
| gkmSVM random amino acid | 80.02 ( | 78.13 ( | 79.07 ( | 0.5819 ( | 86.68 ( |
Note: Recognition performance on the testing dataset is shown for state-of-the-art methods (listed in column 1) on the metrics listed in columns 2–6. Best performance on a metric is marked in bold. Our DNN model is shown in row 9. The four bottom rows show performance of the DNN model and the gkmSVM model on the DNN-reduced versus random alphabets.
Fig. 2.ROC curves are shown for the various methods compared, ordered from high to low performance in terms of area under the curve (AUC). Straight lines for AntiBP2, CAMP-ANN, and iAMP-2 L approximate the ROC curve using binary prediction results, as probability values are not provided. For the AntiBP2 curve, 211 testing sequences are excluded by the AntiBP2 server due to length restrictions set by the server
Fig. 3.A 2D t-SNE (Van der Maaten and Hinton, 2008) projection of the 128D amino acid embedding vectors. K-means was used to select clusters for the DNN-reduced alphabet as listed in Table S6 of the Supplementary Information