| Literature DB >> 30416494 |
Piyush Agrawal1,2, Gajendra P S Raghava2.
Abstract
Designing novel antimicrobial peptides is a hot area of research in the field of therapeutics especially after the emergence of resistant strains against the conventional antibiotics. In the past number of in silico methods have been developed for predicting the antimicrobial property of the peptide containing natural residues. This study describes models developed for predicting the antimicrobial property of a chemically modified peptide. Our models have been trained, tested and evaluated on a dataset that contains 948 antimicrobial and 931 non-antimicrobial peptides, containing chemically modified and natural residues. Firstly, the tertiary structure of all peptides has been predicted using software PEPstrMOD. Structure analysis indicates that certain type of modifications enhance the antimicrobial property of peptides. Secondly, a wide range of features was computed from the structure of these peptides using software PaDEL. Finally, models were developed for predicting the antimicrobial potential of chemically modified peptides using a wide range of structural features of these peptides. Our best model based on support vector machine achieve maximum MCC of 0.84 with an accuracy of 91.62% on training dataset and MCC of 0.80 with an accuracy of 89.89% on validation dataset. To assist the scientific community, we have developed a web server called "AntiMPmod" which predicts the antimicrobial property of the chemically modified peptide. The web server is present at the following link (http://webs.iiitd.edu.in/raghava/antimpmod/).Entities:
Keywords: antimicrobial peptide prediction; chemically modified peptides; fingerprints; machine learning technique; peptide therapeutics; resistance
Year: 2018 PMID: 30416494 PMCID: PMC6212470 DOI: 10.3389/fmicb.2018.02551
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
FIGURE 1Feature extraction using SMILES format. Different features were calculated using SMILES format (A) binary profile generation of only atoms, (B) binary profile generation of only symbols, (C) binary profile generation of both symbol and atoms, (D) atom composition, and (E) diatom composition.
FIGURE 2Comparison of atom composition present in modified AMPs and non-AMPs.
FIGURE 3Comparison of diatom composition present in modified AMPs and non-AMPs.
The performance of atom composition based models developed using different machine learning techniques.
| Machine learning techniques (parameters) | Main dataset | Validation dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUROC | Sen | Spc | Acc | MCC | AUROC | |
| SVM ( | 90.37 | 83.22 | 86.83 | 0.74 | 0.92 | 89.47 | 77.42 | 83.51 | 0.67 | 0.88 |
| Random Forest (Ntree = 20) | 89.58 | 81.74 | 85.70 | 0.72 | 0.93 | 91.05 | 81.72 | 86.44 | 0.73 | 0.90 |
| SMO ( | 88.39 | 82.15 | 85.30 | 0.71 | 0.85 | 90.00 | 76.88 | 83.51 | 0.68 | 0.83 |
| J48 ( | 88.13 | 80.27 | 84.23 | 0.69 | 0.88 | 85.79 | 77.96 | 81.91 | 0.64 | 0.85 |
| Naive Bayes (Default) | 89.84 | 62.55 | 76.31 | 0.55 | 0.77 | 87.89 | 60.75 | 74.47 | 0.51 | 0.79 |
The performance of diatom composition based models developed using different machine learning techniques.
| Machine learning techniques (parameters) | Main dataset | Validation dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUROC | Sen | Spc | Acc | MCC | AUROC | |
| SVM ( | 89.71 | 86.85 | 88.29 | 0.77 | 0.93 | 90.53 | 81.72 | 86.17 | 0.73 | 0.92 |
| Random Forest (Ntree = 150) | 94.20 | 85.23 | 89.75 | 0.80 | 0.96 | 92.11 | 82.80 | 87.50 | 0.75 | 0.93 |
| SMO ( | 88.79 | 87.92 | 88.36 | 0.77 | 0.88 | 88.95 | 83.33 | 86.17 | 0.72 | 0.86 |
| J48 ( | 89.71 | 83.22 | 86.49 | 0.73 | 0.88 | 86.84 | 83.87 | 85.37 | 0.71 | 0.86 |
| Naive Bayes (Default) | 87.86 | 63.09 | 75.58 | 0.53 | 0.74 | 87.37 | 62.37 | 75.00 | 0.51 | 0.74 |
The performance of 2D descriptors based models developed using different machine learning techniques.
| Machine learning techniques (parameters) | Main dataset | Validation dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUROC | Sen | Spc | Acc | MCC | AUROC | |
| SVM full feature ( | 56.88 | 65.77 | 61.29 | 0.23 | 0.70 | 30.53 | 91.94 | 60.90 | 0.28 | 0.75 |
| SVM after feature selection ( | 84.92 | 76.38 | 80.68 | 0.62 | 0.85 | 84.74 | 74.73 | 79.79 | 0.60 | 0.87 |
| Random Forest (Ntree = 20) | 82.01 | 77.45 | 79.75 | 0.60 | 0.88 | 83.68 | 75.81 | 79.79 | 0.60 | 0.86 |
| SMO ( | 87.04 | 74.63 | 80.88 | 0.62 | 0.81 | 85.79 | 71.51 | 78.72 | 0.58 | 0.79 |
| J48 ( | 81.08 | 77.99 | 79.55 | 0.59 | 0.85 | 83.68 | 74.19 | 78.99 | 0.58 | 0.83 |
| Naive Bayes (Default) | 87.30 | 62.82 | 75.15 | 0.52 | 0.83 | 87.89 | 61.29 | 74.73 | 0.51 | 0.81 |
The performance of fingerprints based models developed using different machine learning techniques.
| Machine learning techniques (parameters) | Main dataset | Validation dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUROC | Sen | Spc | Acc | MCC | AUROC | |
| SVM full feature ( | 95.91 | 87.25 | 91.62 | 0.84 | 0.97 | 93.16 | 86.56 | 89.89 | 0.80 | 0.97 |
| SVM after feature selection ( | 82.85 | 80.67 | 81.77 | 0.64 | 0.87 | 82.63 | 75.81 | 79.26 | 0.59 | 0.84 |
| Random Forest (Ntree = 100) | 92.88 | 90.07 | 91.48 | 0.83 | 0.98 | 92.63 | 89.25 | 90.96 | 0.82 | 0.97 |
| SMO ( | 91.29 | 89.66 | 90.49 | 0.81 | 0.90 | 89.47 | 90.32 | 89.89 | 0.80 | 0.90 |
| J48 ( | 90.50 | 88.99 | 89.75 | 0.80 | 0.88 | 88.95 | 86.02 | 87.50 | 0.75 | 0.85 |
| Naive Bayes (Default) | 84.30 | 64.56 | 74.52 | 0.50 | 0.74 | 78.42 | 65.05 | 71.81 | 0.44 | 0.72 |
The performance of SVM based models developed using binary profile of atoms obtained from terminals of SMILES format.
| Feature (parameters) | Main dataset | Validation dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUROC | Sen | Spc | Acc | MCC | AUROC | |
| N25 ( | 77.63 | 75.68 | 76.67 | 0.53 | 0.83 | 79.59 | 85.71 | 82.42 | 0.65 | 0.91 |
| N50 ( | 83.17 | 79.31 | 81.27 | 0.63 | 0.88 | 90.58 | 86.40 | 88.59 | 0.77 | 0.93 |
| N100 ( | 85.71 | 84.18 | 84.90 | 0.70 | 0.93 | 85.04 | 84.93 | 84.98 | 0.70 | 0.93 |
| C25 ( | 79.11 | 70.43 | 74.70 | 0.50 | 0.79 | 89.19 | 74.51 | 82.16 | 0.65 | 0.83 |
| C50 ( | 83.47 | 72.08 | 77.94 | 0.56 | 0.85 | 88.31 | 74.83 | 81.73 | 0.64 | 0.91 |
| C100 ( | 82.97 | 81.85 | 82.38 | 0.65 | 0.89 | 89.55 | 77.55 | 83.27 | 0.67 | 0.92 |
| N25C25 ( | 85.69 | 84.82 | 85.27 | 0.71 | 0.91 | 84.71 | 82.31 | 83.55 | 0.67 | 0.92 |
| N50C50 ( | 89.79 | 87.16 | 88.47 | 0.77 | 0.95 | 87.43 | 85.63 | 86.53 | 0.73 | 0.95 |
| N100C100 ( | 90.15 | 89.58 | 89.84 | 0.80 | 0.96 | 90.51 | 84.62 | 87.37 | 0.75 | 0.96 |
The performance of SVM based models developed using binary profile of atoms and symbols together obtained from terminals of SMILES format.
| Feature (parameters) | Main dataset | Validation dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUROC | Sen | Spc | Acc | MCC | AUROC | |
| N50 ( | 75.50 | 75.76 | 75.62 | 0.51 | 0.80 | 63.21 | 92.63 | 77.11 | 0.58 | 0.89 |
| N100 ( | 81.26 | 80.39 | 80.84 | 0.62 | 0.88 | 77.62 | 78.79 | 78.18 | 0.56 | 0.89 |
| N200 ( | 85.28 | 81.57 | 83.32 | 0.67 | 0.92 | 81.06 | 77.48 | 79.15 | 0.58 | 0.90 |
| C50 ( | 72.47 | 72.13 | 72.30 | 0.45 | 0.79 | 78.10 | 71.43 | 74.88 | 0.50 | 0.84 |
| C100 ( | 77.93 | 75.83 | 76.94 | 0.54 | 0.83 | 84.42 | 78.72 | 81.69 | 0.63 | 0.89 |
| C200 ( | 80.80 | 79.66 | 80.20 | 0.60 | 0.89 | 83.09 | 82.05 | 82.53 | 0.65 | 0.92 |
| N50C50 ( | 86.45 | 84.19 | 85.36 | 0.71 | 0.91 | 83.97 | 87.84 | 85.86 | 0.72 | 0.92 |
| N100C100 ( | 90.38 | 86.25 | 88.35 | 0.77 | 0.96 | 86.90 | 84.94 | 85.93 | 0.72 | 0.94 |
| N200C200 ( | 91.59 | 87.46 | 89.35 | 0.79 | 0.96 | 89.29 | 82.93 | 85.86 | 0.72 | 0.94 |
The performance of SVM based models developed using different features on additional dataset.
| Features (parameters) | Mod_AMP_similar Dataset | ||||
|---|---|---|---|---|---|
| Atom composition ( | 89.47 | 43.68 | 66.58 | 0.37 | 0.77 |
| Diatom composition ( | 88.42 | 71.58 | 80.00 | 0.61 | 0.88 |
| 2D descriptors ( | 84.74 | 32.63 | 58.68 | 0.20 | 0.66 |
| Fingerprints ( | 93.16 | 87.37 | 90.26 | 0.81 | 0.97 |
| Hybrid features (2D + fingerprints) ( | 84.74 | 58.95 | 71.84 | 0.45 | 0.81 |
| N100C100 Binary profile (only atoms) ( | 90.51 | 89.44 | 89.66 | 0.80 | 0.97 |
| N100C100 Binary profile (only symbols) ( | 76.98 | 91.10 | 84.21 | 0.60 | 0.94 |
| N200C200 Binary profile (atom + symbols) ( | 89.29 | 89.12 | 89.20 | 0.78 | 0.96 |
FIGURE 4Schematic representation of AntiMPmod workflow.
The performance of best models developed using different machine learning techniques based on different features.
| Feature (machine learning technique with parameters) | Main dataset | Validation dataset | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUROC | Sen | Spc | Acc | MCC | |
| Atom composition (SVM, | 90.37 | 83.22 | 86.83 | 0.74 | 0.92 | 89.47 | 77.42 | 83.51 | 0.67 |
| Diatom composition (Random Forest, Ntree = 150) | 94.20 | 85.23 | 89.75 | 0.80 | 0.96 | 92.11 | 82.80 | 87.50 | 0.75 |
| 2D descriptors (SVM, | 84.92 | 76.38 | 80.68 | 0.62 | 0.85 | 84.74 | 74.73 | 79.79 | 0.60 |
| Fingerprints (SVM, | 95.91 | 87.25 | 91.62 | 0.84 | 0.97 | 93.16 | 86.56 | 89.89 | 0.80 |
| N100C100 Binary profile (only atoms) (SVM, | 90.15 | 89.58 | 89.84 | 0.80 | 0.96 | 90.51 | 84.62 | 87.37 | 0.75 |
| N200C200 Binary profile (atoms + symbols both) (SVM, | 91.59 | 87.46 | 89.35 | 0.79 | 0.96 | 89.29 | 82.93 | 85.86 | 0.72 |
FIGURE 5The performance of best models on independent dataset, in terms of ROC curves developed using different input features.