| Literature DB >> 29535692 |
Piyush Agrawal1, Sherry Bhalla1, Kumardeep Chaudhary1, Rajesh Kumar1, Meenu Sharma1, Gajendra P S Raghava1,2.
Abstract
This paper describes in silico models developed using a wide range of peptide features for predicting antifungal peptides (AFPs). Our analyses indicate that certain types of residue (e.g., C, G, H, K, R, Y) are more abundant in AFPs. The positional residue preference analysis reveals the prominence of the particular type of residues (e.g., R, V, K) at N-terminus and a certain type of residues (e.g., C, H) at C-terminus. In this study, models have been developed for predicting AFPs using a wide range of peptide features (like residue composition, binary profile, terminal residues). The support vector machine based model developed using compositional features of peptides achieved maximum accuracy of 88.78% on the training dataset and 83.33% on independent or validation dataset. Our model developed using binary patterns of terminal residues of peptides achieved maximum accuracy of 84.88% on training and 84.64% on validation dataset. We benchmark models developed in this study and existing methods on a dataset containing compositionally similar antifungal and non-AFPs. It was observed that binary based model developed in this study preforms better than any model/method. In order to facilitate scientific community, we developed a mobile app, standalone and a user-friendly web server 'Antifp' (http://webs.iiitd.edu.in/raghava/antifp).Entities:
Keywords: amino acid composition; antifungal peptides; antimicrobial peptides; motifs; support vector machine
Year: 2018 PMID: 29535692 PMCID: PMC5834480 DOI: 10.3389/fmicb.2018.00323
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
The performance of different machine learning techniques based models on Antifp_Main dataset developed using amino acid composition of peptides.
| Parameter | Main Dataset | Validation Dataset | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | ROC | Sen | Spc | Acc | MCC | ROC | ||
| SVM | g = 0.01, c = 5, j = 4 | 88.61 | 87.93 | 88.27 | 0.77 | 0.94 | 86.60 | 85.91 | 86.25 | 0.73 | 0.94 |
| Random Forest | Ntree = 130 | 87.84 | 86.64 | 87.24 | 0.74 | 0.93 | 86.94 | 80.76 | 83.85 | 0.68 | 0.91 |
| SMO | g = 0.001, c = 2 | 87.84 | 82.11 | 84.97 | 0.70 | 0.84 | 88.32 | 81.44 | 84.88 | 0.70 | 0.84 |
| J48 | c = 0.1, m = 7 | 80.39 | 80.65 | 80.52 | 0.61 | 0.82 | 82.82 | 81.44 | 82.13 | 0.64 | 0.84 |
| Naïve Bayes | Default | 76.46 | 75.86 | 76.16 | 0.52 | 0.80 | 74.91 | 78.01 | 76.46 | 0.53 | 0.81 |
The performance of SVM based models on Antifp_Main dataset, where models were developed using dipeptide composition of whole peptide and part of peptides.
| Parameter | Main Dataset | Validation Dataset | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| g | c | j | Sen | Spc | Acc | MCC | ROC | Sen | Spc | Acc | MCC | ROC | |
| DPC | 0.005 | 2 | 3 | 88.53 | 85.02 | 86.77 | 0.74 | 0.94 | 89.69 | 87.63 | 88.66 | 0.77 | 0.95 |
| N5 | 0.001 | 1 | 2 | 77.94 | 73.88 | 75.91 | 0.52 | 0.85 | 82.61 | 75.69 | 79.08 | 0.58 | 0.87 |
| N10 | 0.0005 | 8 | 2 | 78.74 | 79.77 | 79.26 | 0.59 | 0.87 | 86.63 | 81.21 | 84.01 | 0.68 | 0.90 |
| N15 | 0.001 | 3 | 3 | 81.97 | 80.70 | 81.33 | 0.63 | 0.89 | 85.07 | 81.58 | 83.33 | 0.67 | 0.90 |
| C5 | 0.0005 | 2 | 3 | 70.35 | 75.45 | 72.89 | 0.46 | 0.81 | 71.17 | 79.09 | 75.18 | 0.50 | 0.81 |
| C10 | 0.001 | 5 | 1 | 80.89 | 74.01 | 77.43 | 0.55 | 0.87 | 79.71 | 76.16 | 77.92 | 0.56 | 0.85 |
| C15 | 0.001 | 2 | 2 | 80.69 | 76.33 | 78.51 | 0.57 | 0.86 | 79.93 | 77.90 | 78.92 | 0.58 | 0.86 |
| N5C5 | 0.0005 | 1 | 2 | 78.39 | 80.13 | 79.29 | 0.59 | 0.87 | 83.02 | 83.97 | 83.51 | 0.67 | 0.90 |
| N10C10 | 0.001 | 1 | 2 | 84.65 | 78.84 | 81.73 | 0.64 | 0.90 | 87.87 | 83.33 | 85.56 | 0.71 | 0.92 |
| N15C15 | 0.001 | 1 | 2 | 84.97 | 85.09 | 85.03 | 0.70 | 0.92 | 85.39 | 86.52 | 85.96 | 0.72 | 0.93 |
The performance of SVM based model on Antifp_Main dataset developed using binary profile/pattern of peptide segments obtained from terminals.
| Parameter | Main Dataset | Validation Dataset | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| g | c | j | Sen | Spc | Acc | MCC | ROC | Sen | Spc | Acc | MCC | ROC | |
| N5 | 0.5 | 1 | 2 | 76.07 | 81.66 | 78.86 | 0.58 | 0.86 | 81.16 | 81.25 | 81.21 | 0.62 | 0.87 |
| N10 | 0.1 | 2 | 4 | 80.90 | 80.93 | 80.91 | 0.62 | 0.89 | 86.12 | 79.79 | 82.95 | 0.66 | 0.89 |
| N15 | 0.1 | 1 | 2 | 82.63 | 82.40 | 82.52 | 0.65 | 0.89 | 86.57 | 81.20 | 83.90 | 0.68 | 0.90 |
| C5 | 0.5 | 1 | 3 | 71.24 | 78.96 | 75.08 | 0.50 | 0.83 | 68.68 | 83.62 | 76.23 | 0.53 | 0.82 |
| C10 | 0.1 | 2 | 2 | 76.99 | 78.67 | 77.84 | 0.56 | 0.86 | 75.36 | 80.43 | 77.92 | 0.56 | 0.84 |
| C15 | 0.1 | 2 | 2 | 81.64 | 75.10 | 78.36 | 0.57 | 0.87 | 80.67 | 77.90 | 79.29 | 0.59 | 0.87 |
| N5C5 | 0.1 | 4 | 2 | 82.02 | 76.59 | 79.20 | 0.59 | 0.87 | 87.55 | 80.49 | 83.88 | 0.68 | 0.90 |
| N10C10 | 0.05 | 2 | 2 | 84.19 | 83.53 | 83.86 | 0.68 | 0.91 | 85.29 | 84.40 | 84.84 | 0.70 | 0.91 |
| N15C15 | 0.05 | 1 | 3 | 85.55 | 84.23 | 84.88 | 0.70 | 0.92 | 85.39 | 83.90 | 84.64 | 0.69 | 0.92 |
The performance of different models developed in this study and existing methods on Antifp_hard dataset contains compositionally similar peptides.
| Method | Algorithm | Benchmarking Dataset | ||||||
|---|---|---|---|---|---|---|---|---|
| TP | TN | FP | FN | Sen | Spc | Acc | ||
| Composition-based model | SVM | 179 | 183 | 108 | 112 | 61.51 | 62.89 | 62.20 |
| Binary profile based model | SVM | 218 | 92 | 53 | 48 | 81.95 | 63.45 | 75.43 |
| ClassAMP | SVM | 108 | 174 | 117 | 183 | 37.11 | 59.79 | 48.45 |
| ClassAMP | Random Forest | 42 | 221 | 70 | 249 | 14.43 | 75.94 | 45.18 |
| iAMP-2L | FKNN | 61 | 65 | 226 | 230 | 20.96 | 22.34 | 21.56 |