| Literature DB >> 29706944 |
Vinod Kumar1,2, Piyush Agrawal1,2, Rajesh Kumar1,2, Sherry Bhalla1, Salman Sadullah Usmani1,2, Grish C Varshney3, Gajendra P S Raghava1,2.
Abstract
Designing drug delivery vehicles using cell-penetrating peptides is a hot area of research in the field of medicine. In the past, number of in silico methods have been developed for predicting cell-penetrating property of peptides containing natural residues. In this study, first time attempt has been made to predict cell-penetrating property of peptides containing natural and modified residues. The dataset used to develop prediction models, include structure and sequence of 732 chemically modified cell-penetrating peptides and an equal number of non-cell penetrating peptides. We analyzed the structure of both class of peptides and observed that positive charge groups, atoms, and residues are preferred in cell-penetrating peptides. In this study, models were developed to predict cell-penetrating peptides from its tertiary structure using a wide range of descriptors (2D, 3D descriptors, and fingerprints). Random Forest model developed by using PaDEL descriptors (combination of 2D, 3D, and fingerprints) achieved maximum accuracy of 95.10%, MCC of 0.90 and AUROC of 0.99 on the main dataset. The performance of model was also evaluated on validation/independent dataset which achieved AUROC of 0.98. In order to assist the scientific community, we have developed a web server "CellPPDMod" for predicting the cell-penetrating property of modified peptides (http://webs.iiitd.edu.in/raghava/cellppdmod/).Entities:
Keywords: Random Forest; SVM; antimicrobial peptide; chemical descriptors; in silico method; machine learning; modified cell-penetrating peptides
Year: 2018 PMID: 29706944 PMCID: PMC5906597 DOI: 10.3389/fmicb.2018.00725
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Figure 1Percentage amino acid composition of CPPs and non-CPPs.
Figure 2Weblogo illustrating residue preference of first 15 N terminal residues of modified (A) CPPs and (B) non-CPPs.
Figure 3Weblogo illustrating residue preference of first 15 C terminal residues of modified (A) CPPs and (B) non-CPPs.
Performance of different machine learning methods on atom composition.
| SVM | 81.10 | 80.58 | 80.84 | 0.62 | 0.84 | 79.33 | 75.33 | 77.33 | 0.55 | 0.81 | |
| Random Forest | Ntree = 30 | 83.33 | 84.71 | 84.02 | 0.68 | 0.91 | 79.33 | 77.33 | 78.33 | 0.57 | 0.88 |
| SMO | 77.66 | 83.51 | 80.58 | 0.61 | 0.80 | 75.33 | 82.67 | 79.00 | 0.58 | 0.79 | |
| J48 | 75.43 | 80.58 | 78.01 | 0.56 | 0.82 | 80.00 | 76.00 | 78.00 | 0.56 | 0.79 | |
| Naive Bayes | Default | 74.57 | 65.46 | 70.02 | 0.40 | 0.80 | 80.00 | 69.33 | 74.67 | 0.50 | 0.82 |
Performance of different machine learning methods on diatom composition.
| SVM | 90.38 | 86.43 | 88.40 | 0.77 | 0.93 | 85.33 | 96.67 | 91.00 | 0.83 | 0.97 | |
| Random Forest | Ntree = 30 | 88.49 | 88.49 | 88.49 | 0.77 | 0.94 | 85.33 | 82.00 | 83.67 | 0.67 | 0.93 |
| SMO | 86.25 | 89.00 | 87.63 | 0.75 | 0.87 | 86.67 | 84.00 | 85.33 | 0.71 | 0.85 | |
| J48 | 82.47 | 81.10 | 81.79 | 0.64 | 0.81 | 85.33 | 81.33 | 83.33 | 0.67 | 0.82 | |
| Naive Bayes | Default | 71.65 | 70.45 | 71.05 | 0.42 | 0.78 | 72.67 | 66.67 | 69.67 | 0.39 | 0.77 |
Performance of different machine learning methods on 2D descriptors.
| SVM | 89.00 | 84.48 | 86.75 | 0.74 | 0.92 | 86.00 | 82.67 | 84.33 | 0.69 | 0.92 | |
| Random Forest | Ntree = 60 | 92.78 | 91.90 | 92.34 | 0.85 | 0.97 | 94.67 | 88.67 | 91.67 | 0.83 | 0.97 |
| SMO | 83.16 | 86.38 | 84.77 | 0.70 | 0.84 | 81.33 | 87.33 | 84.33 | 0.69 | 0.84 | |
| J48 | 89.52 | 88.79 | 89.16 | 0.78 | 0.89 | 90.00 | 87.33 | 88.67 | 0.77 | 0.89 | |
| Naive Bayes | Default | 75.09 | 78.79 | 76.94 | 0.54 | 0.85 | 74.67 | 77.33 | 76.00 | 0.52 | 0.84 |
Performance of different machine learning methods on 3D descriptors.
| SVM | 76.29 | 74.40 | 75.34 | 0.51 | 0.80 | 71.14 | 73.15 | 72.15 | 0.44 | 0.80 | |
| Random Forest | Ntree = 700 | 80.93 | 72.16 | 76.55 | 0.53 | 0.85 | 79.87 | 67.11 | 73.49 | 0.47 | 0.83 |
| SMO | 69.42 | 72.85 | 71.13 | 0.42 | 0.71 | 63.09 | 76.51 | 69.80 | 040 | 0.69 | |
| J48 | 74.74 | 76.12 | 75.43 | 0.51 | 0.78 | 72.48 | 74.50 | 73.49 | 0.47 | 0.78 | |
| Naive Bayes | Default | 69.24 | 74.40 | 71.82 | 0.44 | 0.78 | 69.80 | 75.84 | 72.82 | 0.46 | 0.79 |
Performance of different machine learning methods on fingerprints.
| SVM | 90.19 | 88.12 | 89.16 | 0.78 | 0.95 | 93.33 | 89.33 | 91.33 | 0.83 | 0.96 | |
| Random Forest | Ntree = 600 | 94.32 | 90.19 | 92.25 | 0.85 | 0.98 | 96.67 | 88.00 | 92.33 | 0.85 | 0.98 |
| SMO | 85.54 | 85.03 | 85.28 | 0.71 | 0.85 | 88.67 | 85.33 | 87.00 | 0.74 | 0.87 | |
| J48 | 90.02 | 89.33 | 89.67 | 0.79 | 0.89 | 88.67 | 88.67 | 88.67 | 0.77 | 0.90 | |
| Naive Bayes | Default | 86.40 | 84.34 | 85.37 | 0.71 | 0.90 | 82.67 | 85.33 | 84.00 | 0.68 | 0.90 |
Performance of different machine learning methods on 2D, 3D and fingerprints collectively.
| SVM | 83.33 | 79.21 | 81.27 | 0.63 | 0.89 | 78.67 | 82.67 | 80.67 | 0.61 | 0.87 | |
| Random Forest | Ntree = 60 | 95.19 | 95.02 | 95.10 | 0.90 | 0.99 | 91.33 | 93.33 | 92.33 | 0.85 | 0.98 |
| SMO | 76.80 | 76.98 | 76.89 | 0.54 | 0.76 | 75.33 | 83.33 | 79.33 | 0.59 | 0.79 | |
| J48 | 89.69 | 87.63 | 88.66 | 0.77 | 0.90 | 84.67 | 92.00 | 88.33 | 0.77 | 0.92 | |
| Naive Bayes | Default | 95.19 | 88.14 | 91.67 | 0.84 | 0.95 | 92.00 | 89.33 | 90.67 | 0.81 | 0.96 |
Figure 4ROC curve showing performance of models on various structural features.