| Literature DB >> 32153395 |
Vinod Kumar1,2, Rajesh Kumar1,2, Piyush Agrawal1,2, Sumeet Patiyal1, Gajendra P S Raghava1.
Abstract
In the present study, a systematic effort has been made to predict the hemolytic potency of chemically modified peptides. All models have been trained, tested, and evaluated on a dataset that contains 583 modified hemolytic peptides and a balanced number of non-hemolytic peptides. Machine learning techniques have been used to build the classification models using an immense range of peptide features that include 2D, 3D descriptors, fingerprints, atom, and diatom compositions. Random Forest based model developed using fingerprints as an input feature achieved maximum accuracy of 78.33% with AUC of 0.86 on the main dataset and accuracy of 78.29% with AUC of 0.85 on the validation dataset. Models developed in this study have been incorporated in a web server "HemoPImod" to facilitate the scientific community (http://webs.iiitd.edu.in/raghava/hemopimod/).Entities:
Keywords: HemoPImod; chemical descriptors; fingerprints; machine learning; modified hemolytic peptides; random forest
Year: 2020 PMID: 32153395 PMCID: PMC7045810 DOI: 10.3389/fphar.2020.00054
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.810
Figure 1Figure illustrating mechanism of hemolysis by peptides.
Performance achieved by scikit ML on the composition of the atoms.
| Methods (Parameters) | Main Dataset | Validation Dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUC | Sen | Spc | Acc | MCC | AUC | |
| RF (n_estimators = 100) | 72.75 | 68.24 | 70.49 | 0.41 | 0.81 | 68.89 | 70.43 | 69.66 | 0.39 | 0.78 |
| KNN (n_neighbors = 5,algorithm = ‘brute',weights = ‘distance') | 70.39 | 68.24 | 69.31 | 0.39 | 0.79 | 68.38 | 71.79 | 70.09 | 0.4 | 0.80 |
| Ridge (alpha = 0.01) | 54.51 | 59.23 | 56.87 | 0.14 | 0.72 | 52.99 | 58.12 | 55.56 | 0.11 | 0.71 |
| Extratree (n_estimators = 60) | 74.03 | 66.74 | 70.39 | 0.41 | 0.81 | 74.87 | 68.03 | 71.45 | 0.43 | 0.82 |
Performance achieved by scikit ML on the composition of the diatom.
| Methods (Parameters) | Main Dataset | Validation Dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUC | Sen | Spc | Acc | MCC | AUC | |
| RF (n_estimators = 100) | 73.61 | 74.03 | 73.82 | 0.48 | 0.83 | 78.46 | 74.36 | 76.41 | 0.53 | 0.86 |
| KNN | 72.32 | 61.59 | 66.95 | 0.34 | 0.81 | 73.5 | 72.65 | 73.08 | 0.46 | 0.84 |
| Ridge (alpha = 1) | 57.51 | 57.51 | 57.51 | 0.15 | 0.72 | 55.56 | 63.25 | 59.4 | 0.19 | 0.75 |
| Extratree (n_estimators = 200) | 75.54 | 73.18 | 74.36 | 0.49 | 0.87 | 77.78 | 74.19 | 75.98 | 0.52 | 0.88 |
Performance achieved by scikit ML on the 2D descriptors.
| Methods (Parameters) | Main Dataset | Validation Dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUC | Sen | Spc | Acc | MCC | AUC | |
| RF (n_estimators = 1000) | 79.37 | 72.49 | 75.88 | 0.52 | 0.83 | 76.76 | 75.69 | 76.21 | 0.52 | 0.81 |
| KNN (n_neighbors = 8,algorithm = ‘kd_tree',weights = ‘distance') | 67.94 | 61.35 | 64.6 | 0.29 | 0.72 | 61.26 | 63.79 | 62.56 | 0.25 | 0.67 |
| Ridge (alpha = 1) | 71.08 | 78.38 | 74.78 | 0.5 | 0.81 | 58.56 | 82.76 | 70.93 | 0.43 | 0.74 |
| Extratree (n_estimator = 40) | 76.01 | 73.8 | 74.89 | 0.5 | 0.82 | 73.87 | 77.07 | 75.51 | 0.51 | 0.80 |
Performance achieved by scikit ML on the 3D descriptors.
| Methods (Parameters) | Main Dataset | Validation Dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUC | Sen | Spc | Acc | MCC | AUC | |
| RF (n_estimators = 800) | 60.22 | 66.09 | 63.16 | 0.26 | 0.69 | 58.29 | 65.64 | 61.97 | 0.24 | 0.67 |
| KNN (n_neighbors = 10,algorithm = ‘ball_tree',weights = ‘distance') | 60.43 | 54.94 | 57.68 | 0.15 | 0.61 | 51.28 | 58.97 | 55.13 | 0.1 | 0.59 |
| Ridge (alpha = 0.01) | 58.71 | 60.73 | 59.72 | 0.19 | 0.65 | 49.57 | 63.25 | 56.41 | 0.13 | 0.59 |
| Extratree (n_estimator = 70) | 65.59 | 65.88 | 65.74 | 0.31 | 0.70 | 61.71 | 64.79 | 63.25 | 0.27 | 0.68 |
Performance achieved by scikit ML on the fingerprints descriptors.
| Methods (Parameters) | Main Dataset | Validation Dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUC | Sen | Spc | Acc | MCC | AUC | |
| RF (n_estimators = 800) | 77.51 | 77.16 | 78.33 | 0.56 | 0.86 | 80.85 | 75.73 | 78.29 | 0.57 | 0.85 |
| KNN (n_neighbors = 8,algorithm = ‘ball_tree',weights = ‘distance') | 75.98 | 70.69 | 73.32 | 0.47 | 0.81 | 77.78 | 70.94 | 74.36 | 0.49 | 0.79 |
| Ridge (alpha = 1) | 75.76 | 71.55 | 73.64 | 0.47 | 0.81 | 76.92 | 70.09 | 73.5 | 0.47 | 0.80 |
| Extratree (n_estimator = 300) | 78.6 | 73.71 | 76.14 | 0.52 | 0.84 | 78.46 | 75.56 | 77.01 | 0.54 | 0.82 |
Performance achieved by scikit ML on the 2D, 3D, and fingerprints descriptors.
| Methods (Parameters) | Main Dataset | Validation Dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sen | Spc | Acc | MCC | AUC | Sen | Spc | Acc | MCC | AUC | |
| RF (n_estimators = 200) | 77.73 | 79.09 | 78.42 | 0.57 | 0.86 | 79.83 | 77.09 | 78.46 | 0.57 | 0.84 |
| KNN (n_neighbors = 10,algorithm = ‘kd_tree',weights = ‘distance') | 62.88 | 62.5 | 62.69 | 0.25 | 0.67 | 49.57 | 58.97 | 54.27 | 0.09 | 0.60 |
| Ridge (alpha = 1) | 62.45 | 53.02 | 57.7 | 0.16 | 0.61 | 63.25 | 48.72 | 55.98 | 0.12 | 0.58 |
| Extratree (n_estimator = 1000) | 80.35 | 74.35 | 77.33 | 0.55 | 0.85 | 82.05 | 72.31 | 77.18 | 0.55 | 0.83 |
Figure 2Outcome of the model on various structural features as a ROC curve.