| Literature DB >> 35365726 |
Rahu Sikander1, Ali Ghulam2, Farman Ali3.
Abstract
Accurate identification of drug-targets in human body has great significance for designing novel drugs. Compared with traditional experimental methods, prediction of drug-targets via machine learning algorithms has enhanced the attention of many researchers due to fast and accurate prediction. In this study, we propose a machine learning-based method, namely XGB-DrugPred for accurate prediction of druggable proteins. The features from primary protein sequences are extracted by group dipeptide composition, reduced amino acid alphabet, and novel encoder pseudo amino acid composition segmentation. To select the best feature set, eXtreme Gradient Boosting-recursive feature elimination is implemented. The best feature set is provided to eXtreme Gradient Boosting (XGB), Random Forest, and Extremely Randomized Tree classifiers for model training and prediction. The performance of these classifiers is evaluated by tenfold cross-validation. The empirical results show that XGB-based predictor achieves the best results compared with other classifiers and existing methods in the literature.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35365726 PMCID: PMC8976041 DOI: 10.1038/s41598-022-09484-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Schematic view of the proposed model.
Figure 2Simple architecture of XGB.
Hyperparameters of the proposed model.
| Hyperparameter | Value |
|---|---|
| No. of estimator | 500 |
| Era | 0.1 |
| Max depth | 8 |
| lambda | 1 |
| alpha | 1 |
Performance of classifiers before feature selection.
| Classifier | Feature descriptor | Acc (%) | Sn (%) | Sp (%) | F-measure (%) | MCC | |
|---|---|---|---|---|---|---|---|
| ERT | RAAA | 81.82 | 88.10 | 75.59 | 82.84 | 0.64 | |
| GDPC | 84.65 | 83.04 | 85.92 | 82.67 | 0.68 | ||
| S-PseAAC | 89.33 | 88.89 | 89.76 | 89.24 | 0.78 | ||
| All features | 88.14 | 87.83 | 88.41 | 88.69 | 0.80 | ||
| RF | RAAA | 82.61 | 86.51 | 78.74 | 83.21 | 0.65 | |
| GDPC | 83.86 | 83.93 | 83.80 | 82.10 | 0.67 | ||
| S-PseAAC | 89.72 | 87.30 | 92.13 | 89.43 | 0.79 | ||
| All features | 90.12 | 85.22 | 94.20 | 88.69 | 0.80 | ||
| XGB | RAAA | 83.79 | 84.92 | 82.95 | 83.92 | 0.67 | |
| GDPC | 86.22 | 80.36 | 90.85 | 83.72 | 0.72 | ||
| S-PseAAC | 90.51 | 91.27 | 89.76 | 90.55 | 0.81 | ||
| All features | 92.09 | 91.30 | 92.75 | 91.30 | 0.84 | ||
Performance of classifiers after feature selection.
| Classifier | Feature descriptor | Acc (%) | Sn (%) | Sp (%) | F-measure (%) | MCC | |
|---|---|---|---|---|---|---|---|
| ERT | RAAA | 82.21 | 84.91 | 79.53 | 82.63 | 0.64 | |
| GDPC | 81.10 | 77.44 | 85.12 | 81.10 | 0.62 | ||
| S-PseAAC | 90.12 | 84.82 | 94.33 | 88.37 | 0.80 | ||
| All features | 92.09 | 91.96 | 92.20 | 91.15 | 0.84 | ||
| RF | RAAA | 83.40 | 83.33 | 83.46 | 83.33 | 0.66 | |
| GDPC | 82.28 | 77.45 | 87.60 | 82.07 | 0.65 | ||
| S-PseAAC | 90.91 | 84.85 | 85.73 | 89.20 | 0.81 | ||
| All features | 93.28 | 92.86 | 93.62 | 92.44 | 0.86 | ||
| XGB | RAAA | 84.82 | 84.92 | 82.68 | 83.92 | 0.67 | |
| GDPC | 83.07 | 81.95 | 84.30 | 83.52 | 0.66 | ||
| S-PseAAC | 91.70 | 88.39 | 94.33 | 90.41 | 0.83 | ||
| All features | 94.86 | 93.75 | 95.74 | 94.17 | 0.89 | ||
Comparison with existing predictors.
| Predictor | Acc (%) | Sn (%) | Sp (%) | MCC |
|---|---|---|---|---|
| PseAAC-DPC-RS | 90.98 | 87.88 | 94.11 | 0.82 |
| Jamali et al | 92.10 | 92.80 | 91.34 | 0.84 |
| GA-Bagging-SVM | 93.78 | 92.86 | 94.45 | 0.87 |
| XGB-DrugPred | 94.86 | 93.75 | 95.74 | 0.89 |
Figure 3ROC curves of the proposed and existing methods.