| Literature DB >> 34916947 |
Yuxin Gong1,2,3, Bo Liao1,2,3, Peng Wang1,2,3, Quan Zou4.
Abstract
Drug targets are biological macromolecules or biomolecule structures capable of specifically binding a therapeutic effect with a particular drug or regulating physiological functions. Due to the important value and role of drug targets in recent years, the prediction of potential drug targets has become a research hotspot. The key to the research and development of modern new drugs is first to identify potential drug targets. In this paper, a new predictor, DrugHybrid_BS, is developed based on hybrid features and Bagging-SVM to identify potentially druggable proteins. This method combines the three features of monoDiKGap (k = 2), cross-covariance, and grouped amino acid composition. It removes redundant features and analyses key features through MRMD and MRMD2.0. The cross-validation results show that 96.9944% of the potentially druggable proteins can be accurately identified, and the accuracy of the independent test set has reached 96.5665%. This all means that DrugHybrid_BS has the potential to become a useful predictive tool for druggable proteins. In addition, the hybrid key features can identify 80.0343% of the potentially druggable proteins combined with Bagging-SVM, which indicates the significance of this part of the features for research.Entities:
Keywords: CC; GAAC; bagging; monoDiKGap; support vector machine
Year: 2021 PMID: 34916947 PMCID: PMC8669608 DOI: 10.3389/fphar.2021.771808
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.810
FIGURE 1Flow chart of DrugHybrid_BS model (A) Process the referenced dataset (B) Three single feature representation methods were used to extract features (C) Combine three single feature representation methods and select the best hybrid feature (D) Use MRMD to remove redundant features and MRMD2.0 to obtain key features (E) Feature subsets were used to predict potentially druggable proteins through the optimized Bagging-SVM model (F) Evaluate model prediction effects based on performance indicators.
FIGURE 2Sample distribution of dataset.
FIGURE 3Comparison the ACC values of the full feature set and 466-dimensional feature subset extracted by monoDiKGap(k = 2) under different classifiers.
Compare the results of different feature methods under different classifiers.
| Method | Classifier | ACC(%) | TPR | FPR | Precision | F-score | auROC |
|---|---|---|---|---|---|---|---|
| monoDiKGap (k = 2) | SVM | 96.608 | 0.965 | 0.033 | 0.960 | 0.962 | 0.966 |
| KNN | 58.437 | 0.083 | 0.004 | 0.946 | 0.152 | 0.628 | |
| RF | 85.272 | 0.788 | 0.094 | 0.873 | 0.828 | 0.928 | |
| CC | SVM | 57.364 | 0.243 | 0.155 | 0.563 | 0.339 | 0.544 |
| KNN | 58.437 | 0.625 | 0.449 | 0.533 | 0.575 | 0.599 | |
| RF | 63.718 | 0.569 | 0.306 | 0.604 | 0.586 | 0.679 | |
| GAAC | SVM | 77.286 | 0.768 | 0.223 | 0.739 | 0.753 | 0.772 |
| KNN | 74.882 | 0.745 | 0.248 | 0.712 | 0.728 | 0.807 | |
| RF | 76.084 | 0.729 | 0.213 | 0.738 | 0.733 | 0.850 |
Performance comparison of different feature combinations under SVM classifiers.
| Method | ACC(%) | TPR | FPR | Precision | F-score | auROC |
|---|---|---|---|---|---|---|
| monoDiKGap + CC | 96.651 | 0.967 | 0.034 | 0.959 | 0.963 | 0.967 |
| monoDiKGap + GAAC | 96.350 | 0.958 | 0.032 | 0.961 | 0.959 | 0.963 |
| CC + GAAC | 78.360 | 0.770 | 0.206 | 0.755 | 0.801 | 0.782 |
| monoDiKGap + CC + GAAC | 96.651 | 0.961 | 0.029 | 0.965 | 0.963 | 0.966 |
FIGURE 4ROC curves of support vector machines in different kernel functions.
Performance comparison of hybrid features under different kernel functions.
| Kernel function | ACC(%) | TPR | FPR | Precision | F-score | auROC |
|---|---|---|---|---|---|---|
| liner kernel | 96.651 | 0.961 | 0.029 | 0.965 | 0.963 | 0.966 |
| polynomial kernel | 95.535 | 0.953 | 0.043 | 0.948 | 0.951 | 0.955 |
| RBF | 85.745 | 0.730 | 0.038 | 0.940 | 0.822 | 0.846 |
Performance comparison of hybrid features with different penalty parameter C values under linear kernel.
| C Values | ACC(%) | TPR | FPR | Precision | F-score | auROC |
|---|---|---|---|---|---|---|
| 1 | 96.651 | 0.961 | 0.029 | 0.965 | 0.963 | 0.966 |
| 10 | 96.651 | 0.960 | 0.030 | 0.965 | 0.963 | 0.966 |
| 100 | 96.608 | 0.960 | 0.029 | 0.965 | 0.962 | 0.966 |
| 1,000 | 96.608 | 0.960 | 0.029 | 0.965 | 0.962 | 0.966 |
Comparison of classification performance of hybrid features before and after using MRMD feature selection.
| Number of feature | ACC(%) | TPR | FPR | Precision | F-score | auROC |
|---|---|---|---|---|---|---|
| 483 | 96.651 | 0.961 | 0.029 | 0.965 | 0.963 | 0.966 |
| 472 | 96.694 | 0.959 | 0.027 | 0.967 | 0.963 | 0.966 |
FIGURE 5The accuracy of hybrid features in predicting potential druggable proteins under the Bagging-SVM classification algorithm where the number of base models was 1–20.
Comparison of prediction performance with other algorithms.
| Method | ACC(%) | TPR | FPR | Precision | F-score | auROC |
|---|---|---|---|---|---|---|
| DrugHybrid_BS(This paper) | 96.994 | 0.970 | 0.030 | 0.963 | 0.967 | 0.992 |
| DrugHybrid_KNN | 58.652 | 0.587 | 0.502 | 0.729 | 0.473 | 0.625 |
| DrugHybrid_SVM | 96.694 | 0.959 | 0.027 | 0.967 | 0.963 | 0.966 |
| DrugHybrid_RF | 87.763 | 0.834 | 0.087 | 0.888 | 0.860 | 0.949 |
| DrugHybrid_BS(Original dataset) | 100 | 1.000 | 0.000 | 1.000 | 1.000 | 1.000 |
| Jamali et al. ( | 89.78 | 0.901 | 0.106 | 0.901 | 0.901 | 0.959 |
| Lin et al. ( | 93.78 | 0.928 | 0.056 | 0.942 | 0.936 | 0.978 |
FIGURE 6Details of the training set and independent test set.
FIGURE 7The relationship between the number of features extracted by the three methods and the accuracy of predicting potentially druggable proteins (A) CC (B) GAAC, and (C) monoDiKGap.
Key feature details of each feature representation method.
| Method | Key features | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| GAAC | Aromatic group | Uncharge group | |||||||
| CC | (mass, hydrophobicity,1) | (mass, hydrophilicity,1) | (hydrophilicity, mass,1) | ||||||
| (mass, hydrophobicity,2) | (hydrophobicity, mass,2) | (hydrophobicity, mass,1) | |||||||
| (hydrophilicity, mass,2) | (hydrophilicity, hydrophobicity,1) | ||||||||
| monoDiKGap | C_ _NQ | C_ _RT | E_ DT | W_ _PR | E_ _VW | T_ _IL | T_ _PN | ||
| I_RH | Q_ _SA | K_ _IY | L_ _HY | N_TD | T_YK | E_ _DI | |||
| Y_ _LI | R_ _MH | T_ _YY | N_DD | P_RQ | R_ _CT | S_ _GL | |||
| E_VC | P_NY | D_KK | N_PK | F_ _LK | — | — | |||