| Literature DB >> 28453576 |
Qi Wang1, YangHe Feng1, JinCai Huang1, TengJiao Wang2, GuangQuan Cheng1.
Abstract
The identification of drug target proteins (IDTP) plays a critical role in biometrics. The aim of this study was to retrieve potential drug target proteins (DTPs) from a collected protein dataset, which represents an overwhelming task of great significance. Previously reported methodologies for this task generally employ protein-protein interactive networks but neglect informative biochemical attributes. We formulated a novel framework utilizing biochemical attributes to address this problem. In the framework, a biased support vector machine (BSVM) was combined with the deep embedded representation extracted using a deep learning model, stacked auto-encoders (SAEs). In cases of non-drug target proteins (NDTPs) contaminated by DTPs, the framework is beneficial due to the efficient representation of the SAE and relief of the imbalance effect by the BSVM. The experimental results demonstrated the effectiveness of our framework, and the generalization capability was confirmed via comparisons to other models. This study is the first to exploit a deep learning model for IDTP. In summary, nearly 23% of the NDTPs were predicted as likely DTPs, which are awaiting further verification based on biomedical experiments.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28453576 PMCID: PMC5409512 DOI: 10.1371/journal.pone.0176486
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Results of the K-S Test.
The p-values of the different properties of DTPs and contaminated NDTPs according to the Kolmogorov-Smirnov test.
Parameter settings in the SAE.
| Parameters used for SAE training | |
|---|---|
| nb_epoch | 100 |
| batch_size | 100 |
| optimizer | adadelta |
| loss | mean_square |
| training_ratio | 70% |
| validation_ratio | 30% |
The optimal parameters for BSVM (SAE), BSVM (Wrapper), and BSVM (Origin).
| Parameters | BSVM(SAE) | BSVM(Wrapper) | BSVM(Origin) |
|---|---|---|---|
| 8.5 | 5 | 9.503 | |
| 47.17 | 4.5 | 8.552 | |
| 5.24 | 0.5 | 0.95 |
Statistical results of the average of 10 iterations of the three models.
Figures in parentheses are the corresponding variance of the 10 independent results.
| Dataset | F1 score-DTPs | Recall Ratio-DTPs | Precision-NDTPs | |
|---|---|---|---|---|
| 0.349(0.179) | ||||
| 0.587(0.027) | ||||
| 0.179(0.087) | 0.482(0.131) | 0.926(0.006) | ||
| 0.169(0.016) | 0.451(0.141) | |||
| 0(0) | 0(0) | 0.91(0) |