| Literature DB >> 31993067 |
Lingling Zhao1, Junjie Wang1, Long Pang2, Yang Liu1, Jun Zhang3.
Abstract
The computational prediction of interactions between drugs and targets is a standing challenge in drug discovery. State-of-the-art methods for drug-target interaction prediction are primarily based on supervised machine learning with known label information. However, in biomedicine, obtaining labeled training data is an expensive and a laborious process. This paper proposes a semi-supervised generative adversarial networks (GANs)-based method to predict binding affinity. Our method comprises two parts, two GANs for feature extraction and a regression network for prediction. The semi-supervised mechanism allows our model to learn proteins drugs features of both labeled and unlabeled data. We evaluate the performance of our method using multiple public datasets. Experimental results demonstrate that our method achieves competitive performance while utilizing freely available unlabeled data. Our results suggest that utilizing such unlabeled data can considerably help improve performance in various biomedical relation extraction processes, for example, Drug-Target interaction and protein-protein interaction, particularly when only limited labeled data are available in such tasks. To our best knowledge, this is the first semi-supervised GANs-based method to predict binding affinity.Entities:
Keywords: convolutional neural networks; deep learning; drug-target affinity prediction; generative adversarial networks; semi-supervised
Year: 2020 PMID: 31993067 PMCID: PMC6962343 DOI: 10.3389/fgene.2019.01243
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Data set.
| Proteins | Compounds | Interactions | |
|---|---|---|---|
| Davis | 442 | 68 | 30056 |
| KIBA | 229 | 2111 | 118254 |
Figure 1Summary of the KIBA (left panel) and Davis (right panel) data sets.
Figure 2Pipeline overview. We train the GANs on the unlabeled data set. Compound SMILES and protein sequences are encoded and two independent GANs are applied to generate the fake samples. The trained discriminator of the GANs can then be used to project the labeled data sets into a feature latent space. Based on this feature, we train a convolutional regression to predict the DT binding affinity.
Figure 3Architecture of the generator and discriminator networks in the proposed method.
CI and MSE scores for the Davis dataset on the independent test for our method and other methods.
| Method | Protein rep. | Compound rep. | CI | MSE |
|---|---|---|---|---|
| DeepDTA | Smith-Waterman | Pubchem-Sim | 0.790 | 0.608 |
| DeepDTA | Smith-Waterman | CNN | 0.886 | 0.420 |
| DeepDTA | CNN | Pubchem-Sim | 0.835 | 0.419 |
| DeepDTA | CNN | Pubchem-Sim | 0.878 | 0.261 |
| KronRLS | Smith-Waterman | Pubchem-Sim | 0.871 | 0.379 |
| SimBoost | Smith-Waterman | Pubchem-Sim | 0.872 | 0.282 |
| GANsDTA | GAN | GAN | 0.276 |
Bolded texts mean the best results.
CI and MSE scores for the Kiba dataset on the independent test.
| Method | Protein rep. | Compound rep. | CI | MSE |
|---|---|---|---|---|
| DeepDTA | Smith-Waterman | Pubchem-Sim | 0.710 | 0.502 |
| DeepDTA | Smith-Waterman | CNN | 0.854 | 0.204 |
| DeepDTA | CNN | Pubchem-Sim | 0.718 | 0.571 |
| DeepDTA | CNN | CNN | 0.863 | 0.194 |
| KronRLS | Smith-Waterman | Pubchem-Sim | 0.782 | 0.411 |
| SimBoost | Smith-Waterman | Pubchem-Sim | 0.836 | 0.222 |
| GANsDTA | GAN | GAN | 0.224 |
Bolded texts mean the best results.
index and AUPR score for the Davis dataset.“4 index and AUPR score for the Davis dataset.”
| Method | Protein rep. | Compound rep. |
| AUPR |
|---|---|---|---|---|
| DeepDTA | CNN | CNN | 0.630 | 0.714 |
| KronRLS | Smith-Waterman | Pubchem-Sim | 0.407 | 0.661 |
| SimBoost | Smith-Waterman | Pubchem-Sim | 0.644 | 0.709 |
| GANsDTA | GAN | GAN | 0.653 | 0.691 |
The index and AUPR score for the KIBA dataset.
| Method | Protein rep. | Compound rep. |
| AUPR |
|---|---|---|---|---|
| DeepDTA | CNN | CNN | 0.673 | 0.788 |
| KronRLS | Smith-Waterman | Pubchem-Sim | 0.342 | 0.635 |
| SimBoost | Smith-Waterman | Pubchem-Sim | 0.629 | 0.760 |
| GANsDTA | GAN | GAN | 0.675 | 0.753 |
Figure 4Predictions from DeepDTA model with two CNN blocks against measured (real) binding affinity values for Davis (pKd) and KIBA (KIBA score) datasets.