| Literature DB >> 30423097 |
Hakime Öztürk1, Arzucan Özgür1, Elif Ozkirimli2.
Abstract
Motivation: The identification of novel drug-target (DT) interactions is a substantial part of the drug discovery process. Most of the computational methods that have been proposed to predict DT interactions have focused on binary classification, where the goal is to determine whether a DT pair interacts or not. However, protein-ligand interactions assume a continuum of binding strength values, also called binding affinity and predicting this value still remains a challenge. The increase in the affinity data available in DT knowledge-bases allows the use of advanced learning techniques such as deep learning architectures in the prediction of binding affinities. In this study, we propose a deep-learning based model that uses only sequence information of both targets and drugs to predict DT interaction binding affinities. The few studies that focus on DT binding affinity prediction use either 3D structures of protein-ligand complexes or 2D features of compounds. One novel approach used in this work is the modeling of protein sequences and compound 1D representations with convolutional neural networks (CNNs).Entities:
Mesh:
Substances:
Year: 2018 PMID: 30423097 PMCID: PMC6129291 DOI: 10.1093/bioinformatics/bty593
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Summary of the datasets
| Proteins | Compounds | Interactions | |
|---|---|---|---|
| Davis (Kd) | 442 | 68 | 30 056 |
| KIBA | 229 | 2111 | 118 254 |
Fig. 1.Summary of the Davis (left panel) and KIBA (right panel) datasets. (A) Distribution of binding affinity values. (B) Distribution of the lengths of the SMILES strings. (C) Distribution of the lengths of the protein sequences
Fig. 2.DeepDTA model with two CNN blocks to learn from compound SMILES and protein sequences
Fig. 3.Experiment setup
The average CI and MSE scores of the test set trained on five different training sets for the Davis dataset
| Proteins | Compounds | CI (std) | MSE | |
|---|---|---|---|---|
| KronRLS ( | S–W | Pubchem Sim | 0.871 (0.0008) | 0.379 |
| SimBoost ( | S–W | Pubchem Sim | 0.872 (0.002) | 0.282 |
| DeepDTA | S–W | Pubchem Sim | 0.790 (0.009) | 0.608 |
| DeepDTA | CNN | Pubchem Sim | 0.835 (0.005) | 0.419 |
| DeepDTA | S–W | CNN | 0.886 (0.008) | 0.420 |
| DeepDTA | CNN | CNN | 0.878 (0.004) | 0.261 |
Note: The standard deviations are given in parenthesis.
Parameter settings for CNN based DeepDTA model
| Parameters | Range |
|---|---|
| Number of filters | 32*1; 32*2; 32*3 |
| Filter length (compounds) | [4, 6, 8] |
| Filter length (proteins) | [4, 8, 12] |
| epoch | 100 |
| hidden neurons | 1024; 1024; 512 |
| batch size | 256 |
| dropout | 0.1 |
| optimizer | Adam |
| learning rate (lr) | 0.001 |
The average CI and MSE scores of the test set trained on five different training sets for the KIBA dataset
| Proteins | Compounds | CI (std) | MSE | |
|---|---|---|---|---|
| KronRLS ( | S–W | Pubchem Sim | 0.782 (0.0009) | 0.411 |
| SimBoost ( | S–W | Pubchem Sim | 0.836 (0.001) | 0.222 |
| DeepDTA | S–W | Pubchem Sim | 0.710 (0.002) | 0.502 |
| DeepDTA | CNN | Pubchem Sim | 0.718 (0.004) | 0.571 |
| DeepDTA | S–W | CNN | 0.854 (0.001) | 0.204 |
| DeepDTA | CNN | CNN | 0.863 (0.002) | 0.194 |
Note: The standard deviations are given in parenthesis.
The average and AUPR scores of the test set trained on five different training sets for the Davis dataset
| Proteins | Compounds | AUPR (std) | ||
|---|---|---|---|---|
| KronRLS ( | S–W | Pubchem Sim | 0.407 (0.005) | 0.661 (0.010) |
| SimBoost ( | S–W | Pubchem Sim | 0.644 (0.006) | 0.709 (0.008) |
| DeepDTA | CNN | CNN | 0.630 (0.017) | 0.714 (0.010) |
Note: The standard deviations are given in parenthesis.
The average and AUPR scores of the test set trained on five different training sets for the KIBA dataset
| Proteins | Compounds | AUPR (std) | ||
|---|---|---|---|---|
| KronRLS ( | S–W | Pubchem Sim | 0.342 (0.001) | 0.635 (0.004) |
| SimBoost ( | S–W | Pubchem Sim | 0.629 (0.007) | 0.760 (0.003) |
| DeepDTA | CNN | CNN | 0.673 (0.009) | 0.788 (0.004) |
Note: The standard deviations are given in parenthesis.
Fig. 4.Predictions from DeepDTA model with two CNN blocks against measured (real) binding affinity values for Davis (pKd) and KIBA (KIBA score) datasets