| Literature DB >> 33070179 |
Kexin Huang1, Cao Xiao2, Lucas M Glass2, Jimeng Sun3.
Abstract
MOTIVATION: Drug-target interaction (DTI) prediction is a foundational task for in-silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (i) existing molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce results that are less accurate and difficult to explain and (ii) existing methods focus on limited labeled data while ignoring the value of massive unlabeled molecular data.Entities:
Year: 2021 PMID: 33070179 PMCID: PMC8098026 DOI: 10.1093/bioinformatics/btaa880
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.MolTrans workflow: (a) MolTrans utilizes vast unlabeled data. (b) Given the input pair of drug S and protein A, MolTrans extracts a sequence of sub-structures and via Algorithm 1. (c) Each sub-structure is embedded into a latent feature vector and through a learnable embedding table via Equation (1). Then, drug/protein sequence of sub-structure embedding is fed into drug/target transformer encoders, respectively, to obtain an augmented contextual representation and via Equation (2). (d) An interaction map I measuring interaction intensity among sub-structures is generated via Equation (3). The interaction is further optimized by a CNN layer that models higher-order interaction, which results in a tensor O via Equation (4). (e) A decoder module then feed the tensor for a classifier to output the DTI probability P via Equation (5). All modules are trained end-to-end with the binary classification loss via Equation (6)
Dataset statistics
| Dataset | # Drugs | # Proteins | # Pos Interactions | # Neg Interactions |
|---|---|---|---|---|
| BIOSNAP | 4510 | 2181 | 9619/1374/2748 | 9619/1374/2748 |
| DAVIS | 68 | 379 | 1043/160/303 | 1043/2846/5708 |
| BindingDB | 10 665 | 1413 | 6334/927/1905 | 6334/5717/11 384 |
Note: For the number of interactions columns, we include training/validation/testing interactions statistics in onefold of data.
Performance comparison (five random runs)
| Method | ROC-AUC | PR-AUC | Sensitivity | Specificity | Threshold |
|---|---|---|---|---|---|
| Dataset 1: BIOSNAP | |||||
| LR |
|
|
|
| 0.434 |
| DNN |
|
|
|
| 0.499 |
| GNN-CPI |
|
|
|
| 0.349 |
| DeepDTI |
|
|
|
| 0.347 |
| DeepDTA |
|
|
|
| 0.466 |
| DeepConv-DTI |
|
|
|
| 0.441 |
| MolTrans |
|
|
|
| 0.431 |
| Dataset 2: DAVIS | |||||
| LR |
|
|
|
| 0.399 |
| DNN |
|
|
|
| 0.489 |
| GNN-CPI |
|
|
|
| 0.487 |
| DeepDTI |
|
|
|
| 0.387 |
| DeepDTA |
|
|
|
| 0.482 |
| DeepConv-DTI |
|
|
|
| 0.438 |
| MolTrans |
|
|
|
| 0.447 |
| Dataset 3: BindingDB | |||||
| LR |
|
|
|
| 0.394 |
| DNN |
|
|
|
| 0.371 |
| GNN-CPI |
|
|
|
| 0.406 |
| DeepDTI |
|
|
|
| 0.060 |
| DeepDTA |
|
|
|
| 0.305 |
| DeepConv-DTI |
|
|
|
| 0.318 |
| MolTrans |
|
|
|
| 0.355 |
Note: MolTrans achieves the best predictive performance across all datasets. The bold value corresponds to the best performance method for each metric.
MolTrans has competitive result in both unseen drug and protein settings (shown avg. ROC-AUC of five random runs) on BIOSNAP dataset
| Settings | DeepDTI | DeepDTA | DeepConv-DTI | MolTrans |
|---|---|---|---|---|
| Unseen drugs | 0.843 ± 0.003 | 0.849 ± 0.007 | 0.847 ± 0.009 |
|
| Unseen proteins | 0.759 ± 0.029 | 0.767 ± 0.022 | 0.766 ± 0.022 |
|
Note: The best performing three baselines are used for comparison.
MolTrans provides best result in high fraction of missing data (shown avg. ROC-AUC of five random runs)
| Settings (%) | DeepDTI | DeepDTA | DeepConv-DTI | MolTrans |
|---|---|---|---|---|
| 70 | 0.853 ± 0.004 | 0.838 ± 0.004 | 0.845 ± 0.003 |
|
| 80 | 0.828 ± 0.007 | 0.821 ± 0.008 | 0.825 ± 0.003 |
|
| 90 | 0.767 ± 0.010 | 0.787 ± 0.011 | 0.792 ± 0.004 |
|
| 95 | 0.659 ± 0.011 | 0.762 ± 0.004 | 0.726 ± 0.008 |
|
Note: The best performing three baselines are used for comparison.
Fig. 2.MolTrans is robust in different protein families
Fig. 3.The interaction map on the contributions of sub-structures in DTI, shown as drug 2-nonyl n-oxide interacts with protein cytochrome b-c1 complex unit 10
Ablation study (five random runs)
| Setup | ROC-AUC | PR-AUC |
|---|---|---|
| MolTrans |
|
|
| −CNN | 0.876 ± 0.003 | 0.883 ± 0.006 |
| −AugEmbed | 0.876 ± 0.004 | 0.870 ± 0.004 |
| −Interaction | 0.847 ± 0.003 |
|
| Small | 0.888 ± 0.001 | 0.888 ± 0.007 |
| −FCS |
| 0.887 ± 0.004 |