| Literature DB >> 36232434 |
Chunyu Wang1, Yuanlong Chen1, Lingling Zhao1, Junjie Wang2, Naifeng Wen3.
Abstract
The prediction of the strengths of drug-target interactions, also called drug-target binding affinities (DTA), plays a fundamental role in facilitating drug discovery, where the goal is to find prospective drug candidates. With the increase in the number of drug-protein interactions, machine learning techniques, especially deep learning methods, have become applicable for drug-target interaction discovery because they significantly reduce the required experimental workload. In this paper, we present a spontaneous formulation of the DTA prediction problem as an instance of multi-instance learning. We address the problem in three stages, first organizing given drug and target sequences into instances via a private-public mechanism, then identifying the predicted scores of all instances in the same bag, and finally combining all the predicted scores as the output prediction. A comprehensive evaluation demonstrates that the proposed method outperforms other state-of-the-art methods on three benchmark datasets.Entities:
Keywords: drug–target binding affinity; multi-instance learning; transformer
Mesh:
Substances:
Year: 2022 PMID: 36232434 PMCID: PMC9569912 DOI: 10.3390/ijms231911136
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
The detailed statistics of the datasets.
| Dataset | # of Targets | # of Drugs | # of Interactions | Sparsity |
|---|---|---|---|---|
| Davis | 361 | 68 | 24,548 | 1 |
| KIBA | 229 | 2052 | 117,184 | 0.249 |
| BindingDB | 1615 | 129,109 | 144,525 | 0.0007 |
# means total number.
Figure 1The overlap between drugs and targets in the KIBA dataset. (a) drug overlap under the random setting; (b) target overlap under the random setting; (c) drug overlap under the blind setting; (d) targets overlap under the blind setting.
Comparison among the results obtained by the DMIL-PPDTA approach and the baseline methods across the datasets under the random splitting setting. The entries in boldface denote the best result for each metric, and the data in brackets represent standard deviations.
| Dataset | Method | CI | MSE | R |
|
|---|---|---|---|---|---|
| Davis | DMIL-PPDTA |
|
|
|
|
| DeepDTA | 0.875(0.006) | 0.239(0.019) | 0.802(0.008) | 0.571(0.026) | |
| GraphDTA | 0.866(0.005) | 0.240(0.009) | 0.793(0.003) | 0.621(0.009) | |
| ML-DTI | 0.863(0.005) | 0.234(0.012) | 0.802(0.009) | 0.601(0.032) | |
| KIBA | DMIL-PPDTA |
|
|
|
|
| DeepDTA | 0.868(0.001) | 0.188(0.002) | 0.857(0.002) | 0.697(0.014) | |
| GraphDTA | 0.838(0.003) | 0.208(0.005) | 0.838(0.005) | 0.696(0.012) | |
| ML-DTI | 0.861(0.002) | 0.189(0.003) | 0.854(0.004) | 0.702(0.015) | |
| BindingDB | DMIL-PPDTA |
|
|
|
|
| DeepDTA | 0.778(0.005) | 1.038(0.041) | 0.762(0.009) | 0.548(0.009) | |
| GraphDTA | – | – | – | – | |
| ML-DTI | 0.780(0.007) | 1.018(0.038) | 0.765(0.011) | 0.566(0.018) |
Comparison among the results produced by our DMIL-PPDTA approach and the baseline approaches across the datasets under the blind setting. The entries in boldface denote the best result for each metric, and the data in brackets represent standard deviations.
| Dataset | Method | CI | MSE | R |
|
|---|---|---|---|---|---|
| Davis | DMIL-PPDTA | 0.555(0.055) |
| 0.124(0.086) | 0.022(0.016) |
| DeepDTA |
| 0.771(0.236) |
|
| |
| GraphDTA | 0.618(0.030) | 0.787(0.077) | 0.235(0.088) | 0.061(0.047) | |
| ML-DTI | 0.626(0.038) | 0.725(0.146) | 0.246(0.063) | 0.062(0.028) | |
| KIBA | DMIL-PPDTA |
|
|
|
|
| DeepDTA | 0.642(0.007) | 0.591(0.046) | 0.453(0.030) | 0.182(0.024) | |
| GraphDTA | 0.597(0.014) | 0.633(0.031) | 0.369(0.023) | 0.125(0.016) | |
| ML-DTI | 0.633(0.015) | 0.614(0.014) | 0.412(0.031) | 0.147(0.022) | |
| BindingDB | DMIL-PPDTA |
|
|
|
|
| DeepDTA | 0.618(0.007) | 2.397(0.106) | 0.383(0.021) | 0.126(0.012) | |
| GraphDTA | – | – | – | – | |
| ML-DTI | 0.620(0.011) | 2.340(0.125) | 0.391(0.025) | 0.131(0.014) |
Ablation experiment results obtained by the DMIL-PPDTA approach on the BindingDB dataset under the random splitting setting. The entries in boldface denote the best result for each metric, and the data in brackets represent standard deviations.
| Private | Public-Late | Public-Early | CI | MSE | R |
|
|---|---|---|---|---|---|---|
| ✓ | 0.732(0.012) | 1.357(0.081) | 0.667(0.025) | 0.418(0.024) | ||
| ✓ | ✓ | 0.815(0.005) | 0.779(0.038) | 0.825(0.008) | 0.664(0.029) | |
| ✓ | 0.800(0.018) | 0.881(0.113) | 0.799(0.029) | 0.623(0.047) | ||
| ✓ | 0.799(0.013) | 0.889(0.092) | 0.798(0.021) | 0.609(0.052) | ||
| ✓ | 0.811(0.014) | 0.807(0.103) | 0.818(0.024) | 0.655(0.056) | ||
| ✓ | ✓ | 0.815(0.010) | 0.788(0.079) | 0.823(0.016) | 0.667(0.040) | |
| ✓ | ✓ | ✓ |
|
|
|
|
Figure 2A graphical illustration of DMIL-PPDTA.
Figure 3Illustration of the residual dilated GatedCNN module.
Figure 4Multi-head cross-attention module.