| Literature DB >> 35922768 |
Yuni Zeng1, Xiangru Chen2, Dezhong Peng2,3,4, Lijun Zhang5,6, Haixiao Huang7.
Abstract
BACKGROUND: Drug-target interaction (DTI) prediction plays a crucial role in drug discovery. Although the advanced deep learning has shown promising results in predicting DTIs, it still needs improvements in two aspects: (1) encoding method, in which the existing encoding method, character encoding, overlooks chemical textual information of atoms with multiple characters and chemical functional groups; as well as (2) the architecture of deep model, which should focus on multiple chemical patterns in drug and target representations.Entities:
Keywords: Deep learning; Drug–target interaction; Representations learning; Self-attention networks
Mesh:
Substances:
Year: 2022 PMID: 35922768 PMCID: PMC9347097 DOI: 10.1186/s12859-022-04857-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1Our proposed multi-granularity multi-scaled SAN model for DTI prediction
Examples of segmented outputs of ’COC1=C(C=C2C(=C1)N=CN=C2NC3 =C(C(=CC=C3)Cl)F)CN4CCCC[C@@H]4C(=O)N’ with different T
| Segmented SMILES Sequence (Vocabulary) | |
|---|---|
| 1 | COC1=C(C=C, 2C(=C1), N=C, N=C2, N, C3=C, (C(=CC=C, |
| 3)C, l), F)C, N, 4, CCCC, [C@@H]4, C(=O)N | |
| 5 | CO, C1=C, (C=C, 2, C(=C1), N=C, N=C, 2, N, C3=C, (, |
| C(=CC=C, 3)C, l), F, )C, N, 4, CCCC, [C@@H], 4, C(=O), N | |
| 25 | C, O, C1=C, (C=C, 2, C(=C, 1), N=C, N=C, 2, N, C3, |
| =C, (, C(, =CC=C, 3, )C, l, ), F, )C, N, 4, CCCC, | |
| [C@@H], 4, C(=O), N |
Fig. 2Our proposed multi-scaled SAN block
The summary of the benchmark datasets
| Proteins | Compounds | Interactions | Training Data | Test Data | |
|---|---|---|---|---|---|
| Davis | 442 | 68 | 30056 | 25046 | 5010 |
| KIBA | 229 | 2111 | 118254 | 98545 | 19709 |
Summary of parameter settings
| Parameter | KIBA | Davis |
|---|---|---|
| 80 | 36 | |
| 800 | 900 | |
| 0,1,2,3 | 0,1,2,3 | |
| 128 | 64 | |
| 2 | 2 | |
| 2 | 1 | |
| Hidden size in FFN | 1024,1024,512,1 | |
| Epoch | 300 | 300 |
| Dropout | 0.1 | 0.1 |
| Optimizer | Adam | Adam |
| Learning rate | 0.0001 | 0.0001 |
Fig. 3Results of DeepDTA [6] model on the KIBA dataset with different multi-granularity representations as inputs. These multi-granularity representations are encoded by BPE algorithm with different threshold T. Here, is the threshold T for drug segmentation and is the threshold T for protein segmentation
Fig. 4Results of DeepDTA [6] model on the Davis dataset with different multi-granularity representations as inputs. These multi-granularity representations are encoded by BPE algorithm with different threshold T. Here, is the threshold T for drug segmentation and is the threshold T for protein segmentation
Results of DeepDTA (CNN model) on KIBA and Davis dataset with character-based and multi-granularity encoding. Especially, the character-based encoding methods is original labelling method in DeepDTA [6]
| Encoding Method | CI | MSE | ||
|---|---|---|---|---|
| KIBA | Character Encoding | 0.863 (0.002) | 0.194 | 0.673 (0.009) |
| Multi-Granularity | ||||
| Davis | Character Encoding | 0.878 (0.004) | 0.261 | 0.630 (0.017) |
| Multi-Granularity |
Bold values indicate the best results on the datasets
Results of deep models on KIBA and Davis dataset with multi-granularity representations as inputs
| Deep models | CI | MSE | ||
|---|---|---|---|---|
| KIBA | CNNs | 0.863 (0.002) | 0.194 | 0.673 (0.009) |
| SANs | 0.875 (0.003) | 0.179 | 0.691 (0.019) | |
| Multi-scaled SANs | ||||
| Davis | CNNs | 0.878 (0.004) | 0.261 | 0.630 (0.017) |
| SANs | 0.888 (0.004) | |||
| Multi-scaled SANs | 0.233 | 0.681(0.014) |
Bold values indicate the best results on the datasets
Results on KIBA and Davis of our proposed multi-granularity multi-scaled SANs model, transitional methods and existing deep sequence representation methods
| Method | Drug | Protein | Interaction | CI | MSE | ||
|---|---|---|---|---|---|---|---|
| KIBA | KronRLS [ | Pubchem Sim | S-W | – | 0.782 (0.001) | 0.411 | 0.342 (0.001) |
| SimBoost [ | Pubchem Sim | S-W | – | 0.836 (0.001) | 0.222 | 0.629 (0.007) | |
| DeepDTA [ | CNNs | CNNs | Concatenation | 0.863 (0.002) | 0.194 | 0.673 (0.009) | |
| MT-DTI [ | SANs | SANs | Concatenation | 0.882 (0.002) | 0.738 (0.006) | ||
| GANsDTA [ | GANs | GANs | Concatenation | 0.866 (−) | 0.224 | 0.675 (−) | |
| CrossAttentionDTI [ | Cross SANs | Cross SANs | Concatenation | 0.874 (0.001) | 0.175 | – | |
| Ours | MSSAN | MSSAN | Concatenation | 0.155 | |||
| Davis | KronRLS [ | Pubchem Sim | S-W | – | 0.871 (0.001) | 0.379 | 0.407 (0.005) |
| SimBoost [ | Pubchem Sim | S-W | – | 0.872 (0.001) | 0.282 | 0.644 (0.006) | |
| DeepDTA [ | CNNs | CNNs | Concatenation | 0.878 (0.004) | 0.261 | 0.630 (0.017) | |
| MT-DTI [ | SANs | SANs | Concatenation | 0.887 (0.003) | 0.245 | 0.665 (0.014) | |
| GANsDTA [ | GANs | GANs | Concatenation | 0.881 (−) | 0.276 | 0.653 (−) | |
| CrossAttentionDTI [ | Cross SANs | Cross SANs | Concatenation | 0.876 (0.006) | 0.244 | – | |
| Ours | MSSAN | MSSAN | Concatenation |
Bold values indicate the best results on the datasets