| Literature DB >> 35356288 |
Jiacheng Sun1,2,3, You Lu1,2,3, Linqian Cui1,2,3, Qiming Fu1,2,3, Hongjie Wu1, Jianping Chen2,4.
Abstract
Calculating and predicting drug-target interactions (DTIs) is a crucial step in the field of novel drug discovery. Nowadays, many models have improved the prediction performance of DTIs by fusing heterogeneous information, such as drug chemical structure and target protein sequence and so on. However, in the process of fusion, how to allocate the weight of heterogeneous information reasonably is a huge challenge. In this paper, we propose a model based on Q-learning algorithm and Neighborhood Regularized Logistic Matrix Factorization (QLNRLMF) to predict DTIs. First, we obtain three different drug-drug similarity matrices and three different target-target similarity matrices by using different similarity calculation methods based on heterogeneous data, including drug chemical structure, target protein sequence and drug-target interactions. Then, we initialize a set of weights for the drug-drug similarity matrices and target-target similarity matrices respectively, and optimize them through Q-learning algorithm. When the optimal weights are obtained, a new drug-drug similarity matrix and a new drug-drug similarity matrix are obtained by linear combination. Finally, the drug target interaction matrix, the new drug-drug similarity matrices and the target-target similarity matrices are used as inputs to the Neighborhood Regularized Logistic Matrix Factorization (NRLMF) model for DTIs. Compared with the existing six methods of NetLapRLS, BLM-NII, WNN-GIP, KBMF2K, CMF, and NRLMF, our proposed method has achieved better effect in the four benchmark datasets, including enzymes(E), nuclear receptors (NR), ion channels (IC) and G protein coupled receptors (GPCR).Entities:
Keywords: drug similarity; drug-target interactions; heterogeneous information fusion; q-learning; target similarity; weight distribution
Year: 2022 PMID: 35356288 PMCID: PMC8959213 DOI: 10.3389/fcell.2022.794413
Source DB: PubMed Journal: Front Cell Dev Biol ISSN: 2296-634X
Information about the four datasets.
| Dataset | E | IC | GPCR | NR |
|---|---|---|---|---|
| Drugs(n) | 445 | 210 | 223 | 54 |
| Targets(m) | 664 | 204 | 95 | 26 |
| Interactions | 2926 | 1476 | 635 | 90 |
Summary of similarity matrix of two feature spaces.
| Space | Similarity matrix | Description |
|---|---|---|
| Drug |
| Chemical structure |
|
| Cosine similarity of drugs | |
|
| Jaccard similarity of drugs | |
| Target |
| Target sequence information |
|
| Cosine similarity of target | |
|
| Jaccard similarity of the target |
FIGURE 1Reinforcement learning.
Q-Table.
| Q-Table |
|
|
|---|---|---|
|
| q ( | q ( |
|
| q ( | q ( |
|
| q ( | q ( |
FIGURE 2Algorithm flow chart.
FIGURE 3Four datasets convergence graphs. (A) The convergence graph of NR dataset. (B) The convergence graph of GPCR dataset. (C) The convergence graph of IC dataset. (D) The convergence graph of E dataset.
Time contrast between Q-learning algorithm and Brute Force algorithm.
| Dataset | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Avg | Avg/ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NR | 1023 | 1001 | 1032 | 1027 | 1029 | 987 | 1008 | 1016 | 1035 | 1005 | 1016 | 0.784 |
| GPCR | 1033 | 1069 | 1001 | 1050 | 1015 | 1055 | 1050 | 1015 | 1050 | 1024 | 1036 | 0.799 |
| IC | 997 | 1021 | 1030 | 1034 | 1045 | 1034 | 1027 | 1012 | 1006 | 1017 | 1022 | 0.789 |
| E | 1045 | 994 | 1036 | 1039 | 1031 | 1001 | 1032 | 1064 | 982 | 1026 | 1025 | 0.791 |
FIGURE 4The AUC of QLNRLMF and Other Methods on Benchmark Datasets. (A) The AUC of QLNRLMF and Other Methods on NR Dataset. (B) The AUC of QLNRLMF and Other Methods on GPCR Dataset. (C) The AUC of QLNRLMF and Other Methods on IC Dataset. (D) The AUC of QLNRLMF and Other Methods on E Dataset.
FIGURE 5The AUPR of QLNRLMF and Other Methods on Benchmark Datasets. (A) The AUPR of QLNRLMF and Other Methods on NR Dataset. (B) The AUPR of QLNRLMF and Other Methods on GPCR Dataset. (C) The AUPR of QLNRLMF and Other Methods on IC Dataset. (D) The AUPR of QLNRLMF and Other Methods on E Dataset.
Comparsion with the other seven methods.
| AUC | ||||||||
|---|---|---|---|---|---|---|---|---|
| Dataset | NetLapRLS | BLM-NII | WNN-GIP | KBMF2K | CMF | Mean-weighted | NRLMF | QLNRLMF |
| NR | 0.849 | 0.905 | 0.903 | 0.876 | 0.864 | 0.967 | 0.948 |
|
| GPCR | 0.914 | 0.943 | 0.935 | 0.919 | 0.929 | 0.965 | 0.960 |
|
| IC | 0.959 | 0.981 | 0.958 | 0.958 | 0.868 | 0.978 | 0.983 |
|
| E | 0.972 | 0.969 | 0.963 | 0.898 | 0.917 | 0.966 |
| 0.971 |
| Avg | 0.923 | 0.949 | 0.939 | 0.912 | 0.894 | 0.969 | 0.966 |
|
| AUPR | ||||||||
| NR | 0.464 | 0.659 | 0.594 | 0.534 | 0.583 | 0.884 | 0.722 |
|
| GPCR | 0.615 | 0.514 | 0.471 | 0.570 | 0.629 | 0.801 | 0.702 |
|
| IC | 0.823 | 0.821 | 0.670 | 0.765 | 0.585 | 0.881 | 0.863 |
|
| E | 0.794 | 0.703 | 0.698 | 0.650 | 0.637 | 0.871 |
| 0.871 |
| Avg | 0.674 | 0.674 | 0.608 | 0.629 | 0.608 | 0.859 | 0.790 |
|
The values in bold mean the best result for each line.