| Literature DB >> 32596314 |
Xiaoyi Guo1, Wei Zhou1, Yan Yu1, Yijie Ding2, Jijun Tang3,4, Fei Guo3.
Abstract
All drugs usually have side effects, which endanger the health of patients. To identify potential side effects of drugs, biological and pharmacological experiments are done but are expensive and time-consuming. So, computation-based methods have been developed to accurately and quickly predict side effects. To predict potential associations between drugs and side effects, we propose a novel method called the Triple Matrix Factorization- (TMF-) based model. TMF is built by the biprojection matrix and latent feature of kernels, which is based on Low Rank Approximation (LRA). LRA could construct a lower rank matrix to approximate the original matrix, which not only retains the characteristics of the original matrix but also reduces the storage space and computational complexity of the data. To fuse multivariate information, multiple kernel matrices are constructed and integrated via Kernel Target Alignment-based Multiple Kernel Learning (KTA-MKL) in drug and side effect space, respectively. Compared with other methods, our model achieves better performance on three benchmark datasets. The values of the Area Under the Precision-Recall curve (AUPR) are 0.677, 0.685, and 0.680 on three datasets, respectively.Entities:
Year: 2020 PMID: 32596314 PMCID: PMC7275954 DOI: 10.1155/2020/4675395
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1The schematic diagram of associations between drugs and side effects.
Figure 2An example of the fingerprint vector.
Summary of kernels for two feature spaces.
| Chemical fingerprint (drug space) | Side effect profiles (drug space) | Drug profiles (side effect space) | |
|---|---|---|---|
| GIP |
|
|
|
| COS |
|
|
|
| Corr |
|
|
|
| MI |
|
|
|
Figure 3Overview of our method.
Algorithm 1Algorithm of our method.
Three benchmark datasets.
| Datasets | Drugs | Side effects | Associations |
|---|---|---|---|
| Pauwels's dataset | 888 | 1385 | 61,102 |
| Mizutani's dataset | 658 | 1339 | 49,051 |
| Liu's dataset | 832 | 1385 | 59,205 |
Figure 4The AUC and AUPR values (under different r and r).
The performance of different kernels via 5-fold Cross-Validation.
| Models | Pauwels's dataset | Mizutani's dataset | Liu's dataset | |||
|---|---|---|---|---|---|---|
| AUPR | AUC | AUPR | AUC | AUPR | AUC | |
|
| 0.4420 | 0.8950 | 0.4735 | 0.9148 | 0.4718 | 0.9145 |
|
| 0.4892 | 0.8994 | 0.5343 | 0.9070 | 0.5224 | 0.9067 |
|
| 0.4994 | 0.8981 | 0.5217 | 0.9005 | 0.5143 | 0.9026 |
|
| 0.4978 | 0.9079 | 0.5591 | 0.9214 | 0.5529 | 0.9238 |
|
| 0.6254 | 0.9300 | 0.6623 | 0.9376 | 0.6574 | 0.9398 |
|
| 0.5861 | 0.9035 | 0.6324 | 0.9090 | 0.6252 | 0.9087 |
|
| 0.5833 | 0.8999 | 0.6123 | 0.9014 | 0.6047 | 0.9013 |
|
| 0.6557 | 0.9428 | 0.6615 | 0.9369 | 0.6587 | 0.9408 |
| Mean weightedc | 0.6598 | 0.9353 | 0.6724 | 0.9280 | 0.6651 | 0.9285 |
| KTA-MKLc | 0.6765 | 0.9434 | 0.6847 | 0.9409 | 0.6801 | 0.9426 |
aThe TMF uses the drug fingerprint and drug profile for side effects. bThe TMF uses the side effect profile for drugs and drug profile for side effects. cThe TMF uses the drug fingerprint, side effect profile for drugs, and drug profile for side effects.
Figure 5The ROC and PR curves of different models (single kernel and multiple kernels).
The kernel weights on three datasets.
| Kernel | Pauwels's dataset | Mizutani's dataset | Liu's dataset |
|---|---|---|---|
|
| 0.1159 | 0.1168 | 0.1167 |
|
| 0.1224 | 0.1226 | 0.1226 |
|
| 0.1200 | 0.1203 | 0.1203 |
|
| 0.1113 | 0.1122 | 0.1116 |
|
| 0.0596 | 0.0621 | 0.0613 |
|
| 0.1538 | 0.1533 | 0.1528 |
|
| 0.1507 | 0.1498 | 0.1497 |
|
| 0.1664 | 0.1628 | 0.1650 |
|
| 0.0151 | 0.0173 | 0.0152 |
|
| 0.3286 | 0.3374 | 0.3380 |
|
| 0.2909 | 0.2865 | 0.2855 |
|
| 0.3654 | 0.3588 | 0.3613 |
Comparison to existing methods via 5-fold Cross-Validation.
| Datasets | Methods | AUPR | AUC |
|---|---|---|---|
| Pauwels | Pauwels's methoda | 0.389 ± N/A | 0.897 ± N/A |
| Liu's methoda | 0.345 ± N/A | 0.920 ± N/A | |
| Cheng's methoda | 0.588 ± N/A | 0.922 ± N/A | |
| RBMBMa [ | 0.612 ± N/A | 0.941 ± N/A | |
| INBMa [ | 0.641 ± N/A | 0.934 ± N/A | |
| Ensemble modela [ | 0.660 ± N/A | 0.949 ± N/A | |
| CMFb | 0.646 ± 0.007 | 0.939 ± 0.005 | |
| GRMFb | 0.643 ± 0.006 | 0.937 ± 0.005 | |
| NRLMFb | 0.654 ± 0.005 | 0.954 ± 0.005 | |
| LGCb | 0.668 ± 0.008 | 0.952 ± 0.007 | |
| Our method | 0.677 ± 0.004 | 0.943 ± 0.003 | |
|
| |||
| Mizutani | Mizutani's methoda | 0.412 ± N/A | 0.890 ± N/A |
| Liu's methoda | 0.366 ± N/A | 0.918 ± N/A | |
| Cheng's methoda | 0.599 ± N/A | 0.923 ± N/A | |
| RBMBMa [ | 0.619 ± N/A | 0.939 ± N/A | |
| INBMa [ | 0.646 ± N/A | 0.932 ± N/A | |
| Ensemble modela [ | 0.666 ± N/A | 0.946 ± N/A | |
| CMFb | 0.645 ± 0.005 | 0.938 ± 0.006 | |
| GRMFb | 0.646 ± 0.007 | 0.937 ± 0.007 | |
| NRLMFb | 0.660 ± 0.006 | 0.950 ± 0.005 | |
| LGCb | 0.673 ± 0.007 | 0.948 ± 0.007 | |
| Our method | 0.685 ± 0.006 | 0.941 ± 0.008 | |
|
| |||
| Liu | Liu's methoda | 0.278 ± N/A | 0.907 ± N/A |
| Cheng's methoda | 0.592 ± N/A | 0.922 ± N/A | |
| RBMBMa [ | 0.616 ± N/A | 0.941 ± N/A | |
| INBMa [ | 0.641 ± N/A | 0.934 ± N/A | |
| Ensemble modela [ | 0.661 ± N/A | 0.948 ± N/A | |
| CMFb | 0.649 ± 0.006 | 0.938 ± 0.005 | |
| GRMFb | 0.650 ± 0.007 | 0.938 ± 0.008 | |
| NRLMFb | 0.656 ± 0.005 | 0.953 ± 0.006 | |
| LGCb | 0.670 ± 0.008 | 0.951 ± 0.007 | |
| Our method | 0.680 ± 0.005 | 0.943 ± 0.006 | |
aResults are derived from [26]. bResults are derived from [18].
Comparison with MF-based models via 5-fold local Cross-Validation.
| Datasets | Methods | AUPR | AUC |
|---|---|---|---|
| Pauwels | CMF∗ | 0.382 ± 0.006 | 0.894 ± 0.004 |
| GRMF∗ | 0.358 ± 0.008 | 0.883 ± 0.005 | |
| NRLMF∗ | 0.374 ± 0.007 | 0.886 ± 0.004 | |
| Our method | 0.392 ± 0.008 | 0.889 ± 0.004 | |
|
| |||
| Mizutani | CMF∗ | 0.395 ± 0.005 | 0.889 ± 0.004 |
| GRMF∗ | 0.392 ± 0.008 | 0.890 ± 0.006 | |
| NRLMF∗ | 0.390 ± 0.006 | 0.882 ± 0.005 | |
| Our method | 0.399 ± 0.013 | 0.886 ± 0.003 | |
|
| |||
| Liu | CMF∗ | 0.393 ± 0.007 | 0.894 ± 0.005 |
| GRMF∗ | 0.379 ± 0.008 | 0.895 ± 0.006 | |
| NRLMF∗ | 0.398 ± 0.006 | 0.897 ± 0.004 | |
| Our method | 0.401 ± 0.015 | 0.891 ± 0.004 | |
∗Results are derived from [18].
Figure 6The ROC and PR curves of different methods via 5 local CV.
Top 10 ranks of predictive side effects for drug caffeine.
| Side effect | Score | Ranks | Confirmed |
|---|---|---|---|
| Diarrhea | 0.3992 | 1 | Yes |
| Diabetic neuropathy | 0.3893 | 2 | Yes |
| Varicocele | 0.3844 | 3 | Yes |
| Gynecomastia | 0.3815 | 4 | Yes |
| Conjunctivitis | 0.3794 | 5 | Yes |
| Telangiectasia | 0.3737 | 6 | No |
| Lump | 0.3663 | 7 | Yes |
| Dyskinesia | 0.3638 | 8 | No |
| Palpitations | 0.3632 | 9 | No |
| Fecal incontinence | 0.3563 | 10 | Yes |
Top 10 ranks of predictive side effects for drug captopril.
| Side effect | Score | Ranks | Confirmed |
|---|---|---|---|
| Diarrhea | 0.4150 | 1 | No |
| Diabetic neuropathy | 0.4043 | 2 | Yes |
| Varicocele | 0.4004 | 3 | Yes |
| Conjunctivitis | 0.3973 | 4 | Yes |
| Gynecomastia | 0.3938 | 5 | Yes |
| Myoglobinuria | 0.3885 | 6 | No |
| Esophageal varices | 0.3854 | 7 | Yes |
| Lump | 0.3806 | 8 | Yes |
| Palpitations | 0.3770 | 9 | No |
| Eclampsia | 0.3674 | 10 | Yes |
The running time (seconds) via 5-fold Cross-Validation.
| Model | Pauwels | Mizutani | Liu |
|---|---|---|---|
| Our method | 977 | 873 | 929 |
| LGC [ | 1290 | 1170 | 1211 |
| CMF [ | 910 | 757 | 846 |
| GRMF [ | 1360 | 1175 | 1282 |
| NRLMF [ | 1966 | 1250 | 1911 |
| Ensemble model [ | 4330 | 2715 | 3611 |