| Literature DB >> 34875000 |
Shagahyegh Sadeghi1, Jianguo Lu1, Alioune Ngom1.
Abstract
MOTIVATION: Drug repurposing is a potential alternative to the traditional drug discovery process. Drug repurposing can be formulated as a recommender system that recommends novel indications for available drugs based on known drug-disease associations. This paper presents a method based on non-negative matrix factorization (NMF-DR) to predict the drug-related candidate disease indications. This work proposes a recommender system-based method for drug repurposing to predict novel drug indications by integrating drug and diseases related data sources. For this purpose, this framework first integrates two types of disease similarities, the associations between drugs and diseases, and the various similarities between drugs from different views to make a heterogeneous drug-disease interaction network. Then, an improved non-negative matrix factorization-based method is proposed to complete the drug-disease adjacency matrix with predicted scores for unknown drug-disease pairs.Entities:
Year: 2021 PMID: 34875000 PMCID: PMC8825773 DOI: 10.1093/bioinformatics/btab826
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Block diagram of NMF-DR: a computational drug repurposing system
Notations and variables used in this article
| Symbol | Description |
|---|---|
|
| Drug–disease association matrix |
|
| Drug–drug and disease–disease similarity matrix |
|
| Normalized matrix |
|
| Drug–disease adjacency matrix |
| Sr | Suitable rank |
|
| Low-rank matrices |
|
| Completed adjacency matrix |
|
| Similarity network normalization and fusion |
| WGC | Weighted graph construction |
| A-MDL | Accelerated minimum description length |
| A-HALS | Accelerated-hierarchical alternating least square |
| MU | Multiplicative updates |
| PG | Projected gradient |
The gold standard datasets used in this study
| Datasets | Drugs (Registered By) | Diseases (Listed By) | Known Relations | Sparsity |
|---|---|---|---|---|
| PREDICT dataset ( | 593 (DrugBank database) | 313 (OMIM database) | 1933 |
|
| TL-HBGI dataset ( | 1409 (DrugBank database) | 5080 (OMIM database) | 1461 |
|
| DrugNet dataset ( | 1490 (DrugBank database) | 4516 (Disease Ontology) ( | 1008 |
|
| CDataset ( | 663 (DrugBank database) | 409 (OMIM database) | 2532 |
|
The Performance of different methods on different datasets
|
|
| |||
|---|---|---|---|---|
| 10-fold cross validation | 5-fold cross validation | |||
| Method | PREDICT Dataset | DrugNet Dataset | CDataSet | TL-HGBI Dataset |
| HGBI ( | 0.82 | — | 0.85 | — |
| TL-HGBI ( | — | — | — | 0.95 |
| DrugNet ( | 0.77 | 0.94 | 0.8 | — |
| MBiRW ( | 0.91 | 0.95 | 0.93 | — |
| RWHNDR( | 0.92 | — | 0.94 | — |
| NTSIM ( | — | — | — | 0.96 |
| DRRS ( | 0.93 | 0.93 | 0.94 | — |
| ANMF ( | 0.93 | — | 0.95 | — |
| KBMF ( | 0.91 | — | 0.92 | — |
| MSBMF ( | 0.94 | — | 0.95 | — |
| SCMFDD ( | — | — | — | 0.97 |
| PREDICT ( | 0.89 | — | — | — |
| PreDR ( | 0.86 | — | — | — |
| SMKF ( | 0.91 | — | — | — |
|
|
|
|
|
|
The best methods and results are indicated in bold.
Pre-processing phase: a comparative view of the composition of different similarity measures with and without SN2F in the proposed NMF-DR framework on PREDICT dataset (AUC)
|
| ||
|---|---|---|
| Drug similarities | Disease Similarities | |
| Phenotype similarity | Semantic phenotypic similarity | |
| Drug chemical similarity | 0.909018 | 0.914015 |
| Drug side effect similarity |
|
|
| Closeness in a PPI network | 0.905549 | 0.911368 |
| GO similarity | 0.908694 | 0.91467 |
| Sequence similarity | 0.879688 | 0.890363 |
|
| ||
| Drug similarities | Disease Similarities | |
| Phenotype similarity | Semantic phenotypic similarity | |
| Drug chemical similarity | 0.914314 | 0.916921 |
| Drug side effect similarity | 0.91585 | 0.917676 |
| Closeness in a PPI network | 0.915723 | 0.912903 |
| GO similarity | 0.914404 | 0.91481 |
| Sequence similarity |
|
|
|
|
| |
The best methods and results are indicated in bold.
Prediction phase: a comparative overview of the results of each step of NMF-DR method on PREDICT dataset
|
| |
|---|---|
| Method | AUC |
| Default rank | 0.9153593 |
| Pre-Processing phase + default rank | 0.91837 |
| A-MDL selected rank | 0.969322 |
| Pre-Processing phase + A-MDL selected rank |
|
|
| |
| Method | AUC |
| Random initialization | 0.9153593 |
| Multi-SVD initialization | 0.919802 |
| Default rank+ multi-SVD |
|
| Pre-Processing+ default rank +random initialization | 0.914306 |
| Pre-Processing + default rank + multi-SVD initialization | 0.917966 |
| Pre-Processing +selected r using A-MDL+ random initialization | 0.9701681 |
| Pre-Processing + selected r using A-MDL + multi-SVD initialization | 0.9715765 |
|
| |
| Methods | AUC |
| ALS | 0.9153593 |
| HALS | 0.9715765 |
| A-HALS |
|
The best methods and results are indicated in bold.
Fig. 2.(a) A demonstration of the impact of SVD-based initialization methods compared with other methods on PREDICT dataset. (b) Comparisons of matrix decomposition algorithms and their accelerated versions on PREDICT dataset
The comparison of the proposed initialization method (Multi-SVD) with some of the existing initialization methods based on AUC (PREDICT dataset)
| Initialization methods | AUC |
|---|---|
| NNDSVD | 0.9689702 |
| SVD-NMF | 0.9685116 |
| NNSVD-LRC | 0.9672019 |
| Random | 0.9671383 |
| Multi-SVD |
|
The best methods and results are indicated in bold.
Comparison of MU, HALS and PG algorithms and their accelerated model for final error on PREDICT dataset
| Method | Final error |
|---|---|
| MU | 15.867820 |
| Accelerated MU | 15.520418 |
| HALS | 15.242294 |
| Accelerated HALS |
|
| PG | 16.475821 |
| Accelerated PG | 16.069888 |
The best methods and results are indicated in bold.