| Literature DB >> 35213534 |
Daniel Domingo-Fernández1, Yojana Gadiya1, Abhishek Patel1, Sarah Mubeen2, Daniel Rivas-Barragan3, Chris W Diana1, Biswapriya B Misra1, David Healey1, Joe Rokicki1, Viswa Colluru1.
Abstract
Network-based approaches are becoming increasingly popular for drug discovery as they provide a systems-level overview of the mechanisms underlying disease pathophysiology. They have demonstrated significant early promise over other methods of biological data representation, such as in target discovery, side effect prediction and drug repurposing. In parallel, an explosion of -omics data for the deep characterization of biological systems routinely uncovers molecular signatures of disease for similar applications. Here, we present RPath, a novel algorithm that prioritizes drugs for a given disease by reasoning over causal paths in a knowledge graph (KG), guided by both drug-perturbed as well as disease-specific transcriptomic signatures. First, our approach identifies the causal paths that connect a drug to a particular disease. Next, it reasons over these paths to identify those that correlate with the transcriptional signatures observed in a drug-perturbation experiment, and anti-correlate to signatures observed in the disease of interest. The paths which match this signature profile are then proposed to represent the mechanism of action of the drug. We demonstrate how RPath consistently prioritizes clinically investigated drug-disease pairs on multiple datasets and KGs, achieving better performance over other similar methodologies. Furthermore, we present two case studies showing how one can deconvolute the predictions made by RPath as well as predict novel targets.Entities:
Mesh:
Year: 2022 PMID: 35213534 PMCID: PMC8906585 DOI: 10.1371/journal.pcbi.1009909
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Evaluation of RPath in multiple datasets across the two KGs using precision.
Each row corresponds to the results of running RPath on a specific drug-disease dataset combination. The second and fourth columns show the performance that is expected to be achieved by chance.
| - | OpenBioLink KG | Custom KG | ||
|---|---|---|---|---|
| Dataset combination | Precision (TP/TP+FP) | Expected precision by chance | Precision (TP/TP+FP) | Expected precision by chance |
|
| 80% (4/5) | 17.42%% | 66.67% (2/3) | 13.74% |
|
| 54.55% (6/11) | 15.01% | 50% (2/4) | 9.62% |
|
| 50% (1/2) | 32.66% | 0% (0/1) | 24.40% |
|
| 50% (1/2) | 41.15% | 50% (1/2) | 34.08% |
Top 5 prioritized protein target-disease pairs.
These results were obtained by running RPath over both KGs with the GEO and Open Targets datasets using the same path length as the drug discovery task (see . Pairs were prioritized based on the number of concordant paths. The vast majority of pairs were prioritized using the disease transcriptomic signatures from the GEO dataset given its larger coverage of measured genes compared to Open Targets (.
| Protein target | Disease | Concordant paths | Nodes in the concordant paths | KG | Transcriptomic dataset |
|---|---|---|---|---|---|
| NOG | AML | 18,456 | 1,008 | Custom KG | GEO |
| PRKCA | AML | 12,861 | 669 | Custom KG | GEO |
| CXCL8 / IL-8 | AML | 7,234 | 465 | Custom KG | GEO |
| NOG | Plasma cell myeloma | 5743 | 616 | Custom KG | GEO |
| CDC42 | Medulloblastoma | 5,651 | 91 | OpenBioLink | GEO |