| Literature DB >> 31000794 |
Tareq B Malas1, Wytze J Vlietstra2, Roman Kudrin3, Sergey Starikov3, Mohammed Charrout1, Marco Roos1, Dorien J M Peters1, Jan A Kors2, Rein Vos2,4, Peter A C 't Hoen1,5, Erik M van Mulligen2, Kristina M Hettne6.
Abstract
Compounds that are candidates for drug repurposing can be ranked by leveraging knowledge available in the biomedical literature and databases. This knowledge, spread across a variety of sources, can be integrated within a knowledge graph, which thereby comprehensively describes known relationships between biomedical concepts, such as drugs, diseases, genes, etc. Our work uses the semantic information between drug and disease concepts as features, which are extracted from an existing knowledge graph that integrates 200 different biological knowledge sources. RepoDB, a standard drug repurposing database which describes drug-disease combinations that were approved or that failed in clinical trials, is used to train a random forest classifier. The 10-times repeated 10-fold cross-validation performance of the classifier achieves a mean area under the receiver operating characteristic curve (AUC) of 92.2%. We apply the classifier to prioritize 21 preclinical drug repurposing candidates that have been suggested for Autosomal Dominant Polycystic Kidney Disease (ADPKD). Mozavaptan, a vasopressin V2 receptor antagonist is predicted to be the drug most likely to be approved after a clinical trial, and belongs to the same drug class as tolvaptan, the only treatment for ADPKD that is currently approved. We conclude that semantic properties of concepts in a knowledge graph can be exploited to prioritize drug repurposing candidates for testing in clinical trials.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31000794 PMCID: PMC6472420 DOI: 10.1038/s41598-019-42806-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Example of correspondence between concepts, semantic types, and semantic groups. In total, the knowledge graph contains over 7 million concepts, each of which has been assigned one or more of 137 semantic types, and one of 15 semantic groups.
Figure 2Overview of the feature creation process. RepoDB, which was used to train the classifier, contains drug-disease combinations whose status have been set to “Approved” or “Terminated” based on the results of clinical trials. Both the direct and the two-step indirect paths between the drugs and diseases are extracted from the knowledge graph. Based on the availability of a direct path between the drug and the disease, a binary feature is created. For the indirect paths, the frequencies of the semantic types and semantic groups of the intermediate concepts (IC) are used to create the features. In this figure, IC 1 has the semantic type pharmacologic preparation (i.e. a drug), IC 2 & 4 has Sign or Symptom, and IC 3 has Enzyme. Their semantic groups are Chemicals & Drugs for IC 1 and 3, and Phenomena for IC 2 and 4. Based on the extracted features, the classifier is trained/cross-validated to classify the status of each drug-disease combination as “Approved” (A) or “Terminated” (T).
Number of unique drugs and diseases in the “Approved” and “Terminated” datasets. Each drug or disease could be part of multiple drug-disease combinations.
| Approved | Terminated | All | |
|---|---|---|---|
| No. of Drugs | 1407 | 373 | 1452 |
| No. of Diseases | 1111 | 719 | 1681 |
| No. of non-cancers | 787 | 226 | 933 |
| No. of cancers | 324 | 493 | 751 |
| No. of drug-disease combinations | 6044 | 2021 | 8065 |
| No. of direct paths | 3010 | 906 | 3916 |
Performance metrics achieved by each machine learning algorithm.
| Area under the ROC curve | Area under the precision-recall curve | F1 | Accuracy | Kappa | |
|---|---|---|---|---|---|
| Logistic Regression | 81.6 (1.9) | 90.8 (0.1) | 87.6 (0.1) | 79.8 (0.1) | 35.4 (0.3) |
| Neural Network | 89.1 (1.7) | 95.0 (0.3) | 90.9 (0.2) | 86.0 (0.3) | 60.7 (1.0) |
| SVM | 88.4 (1.8) | 94.8 (0.2) | 90.6 (0.1) | 84.7 (0.2) | 51.1 (0.9) |
| CART | 81.5 (2.0) | 89.9 (0.0) | 90.3 (0.1) | 84.9 (0.1) | 56.7 (0.3) |
| k-NN | 88.2 (1.3) | 94.2 (0.1) | 90.7 (0.1) | 85.5 (0.2) | 57.9 (0.4) |
| Naïve Bayes | 68.2 (1.8) | 82.8 (0.1) | 82.8 (0.1) | 72.8 (0.1) | 18.0 (0.2) |
| Random Forest | 92.2 (1.3) | 96.4 (0.1) | 92.9 (0.0) | 89.1 (0.1) | 68.7 (0.2) |
Values indicate mean and standard deviation (in %) of 10 repeats of a 10-fold cross-validation experiment.
Figure 3ROC curve of the 10-times repeated 10-fold cross-validation.
Figure 4Individual feature importance scores, as calculated with the standard feature importance calculation function of the random forest algorithm. The scale of the scores have been normalized.
Overview of drugs pre-clinically suggested for ADPKD.
| Identifier | Name | Mechanism of action | Selected recent reference (PMID) |
|---|---|---|---|
| UMLS C0077274 | Triptolide | intracellular calcium homeostasis | 24560027 |
| UMLS C2975283 | Mozavaptan | Vasopressin V2 receptor antagonist | 27578560 |
| UMLS C2607958 | Satavaptan | Vasopressin V2 receptor antagonist | 18945944 |
| UMLS C0028833 | Octreotide | Somatostatin receptor agonist | 26844873 |
| UMLS C1872203 | Pasireotide | Somatostatin receptor 2 agonist | 24994926 |
| UMLS C1174836 | SKI-606 | c-Src inhibitor | 18385429 |
| UMLS C1516119 | Sorafenib | Raf kinase inhibitor | 20810616 |
| UMLS C0755562 | U0126 | MEK Inhibitor | 18263604 |
| UMLS C1831731 | Bosutinib | Src/Bcr-Abl tyrosine kinase inhibitor | 28838955 |
| UMLS C0541315 | Everolimus | FK506-binding protein 1 A inhibitor | 25424440 |
| UMLS C0072980 | Sirolimus | FK506-binding protein 1 A inhibitor | 29880342 |
| UMLS C0025598 | Metformin | Mitochondrial complex I (NADH dehydrogenase) inhibitor | 21262823 |
| UMLS C0071097 | Pioglitazone | Peroxisome proliferator-activated receptor gamma agonist | 28191533 |
| UMLS C0289313 | Rosiglitazone | Peroxisome proliferator-activated receptor gamma agonist | 28191533 |
| UMLS C0536217 | Roscovitine | CDK inhibitor | 23032260 |
| UMLS C0025270 | Menadione | Cdc25A | 22155366 |
| UMLS C0717758 | Etanercept | TNF-alpha inhibitor | 18552856 |
| UMLS C0034283 | Pyrimethamine | Stat3 inhibitor | 21821671 |
| ChemSpider 221421 | S3I-201 | Stat3 inhibitor | 21821671 |
| UMLS C1718383 | Teriflunomide | Stat3 inhibitor | 22155366 |
| UMLS C1957685 | Genz-123346 | glucosylceramide synthase inhibitor | 20562878 |
| UMLS C0968934 | HET-0016 | 20-HETE synthesis inhibitor | 19129252 |
| UMLS C0916207 | TRAM-34 | KCa3.1 inhibitor | 18547995 |
| UMLS C0010467 | Curcumin | Multiple | 21345977 |
| UMLS C2935082 | EX-527 | SIRT1-specific inhibitor | 23778143 |
The prediction scores of our random forest classifier for the ADPKD drug repurposing candidates.
| Name | Prediction score (%) | No. of intermediate concepts to ADPKD | No. of concepts the drug is connected to in the whole EKP |
|---|---|---|---|
| Mozavaptan | 100.0 | 1 | 12 |
| Satavaptan | 93.0 | 2 | 17 |
| HET-0016 | 92.6 | 4 | 67 |
| Pasireotide | 90.6 | 2 | 28 |
| Bosutinib | 88.6 | 9 | 1088 |
| EX-527 | 73.8 | 6 | 75 |
| Pioglitazone | 68.2 | 53 | 1994 |
| Octreotide | 67.2 | 257 | 2133 |
| Roscovitine | 65.8 | 58 | 1027 |
| Pyrimethamine | 65.0 | 88 | 1062 |
| TRAM-34 | 65.0 | 25 | 109 |
| Etanercept | 64.2 | 163 | 521 |
| Triptolide | 60.2 | 135 | 4128 |
| Rosiglitazone | 59.4 | 316 | 666 |
| U0126 | 55.2 | 188 | 5891 |
| Menadione | 52.2 | 138 | 2822 |
| Curcumin | 46.0 | 359 | 2653 |
| Metformin | 42.4 | 343 | 3616 |
| Everolimus | 35.0 | 174 | 986 |
| Sirolimus | 27.0 | 376 | 5621 |
| Sorafenib | 19.0 | 205 | 1446 |
Figure 5Simplified graph of the network of intermediate concepts between ADPKD and the top 3 drugs mozavaptan, satavaptan and HET-0016 (n-hydroxy-n’-(4-butyl-2-methyl phenyl)formamidine). The thickness of the arrows indicates the amount of underlying evidence (database entry or publication from the literature). For the sake of clarity, some paths were removed when creating the figure. The complete set of paths can be found at the github repository mentioned in the Data availability section.