| Literature DB >> 33571675 |
Rui Zhang1, Dimitar Hristovski2, Dalton Schutte3, Andrej Kastrin2, Marcelo Fiszman4, Halil Kilicoglu5.
Abstract
OBJECTIVE: To discover candidate drugs to repurpose for COVID-19 using literature-derived knowledge and knowledge graph completion methods.Entities:
Keywords: COVID-19; Drug repurposing; Knowledge graph completion; Literature-based discovery; Text mining
Mesh:
Substances:
Year: 2021 PMID: 33571675 PMCID: PMC7869625 DOI: 10.1016/j.jbi.2021.103696
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 8.000
Fig. 1Diagram illustrating the workflow of our approach.
Fig. 2TransE models relations as translations on a low-dimensional embedding of the entities. If is true, the embedding of the tail entity t (i.e., COVID-19) should be close to the embedding of the head entity h (i.e., Metoclopramide) plus the vector that depends on the relationship r (i.e., TREATS).
Fig. 3Diagram for the high-level architecture of STELP.
Results of SemMedDB semantic relation classification using biomedical BERT variants.
| Vanilla BERT | BioBERT | BioClinical | PubMed | BlueBERT | ||
|---|---|---|---|---|---|---|
| Uncased | Cased | Cased | Cased | Uncased | Uncased | |
| Rec | 0.815 | 0.767 | 0.861 | 0.822 | 0.822 | |
| Pre | 0.695 | 0.723 | 0.685 | 0.693 | 0.700 | |
| F1 | 0.743 | 0.744 | 0.748 | 0.781 | 0.756 | |
| Rec | 0.815 | 0.782 | 0.842 | 0.832 | 0.845 | |
| Pre | 0.795 | 0.815 | 0.804 | 0.816 | 0.782 | |
| F1 | 0.805 | 0.798 | 0.840 | 0.818 | 0.812 | |
Note: Rec = recall, Pre = precision. Results highlighted in bold are the best for each method.
Trained on PubMed 1 M
Trained on Abstracts + Full text
Trained on PubMed + MIMIC
Distribution of semantic predications after filtering.
| Predicate | Count (%) | Predicate | Count (%) |
|---|---|---|---|
| 518,267 (27.2%) | 38,602 (2.0%) | ||
| 420,633 (22.1%) | 37,887 (2.0%) | ||
| 224,809 (11.8%) | 25,103 (1.3%) | ||
| 205,441 (10.8%) | 24,734 (1.3%) | ||
| 192,092 (10.1%) | 18,613 (1.0%) | ||
| 106,418 (5.6%) | 1,479 (0.1%) | ||
| 52,518 (2.8%) | 1,156 (0.1%) | ||
| 39,960 (2.1%) |
| MR | MRR | Hits@1 | Hits@3 | Hits@10 | |
|---|---|---|---|---|---|
| TransE | |||||
| DistMult | 11.639 | 0.325 | 0.216 | 0.340 | 0.515 |
| ComplEx | 11.045 | 0.332 | 0.216 | 0.352 | 0.553 |
| RotatE | 10.864 | 0.377 | 0.246 | 0.428 | 0.633 |
| STELP | 22.960 | 0.073 | 0.000 | 0.027 | 0.234 |
Note: MR = mean rank, MRR = mean reciprocal rank. Results highlighted in bold are the best for each method.
Fig. 4Visualization of biomedical concepts learned by t-SNE (t-distributed stochastic neighbor embedding) algorithm and embedded in a two-dimensional space. We highlighted five drugs identified as potential new drugs to treat COVID-19. Color refers to semantic type of a particular concept; note that only the eight most frequent semantic types are presented. aapp: Amino Acid, Peptide, or Protein; dsyn: Disease or Syndrome; fndg: Finding, gngm: Gene or Genome; neop: Neoplastic Process; orch: Organic Chemical; phsu: Pharmacologic Substance; topp: Therapeutic or Preventive Procedure. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Thirty-three candidate drugs highly ranked by TransE and deemed plausible in manual analysis.
| Metoclopramide | Trilostane |
| Oxymatrine | Cyproterone Acetate |
| Mitogen-Activated- | Nucleoside Reverse- |
| Oxophenylarsine | Methyltrienolone |
| 5-Alpha reductase inhibitor | Bosentan |
| Folic acid | Estramustine |
| Anthelmintics | Allicin |
| Sildenafil | Proteasome inhibitors |
| Furosemide | Antiplatelet Agents |
| Beclomethasone | Fibrinolytic Agents |
| Cangrelor | Contraceptive Agents |
| Gymnemic acid | Neuraminidase inhibitor |
| Estradiol | Vitamin D Analogue |
| mTOR Inhibitor | Tyrosine kinase inhibitor |
| Clobetasol propionate | Mometasone furoate |
| Carbenoxolone | Vasopressin Antagonist |
| Anti-Retroviral Agents |
Comparison of drug overlap between methods and studies.
| Methods | Common Drugs |
|---|---|
| Zeng et al. | Estradiol |
| Zeng et al. | Dexamethasone |
| Zeng et al. | Hydrocortisone |
| Zeng et al. | Zidovudine |
| TransE | 5-alpha Reductase Inhibitors |
| RotatE | Pibrentasvir |
| RotatE | Paclitaxel |
Note: Drugs are from the top 50 ranked drugs from RotatE, STELP, the 33 drugs from TransE identified by MF as plausible, and the drugs specified in Zeng et al. [16], Zhou et al. [14], and Singh et al. [94]. We also use top 50 drugs identified using the discovery pattern in open discovery mode.
Statistics for absolute differences of TransE and STELP rankings.
| Median | Mean | Standard Deviation | |
|---|---|---|---|
| Top 1000 TransE Rankings | 10789.0 | 10567.140 | 6128.881 |
| Top 1000 STELP Rankings | 10224.0 | 10420.0 | 6002.522 |
| All Rankings | 6342.0 | 7207.910 | 5070.927 |
Note: The values for the first two rows are calculated by taking the top 1000 ranked triples for the specified model, calculating the absolute difference between the rankings from the two models for each of those triples, and calculating the statistics. For example, the triples that TransE ranked as the top 1000 triples were gathered, the absolute differences of rankings between TransE and STELP for those 1000 triples were calculated, and the statistics were calculated from those differences.
Summary of absolute differences for TransE and STELP rankings. Semantic types are aapp: Amino Acid, Peptide, or Protein; gngm: Gene or Genome; orch: Organic Chemical; sosy: Sign or Symptom; topp: Therapeutic or Preventive Procedure.
| Max Absolute Difference | Count (%) | Top 3 Most Common Semantic Types |
|---|---|---|
| 0 | 1 (0.005%) | aapp |
| 1 | 1 (0.005%) | aapp |
| 3 | 5 (0.023%) | orch, topp, aapp |
| 10 | 15 (0.070%) | gngm, aapp, orch |
| 100 | 189 (0.877%) | gngm, aapp, orch |
| 500 | 973 (4.516%) | gngm, aapp, orch |
| 1000 | 1937 (8.990%) | gngm, aapp, orch |
Note: Count column represents the number of triples where the two models rankings differed by at most the corresponding value in the Max Absolute Difference column. For example, there were 4 triples where both models rankings for those triples differed by at most 3.
Fig. 5Drug repurposing for COVID-19 with the open discovery pattern DrugA-inhibitsinteracts_with-ConceptBandConceptB-affectsassociated_withcausespredisposes-COVID-19. The directionality is from the periphery (the predicted drugs) through the intermediate concepts to COVID-19 in the center.