Literature DB >> 34297570

Knowledge Graph-Based Approaches to Drug Repurposing for COVID-19.

Jacob Al-Saleem¹, Roger Granet¹, Srinivasan Ramakrishnan¹, Natalie A Ciancetta¹, Catherine Saveson¹, Chris Gessner¹, Qiongqiong Zhou¹.

Abstract

The COVID-19 pandemic has motivated researchers all over the world in trying to find effective drugs and therapeutics for treating this disease. To save time, much effort has focused on repurposing drugs known for treating other diseases than COVID-19. To support these drug repurposing efforts, we built the CAS Biomedical Knowledge Graph and identified 1350 small molecules as potentially repurposable drugs that target host proteins and disease processes involved in COVID-19. A computer algorithm-driven drug-ranking method was developed to prioritize those identified small molecules. The top 50 molecules were analyzed according to their molecular functions and included 11 drugs in clinical trials for treating COVID-19 and new candidates that may be of interest for clinical investigation. The CAS Biomedical Knowledge Graph provides researchers an opportunity to accelerate innovation and streamline the investigative process not just for COVID-19 but also in many other diseases.

Entities: Chemical Disease Gene Species

Year: 2021 PMID： 34297570 PMCID： PMC8340579 DOI： 10.1021/acs.jcim.1c00642

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Introduction

To date, very few treatments have received FDA approval as therapeutics for COVID-19, while the need for such drugs remains high. To reduce development time and costs, much research has focused on repurposing small molecules that have either already been approved as drugs or have been clinically studied.[1] Because COVID-19 is characterized by the impact of multiple, interlinked physiological systems, including pulmonary hyperinflammation, severe lung injury, blood coagulopathy, renal and neurological problems, and the cellular pathways that underlie these systems,[2,3] it is proposed that a knowledge graph approach would be of value in identifying the connections between these systems as well as potential therapeutics.[4−6] Knowledge graphs are a type of database that allow users to organize and connect pieces of data based on the relationships that exist between them. Each unit of data can be thought of as a dot (or node) connected to other units by lines (or edges) that represent the relationships between the nodes. This type of database places as much importance on the relationships that connect data as on the data itself. Knowledge graphs can also combine data from multiple sources. These features allow insights that would not be possible using the individual data sources and traditional databases. One small, highly simplified example is: Alpelisib (unit of information, or node, and a small molecule) inhibits (relationship, or edge) tumor protein p53 (unit of information, or node) Tumor protein p53 upregulates transcription factor Fos Transcription factor Fos upregulates transcription factor STAT3 Transcription factor STAT3 is associated with vascular inflammation This simple knowledge graph is depicted visually in Figure . If a user were to query the knowledge graph to predict what drugs might inhibit vascular inflammation, the graph could provide the answer of alpelisib. This may not be obvious from traditional databases, which might show only direct inhibitors of transcription factor STAT3 or of vascular inflammation itself, but because a knowledge graph links multiple nodes via relationships, the second-level inhibitor alpelisib can also be found. This example illustrates how knowledge graphs can be used to manage, explore, and navigate through the interactions and connections between disparate pieces of information to gain insights and make predictions. As a result of their value, knowledge graphs have grown in importance in the last 10 years in both industry and academia.[7]

Figure 1

Visual depiction of simplified knowledge graph relating alpelisib to vascular inflammation.

Visual depiction of simplified knowledge graph relating alpelisib to vascular inflammation. It is important to note that knowledge graphs are both scalable and modular, so they can be used in many different areas of research or other activities. For example, a pharmaceutical researcher could use a knowledge graph to identify potential drug candidates or drug targets for diseases. In other fields, material scientists could use a knowledge graph to identify the best compounds for inclusion in designing a new type of material. A graph could also power the hunt for new light-absorbing compounds to use in creating more efficient solar cells. Combined with nutritional data, a knowledge graph could assist food scientists in identifying ingredients that could promote health or improve a recipe. These are just a few of the many possible uses that knowledge graphs could provide. In this CAS Biomedical Knowledge Graph, we incorporated human diseases, proteins, small-molecule inhibitors, virus, and COVID-19-specific data for identifying small molecules that show potential for repurposing as COVID-19 therapeutics. This CAS Biomedical Knowledge Graph features the human-curated substance data in the CAS Content Collection linked to biomedical data from both CAS and external databases. The information units, or nodes, in this graph are human proteins (denoted by their gene names), biological processes, diseases, and small molecules, including drugs and drug candidates. The links, or edges, between them are relationships such as drug X targets protein Y and protein Y is involved in biological process Z. A novel algorithmic method was also developed for ranking the most promising drug candidates. It prioritized those identified molecules that target unique proteins involved in COVID-19 disease pathways, while minimizing side effects. The most highly ranked substances are discussed in terms of their possible relevance to COVID-19.

Results

The CAS Biomedical Knowledge Graph was constructed using data from the CAS Content Collection and public repositories. In total, the graph contains over 6 million nodes and 18 million relationships. A simple visual schema is shown in Figure . Genes and gene products (i.e., proteins or microRNAs) are all referenced by their respective gene node in the graph. A detailed description of the knowledge graph and its construction can be found in the Materials and Methods section.

Figure 2

Simple schematic diagram of the CAS Biomedical Knowledge Graph.

Simple schematic diagram of the CAS Biomedical Knowledge Graph. A two-component approach was designed to identify COVID-19 drug repurposing candidates; a flowchart of this approach is shown in Figure A. The first step of both components was the collection of biological processes deemed important in the SARS-CoV-2 infection process and COVID-19. For the first component, CAS scientists identified a selection of CAS controlled vocabulary headings and associated synonyms to search the CAS Content Collection for SARS-CoV-2 infection-related documents and collected those containing potential drug targets. Intellectual analysis of the resulting documents along with author terminology was then used to gather a list of 20 biological processes deemed important in the SARS-CoV-2 infection process and COVID-19, as summarized in Zhou et al.[8] Some of the biological processes identified include viral entry, endocytosis, autophagy, cytokine storm, and blood coagulation (full list in Table S1). For the second component, genes that were significantly upregulated (>2-fold) by SARS-CoV-2 infection as described in ref (9) were extracted, and the biological processes associated with four or more of these genes were identified, of which there were 16 in total. These included, for example, inflammatory response, angiogenesis, and negative regulation of RNA transcription (full list in Table S2). The 36 processes collected in total from both components were then matched against Gene Ontology (GO), and the corresponding GO terms were used from this point on. Any SARS-CoV-2-specific processes were mapped to the corresponding general viral processes in GO to gather a larger set of potential targets. The graph was then queried for small molecules that modulate the genes associated with these biological processes. The resulting small molecules from both components were then combined, resulting in a set of 1350 small molecules. The number of compounds connected to each biological process/disease node is shown in Figure B. The graph queries used for both components are provided in the Supporting Information.

Figure 3

Identification of small molecules targeting biological processes involved in COVID-19. (A) Flowchart of two-component approach to identify potential COVID-19 therapeutics. (B) Diagram displaying the number of small molecules that target the biological process/disease nodes selected from the two-component approach. The larger the circle, the larger the number of small molecules that connect to that node. The total number of small molecules that connect to the node is also shown below the node description. Note that there is an extensive overlap of small molecules between the nodes. To rank the identified small molecules, the following equation was developed and used to score each of the 1350 molecules individuallyGene rarity (GR) is a measure of the number of small molecules that directly connect to a given gene, defined asBiological process rarity (BPR) is a measure of the number of small molecules that connect to a given biological process separated by one gene, defined asThe queries used to calculate GR and BPR are provided in the Supporting Information. Side effect proxy (SEP) is defined as the number of biological processes the small molecule is connected to in the graph. Cytokine storm (CS) is assigned the value of 1 when the small molecule connects to the cytokine storm node or 0 if it does not. Likewise, activated gene (AG) is given the value of 1 if the small molecule activates a gene or 0 if it does not. This scoring equation measures all of the interactions identified in our two-component approach (GR/BPRCAS represents the results of component one identified by CAS scientists, and GR/BPREXP represents the upregulated-expression results of component two). Importance is given to genes and biological processes that are not connected to large numbers of small molecules in the graph (GR + BPR). These genes and biological processes were postulated as being of higher interest because they are targeted by fewer small molecules. To normalize for the promiscuity of small molecules, a penalty is applied to all small molecules that scales with the number of biological processes it connects to in the graph (SEP). Due to the inherent importance of the cytokine storm module, a score boost was given to small molecules that connect to that node (CS). A score boost was also applied to small molecules that have an activating relationship with genes as this is a rare relationship (AG). The values of this equation can be fine-tuned based on the experimental objectives. In our final equation, the importance of the upregulated-expression results was lowered by dividing the score by 5; this number was derived empirically to increase the presence of clinical trial drugs in the top results. This adjustment placed more emphasis on the CAS scientist-defined biological process scores while still allowing the upregulated-expression scores to influence the final ranking. The 50 top-ranked potential drug repurposing candidates along with their drug class and clinical trial status are shown in Table . The individual score components for the top 50 small molecules are provided in Table S3, and the complete result set is provided in Table S4. The top 10 small molecules are visualized in the network diagram in Figure .

Table 1

Top 50 Drug Repurposing Candidates with CAS Registry Number, Drug Name, Drug Class, and Clinical Trial Statusa

Rank	CAS Registry Number	Drug Name	Drug Class	Clinical Trial
1	149647-78-9	vorinostat	HDAC inhibitors
2	179324-69-7	bortezomib	protease inhibitors
3	23214-92-8	doxorubicin	DNA metabolism-related
4	284461-73-0	sorafenib	kinase inhibitors
5	183321-74-6	erlotinib	kinase inhibitors
6	231277-92-2	lapatinib	kinase inhibitors
7	114977-28-5	docetaxel	microtubule-regulating agents
8	667463-62-9	MLS 2052	kinase inhibitors
9	404950-80-7	panobinostat	HDAC inhibitors
10	152459-95-5	imatinib	kinase inhibitors	yes
11	56-65-5	adenosine 5′ triphosphate	other
12	872511-34-7	BGJ 398	kinase inhibitors
13	2447-54-3	sanguinarine	other
14	1339928-25-4	fimepinostat	other
15	183506-66-3	apicidin	HDAC inhibitors
16	58880-19-6	trichostatin A	HDAC inhibitors
17	943319-70-8	ponatinib	kinase inhibitors
18	112953-11-4	7-hydroxystaurosporine	kinase inhibitors
19	1256448-47-1	nanatinostat	HDAC inhibitors
20	287383-59-9	scriptaid	HDAC inhibitors
21	1210608-43-7	PIM 447	kinase inhibitors
22	477600-75-2	tofacitinib	kinase inhibitors	yes
23	868540-17-4	carfilzomib	protease inhibitors
24	989-51-5	epigallocatechin gallate	DNA metabolism-related inhibitors	yes
25	23541-50-6	daunorubicin hydrochloride	DNA metabolism-related inhibitors
26	870262-90-1	letaxaban	coagulation factor Xa inhibitors
27	1195765-45-7	dabrafenib	kinase inhibitors
28	25316-40-9	doxorubicin hydrochloride	DNA metabolism-related inhibitors
29	491-80-5	biochanin	other
30	405169-16-6	dovitinib	kinase inhibitors
31	50-65-7	niclosamide	other	yes
32	957054-30-7	pictilisib	kinase inhibitors
33	1108743-60-7	entrectinib	kinase inhibitors
34	97-77-8	tetraethylthiuram disulfide	other	yes
35	75706-12-6	leflunomide	other	yes
36	726169-73-9	mocetinostat	HDAC inhibitors
37	637-03-6	phenylarsine oxide	other
38	1951-25-3	amiodarone	other	yes
39	630-60-4	ouabain	other
40	58-00-4	(−)-apomorphine	other
41	64-86-8	colchicine	microtubule-regulating agents	yes
42	90-34-6	primaquine	other	yes
43	936563-96-1	ibrutinib	kinase inhibitors	yes
44	31431-39-7	mebendazole	microtubule-regulating agents
45	361442-04-8	saxagliptin	protease inhibitors
46	1032900-25-6	ceritinib	kinase inhibitors
47	446-72-0	genistein	kinase inhibitors	yes
48	20830-81-3	daunorubicin	DNA metabolism-related
49	480449-70-5	edoxaban	coagulation factor Xa inhibitors
50	153436-53-4	tyrphostin AG 1478	kinase inhibitors

Drugs that were difficult to classify are listed as “other”. The numbers of drugs in each class in the top 50 are: 18 kinase inhibitors, 7 HDAC inhibitors, 5 DNA metabolism-related, 3 microtubule-regulating agents, 2 coagulation factor Xa inhibitors, and 12 in other classes.

Figure 4

Network diagram showing the connections of the top 10 scoring drugs from the results. Gene names in red represent genes that have a greater than 2-fold change in expression in response to SARS-CoV-2 infection. The size of the node corresponds to the number of connections to other nodes. Drugs that were difficult to classify are listed as “other”. The numbers of drugs in each class in the top 50 are: 18 kinase inhibitors, 7 HDAC inhibitors, 5 DNA metabolism-related, 3 microtubule-regulating agents, 2 coagulation factor Xa inhibitors, and 12 in other classes. Among the top 50 drug repurposing candidates, 11 have been or are in clinical trials for treating COVID-19, thus supporting the validity of our results.[10] An error analysis was performed by varying constants of the equation to determine their effect on the number of clinically investigated small molecules in the top 50. We found that the values chosen for our constants supplied the highest number of clinically investigated small molecules within the top 50. Interestingly, AG had no effect on the number of clinical trial drugs present in the top 50 results. However, the impact of AG is still apparent as 46 out of the top 50 feature this rare connection. The error analysis results are described in the Supporting Information. The largest class of drugs found in our results was kinase inhibitors, which accounted for 36% of the top 50 drug repurposing candidates in Table . The high prevalence of this drug class can be explained by the fact that kinases are involved in almost all biological processes, and their activities are dysregulated in many diseases. As such, kinase inhibitors are one of the most studied drug classes in pharmacology.[11] Indeed, it has been estimated that 20–33% of all drug discovery research involves protein kinases alone.[12] Furthermore, kinases have long been shown to be involved in the viral infection process, including in coronavirus infections.[13] For instance, receptor tyrosine kinases are involved in the cell entry of many different viruses.[14] Bekerman et al. have shown that kinase inhibitors impair intracellular viral trafficking and exert broad-spectrum antiviral effects.[15] Inhibitors of kinases PKC, IRAK4, p38,[16] and GSK-3[17] suppress SARS-CoV-2 replication. Given this, the large number of kinase inhibitors in our top 50 results is within expectations, and the enrichment of this class is likely due to their high prevalence in drug discovery research. In this study, the kinase inhibitors we identified include those affecting receptor tyrosine kinases (RTKs) such as the EGF, FGF, PDGF, and ALK receptors as well as nonreceptor tyrosine kinases such as Bruton tyrosine kinase (BTK). Also included were serine/threonine kinases such as B-RAF, PKC, PIM, and GSK-3beta and lipid kinases such as phosphatidylinositol 3-kinase (PI3K). Four of these, the tyrosine kinase inhibitors imatinib, tofacitinib, ibrutinib, and genistein, have been or are in clinical trials for COVID-19. Additionally, Treon et al. found that ibrutinib (BTK inhibitor) may offer protection against the severe form of COVID-19 and may mitigate lung injury due to SARS-CoV-2.[18] Another of the larger drug classes from our top 50 results was histone deacetylase inhibitors (HDIs). This makes sense in relation to COVID-19 in that (1) HDACs regulate gene expression by reducing histone deacetylation, and HDIs have been shown to reduce the expression of both angiotensin-converting enzyme 2 (ACE2), the main cell surface receptor for SARS-CoV-2, and the ABO glycosyltransferase, an enzyme-regulating blood type, a known COVID-19 risk factor;[19] (2) HDACs regulate several of the chemokines and cytokines involved in the immune response in COVID-19;[20] and (3) the SARS-CoV-2 proteinase MPro directly binds to HDAC2.[21] Additionally, Liu et al.[22] showed that HDAC inhibitors such as romidepsin can block SARS-CoV-2 entry in a pseudotyped SARS-CoV-2 virus model. Further investigation is warranted, however, because HDACs have also been shown to be required for the transcription of interferon-stimulated genes and antiviral responses.[23] Microtubules are filaments composed of tubulin subunits. They are constantly going through the process of assembly and disassembly at their ends, giving them a dynamic and unstable quality.[24] Many studies have shown that SARS-CoV-2 proteins interact with microtubules or microtubule-associated proteins. For example, NSP13 interacts with many proteins in the centrosome, where microtubule minus ends are organized. The microtubule-regulating agents, such as docetaxel, colchicine, and mebendazole, in Table , may therefore be of use in disrupting SARS-CoV-2 infection. In fact, colchicine (ranked 41 in Table ), a microtubule polymerization blocker, and VERU-111, an α- and β-tubulin inhibitor/cytoskeleton disruptor, are currently in clinical trials for the treatment of COVID-19 patients. Another drug class shown in our results are protease inhibitors, most of which are proteasome inhibitors. It has been previously shown that the ubiquitin-proteasome system (UPS) is involved in viral replication and the cytokine storm[25] including in coronavirus-associated diseases,[26] so it seems rational that proteasome inhibitors would be of value in treating COVID-19. Several such inhibitors are already being investigated as COVID-19 therapeutics, and several were found in our results (bortezomib, carfilzomib, and saxagliptin).[27] The category labeled “other” from Table includes drugs that are difficult to classify. While we will not discuss most of these in detail, two, (−)-apomorphine and ouabain, were of interest. The dopamine agonist and aporphine-type alkaloid (−)-apomorphine is linked in the knowledge graph to the well-studied COVID-19 drug target called the sigma-1 receptor (Sigma1R, gene SIGMAR1). Sigma1R is a ligand-regulated membrane chaperone usually localized to endoplasmic reticulum (ER)-mitochondrial membrane junctions. It regulates many processes including protein folding, ER and oxidative stress, autophagy, and ion transport. Viruses often use cell stress pathways to aid replication, and accordingly, Sigma1R ligands have been studied as general antiviral agents for many years and, more recently, as anti-SARS-CoV-2 agents.[28] Further, the SARS-CoV-2 NSP6 protein directly binds to the sigma-1 receptor[21] and a sigma-1 receptor ligand, fluvoxamine, has been shown to reduce the chances of deterioration in patients with symptomatic COVID-19.[29] This suggests that (−)-apomorphine may be a worthwhile drug repurposing candidate for COVID-19. Another drug of interest in the other category, ouabain, is a Na+/K+-ATPase inhibitor. Na+/K+-ATPase is a membrane transporter that exports cellular sodium in exchange for importing potassium. It regulates cell–ion concentrations, cell volume, membrane potential, and reactive oxygen species. While the mechanisms are not fully understood, many viruses, including coronaviruses, are known to be inhibited by Na+/K+-ATPase-inhibiting cardiac glycosides. Indeed, coronavirus cell entry is inhibited when the Na,K-ATPase alpha1 subunit is silenced or inhibited.[30] Further, a peptide (NaKtide) derived from the alpha1 subunit of Na+/K+-ATPase reduces the inflammatory cytokines present in chronic obesity and therefore may be of value in treating the cytokine storm often seen in severe COVID-19.[31] Indeed, others have recently provided supporting in vitro[32] and in vivo[33] evidence for the effectiveness of Na+/K+-ATPase inhibitors in inhibiting SARS-CoV-2.

Discussion

In this paper, we describe the construction of the CAS Biomedical Knowledge Graph and its application, along with a novel results-ranking method, in predicting potential drug repurposing candidates for COVID-19. The graph contains 6 million nodes and 18 million relationships and is built on data from the CAS Content Collection and external databases. Overall, we identified and ranked 1350 small-molecule repurposing candidates and analyzed the top 50 in greater detail. The validity of the knowledge graph and ranking method is supported by the fact that 11 of the top 50 results have been or are currently in clinical trials for treating COVID-19 and that many of the drug classes for these small molecules are well known to play important roles in viral infections. While we focused on COVID-19 in this study, the CAS Biomedical Knowledge Graph described here can also be used to analyze other diseases such as Alzheimer’s, Parkinson’s, cancer, and even rare, or orphan, diseases. Beyond the life sciences,knowledge graphs building on our vast collection of scientific information can be appliedin many areas of science, including other areas of chemistry, materials science, food science, energy technology, and environmental research. The advantages of knowledge graphs have become more widely known in the last 10 years. Most importantly, and beyond their use just as an information management system, knowledge graphs allow users to grasp information in a visual and intuitive way. Users can easily zoom in on specific modules, or subsets, of a large data set and then zoom back out to see how that subset fits in with the whole of the data. They can visually navigate through the pathways connecting data to see how modules, including nonadjacent ones, affect each other. This allows users a different perspective on a research problem. Another important advantage of knowledge graphs is that because they are both modular and flexible, data sources can be substituted in and out, such as was done here by adding COVID-19 clinical trial data to the CAS graph. An example in pharmaceutical research of the nonadjacency benefit mentioned above is that by linking disease-associated pathways, the proteins in those pathways, and the small molecules that regulate them, a knowledge graph can enable a researcher who has identified a novel interaction between two proteins to quickly identify which pathways, biological processes, or diseases this interaction could alter. Use of the graph in this manner has the potential to greatly increase the speed of basic biomedical research. This same kind of approach allows life sciences researchers to identify a wider variety of drug targets “upstream” or “downstream” of those already known. Many of these targets may have been previously overlooked or were not considered to participate in other disease processes. The present results illustrate this nicely. If the goal is to reduce the blood coagulation associated with COVID-19, traditional methods may suggest only blood coagulation factors and the blood vessel wall proteins they directly interact with as targets. But our results also suggest histone deacetylases (HDACs) as possible targets because two HDACs have been linked to blood coagulation within the graph. By the same token, using a knowledge graph allows the prediction of potential upstream and downstream drug candidates. The widely known HDI vorinostat affects three proteins/genes involved in blood coagulation and can therefore be considered a potential repurposing candidate for treating COVID-19-related coagulopathy. That an HDI may reduce blood clotting is supported by the evidence that another HDI, valproic acid, upregulates expression of tissue-type plasminogen activator and reduces thrombus size after vascular injury.[34] In drug discovery research, therefore, the wider, more comprehensive view provided by knowledge graphs can lead to cost- and time-savings in initial drug screening. Of course, all identified drug candidates would still have to be validated by experimental and clinical testing. In addition to their strengths, knowledge graphs have some of the same limitations common to all data management systems. For instance, by linking modules in a complex network, they may give the false visual impression that they contain a complete picture of what is known about a subject. However, knowledge is always incomplete, so like all databases, they must be maintained and periodically updated. Furthermore, the power of a graph depends on the quality and comprehensiveness of the data sources used to build it. The equation developed for ranking small molecules in this study focuses on identifying relevant and significant small molecule-to-protein relationships. We hypothesized that small molecules with extensive connections to target proteins would be more likely to have significant side effects. An inherent drawback to this approach is that small molecules that could be effective in COVID-19 therapies may be down-ranked if they are highly connected. Our two-component query combined human-designed and data-driven approaches, which we hypothesize may allow our results to capture potentially unknown molecular mechanisms of COVID-19. The flexibility of our equation in combining this two-component approach allows us to independently adjust the importance of each component. For our final ranking, the CAS-designed component was given higher importance as we felt the chosen biological processes covered a more specific and important spectrum of COVID-19 pathology. This scoring could be altered as other applications require. We also explored the use of machine learning with our results. We employed a decision tree to determine which of the variables identified in our equation was most important for the prediction of a small molecule’s use in a COVID-19 clinical trial. We found that the side effect proxy was of highest importance followed by gene rarity and biological process rarity. The machine learning results also demonstrated the effectiveness of our equation in ranking the small molecules. The use of machine learning to improve the returned results could be of great value in a different application. While the CAS Biomedical Knowledge Graph contains a massive wealth of entities and relationships, there is also potential for improvement. One improvement that could be made is the addition of a new layer of information within the protein–protein interaction relationships. If these relationships included the effect one protein had on another, such as phosphorylation, activation, and/or physical inhibition, the graph would allow for more powerful queries. Searching for an inhibitor of an upstream activator of a protein involved in a disease is one such line of questioning this data would allow. Another addition planned for the graph is the expression level of genes in different tissues and cellular compartments. The accuracy of our queries for a given disease would increase if we could limit our search to genes expressed only in the tissue of interest. An example of this for COVID-19 would be restraining our results based on gene expression in the lungs. These are just two of many possible improvements, which will require additional data analytics work to integrate into the CAS Biomedical Knowledge Graph. We plan to leverage the data and expertise of CAS to add these capabilities to the CAS Biomedical Knowledge Graph in the future. In conclusion, we have leveraged a century’s worth of CAS scientific information curation expertise to create the CAS Biomedical Knowledge Graph, which combines an extensive list of small molecules, present in the CAS Content Collection, with external databases of human genes, molecular processes, pathways, and diseases. We used the CAS Biomedical Knowledge Graph to identify 1350 small molecules with potential to be repurposed as COVID-19 therapeutics. Because knowledge graphs are both scalable and modular, this application to COVID-19 is only one example of the vast array of possible uses for CAS-powered knowledge graphs.

Materials and Methods

CAS Biomedical Knowledge Graph

The CAS Biomedical Knowledge Graph was constructed in Neo4j (DBMS Version 4.1.0). In total, the graph contains over 6 million nodes and 18 million relationships. During data ingestion, all references to proteins and genes were normalized to their NCBI/HUGO gene abbreviation and all small-molecule references were normalized to their CAS Registry Number.

Small Molecules and Bioactivity Data

Over 6 million small molecules were added to the graph from the CAS Content Collection. All small molecules were cross-referenced with data from PubChem and ChEMBL. Data scientists at CAS then connected small-molecule nodes to gene nodes where experimental assays have been performed linking the two, with details from these assays stored in the small molecule-to-gene relationships. These details include information about the activity being measured, raw values from the assay, and the source of the experimental data. Over 10 million relationships between small molecules and genes are present in the graph. Side effects associated with the small molecules were obtained from SIDER (version 4.1) and ingested into the graph.

Human Genes, Viral Genes, and Viruses

Gene nodes serve as a representation of a gene’s DNA, RNA, and protein forms. Over 26 000 human and viral genes were obtained from the UniProt database and stored in the graph using their NCBI/HUGO gene abbreviations. Protein–protein interactions between human proteins were obtained from STRING-DB (version 11.0) resulting in over 5 million protein–protein interactions (PPIs) with STRING-DB confidence values stored in each relationship. Over 1000 virus nodes were added from UniProt, which were linked to the viral genes they express.

Diseases, Pathways, Molecular Functions, and Biological Processes

Over 24 000 human disease designations, obtained from NCBI’s MedGen, were added to the graph. A hierarchy of disease inheritance was established using MedGen’s parent–child relationships of diseases (over 14 000 links). Disease–gene associations were also obtained from MedGen, resulting in over 5 million connections. Pathways and pathway–gene associations were obtained from NCBI, resulting in over 8000 pathways and over 121 000 links between genes and pathways. Molecular functions (over 4000) and biological processes (over 12 000) were obtained from the Gene Ontology knowledgebase along with their gene associations (over 59 000 and over 138 000, respectively).

SARS-CoV-2-Specific Data

We identified several data sources that were used to establish connections between SARS-CoV-2 and the CAS Biomedical Knowledge Graph. One such data source measured human gene expression-level changes in response to SARS-CoV-2 infection.[9] This data was added as a relationship between the SARS-CoV-2 virus and the human genes where each relationship contains the expression fold change and p values (over 18 000 links). We also added SARS-CoV-2- and COVID-19-related clinical trial information obtained from clinicaltrials.gov. Relationships in the graph were generated between the clinical trial, the diseases/viruses being investigated, and the small molecules used in the trial. In addition, biological processes related to COVID-19 that were identified in our previous work[8] were included.

Graph Queries and Image Preparation

Graph queries were performed in Neo4j using the Cypher query language. Small-molecule results were filtered to ensure they fit the following three criteria: (1) the small molecule has been identified by CAS scientists as having pharmacological activity; (2) the assay that generated the small molecule-to-gene relationship measured IC50, EC50, Kd, or potency values; and (3) the raw value results from the assay were 10 μM or lower. Query code and returned results can be found in the Supporting Information. All network graph images were generated using Cytoscape (version 3.8.2).

27 in total

1. Valproic acid selectively increases vascular endothelial tissue-type plasminogen activator production and reduces thrombus formation in the mouse.

Authors: P Larsson; I Alwis; B Niego; M Sashindranath; P Fogelstrand; M C L Wu; L Glise; M Magnusson; M Daglas; N Bergh; S P Jackson; R L Medcalf; S Jern
Journal: J Thromb Haemost Date: 2016-11-23 Impact factor: 5.824

2. Fluvoxamine vs Placebo and Clinical Deterioration in Outpatients With Symptomatic COVID-19: A Randomized Clinical Trial.

Authors: Eric J Lenze; Caline Mattar; Charles F Zorumski; Angela Stevens; Julie Schweiger; Ginger E Nicol; J Philip Miller; Lei Yang; Michael Yingling; Michael S Avidan; Angela M Reiersen
Journal: JAMA Date: 2020-12-08 Impact factor: 56.272

Review 3. Proteasome Inhibitors as a Possible Therapy for SARS-CoV-2.

Authors: Lucia Longhitano; Daniele Tibullo; Cesarina Giallongo; Giacomo Lazzarino; Nicola Tartaglia; Sara Galimberti; Giovanni Li Volti; Giuseppe Alberto Palumbo; Arcangelo Liso
Journal: Int J Mol Sci Date: 2020-05-20 Impact factor: 5.923

Review 4. Regulation of Chemokines and Cytokines by Histone Deacetylases and an Update on Histone Decetylase Inhibitors in Human Diseases.

Authors: Himavanth Reddy Gatla; Nethaji Muniraj; Prashanth Thevkar; Siddhartha Yavvari; Sahithi Sukhavasi; Monish Ram Makena
Journal: Int J Mol Sci Date: 2019-03-05 Impact factor: 5.923

Review 5. Cell Clearing Systems as Targets of Polyphenols in Viral Infections: Potential Implications for COVID-19 Pathogenesis.

Authors: Fiona Limanaqi; Carla Letizia Busceti; Francesca Biagioni; Gloria Lazzeri; Maurizio Forte; Sonia Schiavon; Sebastiano Sciarretta; Giacomo Frati; Francesco Fornai
Journal: Antioxidants (Basel) Date: 2020-11-10

Review 6. Repurposing Sigma-1 Receptor Ligands for COVID-19 Therapy?

Authors: José Miguel Vela
Journal: Front Pharmacol Date: 2020-11-09 Impact factor: 5.810

7. Drug repurposing for COVID-19 via knowledge graph completion.

Authors: Rui Zhang; Dimitar Hristovski; Dalton Schutte; Andrej Kastrin; Marcelo Fiszman; Halil Kilicoglu
Journal: J Biomed Inform Date: 2021-02-08 Impact factor: 8.000

8. Antiviral activity of oleandrin and a defined extract of Nerium oleander against SARS-CoV-2.

Authors: Kenneth S Plante; Varun Dwivedi; Jessica A Plante; Diana Fernandez; Divya Mirchandani; Nathen Bopp; Patricia V Aguilar; Jun-Gyu Park; Paula Pino Tamayo; Jennifer Delgado; Vinay Shivanna; Jordi B Torrelles; Luis Martinez-Sobrido; Rick Matos; Scott C Weaver; K Jagannadha Sastry; Robert A Newman
Journal: Biomed Pharmacother Date: 2021-03-03 Impact factor: 6.529

9. Antiviral activity of digoxin and ouabain against SARS-CoV-2 infection and its implication for COVID-19.

Authors: Junhyung Cho; Young Jae Lee; Je Hyoung Kim; Sang Il Kim; Sung Soon Kim; Byeong-Sun Choi; Jang-Hoon Choi
Journal: Sci Rep Date: 2020-10-01 Impact factor: 4.379

Review 10. Drug repurposing for COVID-19: Approaches, challenges and promising candidates.

Authors: Yan Ling Ng; Cyrill Kafi Salim; Justin Jang Hann Chu
Journal: Pharmacol Ther Date: 2021-06-23 Impact factor: 12.310

6 in total

1. Multimodal reasoning based on knowledge graph embedding for specific diseases.

Authors: Chaoyu Zhu; Zhihao Yang; Xiaoqiong Xia; Nan Li; Fan Zhong; Lei Liu
Journal: Bioinformatics Date: 2022-02-12 Impact factor: 6.937

Review 2. Network for network concept offers new insights into host- SARS-CoV-2 protein interactions and potential novel targets for developing antiviral drugs.

Authors: Neda Eskandarzade; Abozar Ghorbani; Samira Samarfard; Jose Diaz; Pietro H Guzzi; Niloofar Fariborzi; Ahmad Tahmasebi; Keramatollah Izadpanah
Journal: Comput Biol Med Date: 2022-04-30 Impact factor: 6.698

3. HerbKG: Constructing a Herbal-Molecular Medicine Knowledge Graph Using a Two-Stage Framework Based on Deep Transfer Learning.

Authors: Xian Zhu; Yueming Gu; Zhifeng Xiao
Journal: Front Genet Date: 2022-04-27 Impact factor: 4.772

4. Best practices for repurposing studies.

Authors: Richard A Lewis
Journal: J Comput Aided Mol Des Date: 2021-11-12 Impact factor: 3.686

Review 5. The potential of a data centred approach & knowledge graph data representation in chemical safety and drug design.

Authors: Alisa Pavel; Laura A Saarimäki; Lena Möbus; Antonio Federico; Angela Serra; Dario Greco
Journal: Comput Struct Biotechnol J Date: 2022-09-05 Impact factor: 6.155

6. Mining on Alzheimer's diseases related knowledge graph to identity potential AD-related semantic triples for drug repurposing.

Authors: Yi Nian; Xinyue Hu; Rui Zhang; Jingna Feng; Jingcheng Du; Fang Li; Larry Bu; Yuji Zhang; Yong Chen; Cui Tao
Journal: BMC Bioinformatics Date: 2022-09-30 Impact factor: 3.307

6 in total