Literature DB >> 33817623

Network bioinformatics analysis provides insight into drug repurposing for COVID-19.

Xu Li¹, Jinchao Yu², Zhiming Zhang¹, Jing Ren¹, Alex E Peluffo², Wen Zhang¹, Yujie Zhao¹, Jiawei Wu¹, Kaijing Yan¹, Daniel Cohen², Wenjia Wang¹.

Abstract

The COVID-19 disease caused by the SARS-CoV-2 virus is a health crisis worldwide. While developing novel drugs and vaccines is long, repurposing existing drugs against COVID-19 can yield treatments with known preclinical, pharmacokinetic, pharmacodynamic, and toxicity profiles, which can rapidly enter clinical trials. In this study, we present a novel network-based drug repurposing platform to identify candidates for the treatment of COVID-19. At the time of the initial outbreak, knowledge about SARS-CoV-2 was lacking, but based on its similarity with other viruses, we sought to identify repurposing candidates to be tested rapidly at the clinical or preclinical levels. We first analyzed the genome sequence of SARS-CoV-2 and confirmed SARS as the closest virus by genome similarity, followed by MERS and other human coronaviruses. Using text mining and database searches, we obtained 34 COVID-19-related genes to seed the construction of a molecular network where our module detection and drug prioritization algorithms identified 24 disease-related human pathways, five modules, and 78 drugs to repurpose. Based on clinical knowledge, we re-prioritized 30 potentially repurposable drugs against COVID-19 (including pseudoephedrine, andrographolide, chloroquine, abacavir, and thalidomide). Our work shows how in silico repurposing analyses can yield testable candidates to accelerate the response to novel disease outbreaks.

Entities: Chemical

Keywords: COVID-19; Drug repurposing; Network bioinformatics; Network pharmacology; SARS-CoV-2

Year: 2021 PMID： 33817623 PMCID： PMC8008783 DOI： 10.1016/j.medidd.2021.100090

Source DB: PubMed Journal: Med Drug Discov ISSN： 2590-0986

Introduction

The COVID-19 disease outbreak caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), formerly named “2019 novel coronavirus” (2019-nCoV), has already infected more than 108 million people and caused 2 million deaths in the world, in February 2021 [1]. While several vaccines have become available, the combat against the COVID-19 pandemic is still highly challenging because of the virus's emerging mutant strains, the difficulties of manufacturing and distributing vaccines, and more [2]. Another approach is small molecule drug research, especially drug repurposing approach [3], remains an important solution to find rapid therapies. At the time of the outbreak, being the time of this study, drug repurposing approach was one of the best strategies to explore efficient therapies against COVID-19 rapidly. Drug repurposing can yield new therapies at a faster rate than novel drug discovery when the safety profiles of the drugs being repurposed have been evaluated in the context of drug development for another disease, and at an even faster rate when the drugs have been approved for other diseases and postmarketing safety surveillance data are available [4], [5]. By relying on already known preclinical, pharmacokinetic, pharmacodynamic, and toxicity profiles of the drugs being repurposed, one can dramatically increase the rapidity of the response against a disease with unmet clinical needs, especially for an epidemic disease, where drug proven safe can be immediately tested. At the begging of the pandemic, in February 2020, more than 10 repurposed drugs were under clinical trials evaluation for COVID-19. Among them, Remdesivir (Gilead Sciences, in Phase 3, clinical trial No. NCT04257656), originally developed to treat the Ebola which showed inhibition of replicases in a broad range of viruses including coronaviruses, and chloroquine (in Phase 4 ChiCTR2000029975), originally approved as an antimalarial and autoimmune disease drug, which, unlike Remdesivir, doesn’t target viral proteins but works as human endosomal acidification fusion inhibitor, which may help to stop the virus’ infection lifecycle [6]. In silico methods offer a way to methodically and rapidly yield additional repurposing candidates [7]. For instance, when drug targets associated with the disease of interest are known, and when their protein structures or that of close homologs are available, it is possible to use structural bioinformatics to virtually screen (e.g., using molecular docking) a library of existing drugs against these known targets [8]. A study published on February 27, 2020, relied on this approach, using the predicted structure of all SARS-CoV-2 proteins based on their homology with other known coronavirus protein structures, and identified several compounds with potential antiviral activity [9]. Another approach to repurposing is the construction of so-called “disease-related molecular networks,” i.e., interactions between gene products (sometimes together with cellular metabolites) involved in the etiology and symptoms of that disease [10]. There exist several ways to identify disease-related genes, whether using genomic data (e.g., Genome-Wide Association Studies), gene expression data (e.g., RNAseq differential expression analysis) or data directly collected from the scientific literature (e.g., text mining or expert curation, either analyzed in-house or via recognized structured databases). Compared to virtual screening, where the candidate targets are known from the start, network biology methods can identify additional, unanticipated targets, which are part of the same molecular pathways than previously known targets for the disease of interest [7], [11]. In this study, we performed network bioinformatics analyses to repurpose existing drugs, which are at the completed Phase 2 stage or later, against the now pandemic COVID-19. At the time of the outbreak, our goal was to yield a list of experimentally testable repurposing drug candidates, despite the fact that little was known about SARS-CoV-2, by supplementing that little knowledge with extensive data on closely related viruses and machine-learning analysis of those data. Therefore, because in late January 2020, limited knowledge about COVID-19 was available, we focused our work on similar pathogens as indirect cues to identify COVID-19 related genes and build a molecular network that could serve the identification of repurposable drug targets. We first relied on genome sequence alignment of SARS-CoV-2 to identify SARS-CoV (Severe Acute Respiratory Syndrome Coronavirus) as the most similar virus, followed by MERS-CoV (Middle East Respiratory Syndrome Coronavirus) and other related human coronaviruses. We then applied our AutoSeed program, which performed text mining against all NCBI PubMed abstracts (referenced before January 2020) and systematic database research, which led to 34 COVID-19-related genes, including ACE2. To study these disease genes and their role at the systems level, we used an iterative network-building algorithm “AutoNet” that expands, prunes and merges subnetworks, leading to a human COVID-19 disease network composed of 1344 genes. In total, 24 enriched pathways were identified in five topological network modules (i.e., community structure, a region where nodes are more densely connected, more likely to be related to the same function or disease [12]). We scanned this network for known drug-target interactions and applied proximity-based topology analysis [13] to obtain a list of 78 drugs repurposable against COVID-19. Finally, we manually filtered this list based on the criteria of the drugs’ mechanisms of action, their adverse effects, and clinical approvals to yield a total of 30 drugs. In this study, we also discuss the repurposing and mechanisms of thalidomide in particular, since, after sharing our findings with multiple institutions and hospitals in China, one care unit reported the remission of a patient treated with this drug together with low-dose glucocorticoids. In addition, two clinical trials of thalidomide were registered.

Results

Genome sequence analysis suggests SARS as the most similar disease

After performing a BLASTn search using the SARS-CoV-2 (a.k.a. 2019-nCoV at the time of the analysis) genome sequence against the NCBI GenBank database (see Methods), representative sequences from top results, all being coronaviruses either in humans or other animals, were selected to build a phylogenetic tree using the neighbor-joining method (Fig. 1 ). We found SARS-CoV to be the evolutionarily closest sequence to SARS-CoV-2, with an 80% sequence identity. Among all other human coronaviruses, MERS-CoV is evolutionarily closest to SARS-CoV-2, with a 50% sequence identity. Importantly, we performed this analysis in January 2020, when the virus was less known and studied. Since then, multiple additional sequencing studies have been performed for SARS-CoV-2, including a landmark preprint, which suggested renaming 2019-nCoV to SARS-CoV-2 based on results similar to ours [14].

Fig. 1

Sequence analysis suggests SARS-CoV as the most similar virus to the SARS-CoV-2. Based on the results of BLASTn for SARS-Cov-2 against NCBI GenBank, nineteen genome sequences were selected as representative and were aligned using EMBI-EBI’s MSA tool, and a neighbour-joining phylogenetic tree was built by the MEGA-X tool. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown above the branches. The scale represents 0.10 residue substitutions per site.

Text mining and database searches yield a list of 34 seed genes

In this step, we aimed to identify a list of human genes that are involved in the COVID-19 disease (Fig. 2 A) and built a literature searching-engine-based web tool which is freely accessed in http://literature.tasly.com/covid19. Considering SARS-CoV as the closest virus to SARS-CoV-2, we used SARS as the first keyword for text mining against the database of NCBI PubMed. We searched for all human genes co-occurring with the keyword “SARS-COV-2” (abbreviations, full names, or synonyms) within any sentence (a.k.a “sentence co-occurrence” in NLP methods). We then ranked all genes based on their SARS co-occurrences count.

Fig. 2

The workflow of our network bioinformatics pipeline for SARS-CoV-2 drug repurposing. The equation shown in (D) represents how the distance was calculated to prioritize drugs based on proximity, see details in Methods. To enrich our text mining results, we added four other terms: “MERS”, “coronavirus”, “viral pneumonia”, and “HIV” (Human Immunodeficiency Viruses). We chose MERS because of its close similarity to SARS-CoV-2 (Fig. 1) and the fact that it has been studied for long. “Coronavirus” and “viral pneumonia” were selected because they are highly related to the nature and symptoms of SARS and COVID-19, to the point that China and other regions of Asia, the synonyms of SARS and COVID-19 often contains the words “viral pneumonia”. Although HIV does not belong to coronaviruses, “HIV” was used as a keyword because it was previously reported that HIV and SARS share similar viral protein structures [15] and that HIV drugs can be effective against SARS [16]. In addition, there exists extensive research and publication record on HIV, which can enrich our text mining analysis. For these four additional terms, the same co-occurrence analysis was performed, except that only the top 10% of each resulting list was retained. Therefore, the final text-mining-based list was made from the full SARS-related gene output list combined to these four top-10%-retained-gene lists (See Data availability for sources of extracted texts and papers). In addition to our in-house text mining analysis, to enrich our search for SARS-related genes, we also searched for the five keywords aforementioned in reference databases including DisGeNET, DrugBank, KEGG, MalaCards, eDGAR, and GWAS-Catalog, because these databases integrate text-mining results with expert-curated information, from different aspects, including pathways, genetic factors, and animal models (see Methods). A final list of seed genes was built by overlapping text-mining and database results (see Methods). This list contains 34 genes (shown as a network in Fig. 3 , also see Data availability). Among them, 23 genes are directly linked to SARS. Two genes, CRP and TNF, connect to all keywords. Seven genes STAT1, CCL5, ACE2, IRF3, CXCL10, CTSL and TMPRSS2 are linked to four keywords (including SARS).

Fig. 3

Thirty-four genes related to SARS-CoV-2 identified by text mining and database searches. Each link represents at least one sentence co-occurrence in PubMed abstracts or at least one relationship recorded in one of our searched databases.

Network bioinformatics approach helps to predict 30 repurposable drugs

In order to contextualize and better understand, at a systems level, the molecular and physiological role of the COVID-19-related genes we found, we applied an in-house developed algorithm to build a molecular (i.e., protein) network taking these 34 genes as seeds. This algorithm repeats subnetwork expanding, merging, and pruning in an iterative manner, controlled by pathway enrichment analysis (see Fig. 2B and Methods). In this way, we obtained a final protein network of 1344 genes and 24 enriched pathways (see Data availability). The Newman greedy heuristic module detection algorithm was applied on the network, leading to five modules, representing the T cell receptor signaling pathway, JAK-STAT signaling pathway, C-type lectin receptor signaling pathway, Chemokine signaling pathway and Endocytosis (Fig. 2C). At last, DrugBank’s drug-target interactions were added to the protein network, resulting a heterogeneous molecular network, over which proximity-based network analysis [13] identified a list of 78 repurposable drugs (see Data availability). Having obtained these 78 drugs, we looked for more information, including clinical drug status, drug category, and adverse effect in the Yaozh (https://data.yaozh.com/) and DrugBank [17] databases. The former database provides increased China-related information, including clinical trials in China, traditional Chinese medicine usage and theory, approvals by NMPA (National Medical Products Administration, formerly known as CFDA – China Food and Drug Administration) in China, and studies only published in Chinese, while the latter reports approval process by the U.S. FDA (Food and Drug Administration), known targets, therapeutic effects as well as basic chemical information [17]. Through a literature review, we identified a list of important symptoms and mechanisms linked to SARS-CoV-2, including fever, fatigue, cough [18], breathing difficulty, septic shock, viral proliferation, immunodeficiency and pulmonary fibrosis [19] (Fig. 4 ). We manually removed a drug from our list if it did not have any reported effect on any of these key symptoms and mechanisms. We also removed a drug from our list if it had strong reported side effects. We also filtered out drugs for which there is little scientific knowledge. After removing these drugs deemed unfit for rapid repurposing, we obtained a list of 30 drugs (Table 1 ).

Fig. 4

Symptoms and mechanisms related to SARS-CoV-2 and the corresponding categories of our 30 suggested drugs.

Table 1

Thirty predicted drug candidates to repurpose against COVID-19. Type and Group were obtained by querying DrugBank. Initial ranks came from our proximity-based drug prioritization algorithm. Categories were obtained from Yaozh (a drug database in China), DrugBank and manual curation when the data was not available in neither of these databases.

DrugBank ID	Drug name	Type	Group	Initial rank	Category
DB00852	Pseudoephedrine	small molecule	approved	1	antipyretic or analgesic; antiasthmatic; anti-inflammatory
DB05767	Andrographolide	small molecule	investigational	2	antipyretic or analgesic; antiviral; anti-bacterial; anti-inflammatory
DB05513	Atiprimod	small molecule	investigational	3	immunomodulator
DB05017	YSIL6	small molecule	investigational	8	immunomodulator
DB06083	Tapinarof	small molecule	investigational	11	anti-inflammatory
DB00005	Etanercept	biotech drug	approved, investigational	12	antipyretic or analgesic; anti-inflammatory
DB00051	Adalimumab	biotech drug	approved	13	antipyretic or analgesic; anti-inflammatory
DB00065	Infliximab	biotech drug	approved	14	anti-inflammatory
DB00608	Chloroquine	small molecule	approved; investigational; vet_approved	15	anti-bacterial; anti-inflammatory
DB00668	Epinephrine	small molecule	approved; vet_approved	16	antiasthmatic
DB01041	Thalidomide	small molecule	approved; investigational; withdrawn	17	anti-fibrosis; immunomodulator
DB01407	Clenbuterol	small molecule	approved; investigational; vet_approved	18	antiasthmatic
DB01411	Pranlukast	small molecule	investigational	19	antiasthmatic
DB04956	Afelimomab	biotech drug	investigational	21	immunomodulator
DB06674	Golimumab	biotech drug	approved	32	antipyretic or analgesic; anti-inflammatory
DB09036	Siltuximab	biotech drug	approved, investigational	35	anti-viral
DB01250	Olsalazine	small molecule	approved	36	anti-inflammatory
DB12698	Ibalizumab	biotech drug	approved, investigational	39	anti-HIV
DB01327	Cefazolin	small molecule	approved	43	anti-bacterial
DB01048	Abacavir	small molecule	approved; investigational	50	anti-HIV
DB02375	Myricetin	small molecule	experimental	51	anti-inflammatory
DB04464	N-Formylmethionine	small molecule	experimental	52	immunomodulator
DB06475	Ruplizumab	biotech drug	Investigational	54	immunomodulator
DB00452	Framycetin	small molecule	approved	60	anti-bacterial
DB01009	Ketoprofen	small molecule	approved; vet_approved	65	anti-inflammatory
DB04835	Maraviroc	small molecule	approved; investigational	66	anti-HIV
DB06652	Vicriviroc	small molecule	investigational	69	anti-HIV
DB00172	Proline	small molecule	approved; nutraceutical	75	nutrition
DB04216	Quercetin	small molecule	experimental; investigational	76	antiasthmatic
DB11638	Artenimol	small molecule	experimental; investigational	78	anti-bacterial

Symptoms and mechanisms related to SARS-CoV-2 and the corresponding categories of our 30 suggested drugs. Thirty predicted drug candidates to repurpose against COVID-19. Type and Group were obtained by querying DrugBank. Initial ranks came from our proximity-based drug prioritization algorithm. Categories were obtained from Yaozh (a drug database in China), DrugBank and manual curation when the data was not available in neither of these databases.

Results sharing and case analysis

In order to help fight COVID-19 as fast as possible, we first publicly shared our list of 78 drugs (see Data availability) and our list of 24 enriched pathways (see Data availability) and we briefly explained our approach with healthcare professionals and hospitals, via GeneNet company’s WeChat Chinese blog, on February 12, 2020. At the time, we put forward pseudoephedrine, andrographolide, chloroquine, abacavir, baricitinib, and quercetin as repurposing candidates from our list, because there were other researches also suggesting or predicting these drugs, mainly based on our Yaozh database search and literature review. Chloroquine has been considered as one of the most promising repurposed drugs and is currently being tested against COVID-19 by more than ten clinical trials [6]. Abacavir was also predicted to treat COVID-19 by two separate studies [20]. Baricitinib was also suggested by the BenevolentAI company using their knowledge graph technology [21] and several clinical trials have been initiated (such as NCT04321993 in Phase 2). Finally, quercetin was predicted by a virtual screening studies of Chinese herbal medicines [22] and was later tested in clinical trial (NCT04377789) against COVID-19. The ongoing clinical trials and experimental validation for our 6 highlighted drugs, most of them beginning after our 1st result exchange, suggested that our approach succeeded in producing repurposing candidates that are worthy of further evaluation. In a second exchange with partner experts from Chinese institutions and care units, via a webinar organized on February 22, we also put forward thalidomide as an interesting repurposing candidate as it was well ranked by our algorithm, the sole drug with anti-fibrosis effect in our list, while it was neither predicted nor tested by another research group. Later, successful use of thalidomide combined with low-dose glucocorticoid (methylprednisolone) was reported by a preprint for a 45-year old Chinese woman who had unsuccessfully been treated with ofloxacin (a fluoroquinolone antibiotic known to inhibit the DNA topoisomerase 4 subunit A and DNA gyrase subunit A of Haemophilus influenzae), oseltamivir (a.k.a Tamiflu, known to inhibit Neuraminidase of Influenza A virus) and lopinavir + ritonavir (a combination of antiviral drugs used to treat HIV known to target the HIV protein encoded by pol) (drug target information above from DrugBank [17]). We remind that these drugs do not belong to our proposed drugs as our method was repurposing drugs with human targets. Before being treated with thalidomide + methylprednisolone, the patient showed an increase in C-reactive protein (CRP) and cytokine levels, including interleukin 6 (IL-6), interleukin 10 (IL-10) and interferon-gamma (IFN-gamma) together with reduced CD4+ and CD8+ T cells counts. The authors reported that these abnormally high interleukin levels and abnormally low T cell levels returned to normal after three days of their combinatorial treatment. It was previously shown that thalidomide enhances TCR (T cell receptor)-mediated T cells activation by by-passing T cell need for co-stimulation by accessory molecules, such as the B7 protein together with the CD28 protein, and therefore can overcome T cell deficiency [23]. In addition, previous work suggests that lenalidomide, a derivative of thalidomide, can restore T cells motility leading to their activation [24]. Finally, it was also reported that thalidomide prevents NF-kB from binding to the promoters of its target genes, including TNF-alpha and IFN-Gamma thereby reducing excessive inflammatory response [25], [26]. Altogether, based on these previous studies, the reported successful use of thalidomide by Chen et al. [27], and our analysis, we hypothesize that thalidomide may be effective against COVID-19 by favorably modifying the immune response of the infected patients against the virus (Fig. 5 ). At the time of the preprint sharing (March 2020), thalidomide had been registered in two Phase 2 clinical trials: NCT04273529 and NCT04273581.

Fig. 5

Thalidomide’s potential Mechanism-of-Action on COVID-19. APC: antigen-presenting cell; MHC: major histocompatibility complex; TCR: T cell receptor.

Discussion

We applied a network bioinformatics approach to prioritize potential drugs and their targets at the systems level based on pre-COVID-19 knowledge of related viruses. To our knowledge, until now, two other studies have investigated the COVID-19 disease using network-based repurposing. The first one took advantage of a knowledge graph (another type of network comprising different entity types, such as gene, protein, organism and disease, and relationship types, such as interacting with, phosphating, belonging to, etc.) technology to suggest baricitinib as potential treatment [21]. A second study used, in part, similar network techniques than reported in this study, although the main difference is that we relied on text-mining and database search for seed genes identification while they essentially relied on the use of transcriptomic data for enrichment analysis [28]. We would like to highlight that our network bioinformatics analysis relied not directly on the keyword COVID-19, but indirectly via its similar terms like SARS based on genome analysis and limited existing knowledge about the disease. This is because our study was conducted mainly in January and February 2020 when scientific knowledge of COVID-19 was seriously lacking. Now, almost one year after we first shared the preprint of this study, 30 out of the 34 genes we identified can now be found using the method of sentence cooccurrence with “Covid-19” based on the CORD-19 text dataset (version November 2020) [29] (see more proofs in Data Availability). The purpose of this in silico work is not to yield repurposing candidates which should be immediately given to patients, rather, it is to shorten the immense list of candidates as to focus rigorous clinical (sometimes and experimental) evaluation on a smaller number of candidates. In the context of an outbreak, the response must be swift but must also satisfy the usual safety and quality standards of the medical community. Deriving a list of candidates based on the available robust in silico data and analysis allows the expert to concentrate scarce resources on evaluating a smaller number of options but with the same level of standards. We therefore designed our analyses hoping that our results could be helpful in rapidly designing and implementing clinical trials or preclinical experiments to treat COVID-19 considering the available preclinical, pharmacokinetic, pharmacodynamic, toxicity, and clinical knowledge. Extreme caution is needed for drugs with important side effects, even when they are already approved drugs, because the interactions of the side effects with the new disease are unknown. In such a situation, combinations are an interesting path, as they can be more efficient at lower synergistic doses when used synergistically while suppressing their side effects [7]. As of now, 7 of the 30 proposed drugs have been tested in clinical trials according to clinicaltrials.gov database (search in February 2021, see more details in Data Availability). To our knowledge, chloroquine (or hydroxychloroquine) is the only drug with published results on clinical trials, which mostly suggest a lack of efficacy in COVID-19 patients and safety concerns at high doses [30]. We believe that drug repurposing, preferably coupled with synergetic combinations at low doses, could help find other therapies in addition to recent successful vaccine development against COVID-19.

Materials and methods

Genome sequence analysis

From NCBI GenBank, the complete genome of Wuhan-Hu-1 (NC_045512.2) was downloaded as the 2019-nCoV sequence. This genome sequence was used to search for closely related viruses, against the whole database using BLASTn (default parameters except that we obtained more results than 100 by default). Among the BLASTn results, we extracted the following complete genome sequences as representative to build a phylogenetic tree: SARS coronavirus (SARS-CoV), MERS coronavirus (MERS-CoV), Human coronaviruses OC43, NL63, HKU1 and 229E; Bat coronaviruses BM48-31/BGR/2008, CDPHE15/USA/2006, HKU8, HKU5-1, 1A, HKU4-1 and HKU2; Rousettus bat coronaviruses HKU9 and HKU10; NL63-related bat coronavirus strain BtKYNL63-9a; Scotophilus bat coronavirus 512; Porcine coronavirus HKU15 (see the full table with their genome identifiers in Data Availability). Multiple sequence alignment was calculated by EMBL-EBI’s MSA (multiple sequence alignment) tool (https://www.ebi.ac.uk/Tools/msa/) using default parameters. A tree was built using the neighbor-joining method with the MEGA-X software [31], using the maximum composite likelihood model and 1000 bootstraps. The resulting tree was represented using the phylogram format (i.e., a tree branch lengths are proportional to the amount of inferred evolutionary change) [32].

Related genes identification

PubMed (version 2019-12) was downloaded from its FTP site. Note that no article mentioning SARS-CoV-2 (or its previous name 2019-nCoV) had been published before that date, meaning that our text mining analysis did not directly consider the COVID-19 disease. Instead, it aims at predicting the network base on closely related viruses and their physiology. More than 29 million abstracts were processed for sentence and word tokenization by the natural language processing tool Spacy (v2). Inputted keywords of interest (SARS, MERS, coronavirus, viral pneumonia, and HIV) were extracted by exact matches to detect abbreviations or regression expressions to detect full names or synonyms. Entity recognition for genes was proceeded by mapping gene names and unambiguous synonyms from the HGNC database. Co-occurrence numbers were counted by the number of papers where a pair of gene and an input entity was in one sentence. A list of related genes ranked by sentence co-occurrence numbers was obtained for each of the five input entities. The final text-mining resulting list (the network shown in Fig. 2) was built from the whole list for SARS and the top 10% of each of the other four lists. Database search for related genes was performed by a program developed in-house, AutoSeed, which can search for disease-related genes in the following databases: DisGeNET [33], DrugBank [17], KEGG [34], Malacards [35], eDGAR [36], NHGRI-EBI GWAS-Catalog [37]. Note that this program was developed for all types of diseases, and not specifically for viral diseases. Its function is to interrogate all of these databases automatically and to return a list of related genes sorted by the number of times they occur in those databases. Although the GWAS-Catalog is one of the resources of AutoSeed, for SARS and MERS, because there are no published GWAS, the findings in that category are, as expected, null. The final database-based list was composed of the whole list for SARS and the top 10% of each of the other four lists.

Network building

Network building was performed automatically by another of our in-house program “AutoNet”, implemented on our drug discovery cloud platform (CloudPhar: http://cloud.tasly.com/#/portalHome). This algorithm is explained by a schematic diagram in Fig. 2B. Data for this step includes a local meta-pathway database for pathway enrichment analysis and a meta-PPI (protein–protein interaction) database to grow the network. The meta-pathway database is made of human pathways in KEGG [34] and Reactome (v70) [38] databases, after removing small pathways (less than five genes) and pathways which enrich too easily, such as hsa05200: Pathways in cancer. The meta-PPI database is composed of protein–protein interaction databases HPRD [39], BioGrid [40] (excluding genetic interactions), and STRING [41] (excluding PPIs with confidence score <0.7). The building process repeats network expanding, merging, and pruning in an iterative manner. At the initial state, all seed genes are considered as positive nodes where each seed gene is a subnetwork (i.e., connected component) composed of one node. A dynamic pathway collection for network building is initiated by a pathway enrichment analysis (hypergeometric test, False Discovery Rate correction, threshold: adjusted p-value <0.001) for all positive nodes against our meta-pathway database. Here, a subnetwork is used to denote any growing network during the network building process and to be distinguished from our final network; a pathway means any pathway from meta-pathway databases (KEGG and Reactome). In each expanding step, protein interactors of any positive nodes are added as temporary nodes according to our meta-PPI database. In each merging step, only the pair of subnetworks that share the most positive nodes and temporary nodes are merged, while the other subnetworks wait to be merged in the next iterations. In the subnetwork pruning step, those temporary nodes which are not in any of the pathway collection in the current state are removed. Remaining nodes become positive nodes, and the dynamic pathway collection is updated by using pathway enrichment analysis for all positive nodes. Sub-networks are grown until they cannot be further merged. At last, if more than one subnetwork remains, only the largest connected component and any other subnetwork whose size is greater than 5% of largest connected component’s size, are kept. In this study, only the largest component was kept because the others were too small.

Network-based drug repurposing

After the network was built, core modules were detected (Fig. 2C), using the Newman greedy heuristic algorithm [42], implemented in igraph package (v1.2.4.2) in the R language (version 3.5.3). Potential drugs were then mapped to the COVID-19 network through drug-target interactions (source from DrugBank). As shown in Fig. 2D, different drugs can be linked to one or more different modules (shown as colored areas) in the network. In order to find the maximum effective coverage of the core functional modules for each drug, we used a proximity method with each drug proximity distance calculated as the mean value of the shortest distances between any drug and each of the core modules in the space (equation shown in Fig. 2D) [13].

Conclusion

In this study, we applied a network bioinformatics approach to repurpose drugs for COVID-19. Our seed genes (i.e., disease-related genes) resulted from our AutoSeed program -- a systematic text mining and database search, while our protein network was built by AutoNet, mainly based on knowledge of pathways, protein–protein interaction and graph theory. Combining these results with module detection and proximity analysis algorithms allowed us to identify 78 old drugs repurposable for COVID-19 disease. Finally, drug database search and manual curation helped shorten our first list to a final list of 30 rapidly repurposable drugs to be tested clinically and experimentally, possibly as combination therapies to treat COVID-19 patients.

Conflict of interest

X.L., Z.Z., J.R., W.Z., Y.Z., J.W., K.Y. and W.W. are employees of GeneNet Pharmaceuticals. J.Y., A.E.P. and D.C. are employees of Pharnext.

Funding

This research received no external funding.

CRediT authorship contribution statement

Xu Li: Conceptualization, Formal analysis, Investigation, Methodology, Validation, Writing - review & editing. Jinchao Yu: Formal analysis, Investigation, Writing - original draft, Writing - review & editing. Zhiming Zhang: Formal analysis, Investigation, Writing - review & editing. Jing Ren: Formal analysis, Investigation, Writing - review & editing. Alex E. Peluffo: Writing - original draft, Writing - review & editing. Wen Zhang: Investigation, Writing - review & editing. Yujie Zhao: Investigation, Writing - review & editing. Jiawei Wu: Investigation, Writing - review & editing. Kaijing Yan: Writing - review & editing. Daniel Cohen: Writing - review & editing. Wenjia Wang: Supervision, Writing - review & editing.

38 in total

Review 1. The evolution of thalidomide and its IMiD derivatives as anticancer agents.

Authors: J Blake Bartlett; Keith Dredge; Angus G Dalgleish
Journal: Nat Rev Cancer Date: 2004-04 Impact factor: 60.716

Review 2. Immunomodulation and immune reconstitution in chronic lymphocytic leukemia.

Authors: John C Riches; John G Gribben
Journal: Semin Hematol Date: 2014-05-15 Impact factor: 3.851

Review 3. Next-generation drug repurposing using human genetics and network biology.

Authors: Serguei Nabirotchkin; Alex E Peluffo; Philippe Rinaudo; Jinchao Yu; Rodolphe Hajj; Daniel Cohen
Journal: Curr Opin Pharmacol Date: 2020-01-22 Impact factor: 5.547

Review 4. MalaCards: an amalgamated human disease compendium with diverse clinical and genetic annotation and structured search.

Authors: Noa Rappaport; Michal Twik; Inbar Plaschkes; Ron Nudel; Tsippi Iny Stein; Jacob Levitt; Moran Gershoni; C Paul Morrey; Marilyn Safran; Doron Lancet
Journal: Nucleic Acids Res Date: 2016-11-28 Impact factor: 16.971

5. The BioGRID interaction database: 2019 update.

Authors: Rose Oughtred; Chris Stark; Bobby-Joe Breitkreutz; Jennifer Rust; Lorrie Boucher; Christie Chang; Nadine Kolas; Lara O'Donnell; Genie Leung; Rochelle McAdam; Frederick Zhang; Sonam Dolma; Andrew Willems; Jasmin Coulombe-Huntington; Andrew Chatr-Aryamontri; Kara Dolinski; Mike Tyers
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

6. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.

Authors: Annalisa Buniello; Jacqueline A L MacArthur; Maria Cerezo; Laura W Harris; James Hayhurst; Cinzia Malangone; Aoife McMahon; Joannella Morales; Edward Mountjoy; Elliot Sollis; Daniel Suveges; Olga Vrousgou; Patricia L Whetzel; Ridwan Amode; Jose A Guillen; Harpreet S Riat; Stephen J Trevanion; Peggy Hall; Heather Junkins; Paul Flicek; Tony Burdett; Lucia A Hindorff; Fiona Cunningham; Helen Parkinson
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

7. Baricitinib as potential treatment for 2019-nCoV acute respiratory disease.

Authors: Peter Richardson; Ivan Griffin; Catherine Tucker; Dan Smith; Olly Oechsle; Anne Phelan; Michael Rawling; Edward Savory; Justin Stebbing
Journal: Lancet Date: 2020-02-04 Impact factor: 79.321

8. Efficacy of chloroquine or hydroxychloroquine in COVID-19 patients: a systematic review and meta-analysis.

Authors: Zakariya Kashour; Muhammad Riaz; Musa A Garbati; Oweida AlDosary; Haytham Tlayjeh; Dana Gerberi; M Hassan Murad; M Rizwan Sohail; Tarek Kashour; Imad M Tleyjeh
Journal: J Antimicrob Chemother Date: 2021-01-01 Impact factor: 5.790

9. Human Protein Reference Database--2009 update.

Authors: T S Keshava Prasad; Renu Goel; Kumaran Kandasamy; Shivakumar Keerthikumar; Sameer Kumar; Suresh Mathivanan; Deepthi Telikicherla; Rajesh Raju; Beema Shafreen; Abhilash Venugopal; Lavanya Balakrishnan; Arivusudar Marimuthu; Sutopa Banerjee; Devi S Somanathan; Aimy Sebastian; Sandhya Rani; Somak Ray; C J Harrys Kishore; Sashi Kanth; Mukhtar Ahmed; Manoj K Kashyap; Riaz Mohmood; Y L Ramachandra; V Krishna; B Abdul Rahiman; Sujatha Mohan; Prathibha Ranganathan; Subhashri Ramabadran; Raghothama Chaerkady; Akhilesh Pandey
Journal: Nucleic Acids Res Date: 2008-11-06 Impact factor: 16.971

10. In silico screening of Chinese herbal medicines with the potential to directly inhibit 2019 novel coronavirus.

Authors: Deng-Hai Zhang; Kun-Lun Wu; Xue Zhang; Sheng-Qiong Deng; Bin Peng
Journal: J Integr Med Date: 2020-02-20

15 in total

1. Bioinformatics and machine learning approach identifies potential drug targets and pathways in COVID-19.

Authors: Md Rabiul Auwul; Md Rezanur Rahman; Esra Gov; Md Shahjaman; Mohammad Ali Moni
Journal: Brief Bioinform Date: 2021-04-12 Impact factor: 11.622

2. Using informative features in machine learning based method for COVID-19 drug repurposing.

Authors: Rosa Aghdam; Mahnaz Habibi; Golnaz Taheri
Journal: J Cheminform Date: 2021-09-20 Impact factor: 5.514

3. Predicting novel drug candidates against Covid-19 using generative deep neural networks.

Authors: Santhosh Amilpur; Raju Bhukya
Journal: J Mol Graph Model Date: 2021-10-13 Impact factor: 2.518

Review 4. Significant perspectives on various viral infections targeted antiviral drugs and vaccines including COVID-19 pandemicity.

Authors: Gandarvakottai Senthilkumar Arumugam; Kannan Damodharan; Mukesh Doble; Sathiah Thennarasu
Journal: Mol Biomed Date: 2022-07-15

5. Discovery of new drug indications for COVID-19: A drug repurposing approach.

Authors: Priyanka Kumari; Bikram Pradhan; Maria Koromina; George P Patrinos; Kristel Van Steen
Journal: PLoS One Date: 2022-05-24 Impact factor: 3.752

Review 6. Strategies to identify candidate repurposable drugs: COVID-19 treatment as a case example.

Authors: Ali S Imami; Robert E McCullumsmith; Sinead M O'Donovan
Journal: Transl Psychiatry Date: 2021-11-16 Impact factor: 6.222

7. Network-based repurposing identifies anti-alarmins as drug candidates to control severe lung inflammation in COVID-19.

Authors: Emiko Desvaux; Antoine Hamon; Sandra Hubert; Cheïma Boudjeniba; Bastien Chassagnol; Jack Swindle; Audrey Aussy; Laurence Laigle; Jessica Laplume; Perrine Soret; Pierre Jean-François; Isabelle Dupin-Roger; Mickaël Guedj; Philippe Moingeon
Journal: PLoS One Date: 2021-07-22 Impact factor: 3.240