Literature DB >> 30123072

Review of Drug Repositioning Approaches and Resources.

Hanqing Xue¹, Jie Li¹, Haozhe Xie¹, Yadong Wang¹.

Abstract

Drug discovery is a time-consuming, high-investment, and high-risk process in traditional drug development. Drug repositioning has become a popular strategy in recent years. Different from traditional drug development strategies, the strategy is efficient, economical and riskless. There are usually three kinds of approaches: computational approaches, biological experimental approaches, and mixed approaches, all of which are widely used in drug repositioning. In this paper, we reviewed computational approaches and highlighted their characteristics to provide references for researchers to develop more powerful approaches. At the same time, the important findings obtained using these approaches are listed. Furthermore, we summarized 76 important resources about drug repositioning. Finally, challenges and opportunities in drug repositioning are discussed from multiple perspectives, including technology, commercial models, patents and investment.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 30123072 PMCID： PMC6097480 DOI： 10.7150/ijbs.24612

Source DB: PubMed Journal: Int J Biol Sci ISSN： 1449-2288 Impact factor: 6.580

Introduction

Drug discovery is a time-consuming, laborious, costly and high-risk process. According to a report by the Eastern Research Group (ERG) 1, it usually takes 10-15 years to develop a new drug. However, the success rate of developing a new molecular entity is only 2.01% 2, on average. As demonstrated in a report by the Food and Drug Administration (FDA), the number of drugs approved by the FDA has been declining since 1995 3. Moreover, investment in drug development has been gradually increasing, as reported by Pharmaceutical Research and Manufacturers of America (PhRMA) 4 (Figure 1). This indicates that the cost of new drug development will continue to grow. Hence, it is urgent to find a new strategy to discover drugs.

Figure 1

The investment in drug development by PhRMA member companies and the number of approved drugs by the FDA from 1995 to 2015.

Drug repositioning, also known as old drugs for new uses, is an effective strategy to find new indications for existing drugs and is highly efficient, low-cost and riskless. Traditional drug development strategies usually include five stages: discovery and preclinical, safety review, clinical research, FDA review, and FDA post-market safety monitoring 4, 5. However, there are only four steps in drug repositioning: compound identification, compound acquisition, development, and FDA post-market safety monitoring (Figure 2). Due to the fast growth of bioinformatics knowledge and biology big data, drug repositioning decreases the time cost of the drug development process significantly. Researchers only need 1-2 years to identify new drug targets and 8 years to develop a repositioned drug, on average 1. Furthermore, the research and development investment required for drug repositioning is lower than that for traditional strategies. Drug repositioning breaks the bottlenecks of cost for many countries. It only costs $1.6 billion to develop a new drug using a drug repositioning strategy, while the cost of the traditional strategy is $12 billion 6. Thus, drug repositioning offers an opportunity for many countries to develop drugs with lower investments.

Figure 2

The contrast of traditional drug development and drug repositioning. A) Flowchart of the traditional drug development process. B) Flowchart of drug repositioning.

In addition to reducing the time cost and investment, drug repositioning is also a low-risk strategy. A risk-reward diagram is often used to describe the relationship between a risk and the reward on investment 7. We drew a risk-reward diagram to compare repositioning and traditional drug development strategies (Figure 3). As shown in Figure 3, drug repositioning holds a higher reward with a lower risk. Because repositioned drugs have passed all clinical tests in Phase I, Phase II, and Phase III, their safety has been confirmed. In addition, some repositioned drugs may be marketed as molecular entities and have more opportunities to be pushed into the market once a new indication is discovered.

Figure 3

Risk and reward in two different drug development strategies

Approaches to drug repositioning

The main issue in drug repositioning is the detection of novel drug-disease relationships. To address this issue, a variety of approaches have been developed including computational approaches, biological experimental approaches and mixed approaches. With the fast development of biology microarray techniques, various drug and disease knowledge databases such as DrugBank 8, ChemBank 9, OMIM 10, KEGG 11, and Pubmed 12 have appeared, and massive genomic databases such as MIPS13, PDB 14, GEO 15, and GenBank 16 have been built (see Resource section for details). This knowledge and data further promoted the rapid development of a variety of novel computational approaches. Compared to biological experimental approaches, computational approaches have much lower costs and much fewer barriers 17. In this review, we mainly introduce computational approaches. Most existing computational approaches are based on the gene expression response of cell lines after treatment or merging several types of information about disease-drug relationships 18 that can be divided into different types from different viewpoints 19-21. For instance, some researchers grouped drug repositioning methods according to the biological networks used 19, and others divided drug repositioning methods into two types: data-driven and hypothesis-driven 21. However, the above studies did not focus on methodology. In this paper, we emphasized the core methodologies of drug repositioning approaches, so we divided them into three categories: network-based approaches 22-32, text-mining approaches 33-45 and semantic approaches 46-49.

Network-based approaches

Network-based approaches are widely used in drug repositioning due to the associated ability to integrate multiple data sources. These approaches have been proposed in the past few decades and became a hot topic approximately ten years ago. In this section, two types of network-based approaches are reviewed: network-based cluster approaches 22-24, 26 and network-based propagation approaches 27, 29, 31, 32, 50.

Network-based cluster approaches

Inspired by the fact that biologic entities (disease, drug, protein, etc.) in the same module of biological networks share similar characteristics, network-based cluster approaches have been proposed to discover novel drug-disease relationships or drug-target relationships. These approaches aim to find several modules (also known as subnetworks, groups or cliques) using cluster algorithms according to the topology structures of networks. These modules include various relationships such as drug-disease, drug-drug or drug-target relationships. The most common network-based cluster approaches, including DBSCAN 51, CLIQUE 52, STING 53, and OPTICS 54, cannot detect overlapping clusters. To address this problem, Lu et al. 55 studied the drug repositioning of SCLC (small-cell lung cancer) using a k-means-based network cluster algorithm. Chemical-chemical interactions and chemical-protein interactions were utilized to select candidate drug compounds that had close associations with approved lung cancer drugs and lung cancer-related genes. The experimental results revealed that the proposed algorithm predicted some drugs for treating SCLC, indications which were verified by retrieving references. Tamas´ et al. 22 proposed a greedy network cluster approach named ClusterONE with two kernel components: cohesiveness score and greedy growth process. There are three steps in the approach: (i) generating growth groups from the greedy growth process using high-cohesiveness seed nodes, (ii) merging highly overlapping group pairs, (iii) discarding some complex groups according to the threshold. The advantage of the approach is its generality, and it can accurately predict not only disease-drug relationships in disease-drug networks but also protein interactions in protein-protein networks. Yu et al. 23 proposed an approach to construct disease-protein-drug networks based on a symmetrical conditional probability and detection of modules on the network via the ClusterONE algorithm. As a result, potential disease-drug interactions were found—for example, the authors discovered that iloperidone could be used to treat hypertension. Wu et al. 24 developed a novel approach through combining ClusterONE and Louvain 25 to detect modules in a heterogeneous network built from KEGG 11 disease-drug and drug-target data. They found 98 clusters and 1160 pairs of disease-drug interactions—for instance, vismodegib was predicted to treat Gorlin syndrome, while its original indication was basal cell carcinoma. Luo et al. 26 presented an approach named MBiRW with three steps: (i) calculating a comprehensive similarity between drugs and diseases; (ii) obtaining drug-drug subnetworks, disease-disease subnetworks and drug-disease subnetworks; and (iii) finding drug-disease relationships using a bi-random walk algorithm. Some novel disease-drug relationships such as Alzheimer's-levodopa were found (see Table 1).

Table 1

Networks-based drug repositioning

Name	Method	Network	Description	Key Findings	Advantage	Disadvantage	Ref.
RNSC	Cluster	PPI	A global networkalgorithm to identifyprotein clusters on PPInetworks	Some complex proteins	This method considers both local and global information from networks.Overlap clusters can be detected as well.	Some information may be dropped because the cluster size is small.	60
RRW	Cluster	PPI	An effective networkcluster approach to identify protein clusters on a PPI network	Some complex proteins	This is a general method with a high prediction accuracy.	It is a time-costly andmemory-costly methodthat cannot detectoverlap clusters.	61
ClusterONE	Cluster	PPI	A global network algorithm to identify node clusters on networks.	Some complex proteins	This approach outperformed the other approaches including MCL, RRW, etc., both on weighted and unweighted PPInetworks.	There is no a gold standard to evaluate clusters.	22
-	Cluster	Drug-protein-disease	A variant of ClusterONE algorithm to cluster nodes on heterogeneous networks	(Iloperidone,schizophrenia) →Hypertension	This is an efficientcluster approach thatintegrates multipledatabases.	It is difficult to distinguish between positiveassociations and negative associations on the network.	23
-	Cluster	Drug-target-disease	An algorithm to detectclusters on the network	(Vismodegib, Basal cellcarcinoma) → Gorlinsyndrome	This is a general andhighly robust approach.	This approach losesweakly associate genes of diseases and drugs.	24
MBiRW	Cluster	Drug-disease	A bi-random walk-based algorithm topredict disease-drugsrelationships.	(Levodopa, Parkinsonian disorder) →Alzheimer's(Cabergoline, Hyperprolactinemia) →Migraine	Predictions of this approach are reliable.	The approach needs toadopt more biologicalinformation to improvethe confidence of the similarity metric.	26
-	Cluster	Drug-protein-chemical	A k-means-based network cluster algorithmon heterogeneous networks.	(Canertinib, Acute lymphoblastic leukemia) →SCLC	This approach is easy to implement. Predictions of this approach are reliable.	This approach needsto integrate multipledatabases.	55
-	Propagation	Drug-target	An algorithm thatcombines four network-based approachesto predict drug-targetrelationships.	Melanoma's target cMyc was predicted	This approach is easy to implement. Predictions of this approach are reliable.	This approach needsto integrate multipledatabases.	55
-	Propagation	Disease-protein-gene	A random walk-basednetwork algorithm witha diffusion kernel to predict disease-gene relationships.	Some disease-gene relationships	This is a global efficient method that can be applied on other networkssuch disease-drug networks.	This approach canonly be used for geneswhose protein-proteinrelations are known.It does not perform wellon small disease-genefamily data.	29
PRINCE	Propagation	Disease-gene	A global propagation algorithm topredict disease-generelationships.	Some disease-gene relationships	This is a global network approach combined with a novel normalization of protein-proteininteraction weightsand disease-diseasesimilarities.	This approach relieson phenotype data, sosome diseases that lack phenotype information are excluded. The performance of this approachrelies on data quality.	31
DrugNet	Propagation	Disease-drug-protein	A comprehensive propagation method to predict different propagation strategies in different subnets.	(Methotrexate, antimetabolite andantifolate)→ cancer (Gabapentin,epilepsy)→neuropathic pain	This method is robust and efficient.	The performance of this approach relies on the quality of disease data.	31

Note: In key findings field, some records are organized as the form: (drug, origin indication) → new indication. For example: (Canertinib, Acute lymphoblastic leukemia) → SCLC

Network-based propagation approaches

Network-based propagation approaches are another important type of network-based approach. The workflow of these approaches is that prior information propagates from the source node to all network nodes and some subnetwork nodes. According to the different propagation ways, these approaches can be divided into two types: local approaches and global approaches. Several studies have proven that these methods perform well in finding disease-targets, disease-genes and disease-drug relationships 27. Local propagation approaches only take the limited information of the network into account and may fail to make correct predictions 28 in some cases. By contrast, global approaches containing information from the entire network perform better than local approaches. Most current researchers concentrate on global approaches to achieve outstanding performance. For example, Kohle et al. 29 developed a network propagation approach based on the global information of a network to find novel disease-gene interactions. The approach included three phases: (i) extracting drug-disease relationships and constructing a disease-gene network; (ii) obtaining the global information of the network using a random walk propagation algorithm 56 in the network; and (iii) defining global metrics to predict novel disease-gene relationships. The proposed approach performed better than other approaches, including the diffusion kernel approach, PROSPECTR 30. In addition, cross-validation showed that the accuracy of disease-gene predictions is 98%. Vanunu et al. 31 also proposed a global approach for finding disease-gene and disease-protein relationships via a network propagation approach called PRINCE. The method is based on formulating constraints on a score function related to the smoothness of the disease-gene network. In the proposed method, gene nodes adopt prior information as input and then pump this information to their neighbor node until convergence. The score function gives a confidence level for each predicted disease-gene pair. PRINCE was evaluated on 1369 disease data points from OMIM and could predict unknown causal genes of some diseases such as type 2 diabetes, Alzheimer's disease and prostate cancer. Martinez et al. 32 presented a disease-gene-drug network propagation approach wherein two different propagation approaches were defined: propagation in homogeneous subnetworks (such as gene subnetworks) and propagation between subnetworks. They used a prioritization function to measure the correlation between drugs and diseases. As a consequence, a list of drugs was produced for a queried disease. Novel indications of some drugs such as methotrexate, gabapentin, cisplatin, donepezil, and risperidone were obtained using this approach. In addition, Emig et al. 27 proposed a comprehensive approach combining 4 local and global network approaches through a logistic regression model. The approach was evaluated on 30 different diseases with known drug targets and yielded an AUC (area under the curve) above 80%. Furthermore, melanoma's drug target c-Myc was successfully predicted, and this finding was also confirmed by two other experiential studies. 57, 58 Network-based approaches are vital for drug repositioning. Researchers often need to make a decision in selecting appropriate approaches, and we summarized these approaches in Table 1 and listed their benefits, bottlenecks, key findings, databases and other information. The networks employed in these approaches can be divided into two classes: homogeneous and heterogeneous. As disease pathways can be constructed from protein-protein interaction (PPI) network analysis 59, a protein-protein interaction (PPI) network, as a homogeneous network, is often employed in some approaches 22, 60, 61 used to identify drug targets involved with multiple pathways. Along with the deeper research associated with the network, the accuracy of PPI networks has been enhanced because numerous PPI databases have been established and updated by experiments. However, PPI networks are limited since they only include protein information without considering additional information. With the advent of the era of big data, the accumulation of various medical data (such as drugs, diseases, and targets data, among others) has made it possible to construct complex heterogeneous networks. Heterogeneous networks, which integrate multiple sources of information including genomes, proteomes and metabolic pathways 62, typically contain two-layer (i.e., disease-drug) or three-layer (i.e., disease-drug-gene) relationships and have attracted researchers' attention. Different biological entities included in the heterogeneous networks not only provide an opportunity to improve the performance of existing methods but also offer a tool to design more efficient and stable approaches 24, 26, 27, 29, 31, 55, 58 (see table 1). From the method perspective, network-based cluster algorithms are frequently used to find interesting modules, and network-based propagation algorithms are often used to infer new relationships between biological entities. Network-based cluster approaches are general because most network-based cluster algorithms can be employed for detecting biological modules. For example, some cluster algorithms in the social network analysis field can be employed for detecting modules in biological networks 25, 63. However, there is a challenge of network-based cluster approaches in that there exists no gold standard to test associations among biological modules. Network-based propagation approaches are easy to implement and can make accurate predictions. Researchers can obtain an AUC value and estimate the prediction results. In addition, network-based propagation approaches use not only information from the selected components but also information from expanding components.

Text mining-based approaches

Along with the exploration of drug repositioning, a great deal of medical and biological literature containing fruitful novel biological entity relationships have been published. There is a large challenge in extracting novel and valuable biological entity relationships from the literature. Text mining (TM) techniques have been widely used to address this problem and have been increasingly developed to mine new knowledge from scientific literature and identify connections between biological concepts or biological entities. Marti Hearst gave a general definition for text mining as 'text mining is the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources' 64. The main pipeline of biological text mining includes four phrases: information retrieval (IR), biological name entity recognition (BNER), biological information extraction (BIE) and biological knowledge discovery (BKD) (Figure 4). In the IR step, relevant documents are extracted from the literature. These relevant documents need to be filtered because there are some useless concepts in documents. In the BNER step, valuable biological concepts are identified with controlled vocabularies. In the BIE and BKD steps, useful information is extracted to discover knowledge about biological concepts and build a knowledge graph. At the same time, potential associations between knowledge, such as drug-disease and drug-target relationships, can also be detected.

Figure 4

The workflow of text mining.

The origin of text mining methods in the medical field is the Swanson 'ABC' model, which states that if concept A is connected with concept B, and concept B is involved with concept C, then concept A may have a novel connection with concept C 65. Based on the 'ABC' model, various text mining methods have been proposed to find potential disease-drug relationships in the literature. A number of studies have been devoted to applying text mining techniques in drug repositioning. Li et al. 33 developed an approach to building disease-specific drug-protein connectivity maps combining network mining and text mining. In the proposed method, they first extracted disease-protein relationships from molecular interaction networks using network mining. Then, they searched for drug terms indirectly associated with certain diseases such as Alzheimer's disease (AD) in PubMed abstracts through text mining techniques. Finally, drugs and proteins could be linked through drug-disease or disease-protein relationships. As a result, diltiazem and quinidine are hypertension and arrhythmia drugs, respectively, but authors have also found that the two drugs could be used to cure Alzheimer's disease, which has been confirmed by clinical evidence. Ruggero et al. 34 proposed an approach to building sentence graph networks using text mining techniques. The proposed network can be used to discover relationships between any drug and any disease. These relationships are specific paths among the biomedical entities in the graph network. A novel disease target of sarcoidosis was identified using this approach. Rastegar et al 66 extracted drug-gene and gene-disease relationships from medical abstracts to obtain drug-disease relationships using a rank score. To evaluate the performance of the proposed approach, the obtained drug-disease relationships were validated in the Comparative Toxicogenomics Database (CTD). Experimental results indicated that the discovered relationships confirmed in CTD were highly confident. Jang et al. 35 developed an approach to building dependency graph networks through extracting sentences with genes, drugs and phenotypes from biomedical literature. They calculated the possibility that a drug treats a phenotype based on drug-phenotype associations. The authors compared the predicted drug-phenotype associations using known drug-phenotype associations in databases and proved the good performance of their method. Kuusisto et al 36 proposed another text mining method named KinderMiner to identify potential indications of some old drugs. The method is based on co-occurrence statistics between drugs and diseases in the literature. As a result, new indications of some drugs such as Zestoretic, Zebeta, and Tiazac were found. Zhang et al. 37 reported an algorithm to prioritize anti-AD (Alzheimer's disease) targets. The authors extracted 224 genetic variations, 14 epigenetic modifications, 98 proteins and 86 metabolites associated with AD using text mining and integrated these interactions to construct a weighted sum model to prioritize potential anti-AD drug targets. With the development of natural language processing (NLP) techniques, increasing numbers of text mining tools have been developed and used to discover repositionable drugs (see 38, 39). Here, we summarized the inputs, outputs and characteristics of these tools in Table 2. The inputs of these methods are usually biological terms extracted from existing literature, and the corresponding outputs are lists of relationships of biological terms. The confidence levels of the relationships are generally evaluated using computational approaches or biological experiments. These tools can be divided into two categories: static tools and dynamic tools.

Table 2

Text mining tools for drug repositioning.

Name	Class	Input	Output	Description	Web Site	Ref
Biovista	Static	Biological knowledge	Gene-protein relationships	A mining framework to extract gene-protein relationships.	http://www.biovista.com/	68
BioWisdom	Static	Ontology	Drug-disease, drug-target relationships	A platform to discover novel biological entity relationships.	http://www.biowisdom.com	39
FACTA+	Static	Tekst	Abstracts and linked concepts	A system to find associated concepts based on a user query	http://www.nactem.ac.uk/facta/cepts based on a user query	102
EDGAR	Static	UMLS terms	Drug-gene relationships	A system to extract relationshipsbetween drugs and genes involved in cancer using syntacticanalysis	https://www.sec.gov/	103
PolySearch	Dynamic	Bio-entities	Drug-disease, Drug-gene relationships	A web service to extract links between biological terms	http://wishart.biology.ualberta.ca	41
TextFlow	Dynamic	Document	Knowledge	A web-based text mining andnatural language processing platform	http://textflows.org/	42
EXTRACT2	Static	Bio-entities	Entity relationships	A text mining-based tool tomap biological entities to ontology/taxonomy entries	http://extract.jensenlab.org/	104
Anni 2.0	Static	Bio-entities	Linked concepts	An ontology interface of a text mining tool to extract conceptsrelationships	http://biosemantics.org/anni	69
DrugQuest	Static	Drugs	Drug-drug relations	A knowledge discovery tool todetect drug-drug relationships	http://bioinformatics.med.uoc.gr	40
MaNER	Dynamic	Medical Document	Relevant entities	A rule-based system to mine relevant entities in medical documents	-	43
BEST	Dynamic	Biomedical Literature	Relevant bio-entities	A knowledge discovery system to extract relevant bio-entities.	http://best.korea.ac.kr.	44
Alibaba	Dynamic	Bio-entities	Linked concepts	A tool to fit a PubMed query as agraphical network	-	45

Static tools were built on steady databases or documents with large size. Due to the large data size, the time performance of these tools was poor. To address this problem, indexes for documents or records were created to accelerate the query process in static tools. For example, DrugQuest 40 is a type of query tool for detecting drug-drug relationships. The workflow of this tool includes five stages. (i) Query, in which users provide a query term to retrieve related documents. (ii) Name entity recognition, which identifies proteins, chemicals and pathway terms in related documents using a biomedical concept recognition service named BeCAS 67 and identifies significant terms closely associated with the query by calculating the TF-IDF score (Term Frequency - Inverse Document Frequency) to measure the importance of terms. (iii) Building document network, which uses the similarity of documents. (iv) Clustering, in which various clustering algorithms (MCL, K-means, hierarchical clusters) are employed to cluster documents on the network. (v) Visualization, in which the 'tag cloud' technique is used for representing cluster results. The DrugQuest tool is promising for knowledge discovery and drug-drug relation prediction. However, the proposed tool only supports the DrugBank database, which leads to limitations of the query results. Other query tools 39, 40, 68, 69 were also designed in a similar way (see table 2). Static tools usually cause outdated result problems. To address this issue, many dynamic tools 41-45 that update their document databases daily were developed. However, these tools also need more time to handle user queries. To reduce the time cost of queries, cache and index techniques were employed in dynamic tools. For example, PolySearch2 41 used a cache technique to reduce the response time, and BEST 44 used an indexing technique to reduce the computation time. BEST is a biomedical search tool that returns a list of 10 different types of biomedical entities including genes, diseases, drugs, targets, transcription factors, miRNAs, and mutations for a query. The proposed tool consists of two parts: an indexing subsystem and a search subsystem. In the indexing subsystem, the authors used a dictionary-based approach to extract entities from the text and create a document-entity to list paired indexes. To avoid the outdated results problem, the tool automatically downloads abstracts newly indexed from the PubMed system and updates the document-entity indexes every day. In the search subsystem, the proposed tool utilizes the inverted index to obtain matched query terms. All entities obtained from the query are ranked according to their integrated entity scores involving entities and query terms included in all documents. BEST is a real-time and constantly updated tool, for which the time performance and output quality are both considered. Text mining tools reduce the time complexity of drug repositioning and assist researchers in verifying their experimental results by returning massive amounts of biological entity relationships. However, there are still some issues that need to be addressed. For example, the limited coverage problem is one limitation of text mining tools, which means that partially important biomedical entities or relationships such as mutations, targets, and drugs are not considered. Therefore, there is an urgent need to improve the performance of existing text mining tools.

Semantics-based approaches

Semantics-based approaches are widely used in information retrieval, image retrieval and other fields. Recently, these methods have been applied to drug repositioning. The workflow of these methods mainly includes three steps (Figure 5). First, biological entity relationships are extracted from prior information in massive medical databases to build the semantic network. Then, semantics networks based on existing ontology networks are constructed by adding the prior information obtained in the previous step. Finally, mining algorithms are designed to predict novel relationships in the semantic network.

Figure 5

The workflow of a semantic network inference.

Based on a hypothesis in which similar drugs are correlated with similar targets and similar targets are connected to similar drugs, Guillermo et al. 46 proposed an unsupervised algorithm to predict drug-target relationships. The authors constructed a semantic network including drug-drug, target-target, and drug-target relationships. The proposed approach, which combines semantics link prediction methods and edge partition methods, was evaluated on a network. Due to substantial semantics knowledge being used, the proposed method made accurate predictions about drug-target relationships. Mullell et al. 47 presented a semantics data-driven algorithm for drug repositioning. The authors used a Bayesian statistics approach to rank drug-disease relationships according to prior knowledge. Then, they integrated ranked relationships with other biological entity associations to construct a semantical drug discovery network. To infer drug-disease relationships, the author applied an algorithm for detecting semantic subgraphs. As a result, nitrendipine, a potent blocker of the calcium channel (CACNA1S) used to treat hypokalemic periodic paralysis, was found. Chen et al. 48 built a semantic linked network consisting of over 290,000 nodes and 720,000 edges with multisource data including drugs, targets, proteins, and disease pathways. Then, the authors applied a statistical model to predict drug-target relationships. Consequently, the proposed model identified some drug-targets pairs and drugs for repurposing. For example, barbiturate, a drug used for treating migraines, was predicted for use in curing insomnia with literature support. Zhu et al. 49 proposed an automatic reasoning approach for heterogeneous semantics networks. Biological entities (such as drugs) are converted to labels in a semantics network. Then, disease-drug relationships are obtained from automatic reasoning techniques. As a demonstration, the authors reported that tamoxifen, a drug used for treating breast cancer, can treat ovarian cancer, which was confirmed by the literature 70. Semantics-based approaches take full advantage of semantics information included in massive amounts of literature. Therefore, the precision of predicting biological entity relationships was improved. However, there is a still challenge in constructing a semantic network by integrating multisource data. It is urgent to construct semantic networks that contain fruitful medical data.

Resources

Along with the development of biological technology and the accumulation of various omics data (genomics, proteomics or metabolomics, etc.), more databases from chemical, medical, pharmacological and biological fields have been established. We summarized 80 widely used databases or resources that can be used for designing drug repositioning approaches. Pharmacological databases 8, 71-76 are crucial resources for drug repositioning. These databases collect not only drug property data but also data on interactions between drugs and other biological entities. Pharmacological data lays the foundation of various computational approaches. For instance, DrugBank 8 is a unique bioinformatics and cheminformatics resource that combines detailed drug data with comprehensive drug target information. Many computational approaches, especially network-based approaches 23-26, were designed based on this database, which achieved excellent results (see table 1). Proteomics databases 13, 14, 76-78 are another type of the data resource for drug repositioning. Most important protein-protein interaction (PPI) networks from proteomics databases are the basis of network-based drug repositioning approaches. In addition, proteomics databases are also important resources for building heterogeneous networks such as drug-protein-disease networks. One of the famous proteomics databases is MIPS 13, which includes manual high-quality PPI data from scientific literature. MIPS can provide PPI information for some network approaches 61, 77. Moreover, it is a good resource for evaluating the experimental results of some computational approaches. Chemical features of drugs provide important information for designing chemical-based approaches. Publicly available databases 9 79-84 of chemical structures contain massive amounts of useful information such as 2D topological fingerprints and 3D conformations. Chemical information on drugs is usually employed for predicting novel drug structures to find new indications for drugs having similar structures. PubChem 85 is a famous database for chemical molecules structures. The database contains a massive amount of 2D data that can be used to measure the similarity of drugs and to construct chemical networks. With further studies of drug repositioning, enormous amounts of medical and biological literature that contain novel biological entities relations have been published. Collecting massive amounts of literature is a necessary task for researchers; hence, many medical literature databases 10, 12, 86 have been built. Based on these valuable medical textual data, text mining approaches were proposed for drug repositioning (see Section Text mining-based approaches). PubMed 12 is the most widely used literature database. It comprises more than 27 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher websites. Many text mining tools and search engines have been built using these databases (Table 2). Although increasingly more databases are being established, choosing proper approaches to mine novel knowledge is still a large challenge. It is necessary to discover the potential value of the increasing databases and medical literature for designing efficient drug repositioning approaches. We collected these resources in supplementary materials Tables S2 to help researchers choose proper approaches.

Challenges and opportunities

Traditional drug development strategies are costly, failure-prone, and expensive ventures. Therefore, drug repositioning has recently drawn attention and brings drugs out faster for clinical use. However, drug repositioning is a complex process involving multiple factors such as technology, commercial models, patents, investment and market demands. Although many medical databases have been established, selecting the appropriate approach to make full use of massive amounts of medical data is still a challenge. It is urgent to develop new approaches for drug repositioning. The intellectual property (IP) issue is another highlighted issue to be solved. For repositioning drugs, IP protection is limited 87. For example, some novel drug-target-disease associations found by repositioning researchers were confirmed by publications or online databases; however, it is difficult to seek IP protection for such associations because of the law. The IP issue prevents some repositioned drugs from entering the market. Moreover, some repositioning projects are forced to be abandoned, which is a waste of time and money 88. It is necessary to develop a new commercial model because the traditional commercial model is a serial model and causes overlapping investment issues. Opportunities come with challenges. The first example of drug repositioning was an accidental discovery in the 1920s. After about a century of development, more approaches have been proposed for accelerating the process of drug repositioning. For this reason, drug repositioning has acquired great achievements. In supplementary materials Table S1, we list 75 drug repositioning examples collected from the comprehensive literature. Massive machine learning algorithms were introduced to improve the performance of drug repositioning in this scenario. In addition to computational approaches, experimental approaches that give direct evidence of links between drugs and diseases were developed, such as target screening approaches 87-91, cell assay approaches 92-95, animal model approaches 96-99 and clinical approaches 49. These approaches are reliable and credible. In recent years, increasing numbers of researchers have combined computational approaches and experimental approaches to find new indications for drugs, called mixed approaches 59, 100, 101, wherein the result of computational methods was validated by biological experiments and clinical tests. Mixed approaches offer opportunities for developing repositioned drug effectively and rapidly. Generating secondary patents provides an opportunity for researchers to find new indications for existing drugs. With the IP problem solved, many repositioning projects have been conducted smoothly with low cost, which has aroused concern from many countries. With regard to the commercial model, parallel strategies bring significant improvement in the efficiency of drug repositioning. For example, multiple tests or studies are conducted for a candidate drug, which reduces the time cost for drug repositioning. From the market perspective, a large number of diseases require new drugs to be treated, which brings potential economic benefits. Taking rare diseases as an example, there are over 6000 rare diseases that need to be studied. However, only 5% of them are being researched 79. Rare diseases are a large potential market to explore. Supplementary tables. Click here for additional data file.

83 in total

1. Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization.

Authors: Mehmet Gönen
Journal: Bioinformatics Date: 2012-06-23 Impact factor: 6.937

2. Protein localization vector propagation: a method for improving the accuracy of drug repositioning.

Authors: Yunku Yeu; Youngmi Yoon; Sanghyun Park
Journal: Mol Biosyst Date: 2015-07

3. Drug Repurposing from an Academic Perspective.

Authors: Tudor I Oprea; Julie E Bauman; Cristian G Bologa; Tione Buranda; Alexandre Chigaev; Bruce S Edwards; Jonathan W Jarvik; Hattie D Gresham; Mark K Haynes; Brian Hjelle; Robert Hromas; Laurie Hudson; Debra A Mackenzie; Carolyn Y Muller; John C Reed; Peter C Simons; Yelena Smagley; Juan Strouse; Zurab Surviladze; Todd Thompson; Oleg Ursu; Anna Waller; Angela Wandinger-Ness; Stuart S Winter; Yang Wu; Susan M Young; Richard S Larson; Cheryl Willman; Larry A Sklar
Journal: Drug Discov Today Ther Strateg Date: 2011

4. Effect of combined treatment with progesterone and tamoxifen on the growth and apoptosis of human ovarian cancer cells.

Authors: Ji-Young Lee; Jong-Yeon Shin; Hyun-Seok Kim; Jee-In Heo; Yoon-Jung Kho; Hong-Jun Kang; Seong-Hoon Park; Jae-Yong Lee
Journal: Oncol Rep Date: 2011-09-14 Impact factor: 3.906

5. Proteopedia - a scientific 'wiki' bridging the rift between three-dimensional structure and function of biomacromolecules.

Authors: Eran Hodis; Jaime Prilusky; Eric Martz; Israel Silman; John Moult; Joel L Sussman
Journal: Genome Biol Date: 2008-08-03 Impact factor: 13.583

6. DrugBank: a comprehensive resource for in silico drug discovery and exploration.

Authors: David S Wishart; Craig Knox; An Chi Guo; Savita Shrivastava; Murtaza Hassanali; Paul Stothard; Zhan Chang; Jennifer Woolsey
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

7. Inferring drug-disease associations based on known protein complexes.

Authors: Liang Yu; Jianbin Huang; Zhixin Ma; Jing Zhang; Yapeng Zou; Lin Gao
Journal: BMC Med Genomics Date: 2015-05-29 Impact factor: 3.063

8. Associating genes and protein complexes with disease via network propagation.

Authors: Oron Vanunu; Oded Magger; Eytan Ruppin; Tomer Shlomi; Roded Sharan
Journal: PLoS Comput Biol Date: 2010-01-15 Impact factor: 4.475

9. Assessing drug target association using semantic linked data.

Authors: Bin Chen; Ying Ding; David J Wild
Journal: PLoS Comput Biol Date: 2012-07-05 Impact factor: 4.475

10. Detecting overlapping protein complexes by rough-fuzzy clustering in protein-protein interaction networks.

Authors: Hao Wu; Lin Gao; Jihua Dong; Xiaofei Yang
Journal: PLoS One Date: 2014-03-18 Impact factor: 3.240

117 in total

Review 1. Insights into Computational Drug Repurposing for Neurodegenerative Disease.

Authors: Manish D Paranjpe; Alice Taubes; Marina Sirota
Journal: Trends Pharmacol Sci Date: 2019-07-17 Impact factor: 14.819

Review 2. Screening Repurposing Libraries for Identification of Drugs with Novel Antifungal Activity.

Authors: Gina Wall; Jose L Lopez-Ribot
Journal: Antimicrob Agents Chemother Date: 2020-08-20 Impact factor: 5.191

3. Drug repurposing in Raynaud's phenomenon through adverse event signature matching in the World Health Organization pharmacovigilance database.

Authors: Putkaradze Zaza; Roustit Matthieu; Cracowski Jean-Luc; Khouri Charles
Journal: Br J Clin Pharmacol Date: 2020-05-14 Impact factor: 4.335

4. Nifuroxazide induces apoptosis, inhibits cell migration and invasion in osteosarcoma.

Authors: Yi Luo; Anqi Zeng; Aiping Fang; Linjiang Song; Chen Fan; Chenjuan Zeng; Tinghong Ye; Hao Chen; Chongqi Tu; Yongmei Xie
Journal: Invest New Drugs Date: 2019-01-25 Impact factor: 3.850

Review 5. Harnessing endophenotypes and network medicine for Alzheimer's drug repurposing.

Authors: Jiansong Fang; Andrew A Pieper; Ruth Nussinov; Garam Lee; Lynn Bekris; James B Leverenz; Jeffrey Cummings; Feixiong Cheng
Journal: Med Res Rev Date: 2020-07-13 Impact factor: 12.944

Review 6. Applications of artificial intelligence to drug design and discovery in the big data era: a comprehensive review.

Authors: Neetu Tripathi; Manoj Kumar Goshisht; Sanat Kumar Sahu; Charu Arora
Journal: Mol Divers Date: 2021-06-10 Impact factor: 2.943

7. A streamlined workflow for conversion, peer review, and publication of genomics metadata as omics data papers.

Authors: Mariya Dimitrova; Raïssa Meyer; Pier Luigi Buttigieg; Teodor Georgiev; Georgi Zhelezov; Seyhan Demirov; Vincent Smith; Lyubomir Penev
Journal: Gigascience Date: 2021-05-13 Impact factor: 6.524

8. Computational basis of SARS-CoV 2 main protease inhibition: an insight from molecular dynamics simulation based findings.

Authors: Pramod Avti; Arushi Chauhan; Nishant Shekhar; Manisha Prajapat; Phulen Sarma; Hardeep Kaur; Anusuya Bhattacharyya; Subodh Kumar; Ajay Prakash; Saurabh Sharma; Bikash Medhi
Journal: J Biomol Struct Dyn Date: 2021-05-13

Review 9. Peroxisome proliferator-activated receptor gamma: a novel therapeutic target for cognitive impairment and mood disorders that functions via the regulation of adult neurogenesis.

Authors: Juhee Lim; Hyo In Kim; Yeojin Bang; Hyun Jin Choi
Journal: Arch Pharm Res Date: 2021-06-17 Impact factor: 4.946

10. Leveraging Artificial Intelligence (AI) Capabilities for COVID-19 Containment.

Authors: Chellammal Surianarayanan; Pethuru Raj Chelliah
Journal: New Gener Comput Date: 2021-06-10 Impact factor: 1.048