Literature DB >> 31774113

An omics perspective on drug target discovery platforms.

Abstract

The drug discovery process starts with identification of a disease-modifying target. This critical step traditionally begins with manual investigation of scientific literature and biomedical databases to gather evidence linking molecular target to disease, and to evaluate the efficacy, safety and commercial potential of the target. The high-throughput and affordability of current omics technologies, allowing quantitative measurements of many putative targets (e.g. DNA, RNA, protein, metabolite), has exponentially increased the volume of scientific data available for this arduous task. Therefore, computational platforms identifying and ranking disease-relevant targets from existing biomedical data sources, including omics databases, are needed. To date, more than 30 drug target discovery (DTD) platforms exist. They provide information-rich databases and graphical user interfaces to help scientists identify putative targets and pre-evaluate their therapeutic efficacy and potential side effects. Here we survey and compare a set of popular DTD platforms that utilize multiple data sources and omics-driven knowledge bases (either directly or indirectly) for identifying drug targets. We also provide a description of omics technologies and related data repositories which are important for DTD tasks.

Entities: Chemical Disease Gene Species

Keywords: drug efficacy and safety evaluations; drug target discovery; omics-informed drug discovery

Year: 2020 PMID： 31774113 PMCID： PMC7711264 DOI： 10.1093/bib/bbz122

Source DB: PubMed Journal: Brief Bioinform ISSN： 1467-5463 Impact factor: 11.622

Background

Target-based drug discovery is the most common strategy for the development of new drugs. However, when it comes to clinical trials, most of these new drugs fail due to inadequate efficacy or safety concerns [1], suggesting that a ‘wrong’ target has been selected during the target discovery process. Traditionally, molecular targets for drug discovery are selected on the basis of the accumulation of a series of experimental evidence supporting the hypothesis that modulating the function of the molecule will have an effect on disease [2]. This process strongly relies on the use of databases and bioinformatics tools enabling the collection and integration of multiple sources of evidence linking molecular drug targets to diseases [3]. Being an interdisciplinary branch of the life sciences, bioinformatics aims to provide methodology and computational methods needed to organize, explore and analyze large volumes of biological data, including genomic, proteomic and other ‘omics’ data types. These computational tools have become essential to research progress in drug target discovery (DTD), thanks to their ability to aid in elucidation and understanding of the mechanisms of (complex) diseases. Current bioinformatics strategies for DTD use a wide range of data sources obtained from experimental, mechanistic, pharmacological and, more recently, omics-based molecular profiles. Omics technologies have brought unprecedented abilities to screen biological samples at the gene, transcript, protein, metabolite and their interaction network level in searching of novel targets [4]. In particular, genome-wide association studies (GWASs), whole genome sequencing and transcriptome analysis constitute essential tools to discover or validate new drug targets, since they can provide a systematic approach to evaluate their therapeutic efficacy and related side effects. Even though omics studies can provide information on the efficacy of drug targets, they can be exploited to better understand their mechanisms of action and, most importantly, to detect in advance drug-induced side effects. Omics technologies generate large amount of data from single experiments, and this information is often useless in their original formats. Therefore, advanced data mining algorithms are required to identify, evaluate and rank putative drug targets from omics data. Besides, it is very difficult to systematically fuse these big volumes of data with existing scientific literature and biomedical databases. To speed up the process of collection, processing and analyzing data sources for drug discovery, many software platforms and database systems have been developed. These tools help scientists working on early drug discovery to automatically identify and extract relevant drug target–disease associations without requiring the application of sophisticated algorithms. In this review we describe and evaluate the most popular DTD platforms that directly or indirectly (through the use of external data sources) employ omics data for the identification of disease-relevant drug targets. We highlight their main functionalities and abilities in providing concise information on the therapeutic efficacy, druggability and safety of selected, putative targets. In this review, drug targets and their desired properties are discussed. Then, the application of omics information for target discovery is briefly introduced along with a summary description of databases and web platforms that can be utilized to prioritize drug targets. Finally, the selected DTD platforms are compared on the basis of drug target–disease associations and the use of omics data.

Drug targets

A drug target can be defined as a molecule in the body, usually a protein, that is intrinsically associated with a particular disease process and that could be addressed by a drug to produce a desired therapeutic [5, 6]. Drug targets should exhibit the following several, basic features: involvement in a crucial biological pathway; functionally and structurally characterized; and druggable (capable of binding to small molecules, implying the presence of a binding site). Traditionally, structure-based analysis has been used to search for good drug targets, which leads to the concept of ‘druggability’. A drug target indeed is often described as proteins that possess protein folds that favor interactions with drug-like chemical compounds [7-9]. Many proteins are druggable according to their structure, but their binding will not lead to the therapeutic benefit. Over the past two decades, there have been several efforts to curate drug targets and to categorize them. Most frequent proteins for DTD include proteases, kinases, G protein-coupled receptors (GPCRs) and nuclear hormone receptors [10]. Druggability is not the only desired property for the definition of ‘good’ drug targets. Indeed, often scientists are first interested in selected candidates based on their participation in a biological process critical to diseases [10, 11]. Imming et al. [12] categorized drug targets based on ‘mechanism of actions’, such as enzymes, substrates, metabolites, proteins, receptors, ion channels, transport proteins, DNA, RNA, ribosomes and targets of monoclonal antibodies. Although proteins have been in the past the majority of clinically useful drugs, there are many emerging classes of drug targets, such as nucleic acid, regulatory DNA element and non-coding RNA (ncRNAs). Their importance is rapidly growing in the fields of drug development and precision medicine. Indeed, drugs which target nucleic acids, especially in the areas of antibacterial and anticancer therapy, have been already provided [13]. RNA is now being recognized as an essential component in various regulatory processes just like proteins. Indeed, RNA plays important roles in the transcription regulation, regulation of the translation, catalysis, protein function, protein transport, peptide bond formation and RNA splicing [14]. Compared to DNA, RNA could deliver better therapeutics since RNA displays a greater structural diversity and lacks repair mechanisms. Like proteins, RNA has three-dimensional folding that gives rise to complex structures allowing the highly specific binding of effector molecules. RNA targets have been successfully employed in the antibacterial and antiviral areas [15, 16]. Moreover, with new emerging classes of RNAs and their characterization in regulatory mechanisms of mammals, their application has rapidly expanded. Among these new classes of drug targets, ncRNAs are gaining increasingly attention. ncRNA refers to a large group of endogenous RNA molecules that have no protein-coding capacity, while having specialized biological functions. While ncRNAs lack potential to encode proteins, they can affect the expression of other genes through a variety of mechanisms. In some cases, their mechanisms of action are well known, and their strategies for controlling activity are well established. The ability of ncRNAs to control gene expression makes them targets for drug development. However, the uncertainty about how ncRNAs function (and even whether they have a function) makes the drug development process even more challenging [17]. Targeting RNAs offers opportunities to therapeutically modulate numerous cellular processes, including those linked to ‘undruggable’ protein targets [18]. Examples include proteins with multiple key functions, which are difficult to block by using a single molecule, or proteins that are so closely related to others that it is difficult to achieve adequate selectivity.[19, 20].

Key properties of a drug target

A drug target is a key molecule interfering with biological pathways that are specific to a disease or a disease state. The difference between a drug target and other biomolecule involved in the same pathway is only in their location and role. A putative drug target should be disease-dependent, which implicates that its relevance for other disease should be minimal. However, human diseases are often complex involving many interrelated pathways, which can lead to the identification of different molecular targets. Indeed, studying complex diseases with a Mendelian perspective, or the ‘one gene–one disease’ theory and treated with a ‘magic bullet’ therapy, has been demonstrated to be ineffective [21]. It is frequently observed that the complex interplay existing between multiple molecular features leads to disease state, rather than the behavior of a single molecule. Therefore, it is often important to characterize the association between multiple drug targets and disease [22]. Drug targets can be categorized into two classes: known drug targets and novel drug targets. The former are those for which there is robust scientific evidence, supported by publications and experimental data showing how the target functions in normal physiology and how it is involved in human pathology. Furthermore, there are drugs targeting this target. Whereas, novel drug targets are those biomolecules whose functions are not fully understood and with no established drugs targeting them. These targets merit more attention since they might lead to completely new therapies. Very often the goal of a preclinical drug discovery program is to deliver a ranked list of drug target biomolecules (such as DNA, RNA, proteins and peptides), including both known and novel targets. Candidate drug targets should be then characterized by a well-balanced profile between efficacy and safety (‘drug adverse reactions’). The efficacy should aim to evaluate how good a biomolecule is as a drug target. This evaluation should take into account the target ‘druggability’ [23], and it should be supported by sufficient evidence showing how efficient a target is for a given disease (‘target disease efficacy’). Well-balanced efficacy–safety profile will take into account the intended use of the drug (e.g. severity of the indication and treatment phase) and efficacy and safety of existing treatments. A potentially good drug target will have mild overdose effects, giving drug discovery a comparatively broad efficacy region. In practice, the acceptable efficacy–safety balance for a first-in-market drug for rapidly progressing high mortality disease will be different than for a drug targeting benign disease with existing treatments. Omics studies can be used to evaluate the following: the modulation, a target is disease modifying and/or has a proven function in the pathophysiology of a disease; the tissue specificity, a target expression is not uniformly distributed throughout the body; and the druggability, a target can be modulated by a drug [24]. However, safety-related drug attrition represents a major leap in the development of therapeutic targets. Indeed, safety issues are often the main cause of drug development failures. Key safety liabilities induced by target modulation could be evaluated by using omics studies. In particular, safety assessment could apply genomics, transcriptomics and proteomics, to pre-evaluate on-target, off-target, toxicity pathways [25] and harmful or unpleasant clinical events triggered by drug interaction with the target. Moreover, extensive studies of drug target safety by the means of omics could lead to the accumulation of enough data to develop robust in silico methods to predict the safety of putative drug targets or, even better, to help researchers identify safe-by-design drug targets. Omics-driven efficacy and safety evaluations of drug targets can be combined with information extracted from scientific literature in order to increase the success of identifying the right candidates. To this end, different web-based text mining tools, such as TIN-X [26], PubTator [27] and Chemotext [28], can be applied. These tools can improve the accuracy of drug target–disease associations extracted from scientific publications [29]. Moreover, network analysis aiming to integrate gene, drug and phenotype information can also be utilized to better estimate drug target efficacy [30]. Finally, in addition to efficacy and safety evaluations, drug targets can be characterized by describing their novelty [31]. Table 1 describes key drug target properties.

Table 1

Key drug target properties

Property	Description	Key aspects
Efficacy	In order for a drug to have an effect, it needs to bind to its target, and then to affect the function of this target. A target can refer to a gene, a protein or other biomolecules, and it is responsible for the therapeutic efficacy of the drug [32]. Therefore, the efficacy of a target should evaluate its potential in delivering effective therapeutic treatments.	Target druggabilityTarget disease validationTissue-specific efficacy evaluations
Safety	Safety evaluation aims to identify potential adverse consequences of target modulation, unavoidable on-target toxicities and potential clinical adverse to support the steps of drug target identification and prioritization [33].	Drug toxicity in patientsOFF/ON drug targetsUnsafe biomolecules (essential genes, carcinogenic, etc.)
Novelty	It estimates the scarcity of publications and patents about a protein target [26].	Text mining of scientific and patent literature

Key drug target properties In summary, good drug target needs to be efficacious, druggable, safe and meets clinical and commercial needs. However, it should be noted that there are challenges in making a clean unambiguous assignment in many cases, especially regarding how to define the concept of drug target efficacy.

Multiple target strategies

Traditionally drugs have been designed by following the paradigm ‘one drug, one target’, which aims to find a single molecular target, usually a protein (the so-called ‘on-target’), with high selectivity to avoid any unwanted effects arising from mis-targeting other biological targets (‘off-targets’). While target-first strategies might prove useful to approach single gene disorders, disease is often a multifactorial condition involving a combination of constitutive and/or environmental factors. In this case scenario, single-target drugs might be inadequate to achieve a therapeutic effect [34, 35]. It is now widely accepted that complex diseases are more likely to be healed or alleviated through simultaneous modulation of multiple targets. Indeed, research on multi-target drugs has rapidly increased since 2000, and it is nowadays one of the hottest topics in drug discovery. However, there are several challenges to be addressed when designing multi-target drugs, both in terms of target selection and small molecule discovery [35, 36]. For instance, algorithms that determine multi-drug dosages are important to ensure effective treatments [36]. However, the extremely high number of possible multi-drug combinations combined with heterogeneity and resistance-related issues makes the dosage adjustment optimization an extremely challenging task [37]. Even though several online resources exist for multiple target selection, the Therapeutic Target Database [38] and DrugBank [39], there are no well-established, data-driven computational methods to identify the right combination of molecular targets for a given disease in both multi-target drugs and therapeutic combinations. The fact is that addressing multiple DTD tasks require a deeper understanding of disease mechanisms, target disease associations, pathway-target-drug-disease relationships and adverse events [40]. Besides, when selecting multiple targets additive or synergistic effects should be carefully considered [41].

Omics applications for DTD

Recent technical advances in sequencing, microarray and mass spectometry (MS) technologies allow scientists to generate genomics, transcriptomics, proteomics, and other -omic data types at an unprecedented level of resolution. Many studies have used these technologies to better understand the molecular mechanisms underlying complex diseases and to provide information on drug treatments. The resulting rich information data can be utilized to identify drug targets, to uncover the mechanism of action of drugs and to assess (or infer) their side effects. Omics-based studies can also provide essential information to deliver personalized medicine. For example, it has been shown that genetic variations can help clinicians assess efficacy or toxicity of some targeted agents for specific subsets of molecularly profiled patients [42]. If systematically integrated, omics-driven molecular profiles of diseases and drug treatments/exposures could significantly accelerate drug discovery and development process. Table 2 lists and briefly describes omics technologies that can be utilized in drug discovery and development and related omics data repositories.

Table 2

Omics data types and their use for informed pharmaceutical research and development

Omics	Function	Databases
Genomics	Understanding pathogenesis	GWAS Catalog
	Genetic association studies	GWAS central
	Identification of disease genes	dbGaP
	Discovery of putative drug targets	PharmGKB
	Patient-centered efficacy and toxicity assessment of drugs/targets
	Patient stratification
Transcriptomics	Disease mechanisms	DrugMatrix
	Mode of action of compounds	TG-GATE
	Moving from disease genes to drug targets	LINCS 1000
	Identification/evaluation of drug target candidates	Expression Atlas
	Early prediction of adverse drug target effects	GEO repository
		ArrayExpress
Proteomics	Post-translational process	PRIDE Archive
	Protein–protein network interaction	Peptide Atlas
	Drug target efficacy and safety evaluation at protein level	ProteomicsDB
		Human Proteome Map
	Protein toxicology	Human Proteome Atlas
Metabolomics	Novel DTD	Human Metabolome
	Drug target efficacy and safety evaluation at metabolomic level	Madison Metabolomics
	Metabolic toxicity	Golm Metabolome Database
		MassBank
		MetaboLights
		MetabolomeExpress

Omics data types and their use for informed pharmaceutical research and development

Genomics

Genomics have provided the earliest applications for DTD. In particular, the application of DNA microarray and next-generation sequencing (NGS) technologies has enabled high-throughput analysis of genotype–phenotype relationships on human populations, opening a new era of genetics-informed drug discovery. DNA microarrays have been extensively used to conduct GWASs, which have helped scientists identify loci that harbor genetic variants (typically single nucleotide polymorphisms; SNPs) that are associated with risk for diseases and traits [43]. Studying the target at GWAS risk loci, such as genes or ncRNAs that mediate the associations observed in GWAS [44], can lead to a better understanding of the molecular mechanisms that influence disease risk and, most importantly, to new potential targets for drug development. However, the associated genes remain largely unknown for most GWAS loci. Even though the first GWASs aimed to identify as few as tens of genes contributing to genetic traits, today GWASs have helped to identify thousands of genes contributing to complex genetic traits. To date, almost 10 000 strong associations have been reported between genetic variants and one or more complex traits. Among these findings, there are several examples of disease-associated genes that have been identified as being effective drug targets [45]. GWAS data have also been employed for the development of in silico methods for disease genes and drug target identification [46, 47]. GWAS can also enable the discovery of biological pathways that confer susceptibility to diseases. However, these studies alone cannot elucidate how the variants affect downstream pathways and lead to a disease. Indeed, one of the most relevant post-GWAS challenges is to combine GWAS findings with additional omics data layers in order to shed the light on the biological systems underlying complex diseases and provide more effective drug targets. Indeed, GWAS-driven analyses have been often merged with transcriptomics data, which can help identify disease-related genes by comparing gene expression profiles between disease and control groups [48]. GWASs have also rapidly changed the landscape of pharmacogenomic research. Pharmacogenomics is the study of the impact of genetic variations of individuals on their drug response or drug metabolism. It helps understand how patients with specific genomic characteristics respond to certain treatments and drugs [49]. In more practical terms, it can affect the drug development process in two primary ways: indicating how well the drug works (efficacy) and providing drug-related toxicity information [50]. Table 3 reports GWAS data repositories and pharmacogenomics databases that are currently used for discovering putative drug targets, drug repositioning, drug efficacy and safety assessments.

Table 3

Genomics databases for DTD

Database	Description	Application
GWAS Catalog [51]	Collects published human GWASs that are manually curated by expert scientists. GWAS Catalog provides accurate and structured metadata for publication, study design, sample and trait information and the most significant published results.https://www.ebi.ac.uk/gwas/	Mining disease genes
		Narrow-down/prioritize candidate loci
		Disease risk prediction
		Disease mechanisms
GWAS Central [52]	A database of summary level findings from genetic association studies, both large and small. GWAS central collects datasets from public domain projects and encourages direct data submissions from the community.http://www.gwascentral.org/index	Mining SNP-drug response associations
NCBI dbGaP [53]	The NCBI Database of Genotypes and Phenotypes archives results of studies that have investigated the interaction of genotype and phenotype and distributes these results to investigators for secondary study. It includes phenotype data, GWAS data, summary level analysis data, Short Read Archive (SRA) data, reference alignment (BAM) data, Variant Call Format (VCF) data, etc.http://www.ncbi.nlm.nih.gov/gap	Genotype studies for the identification of disease genes
PharmGKB [50]	A publicly available online knowledgebase aggregating, curating, integrating and disseminating knowledge regarding the impact of human genetic variation on drug response.http://www.pharmgkb.org	Mining drug–gene, drug–SNP, gene–disease, disease–SNP, drug–pathway, disease–pathway, drug–drug

Genomics databases for DTD

Transcriptomics

Genome-wide transcriptional profiling provides a global view of cellular state and how this state changes under different treatments (e.g. drugs) or conditions (e.g. healthy and diseased). In particular, drug-induced gene expression profiles in human cell lines and in vivo models can be used to elucidate the biological effects of putative targets and evaluate in advance therapeutic efficacy [25, 54]. Transcriptomics signals are important for clinical candidate selection as they provide an evaluation of potential adverse effects of drug targets at an early stage in drug development [25]. Indeed, in recent years, transcriptomics data have been intensively used in the toxicogenomics field. This has led to the development of large-scale public databases, such as DrugMatrix [55] and Open TG-GATE [56], which collect compound-induced gene expression data with in vivo histopathological data, and Connectivity Map [57] and the Library of Integrated Network-based Cellular Signatures L1000 dataset [58], which collect transcriptional drug perturbations for thousands of compounds tested on more than 70 cell lines. Other gene expression data following drug treatments can be retrieved from the Gene Expression Omnibus (GEO) repository [59] and the ArrayExpress Archive of Functional Genomics Data [60] which are continuously updated. The data provided by these databases can be used in combination with genomic and network features to prioritize drugs with less likelihood of causing side effects [61]. Similar computational strategies are currently proposed to support alternative methods for chemical risk assessment [25]. Moreover, gene expression and transcriptome profiling can help researchers improve the design of clinical trials in phase I and phase II studies [42]. Genome expression profiling can also be combined with the genotype of trait-associated variants using in vivo data, thus identifying target genes and the directionality of the effect of trait-variants. Expression quantitative trait loci (eQTL) analyses are useful in this regard as they can provide genome-wide lists of genetic variants that associate with gene expression in a particular tissue [63]. eQTL analysis can be used to identify causal and also to discover genetic networks that might play significant roles in drug resistance responses [64]. Currently, many eQTL databases exists, and they can be queried to determine if a trait-associated variant (or variants in linkage disequilibrium) associates with the expression of a specific gene. One of the most used databases is the Genotype-Tissue Expression [65]. Over the past decade, the transcriptomics field has developed rapidly, thanks to the advent of NGS technologies. In particular, whole-transcriptome analysis with total RNA sequencing (RNA-Seq) has become an indispensable tool for gene expression profiling. Total RNA-Seq profiles provide an exceptional opportunity to study ncRNAs, including microRNAs and long non-coding RNAs [66]. Similar to the protein-coding genes, ncRNAs can play critical roles in tumor progression [67] and cancer therapy [68]. Currently available large-scale cancer genome and pharmacogenomics projects, such as The Cancer Genome Atlas (TCGA), Cancer Cell Lines Encyclopedia (CCLE) [69] and Genomics of Drug Sensitivity in Cancer (GDSC) [70, 71], can be used to systematically determine the regulatory roles of ncRNA in cancer drug response by combining RNA-seq data in conjunction with clinical and drug response data from thousands of tumor samples and cancer cell lines [66]. These databases provide information on different omics data types, including mutation and copy number variation. Table 4 introduces the main data sources of transcriptomic data for DTD applications.

Table 4

Transcriptomic databases for DTD

Database	Description	Application
DrugMatrix [55]	DM is provided by the U.S. National Toxicology Program and it gives access to large-scale gene expression data derived from standardized toxicological experiments in which rats or primary rat hepatocytes were systematically treated with therapeutic, industrial and environmental chemicals at both non-toxic and toxic doses.https://outage.niehs.nih.gov/drugmatrix/index.html	DTDUnderstanding drug/target toxicity
TG-GATEs [56]	TG-GATEs provides gene expression profiles and traditional toxicological data derived from in vivo (rat) and in vitro (primary rat hepatocytes, primary human hepatocytes) exposure to 170 compounds at multiple dosages and time points.http://toxico.nibio.go.jp/english/index.html	DTDUnderstanding drug/target toxicity
LINCS 1000 [58]	L1000 generates gene expression signatures from treatment of a variety of cell types with perturbagens that span a range of small-molecule compounds, gene overexpression and gene knockdown reagents. The gene expression profiles are generated from a method, namely L1000, which defines a reduced representation of the transcriptome.http://www.lincsproject.org	DTDDrug repositioning
Expression Atlas [72]	EA collects baseline gene expression data in different species and contexts, such as tissue, developmental stage or cell type. It also contains differential studies, reporting changes in expression between two different conditions, such as healthy and diseased tissue.https://www.ebi.ac.uk/gxa/baseline/experiments	DTD and validationDisease genes
GEO repository [59]	GEO is a database repository of high-throughput gene expression data and hybridization arrays, chips, microarrays.https://www.ncbi.nlm.nih.gov/geo/	Retrieve drug, gene and disease perturbations
ArrayExpress [60]	AE serves as an international repository for microarray data and high-throughput sequencing-based functional genomics experiments associated with scientific publications.http://www.ebi.ac.uk/arrayexpress	Retrieve drug, gene and disease perturbations
TCGA	TCGA collects and functional genomics data repository for >30 cancers across >10 K samples. Primary data types include mutation, copy number, mRNA and protein expression.https://tcga-data.nci.nih.gov/tcga	Discover novel molecular targets
GTEx [65]	GTEx provides transcriptomic profiles of normal tissues, including >7 K samples across >45 tissue types.http://www.gtexportal.org	Tissue-specific drug targets
CCLE [69]	CCLE provides genetic and pharmacologic characterization of >1000 cancer cell lines.http://www.broadinstitute.org/ccle	Identify novel drug targets and drug response biomarkers
GDSC [70]	GDSC is the largest public resource for information on drug sensitivity in cancer cells and molecular markers of drug response.https://www.cancerrxgene.org/	Identify novel drug targets and drug response biomarkers

Transcriptomic databases for DTD Proteomic databases for DTD

Proteomics

Proteomics refers to the analysis of the entire protein content of a cell, tissue or organism under a specific condition. Several techniques have been developed to study the proteome of an organism, and among them MS has become the tool of choice. The three primary applications of MS to proteomics are cataloging protein expression, defining protein interactions and identifying sites of protein modification. These sources of information have been extensively used as a DTD tool [73]. However, while transcriptome data cover the whole range of expressed genes, a typical untargeted MS proteomics experiment can usually detect and quantify up to 5000 proteins, which is less than half of the expressed human proteome. In the past few years new proteomics technologies have been proposed with the ability to identify >8000 proteins in a 5 h analysis [74]. Overall quantitative proteomic methodologies are becoming more robust and reliable with technological developments and can produce robust, reproducible and standardized data sets [75]. The differential and quantitative profiling of the dynamic protein changes in health and disease will inevitably further our understanding of the mechanistic basis of disease. Proteomics experiments can be used for different aspects of clinical and health sciences such as biomarker discovery and drug target identification. A biomarker usually refers to disease-related molecule that can be used to diagnose or monitor risk or prognosis of disease, and can also indicate opportunities for therapeutic interventions. For example, proteomic strategies have been extensively used for discovering novel cancer biomarkers [76, 77]. Proteomics can also be used to address several steps of the drug development process, including identification and validation of drug targets, informing assay development for screening of leads and in generating in vitro and in vivo biomarkers as surrogate endpoints for efficacy, toxicology and disease stratification. Most of the MS-based proteomics research studies in drug discovery have been performed to characterize protein expression profiling, functional proteomics and phosphoproteomics. These experiments aim to measure the protein expression levels, protein–protein complexes and signal transduction relative to a control treatment. However, in drug discovery it is very important to discover protein targets from phenotypic assays and to understand on- and off-target engagement of potential therapeutic compounds. This task can be addressed by using chemoproteomics [78, 79]. Chemoproteomics refers to a new technology that facilitates large-scale study of proteins by combining chemical methods with MS proteomics. In particular, it provides direct binding of small molecules with protein targets, helping one to quantify the amount of drug required to bind a target and subsequently produce a therapeutic effect) and drug selectivity determination (through the assessment of off-target interactions) [78]. Proteomics is also used in drug target identification by applying protein–protein interaction networks (PPINs). PPINs are typically modeled via graphs, whose nodes represent proteins and whose edges connect pairs of interacting proteins. These connections are specific, occur between defined binding regions in the proteins and have a particular biological meaning (i.e. serve a specific function). The totality of PPIs that happen in a cell and in a given biological context is called interactome. As a result of the development of large-scale PPI screening techniques, especially high-throughput affinity purification combined with MS and the yeast two-hybrid assay, today we can access big amounts of PPI data and build very complex interactomes [80]. All this information can serve new ways for DTD. In a recent study, breast, pancreatic and ovarian cancer PPINs were employed to identify the respective sets of driver proteins [81]. In this study, the PPI was implemented as linear time-invariant dynamical systems (LTISs). Then, an efficient (low polynomial time) algorithm was provided for computing the minimal number of input nodes needed to structurally control the given LTIS representing a given cancer PPI. The identified driver proteins, called cancer survivability-essential proteins, were proved to be a key for in vivo cancerous cell’s proliferation and survival. PPI can also be exploited to characterize topological properties of efficient drug targets and to use this information for target prediction [82]. The applications of MS in identification and quantification of proteins encoding disease genes are rapidly evolving [83]. This has led to an exponential growth of targeted quantitative proteomic analyses that aim to systematically measure the abundance of proteins in large sets of samples, generating big volume of multidimensional data. In order to disseminate these large data sets to the scientific community, researchers have recently developed central repositories to store and share MS proteomics data such as PRIDE [84] and ProteomicsDB [85]. The PRoteomics IDEntifications, an archive database for MS proteomics data, provides protein and peptide identifications, post-translational modifications and supporting mass spectra evidence. Whereas, ProteomicsDB contains quantitative data from 78 projects, for a total of 19 k LC–MS/MS experiments. A standardized analysis pipeline enables comparisons between multiple datasets to facilitate the exploration of protein expression across hundreds of tissues, body fluids and cell lines. Table 5 reports the main proteomics databases for DTD. In addition, protein databases such as UniProt [86] and Protein Data Bank (PDB) [87] can be used to further examine individual proteins on functional and structural level.

Table 5

Proteomic databases for DTD

Database	Description	Application
PRIDE Archive [84]	The PRIDE is a public data repository for proteomics, including protein and peptide identifications, post-translational modifications and supporting spectral evidence. https://www.ebi.ac.uk/pride/archive/	Drug target identification
ProteomicsDB [85]	PDB is a large collection of quantitative MS-based proteomics data across various tissue types as well as protein–protein interaction information, functional annotation, target deconvolution, cell sensitivity and reference MS data. https://www.proteomicsdb.org/	Drug target identificationDrug target efficacy/potency
Human Proteome Map [88]	Hosts high-resolution MS proteomic data representing 17 adult tissues, 6 primary hematopoietic cells and 7 fetal tissues resulting in >84% human proteome coverage. http://www.humanproteomemap.org/	Drug target identificationBiomarkers
Human proteome atlas [89]	Collects expression and localization of majority of human protein-coding genes based on both RNA and protein data. The HPA also employs antibody-based proteomics and transcriptomics profiling methods to locate and identify proteins in tissues and cell types. http://www.proteinatlas.org/	Druggable proteomeDrug target efficacy and specificity

Metabolomics

Metabolomics is the study of the metabolome, i.e. all the metabolites present in a cell, tissue or organism at a given time. It provides an overview of the metabolic status and global biochemical events associated with a cellular or biological system. Unlike genes and proteins, whose functions are influenced by intriguing regulatory mechanisms such as epigenetic regulation and protein post-translational modifications, metabolites provide direct signatures of biochemical activity and are therefore easier to correlate with phenotype [90]. Metabolomic-based clinical applications and tests are now emerging [91] and can help to understand disease mechanisms from a new perspective. Metabolomics have contributed to identifying metabolic causes and biomarkers for chronic diseases such as diabetes, Alzheimer disease, atherosclerosis and cancer. Metabolites play an important role in tumor cell proliferation. In particular, by analyzing transcriptional-metabolomic data from experiments of knockdown genes responsible for the enzyme supporting cell growth in glucose-free media, Vincent et al. [92] identified metabolic pathways that support glucose-independent tumor cell proliferation. Metabolomics can be used to implement effective precision medicine approaches such as personalized phenotyping and individualized drug-response monitoring. For instance, the analysis of pre-dose metabolite biofluid profiles allows clinicians to predict the effectiveness of a selected drug treatment for a given individual [93]. Toxicity assessment of drug targets can be addressed by using metabolomics data. In particular, the information from metabolomic analysis can be used to determine the off-targets of a drug candidate and thus provide a mechanistic understanding of drug toxicity [94]. Moreover, metabolic profile analysis can also allow clinicians to quantify drug efficacy and safety and use this information to tailor personalized treatments. Metabolomics also has the potential for generating a new generation of biomarkers. For instance, a panel of metabolic biomarkers to monitor responses to therapeutic interventions was developed recently [95]. Metabolomics is reducing the cost of toxicological screening, enabling improved clinical trial design, allowing better patient selection and monitoring and shortening the time needed for drugs to move through the development pipeline. Considerable advances have been made in the assessment of mechanisms of action of toxicity of drugs and other substances. Comprehensive metabolomics databases include The Human Metabolome Database (HMDB) and MetaboLights [96]. Table 6 lists and briefly describes metabolic data sources that can be used for drug target efficacy and safety assessments.

Table 6

Metabolomic databases for DTD

Database	Description	Applications
The Human Metabolome Database [84]	HMDB is a freely available electronic database containing detailed information about small molecule metabolites found (and experimentally verified) in the human body. It contains experimental MS/MS data for over 5700 compounds. http://hmdb.ca	DTD
The Madison Metabolomics Consortium Database [97]	MMCD collects small molecules of biological interest gathered from electronic databases and the scientific literature. It contains approximately 10 000 metabolite entries and experimental spectral data on about 500 compounds. http://mmcd.nmrfam.wisc.edu	DTD
Golm Metabolome Database [98]	GMD represents a general MS-based repository of reference metabolite profiles for essential plant tissues and typical variations of growth conditions. http://gmd.mpimp-golm.mpg.de/	DTD
MassBank [99]	The first public repository of Electron Impact-MS data covering more than 200 000 spectra for a wide range of organic compounds. https://github.com/MassBank/	DTD
MetaboLights [96]	ML is an open-access database repository for cross-platform and cross-species metabolomics research at the European Bioinformatics Institute. It provides Metabolomics Standard Initiative-compliant metadata and raw experimental data associated with metabolomics experiments.https://www.ebi.ac.uk/metabolights/	Drug safetyDrug efficacy
MetabolomeExpress [100]	MB is designated to perform three main functions: (i) store GC-MS metabolomics data, allowing for analysis without the user having to download the data, (ii) provide a GC-MS analysis pipeline and (iii) store metabolite response statistics.https://www.metabolome-express.org/	DTDDrug safety

Metabolomic databases for DTD

Platforms for multi-omics data discovery

As discussed in the previous section, numerous repositories are available to share and disseminate omics datasets for DTD. However, these repositories do not provide tools to systematically link omics studies having similar experimental set-ups. The need to facilitate the retrieval and the integration of publicly available big omics data has recently led to web-based platforms for indexing, discovering and integrating datasets from different omics technologies and databases into common framework and web interface. Examples include Biomedical and Healthcare Data Discovery Index Ecosystem (bioCADDIE), funded by the National Institute of Health (NIH) Big Data to Knowledge (BD2K) initiative that aims to provide a platform to retrieve relevant metadata of entire datasets [101]. Another example of an omics-based search engine is the Omics Discovery Index (OmicsDI; http://www.omicsdi.org), which helps scientists integrate proteomics, genomics, metabolomics and transcriptomics datasets [102]. OmicsDI has developed a common metadata structure framework and exchange format across 11 repositories, including proteomics databases (PRIDE, MassIVE and GPMDB), metabolomics databases (MetaboLights, GNPS, MetabolomeExpress and Metabolomics Workbench) and transcriptomics databases (ArrayExpress and Expression Atlas). Pathway-based databases useful for DTD

Multi-omics approaches for DTD

Each type of omics data provides important information highlighting differences between normal and disease conditions. These data can be utilized to discover diagnostic and prognostic markers and to give insight as to which biological processes are different between the disease and control samples. However, single-level omics data analysis is limited, reflecting reactive processes rather than causative ones [103]. Indeed, integrating multiple omics data types could reveal important molecular mechanisms that regulate for complex diseases and help scientists understand the dynamics that lead to disease manifestations. Moreover, a more detailed understanding of disease mechanisms would be beneficial when searching for novel, more effective biomarkers. Current DTD platforms do not provide tools for multi-omics data analysis mainly because of the absence of standardized analytical protocols. However, recent studies have proposed interesting approaches applied to the cancer drug discovery [104]. It is acknowledged that multi-omics cancer data have the potential in improvement of targeted therapy and the effectiveness of traditional therapies, in clarification of molecular mechanisms of cancer therapeutic resistance and in the discovery of novel biomarkers and targeted drugs. For example, genomic and transcriptomic data and long-term clinical outcomes were recently analyzed to detect changes of gene expression based on somatic gene copy number aberrations. This analysis revealed important targeted therapeutic response-related events [105]. An integrative analysis of genomic and proteomic data demonstrated that aberrations of the PI3K pathway are particularly common in hormone receptor-positive breast cancer, which might be important in clinical selection of targeted therapies [106]. Another multi-omics data analysis, conducted in sorafenib-treated failure HCCs aimed, combined quantitative proteomics and phosphoproteomics data to better understand the molecules targeted by this drug. This study revealed that sorafenib can indeed effectively inhibit its target kinase in Raf-Erk-Rsk pathway, but the downstream targets of Rsk-2 (eIF4B, filamin-A and so on) were not affected, which suggests that another alternative pathways might have been active and contribute to the treatment failure [107].

Pathway databases for DTD

Recent emphasis on multi-omics data analysis has helped to pave the way for more systems biology-driven approaches to DTD [108, 109]. These approaches strongly rely on the integration of omics-driven information with pathway annotations in order to more accurately identify effective drug targets. Currently, there are a large number of pathways resources that are used for system biology analysis [110, 111]. These resources have a variety of goals, ranging from identifying gene functions in model organisms to providing tools for drug discovery. Pathway-based strategies are also useful for the identification of alternative druggable targets. Sometimes targets identified with GWAS and other omics technologies may not be druggable [112]. However, these undruggable genes may occur on a pathway with a partner that is from a known druggable family. Moreover, pathway-level information can help researchers identify drug target side effect [113]. Table 7 reports a brief description of pathway databases that can be used to implement systems biology approaches to drug discovery and validation.

Table 7

Pathway-based databases useful for DTD

Database	Description	Drug-related information
KEGG [113]	The Kyoto Encyclopedia of Genes and Genomes is a widely used database containing metabolic pathways (372 reference pathways) from a wide variety of species (>700). These pathways are hyperlinked to metabolite and protein-complex/enzyme information.https://www.genome.jp/kegg/	Drug metabolismDrug developmentDisease/drug information
BioCyc [114]	The BioCyc database is a set of 3000 Pathway/Genome Databases (PGDBs) for many sequenced genomes. PGDBs describe the entire genome of an organism, as well as its biochemical pathways and (when curated) its regulatory network.https://biocyc.org/	Pathway-based target selection and validationAntimicrobial drug targets
Reactome [115]	Reactome builds and maintains a peer reviewed knowledge base of biological pathways (primary species of interest is Homo sapiens), including metabolic pathways as well as protein complex trafficking and signaling pathways. Reactome includes several types of reactions in its pathway diagram collection including experimentally confirmed, manually inferred and electronically inferred reactions.https://reactome.org/	Simulate impact of drugs on pathway activitiesDrug target interaction in pathway diagrams
WikiPathways [116]	WikiPathways is an open, collaborative platform dedicated to the curation of biological pathways. It is based on the MediaWiki open source software used by Wikipedia, coupled to a custom graphical pathway editing tool and integrated databases covering major gene, protein complex and small-molecule systems.https://wikipathways.org/	Drug target search strategies
Pathway Commons [117]	PC provides a collection of publicly available pathways from multiple organisms that provide researchers with convenient access to a comprehensive collection of pathways from multiple sources represented in a common language.https://www.pathwaycommons.org/	Robust pathway analyses
Biocarta	Biocarta is an open source database of pathways highlighting molecular relationships from areas of active research as well as classical pathway maps. It also catalogs and summarizes important resources providing information for over 120 000 genes from multiple species. www.biocarta.com	Enhancing genomic information for DTD
PharmGKB [118]	PGKB is a publicly available online knowledgebase responsible for the aggregation, curation, integration and dissemination of knowledge regarding the impact of human genetic variation on drug response. It also contains manually curated pharmacokinetic and pharmacodynamics pathways.https://www.pharmgkb.org/	Drug target–side effects

DTD platforms

Over the past decade many databases and computational platforms for target discovery have been created to help scientists find robust evidence linking targets to diseases. These tools aim to assess the therapeutic efficacy of targets and, more recently, also their safety aspects. Initially, DTD platforms were not built specifically for omics-driven target discovery. However, they provide access to multiple biomedical data showing relevant target diseases associations. Large-scale data, such as omics, are systematically processed into concise information about drug target and target disease associations and drug- and target-related side effects. Table 8 lists brief descriptions of the reviewed DTDs.

Table 8

Platforms and databases for DTD and evaluation

DTD	Link	Description	Main goals	License
DrugBank [39, 119]	drugbank.ca	A bioinformatics and chemoinformatics resource that combines drug and drug target information	Drug and target information	CC BY-NC 4.0
ChEMBL [120]	ebi.ac.uk/chembl	An open large-scale bioactivity database combining molecule, target and drug data	Drug and target information	CC BY-SA 3.0
DGIdb [121]	dgidb.org	A collection of drug–gene interactions and gene druggability information	Drug–gene interactions	MIT
TTD [38]	db.idrblab.org/ttd/	A database to provide information about known and explored therapeutic protein and nucleic acid targets and related targeted disease	Drug and target information	Free access
DisGeNET [68]	disgenet.org	A collection of genes and variants associated with human diseases	Gene disease associations	CC BY-NC-SA 4.0
DTC [2]	drugtargetcommons.fimm.fi	A crowd-sourcing platform to improve the consensus and use of drug target interactions	Drug target interactions	CC BY-NC-SA 3.0
Open Targets [3]	opentargets.org	Platform for target identification and prioritization target–disease associations	Target–disease associations	APACHE LICENSE, VERSION 2.0
PHAROS [122]	pharos.nih.gov	Knowledge base for the Druggable Genome	Target–disease associations	CC BY-SA 4.0
CTD [123, 124]	http://ctdbase.org/	A literature-based, manually curated associations between chemicals, gene products, phenotypes, diseases and environmental exposures	Drug-gene interactionsDrug disease associations	TM
ADReCS-Target [125]	bioinf.xmu.edu.cn	A collection of ADRs caused by drug interaction with protein, gene and genetic variation	Drug target–adverse effect associations	Non-commercial use

Platforms and databases for DTD and evaluation

DrugBank

DrugBank [120] collects comprehensive molecular information about drugs, their mechanisms, interactions and targets. It is primarily focused on providing data mining tools needed to facilitate target discovery and drug development. Latest versions of the DrugBank also provide information on the effect of hundreds of drugs on metabolite levels (pharmacometabolomics data), gene expression (pharmacotranscriptomics data) and protein expression (pharmacoprotoemics data). New data have also been added on the status of hundreds of new drug clinical trials and existing drug repurposing trials. Data in DrugBank are provided in an XML format. This makes data downloads and development of data extraction routines simpler and faster for programmers and database developers.

ChEMBL

ChEMBL [121] is a well-established resource in the fields of drug discovery and medicinal chemistry research. It curates and stores standardized bioactivity, molecule, target and drug data extracted from multiple sources, including the primary medicinal chemistry literature. Moreover, the ChEMBL database includes data typically generated in the preclinical and clinical phases of drug discovery, specifically drug metabolism and disposition data. These data help researchers better understand the key aspects of successful drug discoveries. ChEMBL data can be downloaded in a number of standard formats that can be automatically imported to external applications for further analyses. Alternatively, ChEMBL provides a REST API based service, allowing the remote retrieval of ChEMBL data and its integration into other applications.

DGidb

The DGidb database [122] collects drug–gene relationships and gene druggability information from 30 distinct repositories, including papers, databases and web resources. It has been the first DTD platform providing tools to capture and prioritize genes that are known to be targeted by existing drugs, especially targeted drugs rather than broad chemotherapeutics. Drug–gene interactions have been mined from existing databases and literature to populate DGIdb. Similarly, genes have been categorized as potentially druggable according to membership in selected pathways, molecular functions and gene families. In the latest version, druggable genes from GWASs are also included. All data from DGIdb are available as tab-delimited data downloads and also through a web services API.

Therapeutic Target Database

The TTD [38] provides information about the known therapeutic protein and nucleic acid targets described in the literature, the targeted disease conditions, the pathway information and the corresponding drugs/ligands directed at each of these targets. The database currently contains 2025 targets, 17 816 drugs and 3681 multi-target agents. TTD also provides information about drug resistance mutations, gene expression profiles in the disease-relevant drug-targeted tissue of the patients and healthy individuals and target combinations of multi-target drugs and drug combinations. The database is organized through five main panels which authorize to browse it by advanced search, patient data, targets or drugs groups or by model data. It also permits to download various datasets.

DisGeNET

The DisGeNET platform [127] aims to overcome the fragmentation and heterogeneity of available genomic data for mining gene disease associations. It integrates data from manually curated databases, GWAS catalogs, animal models and the scientific literature. DisGeNET features a score based on the supporting evidence to prioritize gene disease associations. These scores rely on data retrieved from databases of curating genetic association studies (the GWAS Catalog and the Genetic Association Database) and genomic information extracted from animal models. DisGeNET can be used for different research purposes including the analysis of properties of disease genes, the generation of hypothesis on drug therapeutic action and drug adverse effects, the validation of computationally predicted disease genes and the evaluation of text mining methods performance. DisGeNET data are available for downloading in several formats: as SQLite database, as tab-separated files and as dump files, serialized in RDF/Turtle. Comparison on drug target–disease associations Efficacy estimates can refer to drug target interaction (DrT) or disease target associations (DsT).

Drug Target Commons

DTC [2] is a crowdsourcing web platform that aims to standardize the collection, management, curation and annotation of the notoriously heterogeneous compound–target bioactivity data to facilitate drug discovery, target identification and drug repurposing. It integrates different, publicly available bioactivity data, which are generated using various assays, on compound–target interactions. Multiple data sources are systematically used for discovery of new indications for drugs (i.e. the selection of compound affecting specific proteins or biological pathways). DTC also provides access to QSAR models that can be used to extend target spaces for drugs. Besides, the drug-related bioactivity data are combined with chemical proteomic data in order to characterize biological pathways that are affected by certain drugs, to identify makers for drug monitoring and to determine drug-cocktails [128]. These data can be used to characterize potential safety issues associated with certain drugs (e.g. in vivo absorption, distribution, metabolism, excretion and toxicity properties) [129].

Open Targets Platform

Open Target [3] is a platform for therapeutic target identification and validation, providing either a target-centric workflow to identify diseases that may be associated with a specific target, or a disease-centric workflow to identify targets that may be associated with a specific disease. Coverage includes genetic associations, somatic mutations, know drugs, gene expression, affected pathways, literature mining and animal models. The latest version of Open Target provides information regarding the tractability of a target, which measures the ‘ligandability’ of putative drug targets [130], and safety risk information associated with selected targets. The Open Targets Platform allows programmatic retrieval of data via a set of REST services or, alternatively, the access to dump files.

Pharos

Pharos [123] provides a web interface for data collected by the Illuminating the Druggable Genome initiative. It incorporates text-mined bibliometric associations and statistics from the biomedical and patent literature, mRNA and protein expression data, disease and phenotype associations, bioactivity data, drug target interactions, and omics-driven data imported from the Harmonizome. It also integrates with the functionality of the Drug Central and DTO resources. The Drug Target Ontology is a database providing tools to classify and integrate drug discovery data based on formalized and standardized classifications and annotations of druggable protein targets. DTO integrates phylogenecity, function, target development level, disease association, tissue expression, chemical ligand and substrate characteristics and target-family specific characteristics. Protein classes are linked to tissue and disease via different levels of confidence. DTO also contains drug target development classifications, a large collection of cell lines from the LINCS project and relevant cell–disease and cell–tissue relations. DTO is modeled in OWL2-DL to enable further classification by inference reasoning and SPARQL queries. DTO is implemented following a modularization approach. DTO will serve as the organizational framework for drug targets in the IDG PHAROS User Interface Portal.

Comparative Toxicogenomics Database

CTD [124] is a public resource that provides information about interactions between chemicals and gene products, and their relationships to diseases. Professional biocurators manually curate the scientific literature, transforming text, tables, figures and supplemental files into annotated data that are seamlessly integrated and available through CTD’s public web application (PWA). Different data sources for toxicogenomics, phenotypes, diseases, environmental exposures and pharmaceuticals are considered in order to build drug gene and drug diseases interactions. Overall, CTD includes over 38 million toxicogenomic relationships for analysis and hypothesis development. This information is organized through community-accepted controlled vocabularies and ontologies with accession identifiers, in order to ensure that CTD’s content is cohesive, manageable and computable, as well as adhering to the FAIR principle. CTD’s vocabularies and content are described and made freely available for users to download in a variety of formats. Comparison based on the use of omics data layers and prioritization tool The (*) indicates that omics-driven information is obtained from an external data source, database or literature.

ADReCS-Target

ADReCS-Target [126] provides target profiles for aiding drug safety research and application by collecting data about adverse drug reactions (ADRs) caused by drug interactions with protein, gene and genetic variation. ADReCS-Target contains more than 66 000 association pairs with over 2200 standard ADR terms manually curated from text mining of the public scientific literatures. All the terms are standardized by using ADReCS ontology and represented as a connected network or systematic fashion. The user can download selected records via the embedded download function in six formats such as JSON, XML, CSV, TXT, SQL and MS-Excel. ADReCS-Target also allows batch data retrieval.

Comparative analysis of DTD platforms

In this section we compare the selected DTDs based on what type of associations are pursued, the information provided to inform about efficacy and safety of putative targets and the omics data employed for the discovery process. Comparing the information presented in Table 9, we can observe that many existing DTDs help identify and prioritize drug targets solely on the basis of molecule–target interactions, without specifying whether the target is disease-modifying and/or has a proven function in the pathophysiology of a disease. On the other hand, web platforms such as Open Targets and DisGeNET provide information on drug target–disease associations. These tools provide also efficacy scores of target disease associations calculated from multiple sources of evidence, including omics and scientific literature. Both scoring systems can be utilized to summarize and rank disease-relevant targets. This is especially the case of the Open Targets platform, which allows the collection and fusion of many pieces of evidence to support target disease associations. The Open Targets platform employs data mining algorithms in order to estimate a numerical score for each type of evidence. These scores are finally merged through the harmonic sum, which gives an overall efficacy estimate indicting strength of an association between a molecular target and a disease. More recently, Open Targets platform has provided a new source of information namely ‘target tractability’ which can be used to collect drug target details, such as whether there is a binding site in the protein that can be used for small molecule binding, or an accessible epitope for antibody based therapy. This can assist in target prioritization, drug target inclusion in discovery pipelines and selection of therapeutic modalities that are most likely to succeed. An important aspect of drug target prioritization is target safety assessment, which should aim to identify potential unintended adverse consequences of target modulation [33]. However, to the best of our knowledge, none of the presented DTD platforms provide safety evaluation scores to be combined with efficacy estimates of target disease association. Open Target and DisGeNET do not directly provide information on drug target interactions. CTD is the only platform aiming to bridge the information gap between drug target interaction and target disease association tasks. It integrates multiple sources of information for drug target interaction and it includes clinical development information for the compounds and target gene disease associations, as well as cancer-type indications for mutant protein targets, which are critical for precision oncology developments. However, CTD often considers the disease as an adverse outcome given by the interaction between a chemical and a gene. Another important observation is that many DTDs do not directly indicate information on potential side effects that could be caused by a drug interacting with the selected molecular target (e.g. gene or protein), if the selected targets are classified as ‘essential’ or play a role in oncogenic pathways. The term essential gene can refer to genes encoding proteins that are necessary to maintain a central metabolism, replicate DNA, translate genes into proteins, maintain a basic cellular structure, etc. CTD and ADReCS-Target provide information on putative side effects associated with drug target. In particular, CTD integrates chemical and biological information to elucidate toxicology relationships, while ADReCS-Target provides information on drug toxicity–target relationships. TTD and DrugBank provide information on multi-target drugs or therapeutic combinations. TTD provides information on synergistic, additive, antagonistic, potentiative and reductive drug combinations. Whereas, the DrugBank database includes a set of 12 128 drug–drug interactions along with a brief textual description of the interaction and information about therapeutic effects [131]. Overall, the existing DTDs focus more on efficacy evaluations of drug target–disease associations and less on safety aspects of drug targets, and none of them provide ranking systems to prioritize drug targets on efficacy and safety evaluations simultaneously. Table 10 lists the omics data types that DTDs use for mining drug target or target disease associations. In particular, we can observe that genomic (genetic) and transcriptomic data are commonly used and that they are often obtained from pre-compiled omics data analysis. Moreover, this information is not always utilized to compile efficacy estimates of drug–gene or gene disease associations. Platforms such as ChEMBL, DGIdb, DisGeNET and Open Targets provide scoring methods to rank drug target or target disease interactions. However, these methods often do not fully exploit the data collected by DTDs, for instance, DGIdb simply reports the number of distinct sources and distinct PubMed IDs (PMIDs) supporting each interaction. Moreover, benchmark studies supporting the validity of these scoring methods are missing. As DTD is one of the first phases of drug development process, it seems feasible that DTD platforms would be incorporated into the early stages of the process. The platforms can be utilized to discover completely new targets, rank existing targets lists to identify the most likely candidates based on different criteria or to provide additional evidence and starting points for deep dives into specific targets. To be able to efficiently incorporate DTD platforms to an existing drug development process and pipeline, the following features are desirable: (i) possibility to batch process a set of targets, (ii) programmatic access to the platform through API and possibility to download the data, (iii) possibility to incorporate private data, (iv) standardized data formats and gene/gene product identifiers, (v) references to where and how the evidence for a specific target was gathered, (vi) rankable evidence scores and (vii) license that enables intended use. We believe these features make a platform easier to integrate with other DTD platforms and existing drug development pipelines.

Table 9

Comparison on drug target–disease associations

DTD	Main association	Drug target	Target–disease	Efficacy	Safety	Novelty
DrugBank	Drug target (RNA, DNA and other molecules)	Drug binding data Drug pharmacokinetics Drug bioavailability Drug ADMET characteristics Clinical trials (TTD, STITCH, BindingDB, ChEMBL)	External links to ChemSpider, HMDB, MMCD, SMPDB and OMIM	Yes (DrT)	Yes	Yes
ChEMBL	Molecule-target (genes/proteins)	Efficacy assays data ADME assays data Drug metabolism data Toxicity assay data	External link to ClinicalTrials.gov	Yes (DrT)	Yes	Yes
DGIdb	Drug target (genes)	Drug bioactivity data Physically binding data Modulation and indirect interaction RNA drug binding data (DrugBank, TTD, ChEMBL, TALC)	Missing	Yes (DrT)	No	No
TTD	Drug target Target disease	Drug target interaction (PubChem, DrugBank, SuperDrug and ChEBI)	Gene expression profiles	Yes (DsT)	No	No
DisGeNET	Gene diseases Variant diseases	External links to Drug activity data Drug gene interaction Drug adverse reaction (ChEMBL, CTD, Sider)	Genomic data (GWAS) Scientific literature Animal models	Yes (DsT)	No	No
DTC	Drug target interactions (proteins)	Drug activity data; clinical development information of drugs (25 databases, including ChEMBL, PubChem, DrugBank, PharmGKB and ClinicalTrials.gov)	External links to DisGeNET, Cancer Genome Interpreter)	Yes (DrT)	No	No
Open Targets	Target disease (genes/proteins)	External link to ChEMBL	Genetic associations; somatic mutations; drugs pathways & systems biology; RNA expression; text mining; animal models	Yes (DrT)	Yes	No
PHAROS	Drug target	Scientific literature mRNA and protein expression data Disease and phenotype associations Bioactivity data Drug target interactions Adverse drug reactions	External links to DisGeNET, Expression Atlas GTEx, GWAS Catalog, JensenLab data	Yes (DrT)	No	Yes
CTD	Drug target Drug disease Target disease (genes/proteins)	Curated chemical–gene interactions (bioactivity, binding, expression, mutagenesis and metabolic processing)	Curated chemical–diseases interactions Inferred gene disease associations	Yes (DsT)	Yes	No
ADReCS-Target	Drug target/side effects (genes/proteins)	A collection of ADRs caused by drug interaction with protein, gene and genetic variation	External links to CTD, DrugBank, dbSNP	No	Yes	No

Efficacy estimates can refer to drug target interaction (DrT) or disease target associations (DsT).

Table 10

Comparison based on the use of omics data layers and prioritization tool

DTD	Omics	Omics data types	External DB	Ranking
DrugBank	GenomicTranscriptomic Metabolomic Proteomic	SNP-drug data (^) Up/down regulation of genes due to drug metabolism Manually compile metabolomic information (^) Drug-action pathways on protein targets (^*)	dbSNP, Literature, SMPDB, HMDB, T3DB, SMPDB, Uniprot, CTD	Not available
ChEMBL	Transcriptomic Genomic	Gene expression profiles induced by chemical or drug exposure (^) Drug sensitivity (^)	TG-GATE, DrugMatrix, Gene Expression Atlas, GDSC	Confidence score to rank molecule-target interactions
DGIdb	Genomic	Druggable genome/genes (^*)	MyCancerGenome	Number of distinct sources of evidence and PMIDs supporting each interaction.
TTD	Transcriptomic Genomic	Tissue-specific gene expression profiles in healthy and diseased individuals (^) Drug resistance mutation (^)	Gene Expression Omnibus and ArrayExpress Literature	Not available
DisGeNET	Transcriptomic Genomic	Gene expression alteration (^) Relationships between human variants/genes and phenotypes/diseases (^) Genome association studies (^*)	Gene Expression Atlas CTD, CLINVAR Orphanet, GWAS Catalog, The Genetic Association Database (GAD)	GDA score to rank the gene disease according to their level of evidence. This score compiles efficacy scores on the basis of genomic information and scientific literature.
DTC	Genomic	Gene disease associations (^) Somatic mutation information (^)	DisGeNET	Not available
Open Targets	Transcriptomic Genomic/Genetic	Expression profile of diseases (^) Genome association studies, somatic mutation (^)	Gene Expression Atlas GWAS/PheWAS Catalog, Gene2Phenotype	Multi-evidence ranking of target disease associations
PHAROS	Transcriptomic/Proteomic Genomic/Genetic	Tissue-specific RNA expression (^) Genome association studies (^)	GTEx, Expression Atlas, JensenLab RNA-seq GWAS Catalog	Not available
CTD	Transcriptomic Genomic Metabolomic	Gene expression alteration (^) Genetic alteration of a gene product (^) Metabolic processing (^*)	DrugBank	Not available
ADReCS-Target	Genomic/Genetic	Gene disease associations (^) Drug relevant genetic variations (^)	CTD, GWAS Catalog, DrugBank	Not available

The (*) indicates that omics-driven information is obtained from an external data source, database or literature.

Conclusion

Current DTD platforms provide alternative ways of utilizing omics data sources for improved drug target prioritization and selection. However, there could be some improvements on the data mining algorithms which are used to quantify the efficacy estimates of drug target–disease associations. In particular, genomic, transcriptomics and proteomics data could be more efficiently used to provide new ways to link targets to diseases and validate these targets. Perhaps, it would be important in the near future to develop computational tools that could assist with the integration of these complex multi-omics data sets in order to more robustly identify drug targets. Moreover, there has been a little effort to apply omics data for early identification of safety-related issues of putative drug targets. The research using clinical data by computational biologists and biostatisticians, in academia and industry, continuously work toward the development of cost-effective and sensitive diagnostic biomarkers. Overall, we identified three major technical gaps that could be bridged by the next generation of drug discovery platforms: (i) the lack of in silico tools for target safety assessment, (ii) comparative analysis of different efficacy and safety estimates for drug target prioritization and (iii) systematic identification of multiple drug targets and selection of optimal therapeutic strategies. Target-based drug discovery is still largely manual work that bottlenecks the whole drug discovery process. Many computational platforms exist to rapidly identify and prioritize genes or proteins that encode promising drug targets from hundreds of data sources, ranging from scientific publications to omics databases. There are a few platforms providing omics-driven efficacy estimates of target disease associations. No single tool, platform or database supports drug target prioritization based on efficacy and safety assessment scores.

127 in total

1. Classification of a large microarray data set: algorithm comparison and analysis of drug signatures.

Authors: Georges Natsoulis; Laurent El Ghaoui; Gert R G Lanckriet; Alexander M Tolley; Fabrice Leroy; Shane Dunlea; Barrett P Eynon; Cecelia I Pearson; Stuart Tugendreich; Kurt Jarnagin
Journal: Genome Res Date: 2005-05 Impact factor: 9.043

Review 2. Deciphering the Emerging Complexities of Molecular Mechanisms at GWAS Loci.

Authors: Maren E Cannon; Karen L Mohlke
Journal: Am J Hum Genet Date: 2018-11-01 Impact factor: 11.025

3. The Opportunities of Metabolomics in Drug Safety Evaluation.

Authors: Pengcheng Wang; Amina I Shehu; Xiaochao Ma
Journal: Curr Pharmacol Rep Date: 2017-01-03

Review 4. Emerging applications of metabolomics in drug discovery and precision medicine.

Authors: David S Wishart
Journal: Nat Rev Drug Discov Date: 2016-03-11 Impact factor: 84.694

Review 5. Multi-omics approaches to disease.

Authors: Yehudit Hasin; Marcus Seldin; Aldons Lusis
Journal: Genome Biol Date: 2017-05-05 Impact factor: 13.583

6. In silico drug screening by using genome-wide association study data repurposed dabrafenib, an anti-melanoma drug, for Parkinson's disease.

Authors: Takeshi Uenaka; Wataru Satake; Pei-Chieng Cha; Hideki Hayakawa; Kousuke Baba; Shiying Jiang; Kazuhiro Kobayashi; Motoi Kanagawa; Yukinori Okada; Hideki Mochizuki; Tatsushi Toda
Journal: Hum Mol Genet Date: 2018-11-15 Impact factor: 6.150

7. GWAS Central: a comprehensive resource for the comparison and interrogation of genome-wide association studies.

Authors: Tim Beck; Robert K Hastings; Sirisha Gollapudi; Robert C Free; Anthony J Brookes
Journal: Eur J Hum Genet Date: 2013-12-04 Impact factor: 4.246

8. 2016 update of the PRIDE database and its related tools.

Authors: Juan Antonio Vizcaíno; Attila Csordas; Noemi del-Toro; José A Dianes; Johannes Griss; Ilias Lavidas; Gerhard Mayer; Yasset Perez-Riverol; Florian Reisinger; Tobias Ternent; Qing-Wei Xu; Rui Wang; Henning Hermjakob
Journal: Nucleic Acids Res Date: 2015-11-02 Impact factor: 16.971

9. ProteomicsDB.

Authors: Tobias Schmidt; Patroklos Samaras; Martin Frejno; Siegfried Gessulat; Maximilian Barnert; Harald Kienegger; Helmut Krcmar; Judith Schlegl; Hans-Christian Ehrlich; Stephan Aiche; Bernhard Kuster; Mathias Wilhelm
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

10. Prediction of Drug-Gene Interaction by Using Metapath2vec.

Authors: Siyi Zhu; Jiaxin Bing; Xiaoping Min; Chen Lin; Xiangxiang Zeng
Journal: Front Genet Date: 2018-07-31 Impact factor: 4.599

22 in total

1. A Combined Human in Silico and CRISPR/Cas9-Mediated in Vivo Zebrafish Based Approach to Provide Phenotypic Data for Supporting Early Target Validation.

Authors: Matthew J Winter; Yosuke Ono; Jonathan S Ball; Anna Walentinsson; Erik Michaelsson; Anna Tochwin; Steffen Scholpp; Charles R Tyler; Steve Rees; Malcolm J Hetheridge; Mohammad Bohlooly-Y
Journal: Front Pharmacol Date: 2022-04-25 Impact factor: 5.988

2. Decoding mechanism of action and sensitivity to drug candidates from integrated transcriptome and chromatin state.

Authors: Caterina Carraro; Lorenzo Bonaguro; Jonas Schulte-Schrepping; Arik Horne; Marie Oestreich; Stefanie Warnat-Herresthal; Tim Helbing; Michele De Franco; Kristian Haendler; Sach Mukherjee; Thomas Ulas; Valentina Gandin; Richard Goettlich; Anna C Aschenbrenner; Joachim L Schultze; Barbara Gatto
Journal: Elife Date: 2022-08-31 Impact factor: 8.713

Review 3. Roles of Cofactors in Drug-Induced Liver Injury: Drug Metabolism and Beyond.

Authors: Ruizhi Gu; Alina Liang; Grace Liao; Isabelle To; Amina Shehu; Xiaochao Ma
Journal: Drug Metab Dispos Date: 2022-02-27 Impact factor: 3.579

Review 4. Analysing the nanoparticle-protein corona for potential molecular target identification.

Authors: Chandra Kumar Elechalawar; Md Nazir Hossen; Lacey McNally; Resham Bhattacharya; Priyabrata Mukherjee
Journal: J Control Release Date: 2020-03-09 Impact factor: 9.776

5. Utilizing graph machine learning within drug discovery and development.

Authors: Thomas Gaudelet; Ben Day; Arian R Jamasb; Jyothish Soman; Cristian Regep; Gertrude Liu; Jeremy B R Hayter; Richard Vickers; Charles Roberts; Jian Tang; David Roblin; Tom L Blundell; Michael M Bronstein; Jake P Taylor-King
Journal: Brief Bioinform Date: 2021-11-05 Impact factor: 11.622

6. Integrative omics - An arsenal for drug discovery.

Authors: Rahul Soloman Singh; Vani Angra; Ashutosh Singh; Gladson David Masih; Bikash Medhi
Journal: Indian J Pharmacol Date: 2022 Jan-Feb Impact factor: 2.833

7. Therapeutic Targeting of Repurposed Anticancer Drugs in Alzheimer's Disease: Using the Multiomics Approach.

Authors: Dia Advani; Pravir Kumar
Journal: ACS Omega Date: 2021-05-19

8. Nigella sativa callus treated with sodium azide exhibit augmented antioxidant activity and DNA damage inhibition.

Authors: Mohammed Shariq Iqbal; Zahra Iqbal; Abeer Hashem; Al-Bandari Fahad Al-Arjani; Elsayed Fathi Abd-Allah; Asif Jafri; Shamim Akhtar Ansari; Mohammad Israil Ansari
Journal: Sci Rep Date: 2021-07-06 Impact factor: 4.379

9. A comparative pharmaco-metabolomic study of glutaminase inhibitors in glioma stem-like cells confirms biological effectiveness but reveals differences in target-specificity.

Authors: Jaroslaw Maciaczyk; Ulf D Kahlert; Katharina Koch; Rudolf Hartmann; Julia Tsiampali; Constanze Uhlmann; Ann-Christin Nickel; Xiaoling He; Marcel A Kamp; Michael Sabel; Roger A Barker; Hans-Jakob Steiger; Daniel Hänggi; Dieter Willbold
Journal: Cell Death Discov Date: 2020-04-16

Review 10. Improving target assessment in biomedical research: the GOT-IT recommendations.

Authors: Christoph H Emmerich; Lorena Martinez Gamboa; Martine C J Hofmann; Marc Bonin-Andresen; Olga Arbach; Pascal Schendel; Björn Gerlach; Katja Hempel; Anton Bespalov; Ulrich Dirnagl; Michael J Parnham
Journal: Nat Rev Drug Discov Date: 2020-11-16 Impact factor: 112.288