| Literature DB >> 31774113 |
Jussi Paananen1,2, Vittorio Fortino1.
Abstract
The drug discovery process starts with identification of a disease-modifying target. This critical step traditionally begins with manual investigation of scientific literature and biomedical databases to gather evidence linking molecular target to disease, and to evaluate the efficacy, safety and commercial potential of the target. The high-throughput and affordability of current omics technologies, allowing quantitative measurements of many putative targets (e.g. DNA, RNA, protein, metabolite), has exponentially increased the volume of scientific data available for this arduous task. Therefore, computational platforms identifying and ranking disease-relevant targets from existing biomedical data sources, including omics databases, are needed. To date, more than 30 drug target discovery (DTD) platforms exist. They provide information-rich databases and graphical user interfaces to help scientists identify putative targets and pre-evaluate their therapeutic efficacy and potential side effects. Here we survey and compare a set of popular DTD platforms that utilize multiple data sources and omics-driven knowledge bases (either directly or indirectly) for identifying drug targets. We also provide a description of omics technologies and related data repositories which are important for DTD tasks.Entities:
Keywords: drug efficacy and safety evaluations; drug target discovery; omics-informed drug discovery
Year: 2020 PMID: 31774113 PMCID: PMC7711264 DOI: 10.1093/bib/bbz122
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Key drug target properties
| Property | Description | Key aspects |
|---|---|---|
| Efficacy | In order for a drug to have an effect, it needs to bind to its target, and then to affect the function of this target. A target can refer to a gene, a protein or other biomolecules, and it is responsible for the therapeutic efficacy of the drug [ | Target druggabilityTarget disease validationTissue-specific efficacy evaluations |
| Safety | Safety evaluation aims to identify potential adverse consequences of target modulation, unavoidable on-target toxicities and potential clinical adverse to support the steps of drug target identification and prioritization [ | Drug toxicity in patientsOFF/ON drug targetsUnsafe biomolecules (essential genes, carcinogenic, etc.) |
| Novelty | It estimates the scarcity of publications and patents about a protein target [ | Text mining of scientific and patent literature |
Omics data types and their use for informed pharmaceutical research and development
| Omics | Function | Databases |
|---|---|---|
| Genomics | Understanding pathogenesis | GWAS Catalog |
| Genetic association studies | GWAS central | |
| Identification of disease genes | dbGaP | |
| Discovery of putative drug targets | PharmGKB | |
| Patient-centered efficacy and toxicity assessment of drugs/targets | ||
| Patient stratification | ||
| Transcriptomics | Disease mechanisms | DrugMatrix |
| Mode of action of compounds | TG-GATE | |
| Moving from disease genes to drug targets | LINCS 1000 | |
| Identification/evaluation of drug target candidates | Expression Atlas | |
| Early prediction of adverse drug target effects | GEO repository | |
| ArrayExpress | ||
| Proteomics | Post-translational process | PRIDE Archive |
| Protein–protein network interaction | Peptide Atlas | |
| Drug target efficacy and safety evaluation at protein level | ProteomicsDB | |
| Human Proteome Map | ||
| Protein toxicology | Human Proteome Atlas | |
| Metabolomics | Novel DTD | Human Metabolome |
| Drug target efficacy and safety evaluation at metabolomic level | Madison Metabolomics | |
| Metabolic toxicity | Golm Metabolome Database | |
| MassBank | ||
| MetaboLights | ||
| MetabolomeExpress |
Genomics databases for DTD
| Database | Description | Application |
|---|---|---|
| GWAS Catalog [ | Collects published human GWASs that are manually curated by expert scientists. GWAS Catalog provides accurate and structured metadata for publication, study design, sample and trait information and the most significant published results. | Mining disease genes |
| Narrow-down/prioritize candidate loci | ||
| Disease risk prediction | ||
| Disease mechanisms | ||
| GWAS Central [ | A database of summary level findings from genetic association studies, both large and small. GWAS central collects datasets from public domain projects and encourages direct data submissions from the community. | Mining SNP-drug response associations |
| NCBI dbGaP [ | The NCBI Database of Genotypes and Phenotypes archives results of studies that have investigated the interaction of genotype and phenotype and distributes these results to investigators for secondary study. It includes phenotype data, GWAS data, summary level analysis data, Short Read Archive (SRA) data, reference alignment (BAM) data, Variant Call Format (VCF) data, etc. | Genotype studies for the identification of disease genes |
| PharmGKB [ | A publicly available online knowledgebase aggregating, curating, integrating and disseminating knowledge regarding the impact of human genetic variation on drug response. | Mining drug–gene, drug–SNP, gene–disease, disease–SNP, drug–pathway, disease– |
Transcriptomic databases for DTD
| Database | Description | Application |
|---|---|---|
| DrugMatrix [ | DM is provided by the U.S. National Toxicology Program and it gives access to large-scale gene expression data derived from standardized toxicological experiments in which rats or primary rat hepatocytes were systematically treated with therapeutic, industrial and environmental chemicals at both non-toxic and toxic doses. | DTD |
| TG-GATEs [ | TG-GATEs provides gene expression profiles and traditional toxicological data derived from | DTD |
| LINCS 1000 [ | L1000 generates gene expression signatures from treatment of a variety of cell types with perturbagens that span a range of small-molecule compounds, gene overexpression and gene knockdown reagents. The gene expression profiles are generated from a method, namely L1000, which defines a reduced representation of the transcriptome. | DTD |
| Expression Atlas [ | EA collects baseline gene expression data in different species and contexts, such as tissue, developmental stage or cell type. It also contains differential studies, reporting changes in expression between two different conditions, such as healthy and diseased tissue. | DTD and validation |
| GEO repository [ | GEO is a database repository of high-throughput gene expression data and hybridization arrays, chips, microarrays. | Retrieve drug, gene and disease perturbations |
| ArrayExpress [ | AE serves as an international repository for microarray data and high-throughput sequencing-based functional genomics experiments associated with scientific publications. | Retrieve drug, gene and disease perturbations |
| TCGA | TCGA collects and functional genomics data repository for >30 cancers across >10 K samples. Primary data types include mutation, copy number, mRNA and protein expression. | Discover novel molecular targets |
| GTEx [ | GTEx provides transcriptomic profiles of normal tissues, including >7 K samples across >45 tissue types. | Tissue-specific drug targets |
| CCLE [ | CCLE provides genetic and pharmacologic characterization of >1000 cancer cell lines. | Identify novel drug targets and drug response biomarkers |
| GDSC [ | GDSC is the largest public resource for information on drug sensitivity in cancer cells and molecular markers of drug response. | Identify novel drug targets and drug response biomarkers |
Proteomic databases for DTD
| Database | Description | Application |
|---|---|---|
| PRIDE Archive [ | The PRIDE is a public data repository for proteomics, including protein and peptide identifications, post-translational modifications and supporting spectral evidence. | Drug target identification |
| ProteomicsDB [ | PDB is a large collection of quantitative MS-based proteomics data across various tissue types as well as protein–protein interaction information, functional annotation, target deconvolution, cell sensitivity and reference MS data. | Drug target identification |
| Human Proteome Map [ | Hosts high-resolution MS proteomic data representing 17 adult tissues, 6 primary hematopoietic cells and 7 fetal tissues resulting in >84% human proteome coverage. | Drug target identification |
| Human proteome atlas [ | Collects expression and localization of majority of human protein-coding genes based on both RNA and protein data. The HPA also employs antibody-based proteomics and transcriptomics profiling methods to locate and identify proteins in tissues and cell types. | Druggable proteome |
Metabolomic databases for DTD
| Database | Description | Applications |
|---|---|---|
| The Human Metabolome Database [ | HMDB is a freely available electronic database containing detailed information about small molecule metabolites found (and experimentally verified) in the human body. It contains experimental MS/MS data for over 5700 compounds. | DTD |
| The Madison Metabolomics Consortium Database [ | MMCD collects small molecules of biological interest gathered from electronic databases and the scientific literature. It contains approximately 10 000 metabolite entries and experimental spectral data on about 500 compounds. | DTD |
| Golm Metabolome Database [ | GMD represents a general MS-based repository of reference metabolite profiles for essential plant tissues and typical variations of growth conditions. | DTD |
| MassBank [ | The first public repository of Electron Impact-MS data covering more than 200 000 spectra for a wide range of organic compounds. | DTD |
| MetaboLights [ | ML is an open-access database repository for cross-platform and cross-species metabolomics research at the European Bioinformatics Institute. It provides Metabolomics Standard Initiative-compliant metadata and raw experimental data associated with metabolomics experiments. | Drug safety |
| MetabolomeExpress [ | MB is designated to perform three main functions: (i) store GC-MS metabolomics data, allowing for analysis without the user having to download the data, (ii) provide a GC-MS analysis pipeline and (iii) store metabolite response statistics. | DTD |
Pathway-based databases useful for DTD
| Database | Description | Drug-related information |
|---|---|---|
| KEGG [ | The Kyoto Encyclopedia of Genes and Genomes is a widely used database containing metabolic pathways (372 reference pathways) from a wide variety of species (>700). These pathways are hyperlinked to metabolite and protein-complex/enzyme information. | Drug metabolism |
| BioCyc [ | The BioCyc database is a set of 3000 Pathway/Genome Databases (PGDBs) for many sequenced genomes. PGDBs describe the entire genome of an organism, as well as its biochemical pathways and | Pathway-based target selection and validation |
| Reactome [ | Reactome builds and maintains a peer reviewed knowledge base of biological pathways (primary species of interest is | Simulate impact of drugs on pathway activities |
| WikiPathways [ | WikiPathways is an open, collaborative platform dedicated to the curation of biological pathways. It is based on the MediaWiki open source software used by Wikipedia, coupled to a custom graphical pathway editing tool and integrated databases covering major gene, protein complex and small-molecule systems. | Drug target search strategies |
| Pathway Commons [ | PC provides a collection of publicly available pathways from multiple organisms that provide researchers with convenient access to a comprehensive collection of pathways from multiple sources represented | Robust pathway analyses |
| Biocarta | Biocarta is an open source database of pathways highlighting molecular relationships from areas of active research as well as classical pathway maps. It also catalogs and summarizes important resources providing information for over 120 000 genes from multiple species. www.biocarta.com | Enhancing genomic information for DTD |
| PharmGKB [ | PGKB is a publicly available online knowledgebase responsible for the aggregation, curation, integration and dissemination of knowledge regarding the impact of human genetic variation on drug response. It also contains manually curated pharmacokinetic and pharmacodynamics pathways. | Drug target–side effects |
Platforms and databases for DTD and evaluation
| DTD | Link | Description | Main goals | License |
|---|---|---|---|---|
| DrugBank [ | drugbank.ca | A bioinformatics and chemoinformatics resource that combines drug and drug | Drug and target information | CC BY-NC 4.0 |
| ChEMBL [ | ebi.ac.uk/chembl | An open large-scale bioactivity database combining molecule, target and drug data | Drug and target information | CC BY-SA 3.0 |
| DGIdb [ | dgidb.org | A collection of drug–gene interactions | Drug–gene interactions | MIT |
| TTD [ | db.idrblab.org/ttd/ | A database to provide information about | Drug and target information | Free access |
| DisGeNET [ | disgenet.org | A collection of genes and variants | Gene disease associations | CC BY-NC-SA 4.0 |
| DTC [ | drugtargetcommons. | A crowd-sourcing platform to improve the consensus and use of drug target interactions | Drug target interactions | CC BY-NC-SA 3.0 |
| Open Targets [ | opentargets.org | Platform for target identification and prioritization target–disease associations | Target–disease associations | APACHE LICENSE, VERSION 2.0 |
| PHAROS [ | pharos.nih.gov | Knowledge base for the Druggable Genome | Target–disease associations | CC BY-SA 4.0 |
| CTD [ | http://ctdbase.org/ | A literature-based, manually curated associations between chemicals, gene | Drug-gene interactions | TM |
| ADReCS-Target [ | bioinf.xmu.edu.cn | A collection of ADRs caused by drug | Drug target–adverse effect associations | Non-commercial use |
Comparison on drug target–disease associations
| DTD | Main association | Drug target | Target–disease | Efficacy | Safety | Novelty |
|---|---|---|---|---|---|---|
| DrugBank | Drug target (RNA, DNA and other molecules) | Drug binding data | External links to ChemSpider, HMDB, MMCD, SMPDB and OMIM | Yes (DrT) | Yes | Yes |
| ChEMBL | Molecule-target (genes/proteins) | Efficacy assays data | External link to ClinicalTrials.gov | Yes (DrT) | Yes | Yes |
| DGIdb | Drug target (genes) | Drug bioactivity data | Missing | Yes (DrT) | No | No |
| TTD | Drug target | Drug target interaction (PubChem, DrugBank, SuperDrug and ChEBI) | Gene expression profiles | Yes (DsT) | No | No |
| DisGeNET | Gene diseases | External links to | Genomic data (GWAS) | Yes (DsT) | No | No |
| DTC | Drug target interactions (proteins) | Drug activity data; clinical development information of drugs (25 databases, including ChEMBL, PubChem, DrugBank, PharmGKB and ClinicalTrials.gov) | External links to DisGeNET, Cancer Genome Interpreter) | Yes (DrT) | No | No |
| Open Targets | Target disease (genes/proteins) | External link to ChEMBL | Genetic associations; somatic mutations; drugs pathways & systems biology; RNA expression; text mining; animal models | Yes (DrT) | Yes | No |
| PHAROS | Drug target | Scientific literature mRNA and protein expression data | External links to DisGeNET, Expression Atlas GTEx, GWAS Catalog, JensenLab data | Yes (DrT) | No | Yes |
| CTD | Drug target | Curated chemical–gene interactions (bioactivity, binding, expression, mutagenesis and metabolic processing) | Curated chemical– | Yes (DsT) | Yes | No |
| ADReCS-Target | Drug target/side effects (genes/proteins) | A collection of ADRs caused by drug interaction with protein, gene and genetic variation | External links to CTD, DrugBank, dbSNP | No | Yes | No |
Efficacy estimates can refer to drug target interaction (DrT) or disease target associations (DsT).
Comparison based on the use of omics data layers and prioritization tool
| DTD | Omics | Omics data types | External DB | Ranking |
|---|---|---|---|---|
| DrugBank | Genomic | SNP-drug data (*) | dbSNP, Literature, SMPDB, HMDB, T3DB, SMPDB, Uniprot, CTD | Not available |
| ChEMBL | Transcriptomic | Gene expression profiles induced by chemical or drug exposure (*) | TG-GATE, DrugMatrix, Gene Expression Atlas, GDSC | Confidence score to rank molecule-target interactions |
| DGIdb | Genomic | Druggable genome/genes (*) | MyCancerGenome | Number of distinct sources of evidence and PMIDs supporting each interaction. |
| TTD | Transcriptomic | Tissue-specific gene expression profiles in healthy and diseased individuals (*) | Gene Expression Omnibus and ArrayExpress Literature | Not available |
| DisGeNET | Transcriptomic | Gene expression alteration (*) | Gene Expression Atlas | GDA score to rank the gene disease according to their level of evidence. This score compiles efficacy scores on the basis of genomic information and scientific literature. |
| DTC | Genomic | Gene disease associations (*) | DisGeNET | Not available |
| Open Targets | Transcriptomic | Expression profile of diseases (*) | Gene Expression Atlas | Multi-evidence ranking of target disease associations |
| PHAROS | Transcriptomic/Proteomic | Tissue-specific RNA expression (*) | GTEx, Expression Atlas, JensenLab RNA-seq | Not available |
| CTD | Transcriptomic | Gene expression alteration (*) | DrugBank | Not available |
| ADReCS-Target | Genomic/Genetic | Gene disease associations (*) | CTD, GWAS Catalog, DrugBank | Not available |
The (*) indicates that omics-driven information is obtained from an external data source, database or literature.