| Literature DB >> 30736745 |
Jin Yao1, Mark R Hurle2, Matthew R Nelson3, Pankaj Agarwal2.
Abstract
BACKGROUND: Determining which target to pursue is a challenging and error-prone first step in developing a therapeutic treatment for a disease, where missteps are potentially very costly given the long-time frames and high expenses of drug development. With current informatics technology and machine learning algorithms, it is now possible to computationally discover therapeutic hypotheses by predicting clinically promising drug targets based on the evidence associating drug targets with disease indications. We have collected this evidence from Open Targets and additional databases that covers 17 sources of evidence for target-indication association and represented the data as a tensor of 21,437 × 2211 × 17.Entities:
Keywords: Clinical trial outcomes; Drug target discovery; Tensor factorization
Mesh:
Year: 2019 PMID: 30736745 PMCID: PMC6368709 DOI: 10.1186/s12859-019-2664-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Data representation and model benchmark schematic. a Tensor representation of the dataset. The last “slice” matrix represents the clinical outcomes of target-indication pairs. b Illustration of three schemes of benchmarking models on predicting clinical outcomes. Each matrix represents the clinical outcomes of targets (rows) and indications (columns). Grey and green cells are target-indications pairs used for training and testing, respectively. Blank cells represent unknown clinical outcomes of target-indication pairs
17 sources of target-indication evidence
| Evidence Type | Sources |
|---|---|
| Animal models | Phenodigm |
| Genetics | European Variant Archive, Uniprot, Uniprot literature, GWAS catalog, STOPGAP [ |
| Somatic mutation | Cancer Gene Census, European Variant Archive somatic |
| Literature | Europe PMC, TERMITE |
| mRNA disease expression | Expression Atlas, Internal expression data |
| mRNA tissue overexpression | GeneLogic, GTEx [ |
| Pathways | Reactome, Metabase |
Evidence data were obtained from Open Targets [1] except for TERMITE: www.scibite.com/products/termite; GeneLogic: GeneLogic Division, Ocimum Biosolutions, Inc., Internal expression data, and those explicitly referenced
Six sources of target-only categorical attributes
| Attribute Type | Sources (# of categories) |
|---|---|
| Mutation Tolerance | ExAc_LoF(3), ExAc_Missense (3), RVIS (3), Mouse Protein identity (2) |
| Other Target Characteristics | Target Location (7), Target Topology (5) |
Genes were broken into non-overlapping categories based on available data. Genes were classified as tolerant, intolerant and unclassified based on data from the Exome Aggregation Consortium [51] and the percentile rank of Residual Variation Intolerance Score [52]. Genes were based on the identification of > = 75% protein homology between human and mouse, data downloaded from BioMart [53]. Target Location and Topology were derived from a review of information from Gene Ontology, InterPro, PFAM, and UniProt
Fig. 2Benchmark performance of models. Prediction performance comparison in three benchmark schemes in terms of Area Under Receiving Operation Curve (AUROC, Top) and Area Under Precision Recall Curve (AUPRC, Bottom). Error bars are calculated from cross-validation (LR: Logistic Regression; GBM: Gradient Boosting Machine; RF: Random Forest; MF: Matrix Factorization; BTF: Bayesian Tensor Factorization)
Fig. 3Benchmark performance of leave one out experiments. Model performance on predicting clinical outcomes of target classes (a) and disease clusters (b) in the leave-one-out experiments in terms of Area Under Receiving Operation Curve (AUROC, x-axis) and Area Under Precision Recall Curve (AUPRC, y-axis). 95% confidence interval is calculated using 1000 bootstraps. Dotted lines mark the AUROC (vertical) and AUPRC (horizontal) of a random guess, which is 0.5 and the fraction of positives in the testing set, respectively. The percentage of target-indication pairs in each held-out set is listed after the pipe symbol (|) in the titles. (LR: Logistic Regression; GBM: Gradient Boosting Machine; RF: Random Forest; MF: Matrix Factorization; BTF: Bayesian Tensor Factorization)
Fig. 4Validation of BTF model prediction. a t-SNE visualization of indications based on the latent factors learned in BTF model. Each dot represents one indication and the size of the dot is proportional to the number of targets that have been clinically failed. The inserted pie charts show diseases composition of representative clusters of indications in the 2D visualization. 2D embedding was obtained by using perplexity = 30 in t-SNE and the visualization is consistent using different perplexity values in the range from 10 to 50. b BTF prediction scores of target-indication pairs in Phase I-III clinical trials. The numbers are P-values (Wilcoxon rank sum tests) from comparing prediction scores of target-indication pairs between any two phases
High Scoring Pairs of Interest from TF Model
| Target | High Scoring Indication in Clinical Pipeline (Phase*) | PubmedID | Related Approved Indication (Total Approved Indications) |
|---|---|---|---|
| ABCC8 | Glucose Intolerance (III) | 23903354 | Diabetes Mellitus (1) |
| ADRB1 | Cachexia (II) | 20426789 | Ischemia (13) |
| ADRB2 | Hypoglycemia (I) | 22013013 | Glaucoma (15) |
| ADRB2 | Myocardial Infarction (III) | 26692153 | Heart Failure (15) |
| AGTR1 | Hypercholesterolemia (III) | 12117739 | Hyperlipidemias (7) |
| CYP3A4 | Hepatitis C (II) | 20938912 | HIV Infections (1) |
| IL2 | Behcet Syndrome (II) | 26654556 | Graft Rejection (1) |
| IL6 | Waldenstrom Macroglobulinemia (I) | 26238488 | Giant Lymph Node Hyperplasia (1) |
| IL6 | Arthritis, Psoriatic (II) | 27789987 | Giant Lymph Node Hyperplasia (1) |
| OPRM1 | Schizophrenia (III) | 27397309 | Migraine Disorders (22) |
| RYR1 | Muscular Dystrophy, Duchenne (I) | 26793121 | Malignant Hyperthermia (1) |
| SERPINC1 | Hemophilia (II) | 27099538 | Blood Coagulation Disorders (16) |
| TNFSF11 | Hypercalcemia (II) | 27904108 | Osteoporosis (1) |
| VDR | Alopecia (I) | 27932380 | Keratosis (9) |
| VDR | Cachexia (I) | 22497530 | Chronic Kidney Failure (9) |
New indications of approved targets in clinical trials (Phase* as of May 27, 2016) that have the highest probability of eventual clinical success as measured by the tensor factorization model. The full list is available in the supplement. For illustrative purposes, we list a related indication approved for assets for each target