Literature DB >> 27881429

A review of validation strategies for computational drug repositioning.

Abstract

Repositioning of previously approved drugs is a promising methodology because it reduces the cost and duration of the drug development pipeline and reduces the likelihood of unforeseen adverse events. Computational repositioning is especially appealing because of the ability to rapidly screen candidates in silico and to reduce the number of possible repositioning candidates. What is unclear, however, is how useful such methods are in producing clinically efficacious repositioning hypotheses. Furthermore, there is no agreement in the field over the proper way to perform validation of in silico predictions, and in fact no systematic review of repositioning validation methodologies. To address this unmet need, we review the computational repositioning literature and capture studies in which authors claimed to have validated their work. Our analysis reveals widespread variation in the types of strategies, predictions made and databases used as 'gold standards'. We highlight a key weakness of the most commonly used strategy and propose a path forward for the consistent analytic validation of repositioning techniques.

Entities: Chemical Disease Species

Keywords: analytic validation; drug repositioning; research reproducibility

Mesh：

Year: 2018 PMID： 27881429 PMCID： PMC5862266 DOI： 10.1093/bib/bbw110

Source DB: PubMed Journal: Brief Bioinform ISSN： 1467-5463 Impact factor: 11.622

Introduction

In recent years, the drug-repositioning field has gained substantial traction with both academics and pharmaceutical companies, both because of the lack of preclinical development and optimization, as well as the substantially reduced risk of unforeseen adverse events [1]. A search for ‘drug repositioning’ in PubMed reveals that the number of publications has grown rapidly from only 11 articles in 2007 to 274 in 2015. Many of the publications in the drug repositioning space have been computational methods; despite advances in high-content screening and robotics, high-throughput in vitro screens are still costly, leading many groups to turn to computational repositioning strategies [2]. Computational repositioning methods have proliferated substantially, using a variety of molecular [3-6], literature-derived [7-10] and clinical [11, 12] data as their core drivers of repositioning hypothesis generation. All computational repositioning methods promise to prioritize repositioning candidates, and studies describing these methods typically claim superiority over competing methodologies. To do so, such studies perform analytic validation, whereby they compare the computational results of their methods (and competing methods) to existing biomedical knowledge. A successful method is one that consistently identifies known associations between drugs and diseases (and for some, fails to identify ‘wrong’ associations). When examining the repositioning literature, however, it is apparent that there are no consistent best practices for comparing studies and for validation of methods. In this article, we examine the current trends in validation among studies in the computational repositioning field. We identify three major types of validation, involving the use of case studies, overlap of predictions with known drug indications and sensitivity- or specificity-based methods. All of these methods have drawbacks, trading off between a lack of analytic rigor and the unsatisfying assumption that all repositioning candidates are, a priori, false positives. We propose that the best-case scenario with currently available data is to use ‘overlap-type’ validation and describe a promising next step for the field.

Analytic validation in the computational repositioning literature

To gain a better understanding of validation in the computational repositioning field, we searched PubMed for articles in the computational drug repositioning space that claimed to have performed validation of their methodology or pipeline using a Boolean search [‘(drug repositioning OR drug repurposing) AND (gold standard OR AUC OR receiver operating characteristic OR validation OR validated OR validate)’, performed on 14 June 2016, Figure 1A].

Figure 1.

Computational repositioning validation studies. (A) The search term used with PubMed to retrieve articles. (B) Sources of drug-indication annotation data used in studies retrieved in the literature search. (C) Types of validation in studies retrieved in the literature search. See main text for details. Using this search, we began our analysis with a pool of 213 articles. To further refine our search, we manually reviewed each of the articles, and excluded non-computational papers (e.g. high-throughput drug screens in cell lines or clinical trials), those not in the small molecule/drug field (e.g. articles referring to surgical or dental repositioning), and non-research articles (reviews and book chapters). From the remaining 141 articles, we focused on those that predicted novel indications for drugs. At this point, we excluded 35 articles that focused on target prediction only; in silico target prediction studies aim to predict novel, molecular targets for existing and novel drug candidates. We argue that target prediction is still one step removed from drug repositioning; true or predicted molecular targets can be used as part of repositioning methodologies, but do not themselves provide the full repositioning hypothesis from drug to indication. Furthermore, we note that benchmarking such studies is already possible with the wealth of high-throughput drug–target binding screens [13]. From the studies that predicted new indications for existing drugs, we excluded 67 articles that predicted indications for a single drug or disease, and kept those that made predictions for more than one drug and disease. We excluded these single-drug and disease studies because they were not designed to be applied broadly, and often contained domain-specific knowledge about a particular drug or disease (e.g. Genome Wide Association Studies results for a single disease or structure–association relationship studies for a single drug). This resulted in 39 computational repositioning methods articles with predictions spanning multiple drugs and indications and a clear claim of analytic validation (the full list of captured articles is provided in Supplementary Table S1). We began our analysis by first examining the types of databases used for analytic validation: we discovered that, although many of the investigators in the 39 studies we examined claim to use a ‘gold standard’, there is substantial heterogeneity in the source of these standards as well as the types of data they contain (Figure 1B). For example, DrugBank [14] contains information about only the FDA-approved indications for drugs, while the Comparative Toxicogenomics Database (CTD) [15] contains literature-annotated links between drugs and both approved and investigational indications. While DrugBank contains a set of true drug-indication annotations, it misses off-label uses and late-stage clinical trials; on the other hand, the CTD relies on literature annotations and contains drug–indication pairs that have subsequently failed to receive FDA approval. This inconsistency in specificity among databases used for validation is detrimental to reproducibility and may lead to claims on extremely high accuracy. We next examined the types of analytic validation methodologies used by investigators in computational repositioning. We grouped the 39 captured studies into three classes: (1) validation with a single example or case study of a single disease area (CSV), (2) sensitivity-based validation only (SV) and (3) both sensitivity- and specificity-based validation (SSV) (Figure 1C). First, of the three validation types, CSV is the least rigorous; each of the four CSV studies reported one to three clinically justifiable predictions (Supplementary Table S1). For example, Sirota and colleagues [16] identified cimetidine as a potential therapy for lung adenocarcinoma, which was picked from 2664 significant predictions (of >16 000 drug–indication pairs tested) on the basis of tolerability. The investigators provided evidence of its success using both the literature and an in vitro study. The inclusion of in vitro evidence lends additional biological credence to their single case study, but analytic evidence of their method’s overall success is lacking. We note here that we are not arguing that in vitro evidence is inferior to analytic validation; biological validation is a requirement for any individual candidate to be advanced in a drug development pipeline. However, successful biological validation of a single repositioning candidate cannot be extrapolated to all predictions made by a method. Following CSV, SV provides more analytic rigor by measuring the overlap between currently approved or investigational indications for drugs and the indications predicted by a given repositioning method. In contrast to CSV, SV validation methodologies assess the general ability of repositioning methods to make reasonable claims, rather than selecting a single or several high-ranking predictions to test in depth. For example, Jung and Lee [17] examined the overlap between predictions made by their method and both approved drug indications (from a combination of DrugBank [14], PharmGKB [18] and TTD [19]) and investigational indications from ongoing clinical trials (from ClinicalTrials.gov). SV is appealing because investigators only need to have a set of true positives to which to compare their predictions (e.g. all approved or investigational drug indications). A key drawback of SV is the inability to use traditional two-class machine learning (ML) approaches. An alternative is to train one-class classification algorithms on positive examples only; however, to our knowledge, no methods in the drug repositioning space have used one-class ML approaches. We emphasize, as in any ML exercise, that investigators should perform cross-validation in which algorithms are fine-tuned on a portion of the data and tested on another; testing using an as yet unseen portion of the data is more representative of future performance than training and testing on the full data set [20]. Both CSV and SV validation methods are less popular than SSV. SSV is, in theory, the most rigorous type of validation. For our purposes, SSV-based methods include those that directly report sensitivity and specificity (or reported values for positive or negative predictive value), as well as area under the receiver operating characteristic (AUROC, a commonly used method for determining the predictive value of a method reviewed in [21]). For example, Gottlieb and colleagues [4] used a list of approved drug indications (from DailyMed), and determined how many of their predicted drug indications overlapped with that set (true positives) or did not overlap (false positives); their results are summarized by calculating the AUROC of their predictions. In contrast to sensitivity-only validation methods, methods that rely on both sensitivity and specificity require information about which predicted drug indications are false (false positives). In all of the SSV studies we reviewed, the investigators chose to mark all unannotated drug–indication pairs as false positives. This is troubling for two major reasons. First, the choice of annotation database can substantially impact the sensitivity and specificity estimates. If investigators consistently used a single database of standardized indication information, this issue could be avoided; however, in practice, annotation is derived from a variety of drug information databases and annotation types, from FDA approval, to ongoing clinical trials (Figure 1C). Second, marking unannotated pairs as false suggests that all novel repositioning hypotheses are false positives. This is obviously counterintuitive, as computational repositioning methods should predict novel indications, for which there is no currently annotated association. In addition, this strategy creates a substantial imbalance in the number of true and false positives; such an imbalance has been shown to reduce the accuracy of AUROC and other SSV estimates [22]. It is our opinion that, with currently available data, the best strategy for analytic validation in repositioning studies is SV. Under ideal conditions, in which a database of true positives and true negatives exists, SSV is the optimal choice; currently, however, the field lacks such a database. In contrast, SV does not require true negatives and therefore may be the most practical solution until such a database emerges. We note that there are still two central caveats with using SV for analytic validation: (1) investigators should choose the database to which they should compare their results carefully, potentially corroborating drug–indication pairs between multiple sources and (2) investigators using ML-based methods should test the performance of their methods with cross-validation to prevent over-fitting and limit the reporting of unrealistic predictive power. The question then becomes the following: where can we go from here for analytic validation? Is there an internally consistent database that could be used with SSV? The way forward for SSV is to develop a set of true negatives; such a set would include drug–indication pairs that were tried in a clinical setting and were proved not to be efficacious or safe. An easily accessible database of this information does not, to our knowledge, currently exist, and creating one would require substantial biomedical, regulatory and legal understanding and resources. Despite these challenges, creating a true ‘gold standard’ that contains both repositioning successes and failures is one way to improve consistency in the field, and allows for equitable comparisons between methods. We believe that such a ‘gold standard’ database can improve the accuracy of drug repositioning methods and increase the probability of success in clinical trials.

Conclusions

We present here a brief review of the computational drug repositioning field, with a focus on strategies for analytically validating such methods. We describe the three types of validation currently in use, and highlight the issues with both consistency and key assumptions made by each. In closing, we propose a strategy for improving the quality of validation in computational repositioning. There are currently three predominate validation methods used for computational repositioning studies: (1) case studies, (2) overlap of predictions with known drug indications and (3) sensitivity- or specificity-based methods. There is wide variation in the types and sources of annotation data used for performing validation, leading to a lack of consistency in the field. Despite being rigorous, sensitivity- or specificity-based methods require the use of true negatives, and current studies assume that all unannotated drug–indication pairs are false positives. While a sensitivity and specificity based method is optimal, we posit that the current best strategy is overlap (sensitivity only) because, despite a lower level of rigor, it does not require contradictory assumptions. We propose a new direction in repositioning validation through the creation of a repositioning database to promote reproducible calculations of sensitivity and specificity.

Supplementary data

Supplementary data are available online at http://bib.oxfordjournals.org/.

Funding

The National Institutes of Health (NIH) Training grant from the National Human Genome Research Institute (NHGRI; grant number T32HG002295-12 to A.S.B.); the National Institute of Environmental Health Sciences (NIEHS; grant numbers R00 ES023504 and R21 ES025052 to C.J.P.); a gift from Agilent Technologies and a PhRMA fellowship (to C.J.P.). Click here for additional data file.

20 in total

1. Inferring disease association using clinical factors in a combinatorial manner and their use in drug repositioning.

Authors: Jinmyung Jung; Doheon Lee
Journal: Bioinformatics Date: 2013-06-06 Impact factor: 6.937

2. Quantitative biomedical annotation using medical subject heading over-representation profiles (MeSHOPs).

Authors: Warren A Cheung; B F Francis Ouellette; Wyeth W Wasserman
Journal: BMC Bioinformatics Date: 2012-09-27 Impact factor: 3.169

3. DrugBank: a comprehensive resource for in silico drug discovery and exploration.

Authors: David S Wishart; Craig Knox; An Chi Guo; Savita Shrivastava; Murtaza Hassanali; Paul Stothard; Zhan Chang; Jennifer Woolsey
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

4. DMAP: a connectivity map database to enable identification of novel drug repositioning candidates.

Authors: Hui Huang; Thanh Nguyen; Sara Ibrahim; Sandeep Shantharam; Zongliang Yue; Jake Y Chen
Journal: BMC Bioinformatics Date: 2015-09-25 Impact factor: 3.169

5. Medication-wide association studies.

Authors: P B Ryan; D Madigan; P E Stang; M J Schuemie; G Hripcsak
Journal: CPT Pharmacometrics Syst Pharmacol Date: 2013-09-18

6. The Comparative Toxicogenomics Database's 10th year anniversary: update 2015.

Authors: Allan Peter Davis; Cynthia J Grondin; Kelley Lennon-Hopkins; Cynthia Saraceni-Richards; Daniela Sciaky; Benjamin L King; Thomas C Wiegers; Carolyn J Mattingly
Journal: Nucleic Acids Res Date: 2014-10-17 Impact factor: 16.971

7. Inferring novel disease indications for known drugs by semantically linking drug action and disease mechanism relationships.

Authors: Xiaoyan A Qu; Ranga C Gudivada; Anil G Jegga; Eric K Neumann; Bruce J Aronow
Journal: BMC Bioinformatics Date: 2009-05-06 Impact factor: 3.169

8. Toward more realistic drug-target interaction predictions.

Authors: Tapio Pahikkala; Antti Airola; Sami Pietilä; Sushil Shakyawar; Agnieszka Szwajda; Jing Tang; Tero Aittokallio
Journal: Brief Bioinform Date: 2014-04-09 Impact factor: 11.622

9. Concept Modeling-based Drug Repositioning.

Authors: Jagadeesh Patchala; Anil G Jegga
Journal: AMIA Jt Summits Transl Sci Proc Date: 2015-03-23

10. A Drug-Centric View of Drug Development: How Drugs Spread from Disease to Disease.

Authors: Raul Rodriguez-Esteban
Journal: PLoS Comput Biol Date: 2016-04-28 Impact factor: 4.475

13 in total

1. Validation strategies for target prediction methods.

Authors: Neann Mathai; Ya Chen; Johannes Kirchmair
Journal: Brief Bioinform Date: 2020-05-21 Impact factor: 11.622

2. In silico study of chikungunya polymerase, a potential target for inhibitors.

Authors: Ritu Ghildiyal; Sanjay Gupta; Reema Gabrani; Gopal Joshi; Amita Gupta; V K Chaudhary; Vandana Gupta
Journal: Virusdisease Date: 2019-10-26

3. Design and application of a knowledge network for automatic prioritization of drug mechanisms.

Authors: Michael Mayers; Roger Tu; Dylan Steinecke; Tong Shu Li; Núria Queralt-Rosinach; Andrew I Su
Journal: Bioinformatics Date: 2022-05-13 Impact factor: 6.931

4. A standard database for drug repositioning.

Authors: Adam S Brown; Chirag J Patel
Journal: Sci Data Date: 2017-03-14 Impact factor: 6.444

5. Connecting genetics and gene expression data for target prioritisation and drug repositioning.

Authors: Enrico Ferrero; Pankaj Agarwal
Journal: BioData Min Date: 2018-05-31 Impact factor: 2.522

Review 6. Changing Trends in Computational Drug Repositioning.

Authors: Jaswanth K Yella; Suryanarayana Yaddanapudi; Yunguan Wang; Anil G Jegga
Journal: Pharmaceuticals (Basel) Date: 2018-06-05

Review 7. Current Status of COVID-19 Therapies and Drug Repositioning Applications.

Authors: Ozlem Altay; Elyas Mohammadi; Simon Lam; Hasan Turkez; Jan Boren; Jens Nielsen; Mathias Uhlen; Adil Mardinoglu
Journal: iScience Date: 2020-06-20

8. Repurposing screen identifies mebendazole as a clinical candidate to synergise with docetaxel for prostate cancer treatment.

Authors: Linda K Rushworth; Kay Hewit; Sophie Munnings-Tomes; Sukrut Somani; Daniel James; Emma Shanks; Christine Dufès; Anne Straube; Rachana Patel; Hing Y Leung
Journal: Br J Cancer Date: 2019-12-17 Impact factor: 7.640

Review 9. Opportunities and obstacles for deep learning in biology and medicine.

Authors: Travers Ching; Daniel S Himmelstein; Brett K Beaulieu-Jones; Alexandr A Kalinin; Brian T Do; Gregory P Way; Enrico Ferrero; Paul-Michael Agapow; Michael Zietz; Michael M Hoffman; Wei Xie; Gail L Rosen; Benjamin J Lengerich; Johnny Israeli; Jack Lanchantin; Stephen Woloszynek; Anne E Carpenter; Avanti Shrikumar; Jinbo Xu; Evan M Cofer; Christopher A Lavender; Srinivas C Turaga; Amr M Alexandari; Zhiyong Lu; David J Harris; Dave DeCaprio; Yanjun Qi; Anshul Kundaje; Yifan Peng; Laura K Wiley; Marwin H S Segler; Simina M Boca; S Joshua Swamidass; Austin Huang; Anthony Gitter; Casey S Greene
Journal: J R Soc Interface Date: 2018-04 Impact factor: 4.293

Review 10. A Systematic Review of Computational Drug Discovery, Development, and Repurposing for Ebola Virus Disease Treatment.

Authors: James Schuler; Matthew L Hudson; Diane Schwartz; Ram Samudrala
Journal: Molecules Date: 2017-10-20 Impact factor: 4.411