Literature DB >> 32437556

ThETA: transcriptome-driven efficacy estimates for gene-based TArget discovery.

Mario Failli1,2, Jussi Paananen1,3, Vittorio Fortino1,3.   

Abstract

SUMMARY: Estimating efficacy of gene-target-disease associations is a fundamental step in drug discovery. An important data source for this laborious task is RNA expression, which can provide gene-disease associations on the basis of expression fold change and statistical significance. However, the simply use of the log-fold change can lead to numerous false-positive associations. On the other hand, more sophisticated methods that utilize gene co-expression networks do not consider tissue specificity. Here, we introduce Transcriptome-driven Efficacy estimates for gene-based TArget discovery (ThETA), an R package that enables non-expert users to use novel efficacy scoring methods for drug-target discovery. In particular, ThETA allows users to search for gene perturbation (therapeutics) that reverse disease-gene expression and genes that are closely related to disease-genes in tissue-specific networks. ThETA also provides functions to integrate efficacy evaluations obtained with different approaches and to build an overall efficacy score, which can be used to identify and prioritize gene(target)-disease associations. Finally, ThETA implements visualizations to show tissue-specific interconnections between target and disease-genes, and to indicate biological annotations associated with the top selected genes.
AVAILABILITY AND IMPLEMENTATION: ThETA is freely available for academic use at https://github.com/vittoriofortino84/ThETA. CONTACT: vittorio.fortino@uef.fi. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press.

Entities:  

Year:  2020        PMID: 32437556      PMCID: PMC7390989          DOI: 10.1093/bioinformatics/btaa518

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

In order to minimize the risk of drug development failures, academic and industrial research has focused on target-based drug discovery approaches. This has led to several computational methods to score gene(target)–disease association upon efficacy estimates calculated from different data sources, ranging from scientific publications to omics databases (Koscielny ; Nguyen ; Piñero ). We have recently proposed two transcriptome-driven approaches to identify and score gene–disease associations (Failli ), namely tissue-specific efficacy (TSE) and modulation (MOD) scores. The first method identifies genes that are closely related to disease-genes (genes with genetic variants that associate with disease risk) in tissue-specific gene co-expression networks. The second method estimates the likelihood of a gene perturbation (e.g. knockout or know-down) resulting in specific reversion of disease gene-expression profiles. As we have previously reported, these methods can considerably increase the true positive rate of known target–disease associations (Failli ). Here, we introduce ThETA, an R package that easily facilitates performing these efficacy scoring methods. In particular, ThETA provides functions (i) to tailor the workflow of the proposed scoring methods, (ii) to integrate these novel scores with efficacy estimates available on the Open Targets Platform and generate an overall efficacy score that can be used to prioritize target–disease associations. Moreover, ThETA provides visualization tools to depict tissue-specific network paths linking top targets (or genes) and disease-genes, to visualize biological annotations associated to set of selected gene targets. An example of workflow that R-users can implement with the ThETA package is depicted in Figure 1.
Fig. 1.

An overview of the functions provided by ThETA. (1) ThETA generates target(gene)–disease association scores by using two novel mRNA-based scoring methods. (2) ThETA adds and combines efficacy scores retrieved from alternative drug–target discovery platforms (e.g. Open target platform). The table aligned with the steps 2 and 3 indicates the top-ranked targets for Type 2 Diabetes after using the harmonic sum as prioritization score. (3) ThETA compiles efficacy estimates for all annotated disease–gene pairs, and it (4) provides an R-shiny application to display selected drug targets in tissue-specific networks. The tissue-specific gene networks include three different types of node: known disease-genes (red stars), novel targets (light blue triangles) and bridge genes (blue circles), which connect putative targets to known disease-genes. (Color version of this figure is available at Bioinformatics online.)

An overview of the functions provided by ThETA. (1) ThETA generates target(gene)–disease association scores by using two novel mRNA-based scoring methods. (2) ThETA adds and combines efficacy scores retrieved from alternative drug–target discovery platforms (e.g. Open target platform). The table aligned with the steps 2 and 3 indicates the top-ranked targets for Type 2 Diabetes after using the harmonic sum as prioritization score. (3) ThETA compiles efficacy estimates for all annotated disease–gene pairs, and it (4) provides an R-shiny application to display selected drug targets in tissue-specific networks. The tissue-specific gene networks include three different types of node: known disease-genes (red stars), novel targets (light blue triangles) and bridge genes (blue circles), which connect putative targets to known disease-genes. (Color version of this figure is available at Bioinformatics online.)

2 Methods and features

This section describes the main features of the R package ThETA.

2.1 Compiling transcriptome-driven efficacy scores

The R package ThETA provides the implementation of two transcriptome-based efficacy scoring methods, namely TSE and MOD scores, respectively. By traversing existing tissue/disease specific networks, the tissue-specific scoring (TSE) method detects gene targets that are closely related to disease-genes in disease-relevant tissues. While, the MOD score estimates the likelihood of a gene perturbation (e.g. knockout and knockdown) to result in specific reversion of disease gene-expression profiles. More details on the TSE and MOD scores can be found in our previous study (Failli ). In order to compile the TSE score, the user has to provide the following data inputs: tissue-specific gene-expression data, gene–disease pairs from genome-wide association studies, and human protein–protein interaction (PPI) network. Additionally, ThETA provides three pre-computed datasets for this purpose, including data retrieved from GTEx (Ardlie, 2015), DisGeNET (Piñero ) and StringDB (Franceschini ). In addition, ThETA includes two datasets representing pre-computed node centrality scores from the human PPI network and disease–tissue association scores. These five datasets allow users to rapidly compile TSE scores. However, users still have the possibility to specify different input data and cut-off values for the selection, e.g. for the most significant disease–tissue associations. The MOD score requires lists of up- and down-regulated genes induced by disease and gene perturbations (e.g. gene knockout, knockdown, etc.). For this task, ThETA included gene lists retrieved from Enrichr (Kuleshov ). Details of the input data format are included in the Supplementary Material document called ‘Walkthrough ThETA’. Moreover, known target–disease associations from the DrugBank database (Wishart , 2018), the Therapeutic Target Database (Chen ; Wang ) and the Comparative Toxicogenomics Database (Davis , 2019) are provided in order to allow users to assess the accuracy of the compiled efficacy estimates. These databases include pairwise associations on drugs, molecular targets and diseases.

2.2 Uploading and combining external efficacy estimates

Many different drug–target discovery platforms, such as Open Targets (Koscielny ) and DisGeNET, provide efficacy scores for drug–target disease associations. These scores, which are freely available for download from their respective web sites, can be integrated with the efficacy estimates provided by ThETA in order to define more robust efficacy estimates for the prioritization of disease–target associations. Currently, ThETA implements two integration methods: the harmonic sum proposed by the authors of Open Targets (https://docs.targetvalidation.org/getting-started/scoring) and the max function. The max simply considers the maximum value across different efficacy estimates. While, the harmonic sum aggregates individual efficacy scores, sorted by descending score i.e. from higher to lower values.

2.3 Compiling tissue-specific networks and biological annotations for selected gene targets

An important novelty presented by ThETA is the use of tissue-specific information for the evaluation of genes as drug targets. Indeed, it is acknowledged that drugs modulating tissue-specific targets are more likely to succeed in phase 3 of clinical trials, and that by targeting tissue specificity there are opportunities to identify drug targets with improved efficacy and safety (Ryaboshapkina and Hammar, 2019). Therefore, given the importance of tissue specificity of drug targets, ThETA includes an interactive visualization tool, based on R-shiny (Chang ), to display tissue-specific gene networks highlighting genes and pathways that connect putative targets with the genetic loci that underlie disease susceptibility, or simply disease-genes (see Fig. 1). In these graph structures, the selected genes are distinguished from the disease-genes and the so-called bridge genes, which connects genetic variations associated with diseases and selected targets. Another important feature of the presented R package is the possibility to compile extensive biological annotations. By using enrichplot (Yu, 2018) and clusterProfiler (Yu, 2018) R-packages, ThETA can compile different biological annotations, including KEGG (Kanehisa ; Kanehisa and Goto, 2000), GO (Ashburner ; The Gene Ontology Consortium, 2019) and REACTOME (Fabregat ; Jassal ), linked to selected targets. In more detail, given a target, it selects all genes in the shortest pathways connecting that target to known disease-genes, within relevant tissue-specific networks, and compiles corresponding biological annotations. Moreover, ThETA can be used to further explore the genes that are closely related to selected targets by using Random Walk with Restart (Fang and Gough, 2014).

3 Conclusion

ThETA offers a user-friendly toolbox in R for the computation of mRNA-driven efficacy estimates of disease–target associations. It allows the user to customize the selection of disease-relevant tissues and the estimation of tissue-specific and MOD scores. Comprehensive datasets are included to facilitate easy adaption of the methods. Moreover, different visualization and biological annotation tools are provided to conduct biological interpretations on putative drug targets. Finally, the R package ThETA provides tutorial vignettes including extensive examples on how to use its functions. Financial Support: none declared. Conflict of Interest: Dr M.F. and Dr J.P. have been working at University of Eastern Finland for Business Finland funded project that explores commercialization of drug–target prioritization technologies. Dr J.P. is an employee of Blueprint Genetics Ltd. Click here for additional data file.
  21 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  TTD: Therapeutic Target Database.

Authors:  X Chen; Z L Ji; Y Z Chen
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

3.  Open Targets: a platform for therapeutic target identification and validation.

Authors:  Gautier Koscielny; Peter An; Denise Carvalho-Silva; Jennifer A Cham; Luca Fumis; Rippa Gasparyan; Samiul Hasan; Nikiforos Karamanis; Michael Maguire; Eliseo Papa; Andrea Pierleoni; Miguel Pignatelli; Theo Platt; Francis Rowland; Priyanka Wankar; A Patrícia Bento; Tony Burdett; Antonio Fabregat; Simon Forbes; Anna Gaulton; Cristina Yenyxe Gonzalez; Henning Hermjakob; Anne Hersey; Steven Jupe; Şenay Kafkas; Maria Keays; Catherine Leroy; Francisco-Javier Lopez; Maria Paula Magarinos; James Malone; Johanna McEntyre; Alfonso Munoz-Pomer Fuentes; Claire O'Donovan; Irene Papatheodorou; Helen Parkinson; Barbara Palka; Justin Paschall; Robert Petryszak; Naruemon Pratanwanich; Sirarat Sarntivijal; Gary Saunders; Konstantinos Sidiropoulos; Thomas Smith; Zbyslaw Sondka; Oliver Stegle; Y Amy Tang; Edward Turner; Brendan Vaughan; Olga Vrousgou; Xavier Watkins; Maria-Jesus Martin; Philippe Sanseau; Jessica Vamathevan; Ewan Birney; Jeffrey Barrett; Ian Dunham
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

4.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.

Authors:  Maxim V Kuleshov; Matthew R Jones; Andrew D Rouillard; Nicolas F Fernandez; Qiaonan Duan; Zichen Wang; Simon Koplev; Sherry L Jenkins; Kathleen M Jagodnik; Alexander Lachmann; Michael G McDermott; Caroline D Monteiro; Gregory W Gundersen; Avi Ma'ayan
Journal:  Nucleic Acids Res       Date:  2016-05-03       Impact factor: 16.971

5.  Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans.

Authors: 
Journal:  Science       Date:  2015-05-07       Impact factor: 47.728

6.  DrugBank: a comprehensive resource for in silico drug discovery and exploration.

Authors:  David S Wishart; Craig Knox; An Chi Guo; Savita Shrivastava; Murtaza Hassanali; Paul Stothard; Zhan Chang; Jennifer Woolsey
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

7.  DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants.

Authors:  Janet Piñero; Àlex Bravo; Núria Queralt-Rosinach; Alba Gutiérrez-Sacristán; Jordi Deu-Pons; Emilio Centeno; Javier García-García; Ferran Sanz; Laura I Furlong
Journal:  Nucleic Acids Res       Date:  2016-10-19       Impact factor: 16.971

8.  Tissue-specific genes as an underutilized resource in drug discovery.

Authors:  Maria Ryaboshapkina; Mårten Hammar
Journal:  Sci Rep       Date:  2019-05-10       Impact factor: 4.379

9.  Prioritizing target-disease associations with novel safety and efficacy scoring methods.

Authors:  Mario Failli; Jussi Paananen; Vittorio Fortino
Journal:  Sci Rep       Date:  2019-07-08       Impact factor: 4.379

10.  The Comparative Toxicogenomics Database: update 2019.

Authors:  Allan Peter Davis; Cynthia J Grondin; Robin J Johnson; Daniela Sciaky; Roy McMorran; Jolene Wiegers; Thomas C Wiegers; Carolyn J Mattingly
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

View more
  3 in total

1.  Therapeutic target database update 2022: facilitating drug discovery with enriched comparative data of targeted agents.

Authors:  Ying Zhou; Yintao Zhang; Xichen Lian; Fengcheng Li; Chaoxin Wang; Feng Zhu; Yunqing Qiu; Yuzong Chen
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

Review 2.  Network approaches for modeling the effect of drugs and diseases.

Authors:  T J Rintala; Arindam Ghosh; V Fortino
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

Review 3.  Considerations and challenges for sex-aware drug repurposing.

Authors:  Jennifer L Fisher; Emma F Jones; Victoria L Flanary; Avery S Williams; Elizabeth J Ramsey; Brittany N Lasseigne
Journal:  Biol Sex Differ       Date:  2022-03-25       Impact factor: 5.027

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.