Literature DB >> 23129297

DvD: An R/Cytoscape pipeline for drug repurposing using public repositories of gene expression data.

Clare Pacini¹, Francesco Iorio, Emanuel Gonçalves, Murat Iskar, Thomas Klabunde, Peer Bork, Julio Saez-Rodriguez.

Abstract

SUMMARY: Drug versus Disease (DvD) provides a pipeline, available through R or Cytoscape, for the comparison of drug and disease gene expression profiles from public microarray repositories. Negatively correlated profiles can be used to generate hypotheses of drug-repurposing, whereas positively correlated profiles may be used to infer side effects of drugs. DvD allows users to compare drug and disease signatures with dynamic access to databases Array Express, Gene Expression Omnibus and data from the Connectivity Map.
AVAILABILITY AND IMPLEMENTATION: R package (submitted to Bioconductor) under GPL 3 and Cytoscape plug-in freely available for download at www.ebi.ac.uk/saezrodriguez/DVD/.

Entities: Chemical Disease Species

Mesh：

Year: 2012 PMID： 23129297 PMCID： PMC3530913 DOI： 10.1093/bioinformatics/bts656

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Multiple methods based on matching gene expression signatures have been proposed to identify anti-correlated drug and disease profiles (Hu ; Sirota ). The central paradigm is that a drug compound, which shows the opposite effect on gene expression to the observed for a disease could be used to treat that particular disease. These methods have resulted in hypotheses of differential uses (repurposing) for existing compounds that have been validated experimentally. This highlights the potential power of mining existing safe compounds for repurposing, which does not require the expensive and extensive initial design and clinical phases of drug discovery. It is expected that analysing new and existing data from public repositories such as Array Express (www.ebi.ac.uk/arrayexpress/), Gene Expression Omnibus (GEO) (www.ncbi.nlm.nih.gov/geo/) and the Connectivity Map (CMap) (www.broadinstitute.org/cmap/) using these methods will become increasingly popular in computational drug discovery (Iorio ). Motivated by this, we have developed Drug versus Disease (DvD), an R package to ‘match’ drug and disease profiles. This is done by making use of gene set enrichment analysis (Subramanian ) and visualizing the final results in networks containing clusters of similar drugs and diseases. DvD differs from existing web servers and databases (such as ProfileChaser, MARQ and SPIED, see Supplementary Material) in that it can dynamically access both Array Express and GEO to generate input profiles. With DvD, users can automatically compare input profiles to reference drug data from CMap and disease profiles curated from GEO. This reference data has associated networks where, unlike similar tools, drugs or diseases exerting similar effects on transcription are grouped into clusters. DvD is flexible, offering customizable and default data options for the input profiles and the reference data. The Cytoscape (www.cytoscape.org/) plug-in provides a user interface to the full DvD pipeline, as well as a visualization platform for the results. In the final networks, drug or disease nodes are linked to the DrugBank (www.drugbank.ca/) and Medical Subject Headings (MeSH) (www.ncbi.nlm.nih.gov/mesh/) web browsers, respectively.

2 ANALYSIS PIPELINE

The DvD pipeline provides a number of processing options to generate genome-wide expression profiles from microarray experiments (see Fig. 1A). Options to import data from local directories and Array Express or GEO are supported (Davis ; Kauffmann ). Data are normalized using either rma or mas5 (Irizarry ). DvD will automatically annotate, filter and combine probes to HUGO genes for the Affymetrix platforms HG-U133A, HG-U133A-2 and HG-U133-Plus2 using BiomaRt (Durinck ). Annotation files can be passed to DvD to process data from other platforms. Probes mapping to multiple gene identifiers are removed. Multiple probes mapping to the same gene can be converted using the average or maximal intensities, median polish or by selecting the probe with the highest variance across all arrays.

Fig. 1.

DvD pipeline. (A) GenerateProfiles and ClassifyProfile are wrapper functions whose stages are shown in the vertical flow charts. GenerateProfiles imports the data and normalizes CEL files where necessary. Probes to Genes maps Affymetrix probes to HUGO gene symbols using BiomaRt. Finally, differential expression statistics are calculated using limma. SelectRankedLists can be used to select a subset of the contrasts output from generate profiles. Classifyprofile can take input from generateProfiles, selectrankedlists or the users own preprocessed data. This function calculates and identifies significant Enrichment scores and produces corresponding network files. (B) Example visualization produced by the Cytoscape plug-in for the prostate cancer profile (gse17906). Red edges are for inverse correlations and green positive DvD expects as input either a drug or disease profile. Using this, and the experimental design factors from the Array Express and GEO databases, DvD identifies a main factor for the experiment. In this way, unlike existing methods, DvD is able to calculate differential expression (Smyth ) between levels of the main factor, stratified by a second factor (see Supplementary Material). Given a ranked list of differential expression, either from DvD or by providing preprocessed data, DvD defines gene sets. These can be a fixed number chosen a priori or determined by the number of significantly differentially expressed genes. Enrichment scores are then calculated using either the Kolomgorov–Smirnov-based statistic (Iorio ; Subramanian ) or the weighted signed statistic (Zhang ) by querying the reference dataset with these gene sets. Significance of enrichment scores is determined by comparison with an empirical null distribution. Scores can be corrected for multiple hypothesis testing using either Benjamini–Hochberg correction or q-value method. Profiles producing significant scores are finally assigned to clusters using single or average linkage. In the latter case, the average score for a cluster is defined as either the mean or the median distance to each profile in the cluster.

2.1 DvD data packages

Two associated data packages, cMap2data and DrugVsDiseasedata provide default reference ranked expression profiles and clusters. The cMap2data is based on the CMap version 2 dataset, which contains 6100 hybridizations of 1309 different compounds. The merged profiles obtained by Iorio were used to generate a single gene level ranked profile for each of the 1309 compounds. Disease profiles are defined for 45 diseases based on data from GEO with associated clusters. Analysis of the disease and compound clusters demonstrated viable results. For example, one drug cluster was significantly enriched for Histone deacetylase inhibitors, and a disease cluster was found that contained multiple profiles of different cancers (see Supplementary Material).

2.2 Cytoscape plug-in

The Cytoscape plug-in uses the Rserve framework. The main R wrapper provides a graphical interface to the full DvD pipeline and information on the drug and disease profiles contained in the DvDdata package (Fig. 1B). This is obtained from DrugBank for the drugs and MeSH for the disease profiles. Furthermore, it links DvD to other Cytoscape plugins for further analysis. This could include mapping differential expression profiles associated with drug candidates obtained from DvD on to signalling networks (see Supplementary Material).

3 RESULTS

To illustrate the use of DvD in drug repurposing we analysed several disease datasets available from GEO. We compared them with the 1309 compounds in the CMap using DvD and considered significant those connections with q-value < 0.05. A prostate cancer profile (gse17906) had seven significant matches with five being negative (Fig. 1B). The strongest negative correlation was with Estradiol, a known treatment for prostate cancer. For a breast cancer profile (gse5847), we found the third highest inverse correlation with Tamoxifen, a similarly well-known treatment for breast cancer. The compound scoring highest was Ranitidine, a Histamine receptor type-2 (H2) antagonist, which has previously analysed as a potential treatment for breast cancer (Bolton ). Finally, we analysed a type II diabetes profile (gse15653) and found Phenformin and Torasemide at the seventh and ninth highest therapeutic scores, respectively. Interestingly, Finasteride scored higher than both of these known treatments for type II diabetes, with the third strongest negative correlation. Finasteride inhibits the type-2 5 alpha-reductase enzyme, which converts testosterone to dihydrotestosterone. Testosterone is known to be important in glucose homeostasis and lipid metabolism (Saad, 2009). These results were not obtained when analysing these three profiles using CMap (Supplementary Material), showing the value in combining replicate experiments to generate profiles for comparison.

13 in total

1. Linear models and empirical bayes methods for assessing differential expression in microarray experiments.

Authors: Gordon K Smyth
Journal: Stat Appl Genet Mol Biol Date: 2004-02-12

2. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor.

Authors: Sean Davis; Paul S Meltzer
Journal: Bioinformatics Date: 2007-05-12 Impact factor: 6.937

3. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205

Review 4. H2-antagonists in the treatment of colon and breast cancer.

Authors: E Bolton; J King; D L Morris
Journal: Semin Cancer Biol Date: 2000-02 Impact factor: 15.707

Review 5. The role of testosterone in type 2 diabetes and metabolic syndrome in men.

Authors: Farid Saad
Journal: Arq Bras Endocrinol Metabol Date: 2009-11

6. Importing ArrayExpress datasets into R/Bioconductor.

Authors: Audrey Kauffmann; Tim F Rayner; Helen Parkinson; Misha Kapushesky; Margus Lukk; Alvis Brazma; Wolfgang Huber
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

7. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt.

Authors: Steffen Durinck; Paul T Spellman; Ewan Birney; Wolfgang Huber
Journal: Nat Protoc Date: 2009-07-23 Impact factor: 13.491

8. Human disease-drug network based on genomic expression profiles.

Authors: Guanghui Hu; Pankaj Agarwal
Journal: PLoS One Date: 2009-08-06 Impact factor: 3.240

9. Transcriptional data: a new gateway to drug repositioning?

Authors: Francesco Iorio; Timothy Rittman; Hong Ge; Michael Menden; Julio Saez-Rodriguez
Journal: Drug Discov Today Date: 2012-08-07 Impact factor: 7.851

10. A simple and robust method for connecting small-molecule drugs using gene-expression signatures.

Authors: Shu-Dong Zhang; Timothy W Gant
Journal: BMC Bioinformatics Date: 2008-06-02 Impact factor: 3.169

30 in total

Review 1. A survey of current trends in computational drug repositioning.

Authors: Jiao Li; Si Zheng; Bin Chen; Atul J Butte; S Joshua Swamidass; Zhiyong Lu
Journal: Brief Bioinform Date: 2015-03-31 Impact factor: 11.622

Review 2. A review of connectivity map and computational approaches in pharmacogenomics.

Authors: Aliyu Musa; Laleh Soltan Ghoraie; Shu-Dong Zhang; Galina Glazko; Olli Yli-Harja; Matthias Dehmer; Benjamin Haibe-Kains; Frank Emmert-Streib
Journal: Brief Bioinform Date: 2018-05-01 Impact factor: 11.622

Review 3. Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review.

Authors: Peter Csermely; Tamás Korcsmáros; Huba J M Kiss; Gábor London; Ruth Nussinov
Journal: Pharmacol Ther Date: 2013-02-04 Impact factor: 12.310

4. Systems pharmacology-based integration of human and mouse data for drug repurposing to treat thoracic aneurysms.

Authors: Jens Hansen; Josephine Galatioto; Cristina I Caescu; Pauline Arnaud; Rhodora C Calizo; Bart Spronck; Sae-Il Murtada; Roshan Borkar; Alan Weinberg; Evren U Azeloglu; Maria Bintanel-Morcillo; James M Gallo; Jay D Humphrey; Guillaume Jondeau; Catherine Boileau; Francesco Ramirez; Ravi Iyengar
Journal: JCI Insight Date: 2019-06-06

Review 5. Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2.

Authors: Kaifu Gao; Rui Wang; Jiahui Chen; Limei Cheng; Jaclyn Frishcosy; Yuta Huzumi; Yuchi Qiu; Tom Schluckbier; Xiaoqi Wei; Guo-Wei Wei
Journal: Chem Rev Date: 2022-05-20 Impact factor: 72.087

6. Integrated Drug Expression Analysis for leukemia: an integrated in silico and in vivo approach to drug discovery.

Authors: M H Ung; C-H Sun; C-W Weng; C-C Huang; C-C Lin; C-C Liu; C Cheng
Journal: Pharmacogenomics J Date: 2016-03-15 Impact factor: 3.550

Review 7. Drug repurposing: progress, challenges and recommendations.

Authors: Sudeep Pushpakom; Francesco Iorio; Patrick A Eyers; K Jane Escott; Shirley Hopper; Andrew Wells; Andrew Doig; Tim Guilliams; Joanna Latimer; Christine McNamee; Alan Norris; Philippe Sanseau; David Cavalla; Munir Pirmohamed
Journal: Nat Rev Drug Discov Date: 2018-10-12 Impact factor: 84.694