Literature DB >> 29186349

GDSCTools for mining pharmacogenomic interactions in cancer.

Thomas Cokelaer¹, Elisabeth Chen², Francesco Iorio³, Michael P Menden⁴, Howard Lightfoot², Julio Saez-Rodriguez^3,5, Mathew J Garnett².

Abstract

Motivation: Large pharmacogenomic screenings integrate heterogeneous cancer genomic datasets as well as anti-cancer drug responses on thousand human cancer cell lines. Mining this data to identify new therapies for cancer sub-populations would benefit from common data structures, modular computational biology tools and user-friendly interfaces.
Results: We have developed GDSCTools: a software aimed at the identification of clinically relevant genomic markers of drug response. The Genomics of Drug Sensitivity in Cancer (GDSC) database (www.cancerRxgene.org) integrates heterogeneous cancer genomic datasets as well as anti-cancer drug responses on a thousand cancer cell lines. Including statistical tools (analysis of variance) and predictive methods (Elastic Net), as well as common data structures, GDSCTools allows users to reproduce published results from GDSC and to implement new analytical methods. In addition, non-GDSC data resources can also be analysed since drug responses and genomic features can be encoded as CSV files. Contact: thomas.cokelaer@pasteur.fr or saezrodriguez.rwth-aachen.de or mg12@sanger.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities: Disease Gene Species

Mesh：

Substances：
Antineoplastic Agents

Year: 2018 PMID： 29186349 PMCID： PMC6031019 DOI： 10.1093/bioinformatics/btx744

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Cancers occur due to genetic alterations in cells accumulated through the lifespan of an individual. Cancers are genetically heterogeneous and as a consequence patients with similar diagnoses may vary in their response to the same therapy. The path towards precision cancer medicine requires the identification of specific biomarkers, such as genetic alterations, allowing effective patient selection strategies for therapy. Large-scale pharmacological screens such as the Genomics of Drug Sensitivity in Cancer (GDSC) (Garnett ) and Cancer Cell Line Encyclopaedia projects (Barretina ) have been used to identify potential new treatments and to explore biomarkers of drug sensitivity in cancer cells. In particular, the GDSC project releases database resources periodically (www.cancerRxgene.org) (Yang ). A recent installment of this resource (version 17) includes cancer-driven alterations identified in 11 289 tumors from 29 tissues across 1001 molecularly annotated human cancer cell lines, and cell line sensitivity data for 265 anti-cancer compounds. A systematic identification of clinically-relevant markers of drug response uncovered numerous alterations that sensitize to anti-cancer drugs (Iorio ). Here, we present GDSCTools, a Python library that allows users to perform pharmacogenomic analyses as those presented in (Iorio ). Our software complements an existing tool (Smirnov ) by giving access to the full GDSC dataset and providing a powerful platform for statistical analyses and data mining through visualization tools.

2 Data formats and data wrangling tools

The GDSC database provides large-scale genomics and drug sensitivity datasets. The drug sensitivity dataset contains dose-response curves (e.g. cell viability for 5–9 drug concentrations) which can be used to derive drug sensitivity indicators (Garnett ; Vis ), such as the half-maximal inhibitory concentration () or the area under the curve (AUC) (Fig. 1A). In GDSCTools, logged indicators are encoded as a Nc × Nd matrix, where Nc is the number of cell lines labeled with their COSMIC identifier (http://cancer.sanger.ac.uk/cosmic) and Nd is the number of drugs. For a given drug, we denote with Yd the vector of logged s across the Nc cell lines. The genomic feature dataset is also encoded as a Nc × Nf matrix, where Nf is the number of genomic features. In addition to a subset of the data files available in GDSCTools (version 17 only), users can also retrieve additional datasets online (e.g. methylation data, copy number variants, etc.) Database-like queries can be used to extract and use specific features (e.g. only gene amplifications or deletions). These database-like functionalities are part of the OmniBEM builder (Supplementary Material).

Fig. 1

(A) Drug response (cell viability versus drug concentrations) and derived drug response metrics (AUC and s). (B) Distribution of s in response to a given drug across a dichotomy of cell lines induced by the status of a genomic feature. (C) P-values from an ANOVA analysis versus signed effect sizes (all drug-genomic feature interactions). (D) Weight distributions resulting from training a sparse linear regression model of a given drug response using all the genomic features

3 Data analysis tools

Using GDSCTools, genomic features can be investigated as possible predictors of differential drug sensitivity across screened cell lines. The statistical interaction Yd ∼ X between drug response and genomic features can be tested within a sample population of cell lines from the same cancer type with a t-test. However, to account for possible confounding factors (including the tissue of origin, when performing pan-cancer analyses) a more versatile analysis of variance (ANOVA) is implemented. In this model, the variability observed in Yd is first explained using the tissue covariate, subsequently using additional factors (e.g. microsatellite instability denoted by MSI), and finally by each of the genomic features in X (one model per feature). This can be mathematically expressed as Yd ∼ C(tissue) + C(MSI) + … + feature, where the operator indicates a categorical variable. An ANOVA test is performed for each combination of drug and genomic feature (Fig. 1B). Outcomes of this large number of tests (Nd × Nf) are corrected for multiple hypothesis testing using Bonferroni or Benjamini-Hochberg corrections. To account for P-value inflations due to differences in sample sizes, the effect sizes of the tested statistical interactions (computed with the Cohen and Glass models) are also included (Fig. 1C). Unlike the ANOVA analysis that is performed on a one drug/one feature basis, linear regression models assume that drug response can be expressed as a linear combination of the status of a set of genomic features. GDSCTools includes three linear regression methods: (i) Ridge, based on an L2 penalty term, which limits the size of the coefficient vector; (ii) Lasso, based on an L1 penalty term, which imposes sparsity among the coefficients (i.e. makes the fitted model more interpretable) and (iii) Elastic Net, a compromise between Ridge and Lasso techniques with a mix penalty between L1 and L2 norms (see Supplementary Material for details). These three methods require the optimization of an α parameter (importance of L1 and L2 penalties) and a ρ parameter (mix ratio between L1 and L2 penalties; ElasticNet case only). This is performed via a cross validation to avoid over-fitting. The best model is determined using as objective function the Pearson correlation between predicted and actual drug responses on the training set. The final regressor weights are outputted as shown in Figure 1D. Significance of the final selected models is computed against.

4 Implementation and future directions

GDSCTools is available on http://github.com/CancerRxGene/gdsctools. It is fully documented on http://gdsctools.readthedocs.io. Pre-compiled versions of the library are available on https://bioconda.github.io/. GDSCTools can be used via standalone applications to analyse a user defined set of drugs (and genomic features) and assemble the results in an HTML report. We also provide solutions based on the Snakemake framework (Köster and Rahmann, 2012) to parallelize the analysis on distributed cluster farm architectures such as LSF or SLURM (Supplementary Material). Besides analysis of pharmacogenomic datasets, GDSCTools can provide the framework for discovering new biomarkers through integration/mining of novel and heterogeneous datasets, including pharmacological, RNA interference or increasingly available genetic screens (e.g. CRISPR), alternative drug response metrics (e.g. AUC) or implementing new analytical tools. The augmentation of genomic features with information obtained from online web services (Cokelaer ) like pathway enrichment [e.g. via OmniPath (Turei )] will further extend functionality and usefulness of GDSCTools. Conflict of Interest: none declared. Click here for additional data file.

9 in total

1. OmniPath: guidelines and gateway for literature-curated signaling pathway resources.

Authors: Dénes Türei; Tamás Korcsmáros; Julio Saez-Rodriguez
Journal: Nat Methods Date: 2016-11-29 Impact factor: 28.547

2. Snakemake--a scalable bioinformatics workflow engine.

Authors: Johannes Köster; Sven Rahmann
Journal: Bioinformatics Date: 2012-08-20 Impact factor: 6.937

3. PharmacoGx: an R package for analysis of large pharmacogenomic datasets.

Authors: Petr Smirnov; Zhaleh Safikhani; Nehme El-Hachem; Dong Wang; Adrian She; Catharina Olsen; Mark Freeman; Heather Selby; Deena M A Gendoo; Patrick Grossmann; Andrew H Beck; Hugo J W L Aerts; Mathieu Lupien; Anna Goldenberg; Benjamin Haibe-Kains
Journal: Bioinformatics Date: 2015-12-09 Impact factor: 6.937

4. Multilevel models improve precision and speed of IC50 estimates.

Authors: Daniel J Vis; Lorenzo Bombardelli; Howard Lightfoot; Francesco Iorio; Mathew J Garnett; Lodewyk Fa Wessels
Journal: Pharmacogenomics Date: 2016-05-16 Impact factor: 2.533

5. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.

Authors: Jordi Barretina; Giordano Caponigro; Nicolas Stransky; Kavitha Venkatesan; Adam A Margolin; Sungjoon Kim; Christopher J Wilson; Joseph Lehár; Gregory V Kryukov; Dmitriy Sonkin; Anupama Reddy; Manway Liu; Lauren Murray; Michael F Berger; John E Monahan; Paula Morais; Jodi Meltzer; Adam Korejwa; Judit Jané-Valbuena; Felipa A Mapa; Joseph Thibault; Eva Bric-Furlong; Pichai Raman; Aaron Shipway; Ingo H Engels; Jill Cheng; Guoying K Yu; Jianjun Yu; Peter Aspesi; Melanie de Silva; Kalpana Jagtap; Michael D Jones; Li Wang; Charles Hatton; Emanuele Palescandolo; Supriya Gupta; Scott Mahan; Carrie Sougnez; Robert C Onofrio; Ted Liefeld; Laura MacConaill; Wendy Winckler; Michael Reich; Nanxin Li; Jill P Mesirov; Stacey B Gabriel; Gad Getz; Kristin Ardlie; Vivien Chan; Vic E Myer; Barbara L Weber; Jeff Porter; Markus Warmuth; Peter Finan; Jennifer L Harris; Matthew Meyerson; Todd R Golub; Michael P Morrissey; William R Sellers; Robert Schlegel; Levi A Garraway
Journal: Nature Date: 2012-03-28 Impact factor: 49.962

6. Systematic identification of genomic markers of drug sensitivity in cancer cells.

Authors: Mathew J Garnett; Elena J Edelman; Sonja J Heidorn; Chris D Greenman; Anahita Dastur; King Wai Lau; Patricia Greninger; I Richard Thompson; Xi Luo; Jorge Soares; Qingsong Liu; Francesco Iorio; Didier Surdez; Li Chen; Randy J Milano; Graham R Bignell; Ah T Tam; Helen Davies; Jesse A Stevenson; Syd Barthorpe; Stephen R Lutz; Fiona Kogera; Karl Lawrence; Anne McLaren-Douglas; Xeni Mitropoulos; Tatiana Mironenko; Helen Thi; Laura Richardson; Wenjun Zhou; Frances Jewitt; Tinghu Zhang; Patrick O'Brien; Jessica L Boisvert; Stacey Price; Wooyoung Hur; Wanjuan Yang; Xianming Deng; Adam Butler; Hwan Geun Choi; Jae Won Chang; Jose Baselga; Ivan Stamenkovic; Jeffrey A Engelman; Sreenath V Sharma; Olivier Delattre; Julio Saez-Rodriguez; Nathanael S Gray; Jeffrey Settleman; P Andrew Futreal; Daniel A Haber; Michael R Stratton; Sridhar Ramaswamy; Ultan McDermott; Cyril H Benes
Journal: Nature Date: 2012-03-28 Impact factor: 49.962

7. BioServices: a common Python package to access biological Web Services programmatically.

Authors: Thomas Cokelaer; Dennis Pultz; Lea M Harder; Jordi Serra-Musach; Julio Saez-Rodriguez
Journal: Bioinformatics Date: 2013-09-23 Impact factor: 6.937

8. A Landscape of Pharmacogenomic Interactions in Cancer.

Authors: Francesco Iorio; Theo A Knijnenburg; Daniel J Vis; Graham R Bignell; Michael P Menden; Michael Schubert; Nanne Aben; Emanuel Gonçalves; Syd Barthorpe; Howard Lightfoot; Thomas Cokelaer; Patricia Greninger; Ewald van Dyk; Han Chang; Heshani de Silva; Holger Heyn; Xianming Deng; Regina K Egan; Qingsong Liu; Tatiana Mironenko; Xeni Mitropoulos; Laura Richardson; Jinhua Wang; Tinghu Zhang; Sebastian Moran; Sergi Sayols; Maryam Soleimani; David Tamborero; Nuria Lopez-Bigas; Petra Ross-Macdonald; Manel Esteller; Nathanael S Gray; Daniel A Haber; Michael R Stratton; Cyril H Benes; Lodewyk F A Wessels; Julio Saez-Rodriguez; Ultan McDermott; Mathew J Garnett
Journal: Cell Date: 2016-07-07 Impact factor: 41.582

9. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells.

Authors: Wanjuan Yang; Jorge Soares; Patricia Greninger; Elena J Edelman; Howard Lightfoot; Simon Forbes; Nidhi Bindal; Dave Beare; James A Smith; I Richard Thompson; Sridhar Ramaswamy; P Andrew Futreal; Daniel A Haber; Michael R Stratton; Cyril Benes; Ultan McDermott; Mathew J Garnett
Journal: Nucleic Acids Res Date: 2012-11-23 Impact factor: 16.971

9 in total

18 in total

1. Computational Analyses Connect Small-Molecule Sensitivity to Cellular Features Using Large Panels of Cancer Cell Lines.

Authors: Matthew G Rees; Brinton Seashore-Ludlow; Paul A Clemons
Journal: Methods Mol Biol Date: 2019

Review 2. Single-Cell Sequencing Technologies in Precision Oncology.

Authors: David T Melnekoff; Alessandro Laganà
Journal: Adv Exp Med Biol Date: 2022 Impact factor: 2.622

3. Integrative analysis of large-scale loss-of-function screens identifies robust cancer-associated genetic interactions.

Authors: Christopher J Lord; Niall Quinn; Colm J Ryan
Journal: Elife Date: 2020-05-28 Impact factor: 8.140

4. CellMiner Cross-Database (CellMinerCDB) version 1.2: Exploration of patient-derived cancer cell line pharmacogenomics.

Authors: Augustin Luna; Fathi Elloumi; Sudhir Varma; Yanghsin Wang; Vinodh N Rajapakse; Mirit I Aladjem; Jacques Robert; Chris Sander; Yves Pommier; William C Reinhold
Journal: Nucleic Acids Res Date: 2021-01-08 Impact factor: 16.971

5. Prioritization of cancer therapeutic targets using CRISPR-Cas9 screens.

Authors: Fiona M Behan; Francesco Iorio; Gabriele Picco; Kosuke Yusa; Mathew J Garnett; Emanuel Gonçalves; Charlotte M Beaver; Giorgia Migliardi; Rita Santos; Yanhua Rao; Francesco Sassi; Marika Pinnelli; Rizwan Ansari; Sarah Harper; David Adam Jackson; Rebecca McRae; Rachel Pooley; Piers Wilkinson; Dieudonne van der Meer; David Dow; Carolyn Buser-Doepner; Andrea Bertotti; Livio Trusolino; Euan A Stronach; Julio Saez-Rodriguez
Journal: Nature Date: 2019-04-10 Impact factor: 49.962

6. Functional linkage of gene fusions to cancer cell fitness assessed by pharmacological and CRISPR-Cas9 screening.

Authors: Gabriele Picco; Elisabeth D Chen; Luz Garcia Alonso; Fiona M Behan; Emanuel Gonçalves; Graham Bignell; Angela Matchan; Beiyuan Fu; Ruby Banerjee; Elizabeth Anderson; Adam Butler; Cyril H Benes; Ultan McDermott; David Dow; Francesco Iorio; Euan Stronach; Fengtang Yang; Kosuke Yusa; Julio Saez-Rodriguez; Mathew J Garnett
Journal: Nat Commun Date: 2019-05-16 Impact factor: 14.919

Review 7. Machine learning approaches to drug response prediction: challenges and recent progress.

Authors: George Adam; Ladislav Rampášek; Zhaleh Safikhani; Petr Smirnov; Benjamin Haibe-Kains; Anna Goldenberg
Journal: NPJ Precis Oncol Date: 2020-06-15

8. Methodological challenges in translational drug response modeling in cancer: A systematic analysis with FORESEE.

Authors: Lisa-Katrin Schätzle; Ali Hadizadeh Esfahani; Andreas Schuppert
Journal: PLoS Comput Biol Date: 2020-04-20 Impact factor: 4.475

9. Matching cell lines with cancer type and subtype of origin via mutational, epigenomic, and transcriptomic patterns.

Authors: Marina Salvadores; Francisco Fuster-Tormo; Fran Supek
Journal: Sci Adv Date: 2020-07-01 Impact factor: 14.136

10. Chromosome arm aneuploidies shape tumour evolution and drug response.

Authors: Ankit Shukla; Thu H M Nguyen; Sarat B Moka; Jonathan J Ellis; John P Grady; Harald Oey; Alexandre S Cristino; Kum Kum Khanna; Dirk P Kroese; Lutz Krause; Eloise Dray; J Lynn Fink; Pascal H G Duijf
Journal: Nat Commun Date: 2020-01-23 Impact factor: 14.919