Literature DB >> 28666369

CausalR: extracting mechanistic sense from genome scale data.

Glyn Bradley1, Steven J Barrett1.   

Abstract

SUMMARY: Utilization of causal interaction data enables mechanistic rather than descriptive interpretation of genome-scale data. Here we present CausalR, the first open source causal network analysis platform. Implemented functions enable regulator prediction and network reconstruction, with network and annotation files created for visualization in Cytoscape. False positives are limited using the introduced Sequential Causal Analysis of Networks approach.
AVAILABILITY AND IMPLEMENTATION: CausalR is implemented in R, parallelized, and is available from Bioconductor. CONTACT: glyn.x.bradley@gsk.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2017. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2017        PMID: 28666369      PMCID: PMC5870775          DOI: 10.1093/bioinformatics/btx425

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

With the continuing increase in the generation of genome-wide expression datasets, the limitations of commonly used interpretation methods, such as Gene Ontology classification, Gene Set Enrichment Analysis (Subramanian ) and pathway mapping, have become increasingly apparent. They are useful for classifying endpoints extracted from a new experiment but are of limited use at uncovering the mechanisms that led to the observed changes. Causal reasoning (causal network analysis) can be used to predict the root cause of observed effects. It requires a causal graph, in the form of a signed, directed interaction network, describing the system under study. Observed endpoints serve as a starting point for the analysis. A reasoning algorithm is then used to track back through the causal graph to find points of convergence that maximally and accurately explain the differential regulatory pattern seen across the endpoints. In a biological analysis these points are likely to be key upstream regulators of the observed endpoints, and so for example in a drug discovery environment may represent the best targets for reversing the observed endpoints. Here we present CausalR, a biologically focused causal reasoning implementation coded in the popular statistical language R, and made available from the Bioconductor project (Gentleman ). CausalR builds upon existing methodology (Chindelevitch , b) and provides an enhanced, open source alternative to commercial software (Kramer ).

2 Input data

2.1 Causal networks

CausalR runs on networks in the simple interaction format (.sif), where interactions take the form: Although the majority of causal information currently resides in the commercial domain, publicly available repositories do exist (e.g. Perfetto ), and large public databases such as IntAct (Orchard ) have recently started to curate into causal form.

2.2 Experimental data

CausalR experimental input data take the form of differential expression (or activity) levels between case and control. This could be the output of any ‘omics experiment, but in reality is usually differentially expressed gene signatures resulting from transcriptomics analysis. CausalR analysis utilizes the direction, but not magnitude, of change and so for input, gene signatures are summarized to the form: where ‘1’ denotes up-regulation, ‘0’ denotes unchanged and ‘−1’ denotes down-regulation. High-quality gene expression data, generated from human lung fibroblast cells treated with various cytokines and suitable for the generation of test input gene signatures, are available on the Gene Expression Omnibus (Edgar ) at accession number GSE60880.

3 Examples and usage

3.1 Example workflow and data

As an example, we will demonstrate the CausalR functionality depicted in Figure 1, to predict regulators of input experimental transcriptional data and reconstruct regulatory networks explaining the regulation pattern of genes seen in the transcriptional data. As test input experimental data we will use the IL1B time-course treatment from GSE60880, which yielded IL1B response signatures at 1, 2 and 8 h. The CausalR ready signatures are provided in the Supplementary Material.
Fig. 1.

An example CausalR workflow, to predict regulators of input experimental data and generate regulatory networks

An example CausalR workflow, to predict regulators of input experimental data and generate regulatory networks We will use an input causal network extracted from publicly available interaction data manually curated from the literature, focused on immune regulation (see Catlett ). Full details and the network itself are also provided in the Supplementary Material, along with example R scripts.

3.1 Input processing

The first step in CausalR analysis is the creation of a computational causal graph (CCG) from the input substrate causal network, using the CreateCCG() function (labelled 1 in Fig. 1) The experimental data are then read in and mapped against the CCG using the ReadExperimentalData() function (2 in Fig. 1). Here CausalR will give a warning telling the user how many of the input genes are not represented in the input causal network. It is of course preferable to have as many of the input genes represented in the causal network as possible, however due to the sparsity of causal information currently in the public domain it is likely unrealistic to expect close to 100%.

3.3 Regulator prediction

Upstream regulators of the input experimental data can then be predicted using the RankTheHypotheses() and/or runSCANR() function(s). RankTheHypotheses() (3a in Fig. 1) can be used to predict regulators at individual path lengths, and is minimally parameterized with the name of the CCG and experimental data (both now stored in the R workspace), and the user-supplied delta parameter, which specifies the maximum number of edges (path-length) to be traversed within the CCG from a (signed) regulator hypothesis to an outcome signal node for a predictive path to be scored for that combination. Each node in the causal network is ranked for its fit with the input experimental data. In the case of the example IL1B data the true upstream regulator is known, and IL1B does indeed rank highly in the tabulated results (see Supplementary Table S1) output from RankTheHypotheses() for all three of the time course signatures. This gives confidence there is enough relevant information in the CCG to run network reconstruction. In experimental situations where the true regulators are not known and CausalR is being employed to predict them, it is useful to limit false positives. The runSCANR() function (3b in Fig. 1) executes the Sequential Causal Analysis of Networks (SCAN) methodology, novel to CausalR. This approach repeats the RankTheHypotheses() function multiple times by iterating over a user-supplied sequence of increasing delta (path length) values. It then scans across those sets of results to uncover common regulator hypotheses appearing within the top ranked genes, as defined by the user-supplied topNumGenes parameter. The theory behind this approach is that hypotheses predicted to control genes across multiple path lengths are more likely to be true regulators than those only predicted to control genes at a single path-length. SCAN thus provides the ability to uncover potentially more robust, biologically plausible regulator hypotheses.

3.4 Regulatory network reconstruction

A Cytoscape ready .sif file comprising the regulatory network for an individual CausalR predicted regulator and associated annotation file are constructed via the WriteExplainedNodesToSIF() function (4a in Fig. 1). This function takes as inputs the CCG, the input experimental data, the regulator hypothesis and the number of path lengths to be searched. If the user has applied the SCAN methodology, networks for all resulting SCAN nodes can be automatically generated using the WriteAllScanNodesToSIF() function (4b in Fig. 1). The networks produced by WriteExplainedNodesToSIF() for the time-course IL1B gene signatures are given in Figure 2. Such visualizations are an intuitive way of describing the increased complexity of the signalling response as the time course of IL1B treatment progresses, with stimulatory effects over 1 and 2 h and feedback inhibition loops becoming apparent only after 8 h. The top hubs in the 8 h response network, IFNG, FOXO1 and IRF5 are highlighted. All three are known to play a key role in IL1B signalling (Masters ; Su ; Duffau ), giving confidence in the CausalR reconstructed networks.
Fig. 2.

CausalR reconstructed IL1B response networks. CausalR analysis was carried out on gene signatures from a time course of IL1B treatment of human lung fibroblasts. Visualization of the resulting networks in Cytoscape shows how the signalling response develops. Coloured nodes represent genes upregulated (red) or downregulated (green) in the input experimental data, and so being explained by these signalling networks. Grey nodes represent genes not changed in the experimental input but predicted to be part of the signalling cascade

CausalR reconstructed IL1B response networks. CausalR analysis was carried out on gene signatures from a time course of IL1B treatment of human lung fibroblasts. Visualization of the resulting networks in Cytoscape shows how the signalling response develops. Coloured nodes represent genes upregulated (red) or downregulated (green) in the input experimental data, and so being explained by these signalling networks. Grey nodes represent genes not changed in the experimental input but predicted to be part of the signalling cascade

4 Conclusion

CausalR provides causal reasoning (causal network analysis) methods for the Bioconductor project. We hope it will both help researchers extract mechanistic sense from genome-scale data, and serve to stimulate interest in the development of these and associated techniques in the academic community. Increased curation of causal interaction data in the public domain is an important component for further developments in this area. Click here for additional data file. Click here for additional data file.
ProteinAActivatesProteinB
ProteinCInhibitsProteinD
GeneX   1
GeneY   0
GeneZ  −1
  12 in total

1.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

Authors:  Ron Edgar; Michael Domrachev; Alex E Lash
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

2.  Causal reasoning on biological networks: interpreting transcriptional changes.

Authors:  Leonid Chindelevitch; Daniel Ziemek; Ahmed Enayetallah; Ranjit Randhawa; Ben Sidders; Christoph Brockel; Enoch S Huang
Journal:  Bioinformatics       Date:  2012-02-21       Impact factor: 6.937

3.  Promotion of Inflammatory Arthritis by Interferon Regulatory Factor 5 in a Mouse Model.

Authors:  Pierre Duffau; Hanni Menn-Josephy; Carla M Cuda; Salina Dominguez; Tamar R Aprahamian; Amanda A Watkins; Kei Yasuda; Paul Monach; Robert Lafyatis; Lisa M Rice; G Kenneth Haines; Ellen M Gravallese; Rebecca Baum; Christophe Richez; Harris Perlman; Ramon G Bonegio; Ian R Rifkin
Journal:  Arthritis Rheumatol       Date:  2015-12       Impact factor: 10.995

4.  Regulation of interleukin-1beta by interferon-gamma is species specific, limited by suppressor of cytokine signalling 1 and influences interleukin-17 production.

Authors:  Seth L Masters; Lisa A Mielke; Ann L Cornish; Caroline E Sutton; Joanne O'Donnell; Louise H Cengia; Andrew W Roberts; Ian P Wicks; Kingston H G Mills; Ben A Croker
Journal:  EMBO Rep       Date:  2010-07-02       Impact factor: 8.807

5.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

6.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

7.  Assessing statistical significance in causal graphs.

Authors:  Leonid Chindelevitch; Po-Ru Loh; Ahmed Enayetallah; Bonnie Berger; Daniel Ziemek
Journal:  BMC Bioinformatics       Date:  2012-02-20       Impact factor: 3.169

8.  Reverse causal reasoning: applying qualitative causal knowledge to the interpretation of high-throughput data.

Authors:  Natalie L Catlett; Anthony J Bargnesi; Stephen Ungerer; Toby Seagaran; William Ladd; Keith O Elliston; Dexter Pratt
Journal:  BMC Bioinformatics       Date:  2013-11-23       Impact factor: 3.169

9.  SIGNOR: a database of causal relationships between biological entities.

Authors:  Livia Perfetto; Leonardo Briganti; Alberto Calderone; Andrea Cerquone Perpetuini; Marta Iannuccelli; Francesca Langone; Luana Licata; Milica Marinkovic; Anna Mattioni; Theodora Pavlidou; Daniele Peluso; Lucia Lisa Petrilli; Stefano Pirrò; Daniela Posca; Elena Santonico; Alessandra Silvestri; Filomena Spada; Luisa Castagnoli; Gianni Cesareni
Journal:  Nucleic Acids Res       Date:  2015-10-13       Impact factor: 16.971

10.  FoxO1 links insulin resistance to proinflammatory cytokine IL-1beta production in macrophages.

Authors:  Dongming Su; Gina M Coudriet; Dae Hyun Kim; Yi Lu; German Perdomo; Shen Qu; Sandra Slusher; Hubert M Tse; Jon Piganelli; Nick Giannoukakis; Jian Zhang; H Henry Dong
Journal:  Diabetes       Date:  2009-08-03       Impact factor: 9.461

View more
  7 in total

1.  CausalTAB: the PSI-MITAB 2.8 updated format for signalling data representation and dissemination.

Authors:  L Perfetto; M L Acencio; G Bradley; G Cesareni; N Del Toro; D Fazekas; H Hermjakob; T Korcsmaros; M Kuiper; A Lægreid; P Lo Surdo; R C Lovering; S Orchard; P Porras; P D Thomas; V Touré; J Zobolas; L Licata
Journal:  Bioinformatics       Date:  2019-10-01       Impact factor: 6.937

2.  The status of causality in biological databases: data resources and data retrieval possibilities to support logical modeling.

Authors:  Vasundra Touré; Åsmund Flobak; Anna Niarakis; Steven Vercruysse; Martin Kuiper
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

Review 3.  Computational systems biology approaches for Parkinson's disease.

Authors:  Enrico Glaab
Journal:  Cell Tissue Res       Date:  2017-11-29       Impact factor: 5.249

4.  From expression footprints to causal pathways: contextualizing large signaling networks with CARNIVAL.

Authors:  Anika Liu; Panuwat Trairatphisan; Enio Gjerga; Athanasios Didangelos; Jonathan Barratt; Julio Saez-Rodriguez
Journal:  NPJ Syst Biol Appl       Date:  2019-11-11

5.  The Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST).

Authors:  Vasundra Touré; Steven Vercruysse; Marcio Luis Acencio; Ruth C Lovering; Sandra Orchard; Glyn Bradley; Cristina Casals-Casas; Claudine Chaouiya; Noemi Del-Toro; Åsmund Flobak; Pascale Gaudet; Henning Hermjakob; Charles Tapley Hoyt; Luana Licata; Astrid Lægreid; Christopher J Mungall; Anne Niknejad; Simona Panni; Livia Perfetto; Pablo Porras; Dexter Pratt; Julio Saez-Rodriguez; Denis Thieffry; Paul D Thomas; Dénes Türei; Martin Kuiper
Journal:  Bioinformatics       Date:  2021-04-05       Impact factor: 6.937

Review 6.  Computational analyses of mechanism of action (MoA): data, methods and integration.

Authors:  Maria-Anna Trapotsi; Layla Hosseini-Gerami; Andreas Bender
Journal:  RSC Chem Biol       Date:  2021-12-22

7.  Modeling in systems biology: Causal understanding before prediction?

Authors:  Szilvia Barsi; Bence Szalai
Journal:  Patterns (N Y)       Date:  2021-06-11
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.