Literature DB >> 25577435

ENViz: a Cytoscape App for integrated statistical analysis and visualization of sample-matched data with multiple data types.

Israel Steinfeld¹, Roy Navon², Michael L Creech², Zohar Yakhini¹, Anya Tsalenko².

Abstract

ENViz (Enrichment Analysis and Visualization) is a Cytoscape app that performs joint enrichment analysis of two types of sample matched datasets in the context of systematic annotations. Such datasets may be gene expression or any other high-throughput data collected in the same set of samples. The enrichment analysis is done in the context of pathway information, gene ontology or any custom annotation of the data. The results of the analysis consist of significant associations between profiled elements of one of the datasets to the annotation terms (e.g. miR-19 was associated to the cell-cycle process in breast cancer samples). The results of the enrichment analysis are visualized as an interactive Cytoscape network.

Entities: Chemical Disease Species

Mesh：

Year: 2015 PMID： 25577435 PMCID： PMC4426829 DOI： 10.1093/bioinformatics/btu853

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

The recent emergence of novel high-throughput technologies enables the quantification of different types of biological features in a genome-wide scale (e.g. mRNA expression levels, miRNA expression levels and DNA copy numbers). In parallel with these technologies, various methodologies have been developed to handle integrated analysis of functional genomics data, mainly by studying the transcriptional programs and global organization of biological processes. Still, only a few tools support routine joint analysis of sample cohorts with multiple genomic measurement results (Gomez-Cabrero ). Even fewer tools provide the visualization strength of Cytoscape in this context (Cline ; Bindea ; Xia ). The ENViz approach to integrated data analysis uses the power of enrichment statistics combined with genomic annotation databases to statistically assign relevant function annotations to explored profiled elements. It thus provides a better understating of the relationship between different molecular entities in cells or organisms. Visualizing ENViz results as Cytoscape networks provide compact structured representation of enrichment results. Even though the development of ENViz was motivated by available modern biological measurements, joint analysis of two sample matched datasets and systematic annotations may be applied to other similarly structured.

2 Tool description

ENViz follows an enrichment analysis approach, driven by three input matrices: (i) the primary data matrix (e.g. genes expression measurement across a set of samples), (ii) the annotation matrix providing binary annotation on each of the primary data matrix elements [e.g. pathway or gene ontology (GO) annotation] and (iii) the pivot data matrix providing information on the same set of samples of the primary data matrix (e.g. miRNAs expression measurement) (Fig. 1a).

Fig. 1.

Cytoscape session with ENViz application running. (a) Interactive legend, Analysis, and Visualization control panels. The Interactive legend shows a schematic of the analysis and the overview of the data loaded into Cytoscape for ENViz analysis. The Analysis panel controls data input and analysis parameter settings. The Visualization panel controls the significance threshold for the enrichment network generation and the color coding of the annotation categories based on enrichment scores. (b) Bi-partite network for enrichment analysis of breast cancer data. Nodes in this network correspond to WikiPathways (colored yellow->red) and miRNAs (gray), and edges represent significant enrichments of genes in the pathway correlated (red) or anti-correlated (blue) to the miRNA. (c) Cell Cycle WikiPathway with genes color coded according to their correlation to mir-301 b. (d) GO terms enriched in genes correlated to miR-150 For each pivot data element ENViz performs these steps: Compute the correlation to each element of primary data. Rank primary data elements based on above correlations. Compute the statistical enrichment of annotated elements (e.g. pathways) in the top of above ranked list based on a minimum-hypergeometric (mHG) statistics. Details of mHG statistics are explained in (Eden , 2009). Briefly, given a ranked binary annotation vector we compute enrichment of this annotation in the top k ranked elements based on the hypergeometric statistics, where k is selected to optimize this enrichment. Finally, the mHG score [−log(mHG P-value)] for the pivot-annotation association is reported. The calculated significance level is valid for every individual pivot annotation pair, but is not corrected for the number of pairs tested. Significant results are represented in Cytoscape as an enrichment network—a bipartite graph with nodes corresponding to pivot and annotation elements, and edges corresponding to significant pivot-annotation associations, where significance threshold is user defined (Fig. 1b). In addition ENViz supports extended visualization for association to: The WikiPathways (Kelder ) database (Fig. 1c), where top correlated genes (from the primary data) are visually overlaid on the relevant pathway. The GO (Ashburner ) database (Fig. 1d), where all GO term associated with a particular pivot element can be visually overlaid on the GO graph, which may point to functionally relevant mechanisms. To address multiple testing issues, as well as some potential dependencies between primary data elements, ENViz implements filtering by permutation correction. For each permutation, samples in the pivot data matrix are randomly shuffled, and an enrichment score Srand is calculated for each pivot with at least one significant association, as defined by the user. If, for a given pivot-annotation pair with enrichment score S, we observe Srand ≥ S more than a user-defined number of times across all permutations, then this pivot-annotation element pair is considered not significant. For pivot-annotation pairs that survive this permutation test filtering, the original mHG score is reported as the enrichment score. More details can be found in the user tutorial (http://www.agilent.com/labs/research/compbio/enviz/ENVizUserTutorial.pdf).

3 Example

An example dataset, based on data published in (Enerly ) and formatted for ENViz, can be downloaded from http://www.agilent.com/labs/research/compbio/enviz/data.html. This dataset consists of 100 breast tumor samples with various characteristics. Primary data is gene expression profiles from Agilent microarray experiments, pivot data is Agilent microarray-based miRNA profiles, and the annotation matrix is taken from WikiPathways and GO database. As shown in Figure 1 using ENViz we identify a significant association between miR-301 b and the cell-cycle pathway. On a standard laptop (i7 chip), the analysis of the example data with default parameters and the WikiPathways annotations takes ∼1 min; analysis with GO annotation takes ∼25 min.

9 in total

1. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors: M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal: Nat Genet Date: 2000-05 Impact factor: 38.330

2. Integration of biological networks and gene expression data using Cytoscape.

Authors: Melissa S Cline; Michael Smoot; Ethan Cerami; Allan Kuchinsky; Nerius Landys; Chris Workman; Rowan Christmas; Iliana Avila-Campilo; Michael Creech; Benjamin Gross; Kristina Hanspers; Ruth Isserlin; Ryan Kelley; Sarah Killcoyne; Samad Lotia; Steven Maere; John Morris; Keiichiro Ono; Vuk Pavlovic; Alexander R Pico; Aditya Vailaya; Peng-Liang Wang; Annette Adler; Bruce R Conklin; Leroy Hood; Martin Kuiper; Chris Sander; Ilya Schmulevich; Benno Schwikowski; Guy J Warner; Trey Ideker; Gary D Bader
Journal: Nat Protoc Date: 2007 Impact factor: 13.491

3. OmicsAnalyzer: a Cytoscape plug-in suite for modeling omics data.

Authors: Tian Xia; John V Hemert; Julie A Dickerson
Journal: Bioinformatics Date: 2010-10-14 Impact factor: 6.937

4. miRNA-mRNA integrated analysis reveals roles for miRNAs in primary breast tumors.

Authors: Espen Enerly; Israel Steinfeld; Kristine Kleivi; Suvi-Katri Leivonen; Miriam R Aure; Hege G Russnes; Jo Anders Rønneberg; Hilde Johnsen; Roy Navon; Einar Rødland; Rami Mäkelä; Bjørn Naume; Merja Perälä; Olli Kallioniemi; Vessela N Kristensen; Zohar Yakhini; Anne-Lise Børresen-Dale
Journal: PLoS One Date: 2011-02-22 Impact factor: 3.240

5. WikiPathways: building research communities on biological pathways.

Authors: Thomas Kelder; Martijn P van Iersel; Kristina Hanspers; Martina Kutmon; Bruce R Conklin; Chris T Evelo; Alexander R Pico
Journal: Nucleic Acids Res Date: 2011-11-16 Impact factor: 16.971

6. Discovering motifs in ranked lists of DNA sequences.

Authors: Eran Eden; Doron Lipson; Sivan Yogev; Zohar Yakhini
Journal: PLoS Comput Biol Date: 2007-03-23 Impact factor: 4.475

7. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists.

Authors: Eran Eden; Roy Navon; Israel Steinfeld; Doron Lipson; Zohar Yakhini
Journal: BMC Bioinformatics Date: 2009-02-03 Impact factor: 3.169

Review 8. Data integration in the era of omics: current and future challenges.

Authors: David Gomez-Cabrero; Imad Abugessaisa; Dieter Maier; Andrew Teschendorff; Matthias Merkenschlager; Andreas Gisel; Esteban Ballestar; Erik Bongcam-Rudloff; Ana Conesa; Jesper Tegnér
Journal: BMC Syst Biol Date: 2014-03-13

9. CluePedia Cytoscape plugin: pathway insights using integrated experimental and in silico data.

Authors: Gabriela Bindea; Jérôme Galon; Bernhard Mlecnik
Journal: Bioinformatics Date: 2013-01-16 Impact factor: 6.937

9 in total

6 in total

1. Long Non-Coding RNAs (lncRNAs) of Sea Cucumber: Large-Scale Prediction, Expression Profiling, Non-Coding Network Construction, and lncRNA-microRNA-Gene Interaction Analysis of lncRNAs in Apostichopus japonicus and Holothuria glaberrima During LPS Challenge and Radial Organ Complex Regeneration.

Authors: Chuang Mu; Ruijia Wang; Tianqi Li; Yuqiang Li; Meilin Tian; Wenqian Jiao; Xiaoting Huang; Lingling Zhang; Xiaoli Hu; Shi Wang; Zhenmin Bao
Journal: Mar Biotechnol (NY) Date: 2016-07-09 Impact factor: 3.619

2. Transcriptome analysis demonstrates that long noncoding RNA is involved in the hypoxic response in Larimichthys crocea.

Authors: Wei Liu; Xiaoxu Liu; Changwen Wu; Lihua Jiang
Journal: Fish Physiol Biochem Date: 2018-06-15 Impact factor: 2.794

Review 3. Protein-Protein Interaction (PPI) Network of Zebrafish Oestrogen Receptors: A Bioinformatics Workflow.

Authors: Rabiatul-Adawiah Zainal-Abidin; Nor Afiqah-Aleng; Muhammad-Redha Abdullah-Zawawi; Sarahani Harun; Zeti-Azura Mohamed-Hussein
Journal: Life (Basel) Date: 2022-04-27

4. Identification of Differentially Expressed Long Noncoding RNAs as Functional Biomarkers and Construction of Function Enrichment Network in Oral Squamous Cell Carcinoma.

Authors: Fang Xiao; Ke Wang; Yaojun Chen; Yanzhen Zhang
Journal: Evid Based Complement Alternat Med Date: 2022-06-25 Impact factor: 2.650

5. Serum N-glycan analysis in breast cancer patients--Relation to tumour biology and clinical outcome.

Authors: Vilde D Haakensen; Israel Steinfeld; Radka Saldova; Akram Asadi Shehni; Ilona Kifer; Bjørn Naume; Pauline M Rudd; Anne-Lise Børresen-Dale; Zohar Yakhini
Journal: Mol Oncol Date: 2015-08-19 Impact factor: 6.603

6. Identification and characterization of immune-related lncRNAs and lncRNA-miRNA-mRNA networks of Paralichthys olivaceus involved in Vibrio anguillarum infection.

Authors: Xianhui Ning; Li Sun
Journal: BMC Genomics Date: 2021-06-15 Impact factor: 3.969

6 in total