Literature DB >> 25086505

Epiviz: interactive visual analytics for functional genomics data.

Florin Chelaru1, Llewellyn Smith2, Naomi Goldstein3, Héctor Corrada Bravo1.   

Abstract

Visualization is an integral aspect of genomics data analysis. Algorithmic-statistical analysis and interactive visualization are most effective when used iteratively. Epiviz (http://epiviz.cbcb.umd.edu/), a web-based genome browser, and the Epivizr Bioconductor package allow interactive, extensible and reproducible visualization within a state-of-the-art data-analysis platform.

Entities:  

Mesh:

Year:  2014        PMID: 25086505      PMCID: PMC4149593          DOI: 10.1038/nmeth.3038

Source DB:  PubMed          Journal:  Nat Methods        ISSN: 1548-7091            Impact factor:   28.547


Many analyses in functional genomics including transcriptome analysis, analysis of histone modifications or transcription factor binding using ChIP-seq, and comprehensive microarray or sequencing assays to profile DNA methylation employ powerful computational and statistical tools to preprocess and model data to provide statistical inferences. Visualization of data at each step of these pipelines is essential for its exploratory analysis, characterizing the behavior of the analysis pipeline and making sense of the biological context of results by comparing to other datasets and genomic features. Interactive visualization would make these steps far more efficient, allowing the data scientist to save time, while increasing the impact of these analyses through interactive dissemination. Advances in web application and visualization frameworks, for example d3.js[1], facilitate development of interactive data visualization tools that are easily deployed through the web. These tools have gradually moved from interactive visualization of fixed data elements to the integration of algorithmic and analytic capabilities[2], thereby accelerating how insights are derived from data. Genome and epigenome browsers are ubiquitous tools, a number of them having adopted modern web-application technologies to provide more efficient visualization and better user interfaces[3,4]. However, none of them yet support integration with computational analysis platforms like Bioconductor.[5] This limits visualization to presentation and dissemination rather than a hybrid tool integrating interactive visualization with algorithmic analysis. We introduce Epiviz (http://epiviz.cbcb.umd.edu), a web-based genome browsing application (Fig. 1 and Supplementary Fig. 1; a short overview video here: http://youtu.be/099c4wUxozA). It tightly integrates modern visualization technologies with the R-Bioconductor data analysis platform. It implements multiple visualization methods for location-based (e.g., genomic regions of interest with block and line tracks) and feature-based (e.g., exon or transcript-level expression, with scatterplots and heatmaps) data using fundamental, well-established, interactive data visualization techniques[6], not available in web-based genome browsers. For example, since display objects are mapped directly to data elements, Epiviz implements a brushing feature that instantly gives users visual insights of the spatial correlation of multiple datasets. All data-displaying containers are resizable, colors can be mapped dynamically to display objects, and charts can be exported as static image files (pdf, svg, png or postscript).
Figure 1

Colon cancer methylome visualization using Epiviz. Long regions of methylation changes in colon cancer (Hypo- and Hyper-methylation blocks) are shown along with the smoothed base-pair resolution data (Methylation Colon Cancer and Normal) used to define them. Colon gene expression data on an MA plot (top right) shows genes within the viewing region that are differentially expressed. Data from the gene expression barcode shows transcriptome state across multiple tissues (top left). Highlighted region shows the brushing feature linking all charts by spatial location. This workspace can be accessed at http://epiviz.cbcb.umd.edu/?ws=cDx4eNK96Ws.

The design of Epiviz is centered on providing tight-knit integration with computational environments like Bioconductor. The Epivizr Bioconductor package uses WebSocket connections to support two-way communication between Epiviz and interactive R sessions so data in R objects is served in response to requests made by Epiviz. This protocol was implemented over a general Data Provider interface that de-centralizes data storage allowing users to easily integrate external data sources besides R-Bioconductor (including data served by PHP-MySQL and WebSocket connections to other interactive environments like Python, Supplementary. Fig. 2). Epiviz implements a predictive caching strategy to accelerate system response to user-initiated data requests towards any of the integrated data sources Epiviz integrates human transcriptome data from the Gene Expression Barcode project[7] (Fig. 1). By visualizing these on Epiviz users obtain immediate visual cues on transcriptome state with respect to other genomic features by using the brushing feature. All data sources catalogued by the AnnotationHub Bioconductor package are available for integration as measurements via Epivizr: UCSC genome browser[8], Ensembl[9] and BioMart[10], for instance. Epiviz also allows users to define new data measurements based on integrated measurements using a simple expression language (see Supplementary Note). Epiviz provides persistent URLs for dissemination that replicate both the underlying data, including computed measurements, and the visualization components of shared workspaces. To facilitate exploratory data analysis, we implemented updating, filtering and subsetting operations on R objects that immediately update their visualization on Epiviz. Epivizr also supports interactive exploratory browsing: users can navigate in order through a ranked list of genomic regions of interest, for example, regions of differentially expressed genes from an RNAseq experiment obtained from packages like DESeq. Epiviz features performance optimizations that map multiple data objects to aggregate visual objects (Supplementary Figs. 3 and 4). Using these optimizations, Epiviz displays full exon-level RNA-seq data from chromosome 11 as a scatter plot (~12,000 data points) in 150 milliseconds. Epiviz provides a powerful and flexible extension system through Chart (for extensions with user-provided d3.js visualizations) and Data Provider (for integration of data sources) interfaces. Epivizr also provides direct support for data types defined in the Bioconductor infrastructure[11], used in many of its software packages[12,13], supporting interactive visualization directly for packages that extend its data types. Users can integrate data in SAM or BAM files in their visualizations through this infrastructure. We illustrate the power of truly interactive visual computing using an integrative analysis of DNA methylation and exon-level expression data in colon cancer. Loss of methylation in large, gene-poor, domains associated with heterochromatin and nuclear lamina binding is an early and consistent event in colon tumorgenesis[14]. We replicate the analysis in Hansen et al.[14] for a chromosome 11 region (Fig. 1, the persistent Epiviz workspace can be accessed at http://epiviz.cbcb.umd.edu/?ws=cDx4eNK96Ws) allowing us to interactively inspect the overlap of regions of methylation loss in colon cancer and partially methylated domains (PMDs) reported in fibroblast[15]. We observed that multiple cancer types show similar expression patterns within these hypomethylation blocks (Supplementary Fig. 5) where genes are silent in normal tissues and activated in tumors. We also inferred long blocks of methylation difference in colon cancer using the minfi package[16] in Bioconductor from Illumina HumanMethylation450k beadarray data from the Cancer Genome Atlas project[17]. We used Epivizr to visually analyze the overlap of detected blocks in the TCGA samples using the 450k beadarray and the colon cancer blocks reported by Hansen et al.[14]. We found that the 450k blocks displayed high overlap with sequencing blocks (Figure 2). The method used in minfi for the 450k array ignores methylation measurements in CpG islands by design, so that long blocks of methylation change would span across CpG islands. The algorithm in Hansen et al. did not use this design, so blocks are frequently punctuated by CpG islands. Using Epivizr confirmed that the minfi procedure works as expected (Supplementary Fig. 6).
Figure 2

Integrative analysis of Illumina HumanMethylation450k data and exon-level RNAseq data using Epivizr. Regions of hypomethylation blocks obtained from TCGA data using the 450k array (bottom track) shown along with regions obtained from sequencing data (Hansen et al.) on independent samples. An MA plot (top) of exon-level RNA-seq data from the TCGA project over the same region (the MA transformation was obtained using the computed measurements tool in the Epiviz UI).

We next obtained exon-level RNAseq data from the Cancer Genome Atlas (TCGA) project[17]. RNAseq data can be referenced by genomic location (exon-level coverage, counting the number of fragments aligned to a specific exon), and by feature, e.g., transcript, or gene expression. The multi-perspective organization of Epiviz is designed for this type of analyses. We integrated this data using Epivizr and created an MA plot based on exon-level expression using the computed measurements feature on the Epiviz web application (Fig. 2). We observed the association between higher expression in cancer, now at exon-level, and hypo-methylation blocks for specific genes—the MMP gene family (Fig. 2). Note that the MA transformation could also be applied on the R session, demonstrating the flexibility and power of a hybrid statistical analysis environment integrated with a modern, powerful visualization tool. We further analyzed the correlation between exon-level expression and DNA methylation. To support visualization for this analysis, we created a track-based visualization for continuous measurements of exon-level expression. We used the Epiviz Chart API to include JavaScript files defining the new d3.js visualization hosted on GitHub Gist and are loaded into Epiviz from there. Using this we defined metadata and rendering code for the new exon-level expression visualization track. An overview visualization of the data confirmed the observation that hypo-methylated blocks are gene-poor[14] (Supplementary Fig. 7) and that exons in both normal and cancer tissues tend to be globally silenced within blocks, consistent with their association with heterochromatin, while exons outside blocks tend to be expressed (Supplementary Fig. 8). Epiviz is the first system to provide tight integration between a state-of-the-art analytics platform and a modern, powerful, integrative visualization system for functional genomics. Infrastructure from the core Bioconductor team and hundreds of contributed packages are used in a large number of projects analyzing data that ranges from expression microarrays to next-generation sequencing. The development of interactive visualization tools based on the Bioconductor infrastructure immediately supports a number of widely used, state-of-the-art methods for a) ChIPseq where iterative visualization of data and results of peak-calling algorithms is necessary; b) RNA-seq analyes where both location-based coverage and feature-based expression levels are required; c) methylation analyses using where location-based analysis at multiple genomic scales is important. By supporting interactive visualization of fundamental data structures provided by Bioconductor, developers of new methods using this framework can immediately benefit from the powerful, extensible visualizations of Epiviz. The Galaxy platform[18] provides integration of analysis and visualization, but targets a different type of interaction as that provided by Epiviz Bioconductor defines data structures that allow direct interactive and exploratory data manipulation that is immediately reflected in the Epiviz visualization environment. Integration of analysis and visualization in Galaxy is geared toward pipeline workflows. Epiviz is an extensible platform that may incorporate Canvas-based graphics[19]in the future, while targeting integration with interactive data environments beyond extensibility capabilities available in current browsers[4].

Online Methods

Annotation Data

We obtained annotation data from the UCSC genome browser for hg19. PMDs were obtained from Lister et al., generated from bisulfite sequencing in fibroblast cells[15]. DNAm data and hypomethylation block regions were obtained from Hansen et al.[14]. Affymetrix hgu133plus2 expression data was obtained from the Gene Expression Barcode project[7].

Illumina HumanMethylation450k beadarray data

IDAT files for 17 normal colon and 34 colon tumor samples were obtained from the TCGA project[17]. All processing was performed using the minfi Bioconductor package. Data was preprocessed and normalized using the standard Illumina method, hypomethylation block finding was performed using the method in minfi.

RNA-seq data

Raw count tables at the exon level were obtained for 3 normal colon and 37 colon tumor samples from the TCGA project[17]. Counts were normalized for library size using the DESeq method[12]. Exon annotation using UCSC ids were included by the TCGA project.

Software

Analyses were performed using Bioconductor packages minfi (v. 1.8.9) and epivizr (v. 1.3.3). The Epiviz web application is hosted at http://epiviz.cbcb.umd.edu, the Epivizr Bioconductor package is available through the Bioconductor project. JavaScript files defining exon-level expression visualization tracks are available as Github Gists (http://gist.github.com/11279474 and http://gist.github.com/11279449). Open source code for all components is available in the Epiviz project github page: http://github.com/epiviz, API descriptions and other documentation for Epiviz is available online at http://epiviz.cbcb.umd.edu/help.
  18 in total

1.  D³: Data-Driven Documents.

Authors:  Michael Bostock; Vadim Ogievetsky; Jeffrey Heer
Journal:  IEEE Trans Vis Comput Graph       Date:  2011-12       Impact factor: 4.579

2.  BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis.

Authors:  Steffen Durinck; Yves Moreau; Arek Kasprzyk; Sean Davis; Bart De Moor; Alvis Brazma; Wolfgang Huber
Journal:  Bioinformatics       Date:  2005-08-15       Impact factor: 6.937

3.  Scribl: an HTML5 Canvas-based graphics library for visualizing genomic data over the web.

Authors:  Chase A Miller; Jon Anthony; Michelle M Meyer; Gabor Marth
Journal:  Bioinformatics       Date:  2012-11-19       Impact factor: 6.937

4.  Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays.

Authors:  Martin J Aryee; Andrew E Jaffe; Hector Corrada-Bravo; Christine Ladd-Acosta; Andrew P Feinberg; Kasper D Hansen; Rafael A Irizarry
Journal:  Bioinformatics       Date:  2014-01-28       Impact factor: 6.937

5.  The Human Epigenome Browser at Washington University.

Authors:  Xin Zhou; Brett Maricque; Mingchao Xie; Daofeng Li; Vasavi Sundaram; Eric A Martin; Brian C Koebbe; Cydney Nielsen; Martin Hirst; Peggy Farnham; Robert M Kuhn; Jingchun Zhu; Ivan Smirnov; W James Kent; David Haussler; Pamela A F Madden; Joseph F Costello; Ting Wang
Journal:  Nat Methods       Date:  2011-11-29       Impact factor: 28.547

6.  Highly integrated single-base resolution maps of the epigenome in Arabidopsis.

Authors:  Ryan Lister; Ronan C O'Malley; Julian Tonti-Filippini; Brian D Gregory; Charles C Berry; A Harvey Millar; Joseph R Ecker
Journal:  Cell       Date:  2008-05-02       Impact factor: 41.582

7.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

8.  Increased methylation variation in epigenetic domains across cancer types.

Authors:  Kasper Daniel Hansen; Winston Timp; Héctor Corrada Bravo; Sarven Sabunciyan; Benjamin Langmead; Oliver G McDonald; Bo Wen; Hao Wu; Yun Liu; Dinh Diep; Eirikur Briem; Kun Zhang; Rafael A Irizarry; Andrew P Feinberg
Journal:  Nat Genet       Date:  2011-06-26       Impact factor: 38.330

9.  Differential expression analysis for sequence count data.

Authors:  Simon Anders; Wolfgang Huber
Journal:  Genome Biol       Date:  2010-10-27       Impact factor: 13.583

10.  Web-based visual analysis for high-throughput genomics.

Authors:  Jeremy Goecks; Carl Eberhard; Tomithy Too; Anton Nekrutenko; James Taylor
Journal:  BMC Genomics       Date:  2013-06-13       Impact factor: 3.969

View more
  25 in total

1.  NetworkAnalyst for statistical, visual and network-based meta-analysis of gene expression data.

Authors:  Jianguo Xia; Erin E Gill; Robert E W Hancock
Journal:  Nat Protoc       Date:  2015-05-07       Impact factor: 13.491

2.  Visualizing epigenomic data.

Authors:  Vivien Marx
Journal:  Nat Methods       Date:  2015-06       Impact factor: 28.547

3.  A Compendium of Bioinformatic Tools for Bacterial Pangenomics to Be Used by Wet-Lab Scientists.

Authors:  Camilla Fagorzi; Alice Checcucci
Journal:  Methods Mol Biol       Date:  2021

Review 4.  Orchestrating high-throughput genomic analysis with Bioconductor.

Authors:  Wolfgang Huber; Vincent J Carey; Robert Gentleman; Simon Anders; Marc Carlson; Benilton S Carvalho; Hector Corrada Bravo; Sean Davis; Laurent Gatto; Thomas Girke; Raphael Gottardo; Florian Hahne; Kasper D Hansen; Rafael A Irizarry; Michael Lawrence; Michael I Love; James MacDonald; Valerie Obenchain; Andrzej K Oleś; Hervé Pagès; Alejandro Reyes; Paul Shannon; Gordon K Smyth; Dan Tenenbaum; Levi Waldron; Martin Morgan
Journal:  Nat Methods       Date:  2015-02       Impact factor: 28.547

5.  ChAsE: chromatin analysis and exploration tool.

Authors:  Hamid Younesy; Cydney B Nielsen; Matthew C Lorincz; Steven J M Jones; Mohammad M Karimi; Torsten Möller
Journal:  Bioinformatics       Date:  2016-07-04       Impact factor: 6.937

6.  An interactive environment for agile analysis and visualization of ChIP-sequencing data.

Authors:  Mads Lerdrup; Jens Vilstrup Johansen; Shuchi Agrawal-Singh; Klaus Hansen
Journal:  Nat Struct Mol Biol       Date:  2016-02-29       Impact factor: 15.369

7.  Gosling: A Grammar-based Toolkit for Scalable and Interactive Genomics Data Visualization.

Authors:  Sehi L'Yi; Qianwen Wang; Fritz Lekschas; Nils Gehlenborg
Journal:  IEEE Trans Vis Comput Graph       Date:  2021-12-30       Impact factor: 4.579

8.  Plotgardener: Cultivating precise multi-panel figures in R.

Authors:  Nicole E Kramer; Eric S Davis; Craig D Wenger; Erika M Deoudes; Sarah M Parker; Michael I Love; Douglas H Phanstiel
Journal:  Bioinformatics       Date:  2022-02-04       Impact factor: 6.931

9.  Panoptes: web-based exploration of large scale genome variation data.

Authors:  Paul Vauterin; Ben Jeffery; Alistair Miles; Roberto Amato; Lee Hart; Ian Wright; Dominic Kwiatkowski
Journal:  Bioinformatics       Date:  2017-10-15       Impact factor: 6.937

10.  gEAR: Gene Expression Analysis Resource portal for community-driven, multi-omic data exploration.

Authors:  Joshua Orvis; Brian Gottfried; Jayaram Kancherla; Ricky S Adkins; Yang Song; Amiel A Dror; Dustin Olley; Kevin Rose; Elena Chrysostomou; Michael C Kelly; Beatrice Milon; Maggie S Matern; Hela Azaiez; Brian Herb; Carlo Colantuoni; Robert L Carter; Seth A Ament; Matthew W Kelley; Owen White; Hector Corrada Bravo; Anup Mahurkar; Ronna Hertzano
Journal:  Nat Methods       Date:  2021-08       Impact factor: 47.990

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.