Jayaram Kancherla1,2, Alexander Zhang3, Brian Gottfried3, Hector Corrada Bravo1,2,3. 1. Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, Maryland, 20742, USA. 2. University of Maryland Institute of Advanced Computer Studies, University of Maryland, College Park, College Park, Maryland, 20742, USA. 3. Department of Computer Science, University of Maryland, College Park, College Park, Maryland, 20742, USA.
Abstract
Interactive and integrative data visualization tools and libraries are integral to exploration and analysis of genomic data. Web based genome browsers allow integrative data exploration of a large number of data sets for a specific region in the genome. Currently available web-based genome browsers are developed for specific use cases and datasets, therefore integration and extensibility of the visualizations and the underlying libraries from these tools is a challenging task. Genomic data visualization and software libraries that enable bioinformatic researchers and developers to implement customized genomic data viewers and data analyses for their application are much needed. Using recent advances in core web platform APIs and technologies including Web Components, we developed the Epiviz Component Library, a reusable and extensible data visualization library and application framework for genomic data. Epiviz Components can be integrated with most JavaScript libraries and frameworks designed for HTML. To demonstrate the ease of integration with other frameworks, we developed an R/Bioconductor epivizrChart package, that provides interactive, shareable and reproducible visualizations of genomic data objects in R, Shiny and also create standalone HTML documents. The component library is modular by design, reusable and natively extensible and therefore simplifies the process of managing and developing bioinformatic applications.
Interactive and integrative data visualization tools and libraries are integral to exploration and analysis of genomic data. Web based genome browsers allow integrative data exploration of a large number of data sets for a specific region in the genome. Currently available web-based genome browsers are developed for specific use cases and datasets, therefore integration and extensibility of the visualizations and the underlying libraries from these tools is a challenging task. Genomic data visualization and software libraries that enable bioinformatic researchers and developers to implement customized genomic data viewers and data analyses for their application are much needed. Using recent advances in core web platform APIs and technologies including Web Components, we developed the Epiviz Component Library, a reusable and extensible data visualization library and application framework for genomic data. Epiviz Components can be integrated with most JavaScript libraries and frameworks designed for HTML. To demonstrate the ease of integration with other frameworks, we developed an R/Bioconductor epivizrChart package, that provides interactive, shareable and reproducible visualizations of genomic data objects in R, Shiny and also create standalone HTML documents. The component library is modular by design, reusable and natively extensible and therefore simplifies the process of managing and developing bioinformatic applications.
Entities:
Keywords:
bioinformatics; epigenetics; genomics; visualization; web components
The complex and diverse genomic data sets require flexible software libraries and tools to perform integrative data exploration and analyses. Web-based genome browsers and genomic data visualization tools like the UCSC Genome Browser
[1] and the Integrated Genomics Viewer
[2] are developed for specific use cases i.e., integrative data exploration of a large number of datasets for a region in the genome. Genomic exploration of data on these platforms is usually track-based, where the data is aligned to a reference genome and visualized as a line track. Since these tools are developed for specific use cases, integration and extensibility of these visualizations and libraries is a challenging task.The Web as a platform has been used to serve static HTML documents traditionally. The implementation of
HTML5 and the newer APIs made the Web more of a platform that supports rich and dynamic web applications. But HTML is still restrictive and limited to the tags/elements defined as part of the markup language and is not extensible. Various existing frameworks like
Vue.js and
React have introduced modular components, but components built for one framework do not work with another framework. Newer web platform APIs and technologies like
Web Components introduced a standards-based component model that allows developers to create custom HTML elements that are natively extensible and reusable. Custom components work across modern web browsers and can be used along with most JavaScript libraries or frameworks designed for HTML. Web Components provide the ability to natively extend, import and encapsulate HTML elements. This makes the process of creating and managing web applications easier and a much smoother process. These components are modular, making the code cleaner and less expensive to maintain compared to JavaScript libraries and frameworks like BioJs
[3].We present the Epiviz Component Library, an open source reusable and extensible data visualization library and application framework for functional genomic data. Building upon the Web Component framework, we developed various HTML elements/tags as part of our design as shown in
Figure 1. The
visualization components (
are the core of the library and render extensible and interactive track and feature-based charts. In addition to the chart library, we developed components for creating interactive genomic applications for different use cases and datasets. These include
app components (
and
) to coordinate interactions (linking data across visualizations to implement brushing and events) and manage layouts,
datasource components including
to manage data requests from a web server or WebSocket backend and
for handling user authentication and to create shareable and reproducible visual analytical workspaces. The design of the component library is based on visualizations and features of the Epiviz
[4] web application for visual exploration and analysis of functional genomic data.
Figure 1.
Overview of Epiviz web components architecture.
The epiviz web components architecture is organized into three categories: 1)
visualization components is a library of extensible and interactive D3JS based chart components specifically designed for genomic data; 2)
app components are responsible for managing the layouts, events arising from genomic coordinate navigation, linking data across visualization components to implement brushing and coordinating data requests across multiple charts; 3)
datasource components manage requests to web server or WebSocket connections using the
epiviz-data-source component.
epiviz-workspace handles user authentication and saves the state of the app to a google firebase instance allowing users to create shareable and reproducible visual analytics workspaces.
Overview of Epiviz web components architecture.
The epiviz web components architecture is organized into three categories: 1)
visualization components is a library of extensible and interactive D3JS based chart components specifically designed for genomic data; 2)
app components are responsible for managing the layouts, events arising from genomic coordinate navigation, linking data across visualization components to implement brushing and coordinating data requests across multiple charts; 3)
datasource components manage requests to web server or WebSocket connections using the
epiviz-data-source component.
epiviz-workspace handles user authentication and saves the state of the app to a google firebase instance allowing users to create shareable and reproducible visual analytics workspaces.Bioconductor
[5] is an open source community that develops bioinformatics software tools and pipelines. Ease of developing integrative analyses and a framework for interactive visualizations is one of the core infrastructure needs of the Bioconductor community. Since the web components introduced in this paper can be easily embedded or integrated with any web-based application, the library reduces the effort to visualize and create applications for genomic datasets encapsulated in Bioconductor infrastructure data representations. We developed an R/Bioconductor package,
[6] to visualize genomic data objects within HTML documents created using
RMarkdown. We also integrated our components with
Shiny
[7], a web application framework for R to interactively visualize functional genomic data.
Methods
Implementation
.
components are a collection of reusable and extensible data visualizations specifically designed for genomic data. The library provides multiple data visualizations for both location (visualizing data along the genome genes tracks (
epiviz-genes-track) or line tracks (
epiviz-line-track)) and feature based data (visualizing quantitative measurements like gene expression with scatterplots (
epiviz-scatter-plot) and heatmaps (
epiviz-heatmap-plot)). We use
D3.js
[8] (version: 3.5.17) JavaScript library to render customizable and interactive charts.An
component requires two attributes to render a visualization on the page 1)
data attribute - a JSON (JavaScript Object Notation) representation of genomic data. 2)
dimensions (or columns) from the
data attribute to visualize.
Figure 2 demonstrates the ease of embedding or adding an
to a HTML document or web application.
Figure 2.
Example using components in HTML page.
Epiviz components can be inserted in any HTML page using tags defined by the component library (e.g.,
epiviz-json-scatter-plot in this example). Data is supplied to the chart via the
json-data attribute of the HTML tag. In this example, we show a sample JSON object representing genomic data. In this figure, we are only showing the first 5 data points although the plot renders more visual objects. When used in conjunction with
epiviz-data-source components, data can be queried from a web server or via a WebSocket connection through a corresponding assignment of the
json-data attribute. Adding the epiviz element to the HTML page renders the interactive scatter plot.
Example using components in HTML page.
Epiviz components can be inserted in any HTML page using tags defined by the component library (e.g.,
epiviz-json-scatter-plot in this example). Data is supplied to the chart via the
json-data attribute of the HTML tag. In this example, we show a sample JSON object representing genomic data. In this figure, we are only showing the first 5 data points although the plot renders more visual objects. When used in conjunction with
epiviz-data-source components, data can be queried from a web server or via a WebSocket connection through a corresponding assignment of the
json-data attribute. Adding the epiviz element to the HTML page renders the interactive scatter plot.components are reactive components that render the visualization only after the
json-data attribute is initialized on the element. Any change to the
json-data attribute triggers an event to revisualize the chart. Visualizations are extensible and easily customizable to define various settings and colors. To demonstrate the extensibility of the components, we created a component
epiviz-genes-table extending
epiviz-genes-track and displays a table of all the genes in the current genomic region (
Supplementary File 1).In addition to visualizing data, chart elements can also perform client-side operations on data sets/measurements. For example, if an
epiviz-line-track is visualizing methylation data from multiple samples (tumor and normal), samples can be aggregated using a metric (mean, min, max, etc.) to visualize the difference in methylation between normal and tumor samples. Similarly,
epiviz-heatmap-plot interactively and dynamically clusters data and renders the clustered dendrogram. Settings are available to change clustering type and the distance metrics.Chart components provides performance optimizations for visualizing large amounts of data by precomputing and grouping overlapping data points to a single visual object on the chart. This minimizes the number of overlapping data points to visualize and reduce rendering time of charts.Data modelThe
json-data attribute on an epiviz element is a JSON object that represents genomic data in a columnar format as shown in
Figure 2. The required
keys in the JSON are
chr, start,
end and
data columns to visualize. Developers can also extend the
epiviz-data-behavior element to implement custom data parsers and formats.Linked data selection/brushingChart components implement a linked data selection/highlighting (brushing) feature, to provide a quick overview and visually link the highlighted genomic region across all visualizations and datasets. The linking happens on the client side by finding positions that overlap with the highlighted region. In feature-based visualizations, for example in scatter plots and heatmaps, the visual objects on the chart are aggregated and mapped to multiple data objects across genomic regions. This mapping allows for implementing brushing and propagating events to other charts when using plots. In track-based visualization, events for brushing and selection are propagated based on the region (
chr, start and
end) in the chart.Another essential part of the epiviz design is that data and plots are separated. Users can visualize multiple charts from the same data object without having to replicate the data. This way data queries are made by the data object and not per chart, which leads to a more responsive design of the system.components are simple user interface (UI) elements. They cannot make data requests or can directly interact with other
elements on the page. Chart elements create hover events that propagate up the document object model (DOM) hierarchy. To build interactive web applications or to coordinate interactions by linking data across charts, implement brushing and manage data requests across chart elements, we encapsulate charts inside app components..
components are abstract components that 1) Manage layouts of multiple visualizations, 2) Coordinate interactions across charts by genomic position to implement brushing, and 3) Manage data requests.There are two different types of
elements -elements are not linked to a specific genomic region. If a genomic region (
chr,
start &
end attributes) is not initialized on the element, charts visualize the entire data set genome-wide. This helps identify patterns or interesting regions in the dataset and then investigate specific regions of interest.is a specific instance of
with genomic region linked to the element using the
chr,
start and
end attributes. Navigation elements provide UI functionality to search for a gene/microarray probe (since we serve data from the Gene Expression Barcode project
[9]) or update the location to a specific region of interest.
Figure 3 (bottom) shows a navigation element with various charts when expanded. The top header bar contains functionality to navigate left/right and zoom in/out around the current genomic location. Navigation elements implement the usual genome browser interactions (pan, zoom, location input and gene name search). The chromosome location text box identifies the current location of the navigation element and can be updated to change the genomic region. Hovering over the chromosome location sends a brushing event to highlight this region across other charts encapsulated within the component. Navigation elements can be collapsed (as shown in the top panel) to allow users to flexibly focus on specific genomic regions of interest while providing an overview of other regions of interest. When collapsed, navigation components show an ideogram of the corresponding chromosome with an indication of the specific genomic region encompassed within the components (yellow rectangle). No data requests are made from charts within collapsed navigation components.
Figure 3.
Overview of the Epiviz2 web application for Epigenome Roadmap data.
In this workspace, we explore data from the Epigenome Roadmap project in two genomic regions simultaneously (
Epiviz Navigation components) along with a genome-wide scatterplot of gene expression (top left). The
environment element is not constrained to a specific genomic region, and hence charts included within them visualize entire datasets. In this example, the scatter plot in the top left shows RNA-seq data for esophagus and colon tissues across the entire genome.
EpivizNavigation components, on the other hand, are constrained to specific genomic regions. Given genomic regions or genes of interest in the dataset to further investigate, multiple navigation elements, each corresponding to distinct genomic regions can be added to the workspace. In this example, the navigation element at the bottom of the page visualizes (in order from top to bottom): 1) a genes track showing gene location span and strand, 2) a stacked-blocks track of ChIP-seq peaks in esophagus and colon across two different histone markers (H3K27me3 and H3k9me3), and 3) a line track that visualizes the fold change signal data for the same ChIP-seq data. The line track shows that the region around the gene “ATP6V1C1” shows a peak for H3K27me3 in
but not in
. The stacked blocks track compares the peak regions with other histone markers (H3K9me3). We can also investigate this region further by exploring methylation and gene expression data from these tissues by adding a navigation element (top right). The component library provides and interactive and integrative environment for genomic data exploration. This example workspace can be accessed at
http://epiviz2.cbcb.umd.edu/#/epiviz-C7O4UmIb.
Overview of the Epiviz2 web application for Epigenome Roadmap data.
In this workspace, we explore data from the Epigenome Roadmap project in two genomic regions simultaneously (
Epiviz Navigation components) along with a genome-wide scatterplot of gene expression (top left). The
environment element is not constrained to a specific genomic region, and hence charts included within them visualize entire datasets. In this example, the scatter plot in the top left shows RNA-seq data for esophagus and colon tissues across the entire genome.
EpivizNavigation components, on the other hand, are constrained to specific genomic regions. Given genomic regions or genes of interest in the dataset to further investigate, multiple navigation elements, each corresponding to distinct genomic regions can be added to the workspace. In this example, the navigation element at the bottom of the page visualizes (in order from top to bottom): 1) a genes track showing gene location span and strand, 2) a stacked-blocks track of ChIP-seq peaks in esophagus and colon across two different histone markers (H3K27me3 and H3k9me3), and 3) a line track that visualizes the fold change signal data for the same ChIP-seq data. The line track shows that the region around the gene “ATP6V1C1” shows a peak for H3K27me3 in
but not in
. The stacked blocks track compares the peak regions with other histone markers (H3K9me3). We can also investigate this region further by exploring methylation and gene expression data from these tissues by adding a navigation element (top right). The component library provides and interactive and integrative environment for genomic data exploration. This example workspace can be accessed at
http://epiviz2.cbcb.umd.edu/#/epiviz-C7O4UmIb.App elements coordinate events across charts, i.e., when a chart element is highlighted, an event is propagated to all other charts in the workspace (including those visualizing genome-wide data). App elements also manage layouts for positioning and resizing chart elements. The default grid layout splits the available width into six equal columns. When charts are added to a workspace, track-based charts extend across all the six columns but plot-based chart elements only span across two columns. App components have the functionality to navigate the genome and add new visualizations. Adding a new visualization opens a measurement browser, a UI interface that allows filtering and selection of measurements across different data sets.App components can also detect if the application or page has an active web server or WebSocket connection initialized using the datasource components. If the page has no active datasource component, interactive features that generate data requests (for example – navigating to a new genomic region or adding new charts) are disabled..
component provides functionality for the epiviz app components to interact with an active web server or a WebSocket connection. Datasource components require the API endpoint (
provider-url) attribute where the web server or WebSocket is located and the
provider-type attribute that specifies if it’s a web server or a WebSocket connection. When the user interacts with epiviz components, for example, adding a new visualization or navigating to a new genomic region, these interactions generate data requests that are eventually propagated and managed by the datasource elements.WebServer data providerWe developed a
Python Flask (version 0.12.4) based data provider that queries genomic data stored in MySQL database and responds to data requests. The data provider enables summarization where we bin small regions together and average the value for the measurements. We see a significant improvement in draw times of charts by summarizing data as discussed in the Benchmarks section of this paper. We also implemented data import functions for commonly used Bioconductor datatypes like
GenomicRanges,
SummarizedExperiment, etc., in our R/Bioconductor
.WebSocket data providerThe JavaScript data types that manage genomic data in epiviz components are designed similarly to Bioconductor data types. This enables easy integration and visualization of Bioconductor data objects using the visualization components. The R/Bioconductor
package is an API to interactively visualize Bioconductor data objects. We discuss more about
package in the Use case section of the paper..
component is built upon the
Google Firebase infrastructure to manage user authentication, create shareable and reproducible visual analysis workspaces. Workspace components are easily reconfigurable and allow developers to customize this component to their firebase instance.
Operation
Web components are a set of standardized browser APIs still being implemented across various browsers. Web components implement the Shadow DOM feature, wherein the element defined by the component is rendered separately from the rest of the HTML document avoiding namespace collisions and is isolated to keep element styling and access private to the element. Web Components are natively supported in Chrome and Safari and are still in development in Mozilla Firefox and Microsoft Edge browser.Epiviz Components are developed using the
Google Polymer library. For browsers without native web component support, the Google Polymer library provides polyfill that helps developers use components seamlessly with little performance overhead. It uses a dynamic loader to lazy load polyfill libraries for missing implementations. Documentation on attributes and methods in epiviz components is available from
GitHub. The component library can visualize data by adding the chart tag to a HTML page with the data attribute as shown in
Figure 2.The epivizrChart package requires R version 3.4.0 or higher and packages from Bioconductor version 3.6 or higher. The memory requirements for using the epivizrChart package depends on the size of the dataset. However, for most use cases, a standard laptop will handle most applications visualizing data using the component library and the epivizrChart package. To visualize a Bioconductor data object, supply the supported object to the
epivizChart() function.
Use cases
Epiviz2 web application
is an interactive and integrative genome browser that sends requests to a Python Flask data provider and a MySQL database.
allows users to interactively explore and simultaneously visualize datasets across multiple genomic regions, a feature not available in most current genome browsers. The real advantage of the genome browser lies in the ability to visualize data from multiple regions of the genome or the entire dataset to identify genomic regions of interesting patterns or outliers. Users can then further explore and visualize annotations or measurements from other datasets in these regions to gain insights.
Figure 3 illustrates this workflow of exploratory data analysis. The gene expression scatter plot is encapsulated inside the environment element and visualizes the entire dataset, whereas the navigation elements are linked to a specific genomic region. We also implemented a color by region for genome-wide scatter-plots, where visual objects in the scatter plot will be colored with a different color specific to each of the genomic regions shown in navigation elements. Our instance of the
application is hosted
here.The
instance we host at the University of Maryland contains data from the NIH Roadmap Epigenomics
[10] project. The NIH Roadmap Epigenomics Mapping Consortium leverages next-generation sequencing technologies to map DNA methylation, histone modifications, chromatin accessibility and small RNA transcripts in tissues selected to represent the normal counterparts of tissues and organ systems frequently involved in human disease. Our instance of the roadmap database contains DNA methylation, RNA seq, and histone modification (for markers: h3k9ac, h3k9me3, h3k27ac, h3k27me3) fold change and peak data for seven different tissue types – Breast Myoepithelial cells, Brain Hippocampus Middle, Lung, Liver, Sigmoid Colon, Pancreas and Esophagus. The corresponding data files are downloaded from Bioconductor’s
repository and imported into the MySQL database using the functions available in the
package.
epivizrChart R/Bioconductor package
The Bioconductor open source software community creates bioinformatics workflows and pipelines to analyze and visualize genomic data sets. To support interactive visualization of Bioconductor data objects, we developed an R/Bioconductor package
, an API package to programmatically create and visualize genomic datasets using epiviz components without having to import data into a MySQL database.
demonstrates the ease of integration with existing frameworks and can create interactive web pages or RMarkdown documents as shown in
Figure 4. Integrating with a statistical and powerful state-of-the-art bioinformatics data analysis platform allows users to quickly explore, analyze and visualize genomic datasets with various packages available through Bioconductor.
Figure 4.
Interactive visualization of R/Bioconductor data objects using the
epivizrChart package.
This figure is part of an RMarkdown document and demonstrates the ease of integrating the visualizations from the Epiviz component library with existing frameworks. The
epivizChart function infers the chart type based on the data object parameter.
“Homo.sapiens” from the top panel is a UCSC Gene annotation object for human hg19 reference genome and is visualized by
epivizChart as a genes track. t
cga_colon_curves is a sample dataset from The Cancer Genome Atlas for colon tissue. This is a
GRanges object and is visualized as a Line Track. The
epivizrChart package can also programmatically create navigation elements. This enables interactions and brushing across the charts as shown in the bottom panel around the gene “PHLDB1”. A vignette describing more examples and use cases is available in the package either on the GitHub repository or through Bioconductor (
http://bioconductor.org/packages/release/bioc/html/epivizrChart.html).
Interactive visualization of R/Bioconductor data objects using the
epivizrChart package.
This figure is part of an RMarkdown document and demonstrates the ease of integrating the visualizations from the Epiviz component library with existing frameworks. The
epivizChart function infers the chart type based on the data object parameter.
“Homo.sapiens” from the top panel is a UCSC Gene annotation object for human hg19 reference genome and is visualized by
epivizChart as a genes track. t
cga_colon_curves is a sample dataset from The Cancer Genome Atlas for colon tissue. This is a
GRanges object and is visualized as a Line Track. The
epivizrChart package can also programmatically create navigation elements. This enables interactions and brushing across the charts as shown in the bottom panel around the gene “PHLDB1”. A vignette describing more examples and use cases is available in the package either on the GitHub repository or through Bioconductor (
http://bioconductor.org/packages/release/bioc/html/epivizrChart.html).. Using the
package in an
online mode creates an active WebSocket server and allows interactions between the components and the R-session. In
online mode, components make data requests using the WebSocket connection. In
offline mode, data is attached to the components and a standalone HTML page is generated. This allows researchers to create interactive, shareable and reproducible visualization documents.. Shiny is a web application framework to create standalone web applications on a webpage or in an RMarkdown document. Since Shiny supports HTML, epiviz components can be embedded or integrated in Shiny applications or dashboards to interactively visualize genomic data. The vignette
IntegrationWithShiny.Rmd in the
epivizrChart package demonstrates 1) a simple application that integrates Shiny to visualize R/Bioconductor data objects using epiviz components 2) interactions with non-epiviz components in Shiny as shown in
Figure 5.
Figure 5.
Interactive visualization of R/Bioconductor data objects in Shiny.
In this Shiny application, we explore gene expression from the Gene Expression Barcode Project
[11] for colon, lung and breast tissues for tumor and normal samples as a heatmap. We visualize annotation tracks for the genes and position of CpG islands in the current region. We also integrated IGV with epiviz components and the igv track (bottom right) displays the gene position and the aligned illumina reads for HG01879 sample from 1000 genomes
[12] project. The IGV track queries the file directly to get data and visualize the reads. We also have a genomic location text box (top left) that is a non-epiviz component and can be used to interact with epiviz components within the Shiny application. Changing the location, updates the genomic region in the navigation element and all charts.
Interactive visualization of R/Bioconductor data objects in Shiny.
In this Shiny application, we explore gene expression from the Gene Expression Barcode Project
[11] for colon, lung and breast tissues for tumor and normal samples as a heatmap. We visualize annotation tracks for the genes and position of CpG islands in the current region. We also integrated IGV with epiviz components and the igv track (bottom right) displays the gene position and the aligned illumina reads for HG01879 sample from 1000 genomes
[12] project. The IGV track queries the file directly to get data and visualize the reads. We also have a genomic location text box (top left) that is a non-epiviz component and can be used to interact with epiviz components within the Shiny application. Changing the location, updates the genomic region in the navigation element and all charts.
Benchmarks
We use the google chrome headless
tool to measure request times and chart draw times to compare our web component implementation of the
application (with Python-MySQL backend) to the current
[4] application (with PHP-MySQL backend). We compare the times by varying the genomic region on the scatter plot component (
) across two different backend implementations: 1) Summarized responses (current implementation), where we bin the genomic region into 2000 intervals and average the data values for the measurement within each interval and, 2) Unsummarized responses (previous epiviz implementation), where the entire dataset for the region is sent back to the UI. When visualizing large genomic regions, data points tend to overlap on scatter plots and other visualizations because of pixel and chart size limitations on the page. Summarizing reduces the draw times in rendering charts because of fewer overlapping points as shown in
Figure 6. However, the response times for data requests have not changed significantly because the computation time for summarization is usually similar to the time taken to transfer the entire dataset in the unscaled implementation. The scripts for the benchmarks are available in the
GitHub repository. The benchmark scripts can also save a screenshot of the page rendered to make sure that the page is completely loaded and rendered.
Figure 6.
Effect of data summarization on the Epiviz Python data.
Here we compare average data request and data rendering time for continuous data along the genome to study the effect of summarizing data on the data backend across 10 runs. Lines for ‘unsummarized (u)’ correspond to the previous Epiviz implementation where all data within a genomic region is returned by the php-backend to the web browser client. Lines for ‘summarized (s)’ correspond to our new implementation of the python-data backend, where data summarization within genomic regions is performed in the backend. The left panel shows the mean draw times between these scenarios where we see a significant improvement in the draw times when the data is summarized in the backend. The bar plot in the right panel shows the total http time and is separated to show mean latency times and data transfer times. The number of bytes transferred for summarized and unsummarized backends is also displayed. The error bars represent one standard deviation away from the mean draw time in the left panel and mean http time in the right panel. We observe that the total http request time (summarization plus data transfer) is comparable to transfer time for the larger unsummarized data scenarios.
Effect of data summarization on the Epiviz Python data.
Here we compare average data request and data rendering time for continuous data along the genome to study the effect of summarizing data on the data backend across 10 runs. Lines for ‘unsummarized (u)’ correspond to the previous Epiviz implementation where all data within a genomic region is returned by the php-backend to the web browser client. Lines for ‘summarized (s)’ correspond to our new implementation of the python-data backend, where data summarization within genomic regions is performed in the backend. The left panel shows the mean draw times between these scenarios where we see a significant improvement in the draw times when the data is summarized in the backend. The bar plot in the right panel shows the total http time and is separated to show mean latency times and data transfer times. The number of bytes transferred for summarized and unsummarized backends is also displayed. The error bars represent one standard deviation away from the mean draw time in the left panel and mean http time in the right panel. We observe that the total http request time (summarization plus data transfer) is comparable to transfer time for the larger unsummarized data scenarios.
Discussion
The component library is an extension to our
web application for visualizing functional genomic data sets. The component library is our solution to creating reusable and extensible visualization elements that work with any modern web browsers. The value of a data visualization library depends on its usability and easy integration with existing web frameworks. Epiviz components can be integrated with any framework that supports HTML.The Web has now become the platform for application development and the demand for modular, extensible and reusable frameworks like web components is on the rise. Since epiviz components are modular, we believe it simplifies the process of developing and managing genomic web applications. We also welcome developers to contribute to and extend our component library.
Conclusion
To our knowledge, the Epiviz component library is the first genomic data visualization library based on web components. The library provides an easy and efficient way for bioinformatics developers to add interactive data visualization features to their web applications or datasets with minimal programming experience. It is cross-platform, modular and runs on any modern web browser. We introduced our
web application to demonstrate the features and interactions that can be developed using the component library. We also showed the ease of integration with other frameworks by the R/Bioconductor
package, that provides interactive, reproducible visualizations of data objects in R and also create interactive standalone HTML documents.
Future work
One of the advantages of web components is that HTML is now more readable. With a more declarative implementation, elements can be self-descriptive. We would like to implement a visualization grammar
[13] similar to
ggvis as attributes/properties on the epiviz elements. We plan to further develop the library to extend our current set of visualizations and support various genomic data types including those implemented in
Metaviz
[14] an interactive and statistical metagenomic data browser. We plan to implement canvas-based rendering of charts to scale and significantly reduce draw times especially when rendering large datasets.
Data availability
The Datasets used for the use case describing the Epiviz Application come from the NIH Roadmap Epigenomics Project. The data files are downloaded from Bioconductor’s
repository and imported into the MySQL database using the functions available in the
package. For the epivizrChart package, the datasets used are included as part of the package. The vignettes describing the use cases are also available on
GitHub or through
Bioconductor.
Software availability
Epiviz component library is open sourced and is available on GitHub. The collection of components discussed in this article are available at:epiviz charts -
http://github.com/epiviz/epiviz-chartepiviz data -
http://github.com/epiviz/epiviz-data-sourceepiviz workspace -
http://github.com/epiviz/epiviz-workspaceepiviz app -
http://github.com/epiviz/epiviz-appThe scripts for benchmarks are available in the
repository. R/Bioconductor
package is available either through Bioconductor (
http://bioconductor.org/packages/release/bioc/html/epivizrChart.html) or GitHub (
http://github.com/epiviz/epivizrChart). Both the respositories also contain the vignettes described in
Figure 4 and
Figure 5. The Python Flask API data provider is available at
http://github.com/epiviz/epiviz-data-provider. Documentation is available at
http://epiviz.github.io.Archived source code at the time of publication –
https://doi.org/10.5281/zenodo.1299990
[15]Software license: MIT License.In this paper, Kancherla and colleagues present the Epiviz Component Library, an open source reusable and extensible data visualization library and application framework for functional genomic data. According to the authors, the Epiviz component library is the first genomic data visualization library based on web components. The library provides an easy and efficient way for bioinformatics developers to add interactive data visualization features to their web applications or datasets with minimal programming experience.Overall, I found the tools describe in the paper very useful and promising. It makes complex visualization of multiple types of omics data easy and convenient. The new infrastructure is built upon the established epiviz tool that the group has developed earlier. This group is highly experienced in genomics data visualization and are developing state-of-art infrastructure using the latest technologies. The utilities of the tools great outweigh their limitations and shortcomings. I only have a few minor points.Figure 3. In this use case, I don’t quite understand the connection between the gene expression scatter plot and the other panel. For me, the figure is more like the demonstration of all types of plots it can produce. It is not obvious what biological insight can be gleaned from the plots. It is unclear what kind of biological question the user is trying to ask when making these plots. To be fair, this is always difficult. In most cases, discovery is made unintentionally, out of luck. Hence just demonstrating all the plotting capabilities is probably okay. If so, may be a systematic catalog of all the plots that can be produced will be helpful.Figure 6 is very informative and interesting. I noticed that the numbers of bytes transferred for summarized and unsummarized backends are exactly the same for 10K, 100K and 1M, but dramatically different for 10M, 100M and chr. Why is that?It would be very helpful to the general audience if the authors can articulate in more details, and intuitively, the benefits and advantages of web components-based visualization library for genomics data, compared to the current technologies.I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.This paper describes a library for integrative genomic data exploration, using modern web technologies, particularly HTML Web Components, with solid R integration. The interactive figures in the paper are very interesting, and show how the library can be used to build basic dashboards or RMarkdown pages with interactive charts and genome views. The dashboards look a little spartan, and a bit cryptic at times, but are a great illustration of the potential for this kind of thing and could develop into something really powerful. Plus, with the Bioconductor integration, they offer a lot of ready-to-go useful data for use cases around human genomics. All round a good contribution, although I think marred (very slightly) by some opening text in the paper which presents the software as being very general-purpose and detached from specific use cases - a style of presentation which I view as a mistake that risks detracting from a reader’s clear understanding of what the software can actually do. Modularity is a virtue, but developers often overestimate readers' level of interest in it.For example, according to the authors, previous genome browsers have all been based on “specific use cases”, whereas Epiviz Web Components are more amenable to “integration and extensibility”. On closer reading, it turns out that the primary use case (the NIH Roadmap Epigenomics project) is just a bit further down in the paper, as is the platform (R/Bioconductor) currently required to load any useful data or the documentation vignettes. So I think the generality of the tool is possibly a bit hyperbolically described, in a way that risks obscuring the actual current uses (and software dependencies) of the tool.The authors claim that, in Epiviz Web Components, they have developed the first genome browser that uses Web Components, which form a collection of browser features and APIs that have emerged from JavaScript libraries like React. Due to the fast-changing nature of the web, technologies such as this emerge with rapidity and regularity, and I find it plausible that this is the first genome browser to use these tools. They should be useful and it’s interesting to see them described in practice. The Epiviz Web Components tool is a useful addition to the bioinformatics visualization ecosystem, bridging R and the modern web.With that said, I think perhaps that web bioinformatics software - and its associated publications - should be measured on several axes. Compliance with emerging web standards (such as Web Components) is certainly one such axis, but compliance with established bioinformatics standards would be another axis (how many formats/database schemas does the software support? how many other resources does it integrate with? what choices were made about which data sources to support, and which to omit? are those choices explained in the text? is there a guiding philosophy that can help inform readers who might be considering using this software?)In terms of bioinformatics compatibility, bioconductor’s AnnotationHub seems intended to be the primary way of loading data into the browser, though it isn’t quite presented this way. Bioconductor is mentioned in the abstract as an example that was developed “to demonstrate the ease of integration with other frameworks”; I think it would be more accurate to say that Bioconductor is the only framework that this tool supports, and while the tool was written in such a way as to be hypothetically platform-independent, that hypothesis has not yet been seriously tested.Epiviz Web Components does have its own JSON formats for data, which is promising in terms of backing up the claim that data import would be straightforward, but as far as I can tell there are no tools to import common file formats (FASTA, GFF, etc) and it’s not clear whether there are any plans for developing such import tools, or if the Bioconductor dependency will in practice be permanent.The paper’s interactive figures (Figures 3 & 4) are intriguing. I unfortunately had some problems installing the R from source (several dependencies of epivizrChart failed to build on my Mac Pro running macOS Sierra 10.12.6, R version 3.3.3), so I have not tried them on my local machine, only the JavaScript running in the web browser client. I found the captions to the Figures hard to follow: for example, I found myself wondering if there was a way for the user in the web browser to figure out how the components in Figure 3 (the genes track, stacked-blocks track, and line track) are linked, to use the same underlying data sets? In other words, does that linking only happen via configuration files and other back-end interfaces, or is it exposed to the end user? How were the markers H3K27me3 and H3k9me3 selected, why esophagus & colon, why drill into methylation and expression, what’s the biomedical back-story here? And just in terms of how to use the app to explore these cases as described, there could be more exposition. There are quite a lot of interactive buttons and menus and nested windows, so I was looking around for explanations of all that, passing by the documentation at https://epiviz.github.io/epiviz-chart/ (which is apparently oriented toward developers, not end-users, and is pretty minimal) and ending up at the https://epiviz.github.io/ video tutorial (“Epiviz quick tour”). I couldn’t find any thorough written description of the Epiviz web app, only that 3-minute YouTube video. I’m not very confident in my understanding of the totality of its capabilities after watching the video and looking at Figure 3. The potential to build pages out of RMarkdown, as shown in Figure 4, is pretty exciting.In summary: I would suggest improving the article by adding more extensive descriptions of the biological use cases, and how they might be investigated using the tool. Make another video if you insist, but what I’d really be looking for is clearly-written tutorial text with figures. This will complement and enhance the current concise description of the software, and offer an alternative approach for a broader class of readers.I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.Kancherla et al describe a set of re-usable web-components for the Epiviz tool, which are designed for visualizing genomic/epigenetic data in an integrative and interactive way. The components include the visualization components, app components and datasource components, are designed in a modular way and can be integrated with JavaScript libraries and other HTML frameworks. Moreover, there is an R-version available, which can be used with R / Shiny and to produce stand-alone HTML documents.The general concept and design of the Epiviz web components is very good. With growing amounts of -omics data, we need easy-to-use, re-usable and extensible code for interactive data visualization, since visualizing large-scale data enhances our understanding of inherent trends and features of the datasets under analysis. We therefore have a growing need for easy-to-use and re-usable software for their visualization; re-useable and extensible tools like Epiviz are highly sought-after.The manuscript is in general well written and the software has a nice look and feel. Though, there are a couple of points that should be addressed prior to indexing.While I find the description of the tools quite good and explicit, the description of the used biological examples, as well as the plots, is in my view too sparse. In fact, visualization only makes sense to the reader, if he/she understands, what is plotted and how to interpret the plots. With the manuscript at hand, as well as the user manual, this is not made easy. For instance, there is a general lack of axis labels on the scatter plots. E.g. Figure 2 and 3 show scatter plots, however from different data sources (peak positions/height or differential detection(?) on the genome, RNA-seq data (?)). What are the numbers on Figure 3 (ie the scatter plot of the RNA-seq data)? Are these rpkms? Log2 fold change? As the plots are quite useful and might be used directly for publication, this is a feature that should be implemented.Also, in the same Figure/example (methylation workspace following the given link of the legend to Figure 3), how does the RNA-seq data relate to the chip-seq/methylation data? Is there a possibility to see the identity of the genes, when hovering over the dots in the scatter plot? At least for the pre-selected example shown on the web-page, there is no brushing over from the zoomed-chromosome plot to the scatter plot and vice versa. However, this feature seems to work, when changing the chromosomal region. Is this due to the fact that the genes in the selected region are not present in the RNA-seq data? If so, the authors should think about changing the selected region for their demonstration.Also, what is the relationship of genes in the scatter plot that are highlighted together, when one is selected? Do they share the same enhancer/methylation peaks?It might be useful to provide information on the genes selected in the RNA-seq plot also via an info box or pop-up, which shows the name of the gene(s) and its(their) associated differential expression values.Is there the possibility to e.g. change the region of the chromosome-zoom/the chromosome view, when selecting dots in the scatter plot of the RNA-seq data? This seems currently not possible, however would be desirable.In the same example, the brushing of the chromosome view vs the zoomed-in chart does not work. When changing the genomic region, there is no highlighting any more in the chromosome view, so only the pre-selected region seems to work. This needs to be fixed.Finally, in the same plot, changing the chromosome for zoomed-in visualization has no effect on the whole-chromosome view, which I assume should also be updated. In my opinion, the whole-chromosome view is in its present form not really useful and could be omitted.In the same workspace (methylation workspace), there is a scatter plot at the bottom of the page; what do the numbers in the scatter plot refer to? See data labels problem already discussed above.On the binning of data values, the authors describe that the genomic region is binned into 2000 intervals. To which overall length does this refer to? Is always a fixed length chosen or can the user determine the length? Binning e.g. 100000 bp would then give quite different results than e.g. 1000000 bp.There are also some errors in the manuscript, more specifically in the description of Figure 3. It currently states:“In this example, the navigation element at the bottom of the page visualizes (in order from top to bottom): 1) a genes track showing … “Instead, it should read, I assume, as the genes track is at the bottom of the figure:“In this example, the navigation element at the bottom of the page visualizes (in order from
bottom to top): 1) a genes track showing … “I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Authors: W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler Journal: Genome Res Date: 2002-06 Impact factor: 9.043
Authors: Wolfgang Huber; Vincent J Carey; Robert Gentleman; Simon Anders; Marc Carlson; Benilton S Carvalho; Hector Corrada Bravo; Sean Davis; Laurent Gatto; Thomas Girke; Raphael Gottardo; Florian Hahne; Kasper D Hansen; Rafael A Irizarry; Michael Lawrence; Michael I Love; James MacDonald; Valerie Obenchain; Andrzej K Oleś; Hervé Pagès; Alejandro Reyes; Paul Shannon; Gordon K Smyth; Dan Tenenbaum; Levi Waldron; Martin Morgan Journal: Nat Methods Date: 2015-02 Impact factor: 28.547
Authors: Peter H Sudmant; Tobias Rausch; Eugene J Gardner; Robert E Handsaker; Alexej Abyzov; John Huddleston; Yan Zhang; Kai Ye; Goo Jun; Markus Hsi-Yang Fritz; Miriam K Konkel; Ankit Malhotra; Adrian M Stütz; Xinghua Shi; Francesco Paolo Casale; Jieming Chen; Fereydoun Hormozdiari; Gargi Dayama; Ken Chen; Maika Malig; Mark J P Chaisson; Klaudia Walter; Sascha Meiers; Seva Kashin; Erik Garrison; Adam Auton; Hugo Y K Lam; Xinmeng Jasmine Mu; Can Alkan; Danny Antaki; Taejeong Bae; Eliza Cerveira; Peter Chines; Zechen Chong; Laura Clarke; Elif Dal; Li Ding; Sarah Emery; Xian Fan; Madhusudan Gujral; Fatma Kahveci; Jeffrey M Kidd; Yu Kong; Eric-Wubbo Lameijer; Shane McCarthy; Paul Flicek; Richard A Gibbs; Gabor Marth; Christopher E Mason; Androniki Menelaou; Donna M Muzny; Bradley J Nelson; Amina Noor; Nicholas F Parrish; Matthew Pendleton; Andrew Quitadamo; Benjamin Raeder; Eric E Schadt; Mallory Romanovitch; Andreas Schlattl; Robert Sebra; Andrey A Shabalin; Andreas Untergasser; Jerilyn A Walker; Min Wang; Fuli Yu; Chengsheng Zhang; Jing Zhang; Xiangqun Zheng-Bradley; Wanding Zhou; Thomas Zichner; Jonathan Sebat; Mark A Batzer; Steven A McCarroll; Ryan E Mills; Mark B Gerstein; Ali Bashir; Oliver Stegle; Scott E Devine; Charles Lee; Evan E Eichler; Jan O Korbel Journal: Nature Date: 2015-10-01 Impact factor: 49.962
Authors: Anshul Kundaje; Wouter Meuleman; Jason Ernst; Misha Bilenky; Angela Yen; Alireza Heravi-Moussavi; Pouya Kheradpour; Zhizhuo Zhang; Jianrong Wang; Michael J Ziller; Viren Amin; John W Whitaker; Matthew D Schultz; Lucas D Ward; Abhishek Sarkar; Gerald Quon; Richard S Sandstrom; Matthew L Eaton; Yi-Chieh Wu; Andreas R Pfenning; Xinchen Wang; Melina Claussnitzer; Yaping Liu; Cristian Coarfa; R Alan Harris; Noam Shoresh; Charles B Epstein; Elizabeta Gjoneska; Danny Leung; Wei Xie; R David Hawkins; Ryan Lister; Chibo Hong; Philippe Gascard; Andrew J Mungall; Richard Moore; Eric Chuah; Angela Tam; Theresa K Canfield; R Scott Hansen; Rajinder Kaul; Peter J Sabo; Mukul S Bansal; Annaick Carles; Jesse R Dixon; Kai-How Farh; Soheil Feizi; Rosa Karlic; Ah-Ram Kim; Ashwinikumar Kulkarni; Daofeng Li; Rebecca Lowdon; GiNell Elliott; Tim R Mercer; Shane J Neph; Vitor Onuchic; Paz Polak; Nisha Rajagopal; Pradipta Ray; Richard C Sallari; Kyle T Siebenthall; Nicholas A Sinnott-Armstrong; Michael Stevens; Robert E Thurman; Jie Wu; Bo Zhang; Xin Zhou; Arthur E Beaudet; Laurie A Boyer; Philip L De Jager; Peggy J Farnham; Susan J Fisher; David Haussler; Steven J M Jones; Wei Li; Marco A Marra; Michael T McManus; Shamil Sunyaev; James A Thomson; Thea D Tlsty; Li-Huei Tsai; Wei Wang; Robert A Waterland; Michael Q Zhang; Lisa H Chadwick; Bradley E Bernstein; Joseph F Costello; Joseph R Ecker; Martin Hirst; Alexander Meissner; Aleksandar Milosavljevic; Bing Ren; John A Stamatoyannopoulos; Ting Wang; Manolis Kellis Journal: Nature Date: 2015-02-19 Impact factor: 69.504
Authors: Matthew N McCall; Harris A Jaffee; Susan J Zelisko; Neeraj Sinha; Guido Hooiveld; Rafael A Irizarry; Michael J Zilliox Journal: Nucleic Acids Res Date: 2013-11-22 Impact factor: 16.971