Literature DB >> 28061742

DEIVA: a web application for interactive visual analysis of differential gene expression profiles.

Jayson Harshbarger¹, Anton Kratz¹, Piero Carninci².

Abstract

BACKGROUND: Differential gene expression (DGE) analysis is a technique to identify statistically significant differences in RNA abundance for genes or arbitrary features between different biological states. The result of a DGE test is typically further analyzed using statistical software, spreadsheets or custom ad hoc algorithms. We identified a need for a web-based system to share DGE statistical test results, and locate and identify genes in DGE statistical test results with a very low barrier of entry.
RESULTS: We have developed DEIVA, a free and open source, browser-based single page application (SPA) with a strong emphasis on being user friendly that enables locating and identifying single or multiple genes in an immediate, interactive, and intuitive manner. By design, DEIVA scales with very large numbers of users and datasets.
CONCLUSIONS: Compared to existing software, DEIVA offers a unique combination of design decisions that enable inspection and analysis of DGE statistical test results with an emphasis on ease of use.

Entities: Disease Gene Species

Keywords: Differential gene expression; Interactive visual analysis; RNA-seq; Visualization; Web application

Mesh：

Year: 2017 PMID： 28061742 PMCID： PMC5219783 DOI： 10.1186/s12864-016-3396-5

Source DB: PubMed Journal: BMC Genomics ISSN： 1471-2164 Impact factor: 3.969

Background

RNA-seq [1] and other forms of gene expression profiling such as CAGE [2] are widely used for measuring RNA abundance profiles of various primary cells and cell lines [3]. By comparing the transcript abundance between two states, genes with statistically significant differences in expression levels can be identified [4]. In addition to large-scale, landscape-type analysis of such differentially expressed genes, often leading to long lists of Gene Ontology [5] terms, it is often desired to perform an interactive visual analysis of the results, focusing on comparatively few genes of interest, heavily dependent on the problem domain. While domain experts could perform such an analysis using spreadsheet software, scripting languages or statistical software such as R [6] and Ggobi [7], such an approach often requires implementing custom algorithms. Other systems are embedded within large frameworks [8] which necessitates the user to learn the system first, do not allow the user to upload custom data or are closed source [9]. Experienced bioinformaticians are familiar with existing gene expression profiling tools and, in a fast paced research environment, may perform this analysis often, quickly and routinely using these existing tools. However, sharing the results of DGE analysis with collaborators, including biologists and other researchers that may not be familiar with DE analysis tools, as flat files or static images has limited usability. Against this background, we saw a need for a software that enables interactive visual analysis of DGE with a strong emphasis on ease of use and ease of deployment, which meets user expectations to a modern web application. To address this need, we have developed DEIVA (Differential Expression Interactive Visual Analysis), a SPA to interactively identify and locate genes in a hexagonal binning (hexbin) density or scatter plot of DGE statistical test results, typically from a DESeq2 [10] or edgeR [11] analysis. In addition to identifying and locating genes, DEIVA allows visitors to download associated data and generated vector images. By providing domain experts (biologists) a means to quickly perform lookups on a differential gene expression test, DEIVA can be of use to bioinformaticians who want to share their results and at the same time make them accessible. DEIVA can easily be deployed by cloning a Git repository and adding custom datasets, then serving the SPA through any web server. Users can also try out the system through a live instance of DEIVA, including import and visualization of their own datasets [12], containing DGE statistical test results from Kratz 2014 [13]. Standalone desktop applications for various platforms are also available with each release.

Implementation

Interface

Figure 1 shows a view of the DEIVA interface. The user may select a pre-loaded DGE statistical test result from the dataset dropdown (Fig. 1a) or drag and drop the user’s own dataset into the visualization area. A density plot of log2 fold change vs. average expression is shown (Fig. 1b). Below the visualization a table of all expression data is displayed (Fig. 1c). Highlighting a region in the visualization limits the features shown in the table to those within that region. Zooming allows easier interaction in crowded regions of the plot.

Fig. 1

DEIVA interface. a Data set selector, symbol locator, and highlight filters. b The density plot on a field of log2 FC vs log10 baseMean for a DGE statistical test result. Symbols selected in the symbol locator (shown in (a)) are shown as points with matching colors. In this example comparing samples highly enriched for RNA attached to ribosomes of Purkinje neurons (positive fold change) with samples of unspecific RNA in the same brain region (negative fold change). Locating a set of already known markers for Purkinje neurons immediately confirms that the markers are specifically enriched. Hexagonal bins are colored red based on the fraction of features within that region that pass the cut-off filters; currently set at a log10 FDR ≤ −1, at any fold-change. c Sortable table of expression values for the region selected in the density plot (shown in (b)). Twelve highly overrepresented genes are selected (grey rectangle) in the plot and their information is reflected in this table A user can locate and highlight single or multiple symbols of interest, by typing them into the locate symbol box, selecting them from suggested matches, or by pasting lists of symbols. Such symbols of interest could include genes with expected behavior of fold change or marker genes corresponding to the compared states. In this way the user might see at one glance whether an experiment confirms expectations or needs to be examined in more detail. To see the effect of more relaxed or stringent criteria for calling a feature differentially expressed, the user can adjust the absolute log2 fold change, False Discovery Rate (FDR) and log10 baseMean cutoff filters using sliders. Features passing these filters will be indicated in red on the plot and the number of up- and down-regulated features will be displayed below the filters. At any time, the user can download the raw data or the current visualization as publication quality vector graphic in SVG format.

Input file formats and deployment

DEIVA accepts input files in tab or comma-separated ASCII describing the result of a DGE statistical test. Any algorithm can be used to generate an input file as long as it is possible to export average abundance, log2 fold change, and unique feature names. An optional column “symbol” makes it possible to specify gene symbols independent of the features in which gene expression has been measured (transcription start sites, probes). This accommodates scenarios where one gene may be associated with more than one feature during the DGE test. We anticipate that DEIVA will mostly be used with input generated by DESeq2 [10] and edgeR [11], and DEIVA accepts input files that can be directly written from these R packages. Detailed instructions on preparing files for input are part of the DEIVA documentation. DEIVA is an open source SPA, not a centralized server application, it is therefore easy to deploy multiple instances each with datasets ready to use directly or to share with collaborators. To deploy a custom instance of DEIVA, a developer may clone the source, add the desired DGE statistical test results, and make the SPA accessible through any web server. DEIVA was developed using Project χ, a modular open-source toolkit for building web and cross platform desktop data visualization applications. Project χ utilizes the AngularJS JavaScript framework, the D3js visualization library [14], and various node.js development tools. The resulting application is compatible with all modern web browsers (we tested with Chrome 51, Firefox 47, MS Edge, and Safari 9) and does not require any specific browser or server dependencies.

Results and Discussion

We have tested DEIVA with input files ranging from ~50,000 to ~90,000 features with various browsers and operating systems, and find it responsive at these typical file sizes. By default, the visualization will display a hexbin density plot of the differentially expressed values. The user may also switch to a scatter plot view. In general, the density plot has better performance and will result in a more responsive user experience, while the scatter plot displays full detail. All processing and visualization of the data occurs within the web browser or desktop application. When using a web server, the server is only responsible for sending the SPA code and the data for experiments that are pre-loaded in the given DEIVA instance. If other data is visualized by a user using the interface, the users data is not sent to any server but stays on the client side. The fact that DEIVA is a client-side SPA has several implications: DEIVA can be expected to scale to virtually any number of users and datasets. The fact that data provided by the user is not uploaded to a host server adds to the security of the system, which is important in the context of sensitive data, such as expression profiling of human patient samples. Performance will vary depending on the user's hardware and software combination. We find DEIVA responsive while providing several hundred datasets with over 90,000 features in each dataset. For datasets with considerably more features, server-based systems can be preferable, if the rendering of the visualization is done server-side.

Comparison of DEIVA with related software

There are other systems with varying scope and functionality available for the exploration and analysis of DGE statistical test results, most notably VisRseq [15], OASIS [9] and DEGUST [16]. We compare DEIVA directly with these systems in a feature matrix (Table 1). The following features are tabulated:

Table 1

Summary of competing tools

	Features							Dependencies
	locate	identify	MA-plot	Volcano plot	web-based	users data	FOSS license	browser	development	server
OASIS	O	O	X	O	O	X	O: LGPLv2	none	NA	NA
VisRseq	X	O	O	O	X	O	-^a	NA	Java, R	NA
DEGUST	O	O	O	X	O	O^c	O: GPL v3	none	bash, node.js^b	none^b
DEIVA	Δ	O	O	O	Δ	Δ	O: MIT	none	node.js	none^d

anot specified

bfor analysis back-end DEGUST requires R, Python, node, and Glasgow Haskell Compiler

crequires upload to server or custom deployment with analysis backend

dDEIVA can run on any HTTP server including WebDav, node HTTP server, python HTTP server, Apache

Δ State of the art

O Feature present

X Feature absent

Summary of competing tools anot specified bfor analysis back-end DEGUST requires R, Python, node, and Glasgow Haskell Compiler crequires upload to server or custom deployment with analysis backend dDEIVA can run on any HTTP server including WebDav, node HTTP server, python HTTP server, Apache Δ State of the art O Feature present X Feature absent locate: includes functionality to visually locate the position of the features of at least one symbol. identify: includes a functionality to identify at least one feature, or a group of features, on the plot. MA-plot: can render the DGE statistical test result as a MA-plot (i.e. a scatter plot of mean expression vs log fold change). Volcano plot: can render the DGE statistical test result as a volcano plot (p-value vs fold change). web-based: yes if the system is a web-based application, no if it is a client side application. users data: the user can visualize their own datasets. FOSS license: the system is available under a free and open source software license; the license is listed. dependencies: listing of browser, development, and server dependencies. We also examined GenePattern 2.0 [8]. However, the authors were unable to reproduce the volcano plots as described in the documentation [17] using the GenePattern public servers [18]. Another software in this context is iCanPlot [19], a generic library for generating interactive canvas based scatter plots. Canvas based scatter plots generated by iCanPlot provide excellent performance compared to SVG based scatter plots generated using D3 [14] (as implemented in DEIVA), however, iCanPlot generated plots lack some functionality we felt necessary for DEIVA. For example point-by-point inspection of features, high-contrast color highlighting of features, and download of vectorized images. Additionally, iCanPlot has no ability to generate density plots as is the default in DEIVA. It may be beneficial to implement some level of canvas based rendering in DEIVA, however, this should be done without sacrificing DEIVA's current functionality.

Conclusions

The feature matrix illustrates that none of the other comparable systems available has the combination of design decisions of DEIVA: a functionality to both locate as well as identify features in the visualization, emphasis on ease-of-use and ease-of-deployment, permissive free software license, no specific client or server dependencies, and the possibility to extend and integrate it with other systems.

Availability and requirements

Project name: DEIVA Project home page: https://github.com/Hypercubed/DEIVA Archived version: 1.0.0 (https://github.com/Hypercubed/DEIVA/releases/tag/v1.0.0) Operating system(s): Platform independent Programming language: JavaScript License: MIT

12 in total

1. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors: M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal: Nat Genet Date: 2000-05 Impact factor: 38.330

2. GenePattern 2.0.

Authors: Michael Reich; Ted Liefeld; Joshua Gould; Jim Lerner; Pablo Tamayo; Jill P Mesirov
Journal: Nat Genet Date: 2006-05 Impact factor: 38.330

3. 5' end-centered expression profiling using cap-analysis gene expression and next-generation sequencing.

Authors: Hazuki Takahashi; Timo Lassmann; Mitsuyoshi Murata; Piero Carninci
Journal: Nat Protoc Date: 2012-02-23 Impact factor: 13.491

4. A promoter-level mammalian expression atlas.

Authors: Alistair R R Forrest; Hideya Kawaji; Michael Rehli; J Kenneth Baillie; Michiel J L de Hoon; Vanja Haberle; Timo Lassmann; Ivan V Kulakovskiy; Marina Lizio; Masayoshi Itoh; Robin Andersson; Christopher J Mungall; Terrence F Meehan; Sebastian Schmeier; Nicolas Bertin; Mette Jørgensen; Emmanuel Dimont; Erik Arner; Christian Schmidl; Ulf Schaefer; Yulia A Medvedeva; Charles Plessy; Morana Vitezic; Jessica Severin; Colin A Semple; Yuri Ishizu; Robert S Young; Margherita Francescatto; Intikhab Alam; Davide Albanese; Gabriel M Altschuler; Takahiro Arakawa; John A C Archer; Peter Arner; Magda Babina; Sarah Rennie; Piotr J Balwierz; Anthony G Beckhouse; Swati Pradhan-Bhatt; Judith A Blake; Antje Blumenthal; Beatrice Bodega; Alessandro Bonetti; James Briggs; Frank Brombacher; A Maxwell Burroughs; Andrea Califano; Carlo V Cannistraci; Daniel Carbajo; Yun Chen; Marco Chierici; Yari Ciani; Hans C Clevers; Emiliano Dalla; Carrie A Davis; Michael Detmar; Alexander D Diehl; Taeko Dohi; Finn Drabløs; Albert S B Edge; Matthias Edinger; Karl Ekwall; Mitsuhiro Endoh; Hideki Enomoto; Michela Fagiolini; Lynsey Fairbairn; Hai Fang; Mary C Farach-Carson; Geoffrey J Faulkner; Alexander V Favorov; Malcolm E Fisher; Martin C Frith; Rie Fujita; Shiro Fukuda; Cesare Furlanello; Masaaki Furino; Jun-ichi Furusawa; Teunis B Geijtenbeek; Andrew P Gibson; Thomas Gingeras; Daniel Goldowitz; Julian Gough; Sven Guhl; Reto Guler; Stefano Gustincich; Thomas J Ha; Masahide Hamaguchi; Mitsuko Hara; Matthias Harbers; Jayson Harshbarger; Akira Hasegawa; Yuki Hasegawa; Takehiro Hashimoto; Meenhard Herlyn; Kelly J Hitchens; Shannan J Ho Sui; Oliver M Hofmann; Ilka Hoof; Furni Hori; Lukasz Huminiecki; Kei Iida; Tomokatsu Ikawa; Boris R Jankovic; Hui Jia; Anagha Joshi; Giuseppe Jurman; Bogumil Kaczkowski; Chieko Kai; Kaoru Kaida; Ai Kaiho; Kazuhiro Kajiyama; Mutsumi Kanamori-Katayama; Artem S Kasianov; Takeya Kasukawa; Shintaro Katayama; Sachi Kato; Shuji Kawaguchi; Hiroshi Kawamoto; Yuki I Kawamura; Tsugumi Kawashima; Judith S Kempfle; Tony J Kenna; Juha Kere; Levon M Khachigian; Toshio Kitamura; S Peter Klinken; Alan J Knox; Miki Kojima; Soichi Kojima; Naoto Kondo; Haruhiko Koseki; Shigeo Koyasu; Sarah Krampitz; Atsutaka Kubosaki; Andrew T Kwon; Jeroen F J Laros; Weonju Lee; Andreas Lennartsson; Kang Li; Berit Lilje; Leonard Lipovich; Alan Mackay-Sim; Ri-ichiroh Manabe; Jessica C Mar; Benoit Marchand; Anthony Mathelier; Niklas Mejhert; Alison Meynert; Yosuke Mizuno; David A de Lima Morais; Hiromasa Morikawa; Mitsuru Morimoto; Kazuyo Moro; Efthymios Motakis; Hozumi Motohashi; Christine L Mummery; Mitsuyoshi Murata; Sayaka Nagao-Sato; Yutaka Nakachi; Fumio Nakahara; Toshiyuki Nakamura; Yukio Nakamura; Kenichi Nakazato; Erik van Nimwegen; Noriko Ninomiya; Hiromi Nishiyori; Shohei Noma; Shohei Noma; Tadasuke Noazaki; Soichi Ogishima; Naganari Ohkura; Hiroko Ohimiya; Hiroshi Ohno; Mitsuhiro Ohshima; Mariko Okada-Hatakeyama; Yasushi Okazaki; Valerio Orlando; Dmitry A Ovchinnikov; Arnab Pain; Robert Passier; Margaret Patrikakis; Helena Persson; Silvano Piazza; James G D Prendergast; Owen J L Rackham; Jordan A Ramilowski; Mamoon Rashid; Timothy Ravasi; Patrizia Rizzu; Marco Roncador; Sugata Roy; Morten B Rye; Eri Saijyo; Antti Sajantila; Akiko Saka; Shimon Sakaguchi; Mizuho Sakai; Hiroki Sato; Suzana Savvi; Alka Saxena; Claudio Schneider; Erik A Schultes; Gundula G Schulze-Tanzil; Anita Schwegmann; Thierry Sengstag; Guojun Sheng; Hisashi Shimoji; Yishai Shimoni; Jay W Shin; Christophe Simon; Daisuke Sugiyama; Takaai Sugiyama; Masanori Suzuki; Naoko Suzuki; Rolf K Swoboda; Peter A C 't Hoen; Michihira Tagami; Naoko Takahashi; Jun Takai; Hiroshi Tanaka; Hideki Tatsukawa; Zuotian Tatum; Mark Thompson; Hiroo Toyodo; Tetsuro Toyoda; Elvind Valen; Marc van de Wetering; Linda M van den Berg; Roberto Verado; Dipti Vijayan; Ilya E Vorontsov; Wyeth W Wasserman; Shoko Watanabe; Christine A Wells; Louise N Winteringham; Ernst Wolvetang; Emily J Wood; Yoko Yamaguchi; Masayuki Yamamoto; Misako Yoneda; Yohei Yonekura; Shigehiro Yoshida; Susan E Zabierowski; Peter G Zhang; Xiaobei Zhao; Silvia Zucchelli; Kim M Summers; Harukazu Suzuki; Carsten O Daub; Jun Kawai; Peter Heutink; Winston Hide; Tom C Freeman; Boris Lenhard; Vladimir B Bajic; Martin S Taylor; Vsevolod J Makeev; Albin Sandelin; David A Hume; Piero Carninci; Yoshihide Hayashizaki
Journal: Nature Date: 2014-03-27 Impact factor: 49.962

Review 5. From RNA-seq reads to differential expression results.

Authors: Alicia Oshlack; Mark D Robinson; Matthew D Young
Journal: Genome Biol Date: 2010-12-22 Impact factor: 13.583

6. iCanPlot: visual exploration of high-throughput omics data using interactive Canvas plotting.

Authors: Amit U Sinha; Scott A Armstrong
Journal: PLoS One Date: 2012-02-29 Impact factor: 3.240

7. VisRseq: R-based visual framework for analysis of sequencing data.

Authors: Hamid Younesy; Torsten Möller; Matthew C Lorincz; Mohammad M Karimi; Steven J M Jones
Journal: BMC Bioinformatics Date: 2015-08-13 Impact factor: 3.169

8. Digital expression profiling of the compartmentalized translatome of Purkinje neurons.

Authors: Anton Kratz; Pascal Beguin; Megumi Kaneko; Takahiko Chimura; Ana Maria Suzuki; Atsuko Matsunaga; Sachi Kato; Nicolas Bertin; Timo Lassmann; Réjan Vigot; Piero Carninci; Charles Plessy; Thomas Launey
Journal: Genome Res Date: 2014-06-05 Impact factor: 9.043

9. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.

Authors: Michael I Love; Wolfgang Huber; Simon Anders
Journal: Genome Biol Date: 2014 Impact factor: 13.583

10. DEIVA: a web application for interactive visual analysis of differential gene expression profiles.

Authors: Jayson Harshbarger; Anton Kratz; Piero Carninci
Journal: BMC Genomics Date: 2017-01-07 Impact factor: 3.969

14 in total

1. anexVis: visual analytics framework for analysis of RNA expression.

Authors: Diem-Trang Tran; Tian Zhang; Ryan Stutsman; Matthew Might; Umesh R Desai; Balagurunathan Kuberan
Journal: Bioinformatics Date: 2018-07-15 Impact factor: 6.937

2. Inferring and analyzing gene regulatory networks from multi-factorial expression data: a complete and interactive suite.

Authors: Océane Cassan; Sophie Lèbre; Antoine Martin
Journal: BMC Genomics Date: 2021-05-26 Impact factor: 3.969

3. ExpressionDB: An open source platform for distributing genome-scale datasets.

Authors: Laura D Hughes; Scott A Lewis; Michael E Hughes
Journal: PLoS One Date: 2017-11-02 Impact factor: 3.240

4. DEIVA: a web application for interactive visual analysis of differential gene expression profiles.

Authors: Jayson Harshbarger; Anton Kratz; Piero Carninci
Journal: BMC Genomics Date: 2017-01-07 Impact factor: 3.969

5. iDEP: an integrated web application for differential expression and pathway analysis of RNA-Seq data.

Authors: Steven Xijin Ge; Eun Wo Son; Runan Yao
Journal: BMC Bioinformatics Date: 2018-12-19 Impact factor: 3.169

6. A New Machine Learning-Based Framework for Mapping Uncertainty Analysis in RNA-Seq Read Alignment and Gene Expression Estimation.

Authors: Adam McDermaid; Xin Chen; Yiran Zhang; Cankun Wang; Shaopeng Gu; Juan Xie; Qin Ma
Journal: Front Genet Date: 2018-08-14 Impact factor: 4.599

7. IRIS-EDA: An integrated RNA-Seq interpretation system for gene expression data analysis.

Authors: Brandon Monier; Adam McDermaid; Cankun Wang; Jing Zhao; Allison Miller; Anne Fennell; Qin Ma
Journal: PLoS Comput Biol Date: 2019-02-14 Impact factor: 4.475

8. iGEAK: an interactive gene expression analysis kit for seamless workflow using the R/shiny platform.

Authors: Kwangmin Choi; Nancy Ratner
Journal: BMC Genomics Date: 2019-03-06 Impact factor: 3.969

9. GENAVi: a shiny web application for gene expression normalization, analysis and visualization.

Authors: Alberto Luiz P Reyes; Tiago C Silva; Simon G Coetzee; Jasmine T Plummer; Brian D Davis; Stephanie Chen; Dennis J Hazelett; Kate Lawrenson; Benjamin P Berman; Simon A Gayther; Michelle R Jones
Journal: BMC Genomics Date: 2019-10-16 Impact factor: 3.969

10. bigPint: A Bioconductor visualization package that makes big data pint-sized.

Authors: Lindsay Rutter; Dianne Cook
Journal: PLoS Comput Biol Date: 2020-06-15 Impact factor: 4.475