Literature DB >> 28158604

Pathway Inspector: a pathway based web application for RNAseq analysis of model and non-model organisms.

Luca Bianco¹, Samantha Riccadonna¹, Enrico Lavezzo², Marco Falda², Elide Formentin³, Duccio Cavalieri¹, Stefano Toppo², Paolo Fontana¹.

Abstract

SUMMARY: Pathway Inspector is an easy-to-use web application helping researchers to find patterns of expression in complex RNAseq experiments. The tool combines two standard approaches for RNAseq analysis: the identification of differentially expressed genes and a topology-based analysis of enriched pathways. Pathway Inspector is equipped with ad hoc interactive graphical interfaces simplifying the discovery of modulated pathways and the integration of the differentially expressed genes in the corresponding pathway topology.
AVAILABILITY AND IMPLEMENTATION: Pathway Inspector is available at the website http://admiral.fmach.it/PI and has been developed in Python, making use of the Django Web Framework. CONTACT: Contact:paolo.fontana@fmach.it

Entities: Chemical Species

Mesh：

Year: 2017 PMID： 28158604 PMCID： PMC5408796 DOI： 10.1093/bioinformatics/btw636

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Functional analysis of high-throughput data is a crucial step in systems biology studies. To fully exploit the information encoded in complex biological systems, data analysis tools should be able to combine omics experiments and background knowledge to describe the interactions among proteins and small molecules and to improve the limited knowledge available on gene functions. The analysis of differentially expressed genes between diverse biological conditions is usually the starting point for finding relevant patterns for a given phenotype. Modeling the biological information as pathways of interacting elements can bring the analysis and interpretation of results one step further. Early enrichment approaches rely only on the number of genes or on their co-expression to identify significant pathways (Khatri ). The most recent approaches, like SPIA (Khatri ) or graphite (Sales ), exploit also the pathway topology. Pathway Inspector (PI) provides a fully automated enrichment analysis through the Differential Expression Analysis of Pathways (DEAP) algorithm (Haynes ) run on pathway maps built with rules similar to graphite’s. Compared to other topology-based enrichment algorithms, DEAP also identifies the path within a pathway, which is differentially modulated in the tested conditions. The results are summarized through graphic interfaces highlighting the modulated genes and pathways in the different experimental conditions and the interactions among them. The main novelty of PI is the possibility to analyze RNAseq data obtained by any species: in fact users can annotate the genes of interest against the KEGG orthologs (Kanehisa ) and then upload the annotations to perform the analysis. To the best of our knowledge, only PaintOmics (García-Alcalde ) is providing a similar pathway-based representation, but it is more focused on the integration of several omics sources through KEGG maps and does not support the upload of custom annotations nor topology-based enrichment algorithms. Finally, PI always runs on an updated knowledgebase: it exploits REST interfaces to retrieve on-the-fly pathway data from KEGG and identifiers’ mapping from bioDBnet (Mudunuri ). This feature is particularly relevant given that popular tools often rely on old knowledgebases (http://dx.doi.org/10.1101/049288).

2 Design and implementation

PI is designed to perform several comparisons in one run, such as samples coming from different tissues in two conditions (e.g. roots, leaves and/or plant body in optimal conditions versus water stress). PI has two main modules, allowing users to easily perform differential expression analyses both at the gene level and at the pathway level with simple experimental designs. The tool handles only two-classes balanced comparisons (i.e. with the same number of biological replicates per class), but in principle both modules may be extended to handle more complex designs. Read counts are processed using the edgeR bioconductor package (Anders ): weakly expressed genes are removed, read counts are normalized, dispersion values are estimated before fitting a generalized linear model to each feature, and finally a likelihood ratio test is performed to identify the differentially expressed genes (DEGs), according to the threshold set by the user (more details are available in the online documentation). At the pathway level, DEAP is used to identify enriched pathways (EPs) and their modulated components (modulated sub-pathway genes, MPGs) (Haynes ). Pathway maps are dynamically downloaded from KEGG according to the user-selected organism, while the identifiers of the input gene list (extracted from the count data file) are converted using the web services of bioDBnet (Mudunuri ). Several gene ID formats are accepted, such as Ensembl, Gene Symbol or GI Number, and they are then mapped onto the KEGG pathways. If the organism is not present in KEGG, users can perform the functional annotation using dedicated KEGG tools (Kanehisa b) and upload the resulting annotation file to PI. KEGG pathway maps are used for the reconstruction of gene networks. Since a node in the pathway map may correspond to more than one gene product, protein complexes are expanded in gene groups forming cliques as in (Sales ). Finally, nodes interacting through a chemical compound are directly connected. The resulting graphs are passed to DEAP together with normalized count data. DEAP was originally developed for microarray data, thus read counts are transformed to log2-counts per million using the R package voom (Law ). The list of EPs is built using few parameters, such as the p-value, set by the users (please refer to the online documentation). For each comparison, the results of the differential analysis are reported in the form of a browsable network both at gene and pathway level, but the whole information is also available in several tables as outlined in Figure 1. Thanks to the information encoded in KEGG, the user can explore and follow links among pathways within the network of all EPs and, starting from a selected node, can browse the subnetwork of directly connected pathways. This pathway-centric view of the results is integrated with the gene-based differential analysis, which provides the list of involved genes together with the statistical information of the likelihood test. Moreover, users can easily switch to the gene-based network view corresponding to the MPGs subset, keeping track of the involved pathways. The PI interface allows users to explore their results, check the overlapping between the two analysis levels, and zoom in and out the more interesting parts of the analyzed biological system following the information flux. The global picture of the system is completed by a representation of the MPGs on KEGG maps (colors indicate the log fold change as computed by edgeR), providing also a link to the protein annotation in the public databases. PI provides a unique global view on multiple comparisons that are ‘simultaneously’ run: a unifying table of all the enriched pathways for each comparison, colored using the pathway enrichment significance level (DEAP P-value).

Fig. 1

Schema of the performed analysis output (Color version of this figure is available at Bioinformatics online.)

Schema of the performed analysis output (Color version of this figure is available at Bioinformatics online.) PI is a scalable and parallelized web application, developed in Python exploiting the Django web framework and the d3 Javascript libraries, whereas CytoscapeJS is used for interactive result visualization (http://js.cytoscape.org). The backend is based on Celery, a task queue/job system distributing the computation on different servers (http://www.celeryproject.org/). The differential expression analysis is implemented in R, while DEAP is written in python. KEGG KGML files are downloaded and parsed automatically through the bioservices python libraries (https://pypi.python.org/pypi/bioservices/).

3 Conclusions

Expression studies are a common approach to investigate transcriptional activities in order to identify modulated genes organized in biological pathways. In case of non-model organisms this approach may be hampered by the difficulty of annotating genes into known pathways to identify EPs. To the best of our knowledge, tools accepting custom annotation to perform pathway enrichment are missing. The main novelty of PI is the possibility to easily discover modulated pathways for every sequenced organism, even if it is not present in the KEGG database. Users can annotate genes using the KEGG web services, eventually integrating the annotations with their knowledge. PI provides users with an easy to use one stop shop for their data analysis that requires limited user intervention (simple tabulated text files and analysis parameters) as input.

Funding

The Autonomous Province of Trento. Conflict of Interest: none declared.

9 in total

1. bioDBnet: the biological database network.

Authors: Uma Mudunuri; Anney Che; Ming Yi; Robert M Stephens
Journal: Bioinformatics Date: 2009-01-07 Impact factor: 6.937

2. Count-based differential expression analysis of RNA sequencing data using R and Bioconductor.

Authors: Simon Anders; Davis J McCarthy; Yunshun Chen; Michal Okoniewski; Gordon K Smyth; Wolfgang Huber; Mark D Robinson
Journal: Nat Protoc Date: 2013-08-22 Impact factor: 13.491

Review 3. BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences.

Authors: Minoru Kanehisa; Yoko Sato; Kanae Morishima
Journal: J Mol Biol Date: 2015-11-14 Impact factor: 5.469

4. Paintomics: a web based tool for the joint visualization of transcriptomics and metabolomics data.

Authors: Fernando García-Alcalde; Federico García-López; Joaquín Dopazo; Ana Conesa
Journal: Bioinformatics Date: 2010-11-23 Impact factor: 6.937

5. graphite - a Bioconductor package to convert pathway topology to gene network.

Authors: Gabriele Sales; Enrica Calura; Duccio Cavalieri; Chiara Romualdi
Journal: BMC Bioinformatics Date: 2012-01-31 Impact factor: 3.169

Review 6. Ten years of pathway analysis: current approaches and outstanding challenges.

Authors: Purvesh Khatri; Marina Sirota; Atul J Butte
Journal: PLoS Comput Biol Date: 2012-02-23 Impact factor: 4.475

7. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.

Authors: Charity W Law; Yunshun Chen; Wei Shi; Gordon K Smyth
Journal: Genome Biol Date: 2014-02-03 Impact factor: 13.583

8. KEGG as a reference resource for gene and protein annotation.

Authors: Minoru Kanehisa; Yoko Sato; Masayuki Kawashima; Miho Furumichi; Mao Tanabe
Journal: Nucleic Acids Res Date: 2015-10-17 Impact factor: 16.971

9. Differential expression analysis for pathways.

Authors: Winston A Haynes; Roger Higdon; Larissa Stanberry; Dwayne Collins; Eugene Kolker
Journal: PLoS Comput Biol Date: 2013-03-14 Impact factor: 4.475

9 in total

2 in total

1. BRCA-Pathway: a structural integration and visualization system of TCGA breast cancer data on KEGG pathways.

Authors: Inyoung Kim; Saemi Choi; Sun Kim
Journal: BMC Bioinformatics Date: 2018-02-19 Impact factor: 3.169

2. Transcriptome and Cell Physiological Analyses in Different Rice Cultivars Provide New Insights Into Adaptive and Salinity Stress Responses.

Authors: Elide Formentin; Cristina Sudiro; Giorgio Perin; Samantha Riccadonna; Elisabetta Barizza; Elena Baldoni; Enrico Lavezzo; Piergiorgio Stevanato; Gian Attilio Sacchi; Paolo Fontana; Stefano Toppo; Tomas Morosinotto; Michela Zottini; Fiorella Lo Schiavo
Journal: Front Plant Sci Date: 2018-03-05 Impact factor: 5.753

2 in total