Literature DB >> 24823498

RNAbrowse: RNA-Seq de novo assembly results browser.

Jérôme Mariette¹, Céline Noirot¹, Ibounyamine Nabihoudine¹, Philippe Bardou², Claire Hoede¹, Anis Djari², Cédric Cabau², Christophe Klopp³.

Abstract

Transcriptome analysis based on a de novo assembly of next generation RNA sequences is now performed routinely in many laboratories. The generated results, including contig sequences, quantification figures, functional annotations and variation discovery outputs are usually bulky and quite diverse. This article presents a user oriented storage and visualisation environment permitting to explore the data in a top-down manner, going from general graphical views to all possible details. The software package is based on biomart, easy to install and populate with local data. The software package is available under the GNU General Public License (GPL) at http://bioinfo.genotoul.fr/RNAbrowse.

Entities: Gene

Mesh：

Year: 2014 PMID： 24823498 PMCID： PMC4019526 DOI： 10.1371/journal.pone.0096821

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

The massive sequencing cost decrease has attracted a large community of new users, some of them studying organisms for which the reference genome sequence is still not available. When trying to understand mechanisms taking place at the gene level they usually would start with a de novo transcriptome assembly approach. Software packages such as Trinity [1] or Oases [2] are mature enough to produce reliable contigs from short reads. The analyses performed on the assembled contigs generate a large amount of heterogeneous results including variations, functional annotation and expression measurements. The processing steps of these pipelines are usually shared inside the community but the parameters, the tools and the reference databases used are specific. The results are often provided throught a WEB server including a BLAST query form and download links. Cbrowse [3] is a WEB environment presenting this kind of results throught graphical views and query forms. The functional annotation part is not implemented yet and the query possibilities are very limited. The Galaxy [4] engine provides users with an interface to create and track workflow executions. It already embeds RNA-Seq analysis and assembly components. However, none of them offers user-friendly query and vizualisation features designed for RNA-Seq de novo annotation. In its last version (0.8), biomart [5] is developed as an easily extensible query infrastructures which can be specialized in the presentation of focused data types. On top of the database and beside the proposed query forms it is possible to add new pages as plug-ins in order to present data in a user-friendly way.

Results

RNAbrowse permits sequencing facilities and, even small, bioinformatic teams to give a user-friendly access to RNA-Seq de novo results, helping biologists to analyse and extract meaningful information from their data. RNAbrowse includes two components: a web-based user interface and an administration command line tool presented here-after.

Implementation and installation

RNAbrowse is based on a newly defined biomart schema specialized in RNA-Seq de novo assembly. It includes two marts, storing contig and variant data, and some extra tables for meta-data. The WEB-based user interface is implemented as a biomart plugin using the latest jquery and highcharts features. To perform all third parts requests, the software takes advantage of the biomart API and implements several biomart processors to prepare the data before presenting it. The command line administration tool is based on a workflow execution environment called jflow. It is developed in Python. The installation requires a standard unix server and a MySQL database. The downloadable installation archive, provided on the project web-site, has only to be uncompressed before setting up a first instance. In order to ease the testing, a minimal and a complete example dataset are included in the archive.

WEB interface

The web interface was designed to display the data starting with general views before zooming into detailed ones. It is organised in four levels. The first, called the instance level, corresponds to the home page of the web-site presenting all the available projects. A project gathers the data of an assembly. The second level is entered after having chosen a project. The introduction page contains a picture of the assembled species, a description text and a table including all the analysis processing steps with software names, parameters and versions in order to ease materials and methods writing. The third level is accessed through the menu bar items at the top of the project page and presents general informations about contigs and variants. It also includes the download page. The contigs (Figure 1) and variants (Figure 2) overview pages include a set of graphics showing general statistics, containing for example the contig length histogram, alignment rate bar-chart, mostly represented species in the functional annotation histogram or variant types distribution as a pie-chart. These graphs enable rapid assembly validation and comparison. For contigs, the next section includes detailed information about the different sequenced libraries and provides access to tools such as Venn diagrams (Figure 3) and a differential digital display. The last section corresponds to the favourite contigs or variants table which can be updated by the user while exploring detailed views. The forth level sections gathers all information on a contig or a variant and provides multiple tools to analyse them further.

Figure 1

Contigs overview figures.

From the main contig page, graphics synthesizing informations on the contig set can be displayed. The presented graphs are contigs depth versus size plot (top-left), contigs length distribution (top-right), contig best annotations pie chart (bottom-left) and libraries mapping bar chart (bottom-right). All graphics can be printed or downloaded in four different formats (PNG, JPEG, PDF, SVG).

Figure 2

Variants overview figures.

From the main variant page, graphics synthesizing informations on the variant set can be displayed. The presented graphs are InDels size distribution (top) and Indels annotations pie chart (bottom).

Figure 3

The Venn diagram shows the number of contigs shared between libraries and the specific ones.

Contigs overview figures.

Variants overview figures.

From the main variant page, graphics synthesizing informations on the variant set can be displayed. The presented graphs are InDels size distribution (top) and Indels annotations pie chart (bottom).

The Venn diagram shows the number of contigs shared between libraries and the specific ones.

It has been built using libraries alignment data. To build a new diagram the user has to select the libraries he wants to have in each pool (from two to five). If the user clicks on a figure in the graph the list of corresponding contigs will appear in the list box on the right hand side. Clicking on the contig names redirects to the corresponding page. To illustrate these features we could imagine a simple use case in which a user would like to find all contigs corresponding to a gene for which the sequence is available for another species. This can be done by an alignment versus the contigs using the blast query form (Figure 4) or by a name or description search using the biomart form. The user can then add the found contigs to the favourite table. For each contig, the sequence can be extracted to perform a multiple alignment in order to check if different splice forms have been assembled. All possible open reading frames can be sought (Figure 5). Annotations can be graphically displayed in jbrowse [6]. It is also possible to graphically verify if the expression levels along the contig are conserved between libraries (Figure 6).

Figure 4

The blast query form allows to search the contigs as a database using as input sequence(s) in Fasta format.

Others parameters which can be used in this query are blast program (blastn, blastx), expected value and maximum number of outputs to be shown. The blast results are shown in a table allowing to add new contigs as favorite. Ticking the alignment checkbox enables to browse the alignment results.

Figure 5

The sequence view provides informations such as nucleotide content and longest open reading frame (ORF).

The possible starts are presented in green, stops in red and ORFs in blue. The query form permits different actions on the sequence such as extraction, reverse complementation, translation in different frames, ORF presentation and text search.

Figure 6

The contig depth view enables to visualise the coverage of the reads of the different libraries on a given contig.

The blast query form allows to search the contigs as a database using as input sequence(s) in Fasta format.

The sequence view provides informations such as nucleotide content and longest open reading frame (ORF).

The contig depth view enables to visualise the coverage of the reads of the different libraries on a given contig.

Each library has a defined colour in the table and the same one on the graphic. It is possible to modify the graphical layout by averaging different library depths. This is typically to be used when you work with replicates. All generated graphics can directly be printed from the web-page or downloaded in JPG, PNG, PDF, SVG formats or as a CSV file. Tables can be sorted on all columns, text searched and exported to the clipboard or as CSV files. The download page files are organised as topics. It is possible to download a single file, all the files of a topic or to get the text file containing all the URLs of a topic in order to download the files from the command line. The Download section gathers raw files but also processed ones, enabling users to perform further analyses. A demo site containing a small set of results is accessible at: http://ngspipelines.toulouse.inra.fr:9012/.

Administration tools and extensions

The administration interface is a command line tool with which the environment can be set up. It includes database creation, data formatting and loading and web server managing. The loading process requiers standard file formats, provided by most commonly used tools. During the database upload phase, datafile format compliance is checked. It is possible to set up a minimum environment with just three files: a contig Fasta file, the corresponding annotation file and an alignment file. The reads quantifications can be provided as an input file or calculated during the upload process. The number of input files being potentially quite important for large projects, a configuration file template is provided. It can be used as a loading script parameter. Depending on the provided inputs, RNAbrowse processing steps can be quite time consuming. Thus, in a production environment, the tool can be set up to use different schedulers such as condor, sge, moab, workqueue or mpi-queuein in order to parallelize the loading. If the loading process fails, a recovery command is available to rerun it. The website provided to the biologists can be secured using the biomart Open-ID or jetty realm features. Users with some programming knowledge can add new graphics by loading the corresponding data in the analysis table and writing a javascript module.

Discussion

RNAbrowse is a simple and efficient solution to give access to RNA-Seq de novo results on the Internet. It includes many features that help biologists to analyse and extract biologically meaningful information from their data. The installation and user manuals as well as a general documentation are available on the project website: http://bioinfo.genotoul.fr/RNAbrowse. The software has been designed to be easily extended by developpers having some biomart insights.

6 in total

1. Visualizing next-generation sequencing data with JBrowse.

Authors: Oscar Westesson; Mitchell Skinner; Ian Holmes
Journal: Brief Bioinform Date: 2012-03-12 Impact factor: 11.622

2. CBrowse: a SAM/BAM-based contig browser for transcriptome assembly visualization and analysis.

Authors: Pei Li; Guoli Ji; Min Dong; Emily Schmidt; Douglas Lenox; Liangliang Chen; Qi Liu; Lin Liu; Jie Zhang; Chun Liang
Journal: Bioinformatics Date: 2012-07-12 Impact factor: 6.937

3. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels.

Authors: Marcel H Schulz; Daniel R Zerbino; Martin Vingron; Ewan Birney
Journal: Bioinformatics Date: 2012-02-24 Impact factor: 6.937

4. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.

Authors: Jeremy Goecks; Anton Nekrutenko; James Taylor
Journal: Genome Biol Date: 2010-08-25 Impact factor: 13.583

5. BioMart: a data federation framework for large collaborative projects.

Authors: Junjun Zhang; Syed Haider; Joachim Baran; Anthony Cros; Jonathan M Guberman; Jack Hsu; Yong Liang; Long Yao; Arek Kasprzyk
Journal: Database (Oxford) Date: 2011-09-19 Impact factor: 3.451

6. Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Authors: Manfred G Grabherr; Brian J Haas; Moran Yassour; Joshua Z Levin; Dawn A Thompson; Ido Amit; Xian Adiconis; Lin Fan; Raktima Raychowdhury; Qiandong Zeng; Zehua Chen; Evan Mauceli; Nir Hacohen; Andreas Gnirke; Nicholas Rhind; Federica di Palma; Bruce W Birren; Chad Nusbaum; Kerstin Lindblad-Toh; Nir Friedman; Aviv Regev
Journal: Nat Biotechnol Date: 2011-05-15 Impact factor: 54.908

6 in total

14 in total

1. De novo transcriptome sequencing and analysis of freshwater snail (Radix balthica) to discover genes and pathways affected by exposure to oxazepam.

Authors: Jean-Yves Mazzitelli; Elsa Bonnafe; Christophe Klopp; Frédéric Escudier; Florence Geret
Journal: Ecotoxicology Date: 2016-12-15 Impact factor: 2.823

2. Transcriptomic responses of the endangered freshwater mussel Margaritifera margaritifera to trace metal contamination in the Dronne River, France.

Authors: Anthony Bertucci; Fabien Pierron; Julien Thébault; Christophe Klopp; Julie Bellec; Patrice Gonzalez; Magalie Baudrimont
Journal: Environ Sci Pollut Res Int Date: 2017-09-30 Impact factor: 4.223

3. Diversity of Harbinger-like Transposons in Teleost Fish Genomes.

Authors: Ema Etchegaray; Corentin Dechaud; Jérémy Barbier; Magali Naville; Jean-Nicolas Volff
Journal: Animals (Basel) Date: 2022-05-31 Impact factor: 3.231

4. The BioMart community portal: an innovative alternative to large, centralized data repositories.

Authors: Damian Smedley; Syed Haider; Steffen Durinck; Luca Pandini; Paolo Provero; James Allen; Olivier Arnaiz; Mohammad Hamza Awedh; Richard Baldock; Giulia Barbiera; Philippe Bardou; Tim Beck; Andrew Blake; Merideth Bonierbale; Anthony J Brookes; Gabriele Bucci; Iwan Buetti; Sarah Burge; Cédric Cabau; Joseph W Carlson; Claude Chelala; Charalambos Chrysostomou; Davide Cittaro; Olivier Collin; Raul Cordova; Rosalind J Cutts; Erik Dassi; Alex Di Genova; Anis Djari; Anthony Esposito; Heather Estrella; Eduardo Eyras; Julio Fernandez-Banet; Simon Forbes; Robert C Free; Takatomo Fujisawa; Emanuela Gadaleta; Jose M Garcia-Manteiga; David Goodstein; Kristian Gray; José Afonso Guerra-Assunção; Bernard Haggarty; Dong-Jin Han; Byung Woo Han; Todd Harris; Jayson Harshbarger; Robert K Hastings; Richard D Hayes; Claire Hoede; Shen Hu; Zhi-Liang Hu; Lucie Hutchins; Zhengyan Kan; Hideya Kawaji; Aminah Keliet; Arnaud Kerhornou; Sunghoon Kim; Rhoda Kinsella; Christophe Klopp; Lei Kong; Daniel Lawson; Dejan Lazarevic; Ji-Hyun Lee; Thomas Letellier; Chuan-Yun Li; Pietro Lio; Chu-Jun Liu; Jie Luo; Alejandro Maass; Jerome Mariette; Thomas Maurel; Stefania Merella; Azza Mostafa Mohamed; Francois Moreews; Ibounyamine Nabihoudine; Nelson Ndegwa; Céline Noirot; Cristian Perez-Llamas; Michael Primig; Alessandro Quattrone; Hadi Quesneville; Davide Rambaldi; James Reecy; Michela Riba; Steven Rosanoff; Amna Ali Saddiq; Elisa Salas; Olivier Sallou; Rebecca Shepherd; Reinhard Simon; Linda Sperling; William Spooner; Daniel M Staines; Delphine Steinbach; Kevin Stone; Elia Stupka; Jon W Teague; Abu Z Dayem Ullah; Jun Wang; Doreen Ware; Marie Wong-Erasmus; Ken Youens-Clark; Amonida Zadissa; Shi-Jian Zhang; Arek Kasprzyk
Journal: Nucleic Acids Res Date: 2015-04-20 Impact factor: 16.971

5. GigaTON: an extensive publicly searchable database providing a new reference transcriptome in the pacific oyster Crassostrea gigas.

Authors: Guillaume Riviere; Christophe Klopp; Nabihoudine Ibouniyamine; Arnaud Huvet; Pierre Boudry; Pascal Favrel
Journal: BMC Bioinformatics Date: 2015-12-02 Impact factor: 3.169

6. Root transcriptomic responses of grafted grapevines to heterogeneous nitrogen availability depend on rootstock genotype.

Authors: Noé Cochetel; Frédéric Escudié; Sarah Jane Cookson; Zhanwu Dai; Philippe Vivin; Pierre-François Bert; Mindy Stephania Muñoz; Serge Delrot; Christophe Klopp; Nathalie Ollat; Virginie Lauvergeat
Journal: J Exp Bot Date: 2017-07-10 Impact factor: 6.992

7. Jflow: a workflow management system for web applications.

Authors: Jérôme Mariette; Frédéric Escudié; Philippe Bardou; Ibouniyamine Nabihoudine; Céline Noirot; Marie-Stéphane Trotard; Christine Gaspin; Christophe Klopp
Journal: Bioinformatics Date: 2015-10-10 Impact factor: 6.937

8. jvenn: an interactive Venn diagram viewer.

Authors: Philippe Bardou; Jérôme Mariette; Frédéric Escudié; Christophe Djemiel; Christophe Klopp
Journal: BMC Bioinformatics Date: 2014-08-29 Impact factor: 3.169

9. Gene evolution and gene expression after whole genome duplication in fish: the PhyloFish database.

Authors: Jeremy Pasquier; Cédric Cabau; Thaovi Nguyen; Elodie Jouanno; Dany Severac; Ingo Braasch; Laurent Journot; Pierre Pontarotti; Christophe Klopp; John H Postlethwait; Yann Guiguen; Julien Bobe
Journal: BMC Genomics Date: 2016-05-18 Impact factor: 3.969

Review 10. Assembly, Assessment, and Availability of De novo Generated Eukaryotic Transcriptomes.

Authors: Joanna Moreton; Abril Izquierdo; Richard D Emes
Journal: Front Genet Date: 2016-01-11 Impact factor: 4.599