| Literature DB >> 23936070 |
Kevin C Dorff1, Nyasha Chambwe, Zachary Zeno, Manuele Simi, Rita Shaknovich, Fabien Campagne.
Abstract
We present GobyWeb, a web-based system that facilitates the management and analysis of high-throughput sequencing (HTS) projects. The software provides integrated support for a broad set of HTS analyses and offers a simple plugin extension mechanism. Analyses currently supported include quantification of gene expression for messenger and small RNA sequencing, estimation of DNA methylation (i.e., reduced bisulfite sequencing and whole genome methyl-seq), or the detection of pathogens in sequenced data. In contrast to previous analysis pipelines developed for analysis of HTS data, GobyWeb requires significantly less storage space, runs analyses efficiently on a parallel grid, scales gracefully to process tens or hundreds of multi-gigabyte samples, yet can be used effectively by researchers who are comfortable using a web browser. We conducted performance evaluations of the software and found it to either outperform or have similar performance to analysis programs developed for specialized analyses of HTS data. We found that most biologists who took a one-hour GobyWeb training session were readily able to analyze RNA-Seq data with state of the art analysis tools. GobyWeb can be obtained at http://gobyweb.campagnelab.org and is freely available for non-commercial use. GobyWeb plugins are distributed in source code and licensed under the open source LGPL3 license to facilitate code inspection, reuse and independent extensions http://github.com/CampagneLaboratory/gobyweb2-plugins.Entities:
Mesh:
Year: 2013 PMID: 23936070 PMCID: PMC3720652 DOI: 10.1371/journal.pone.0069666
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1GobyWeb user interface menus.
Increasing numbers indicate the typical order in which a user would navigate the interface, from data upload (1) to download or sharing results with others (8). See Supporting information description for a detailed description of each step.
Figure 2Overview of flows of data in the GobyWeb system.
A typical project starts with upload of data files (in yellow, top left). Tasks that run on the compute grid are shown in red. Items of data represented in GobyWeb are shown in green and have dedicated web user interface views. Most views offer the option of downloading result files (in blue) in formats compatible with third-party software.
Figure 3Overview of the system architecture.
An installation of GobyWeb relies on three pieces of infrastructure. (a) The web front-end is deployed as a Java web application on one or more application server(s). Several servers can be used to scale the application up under heavy usage. (b) Meta-data about samples, alignments, analyses and users are stored persistently in a Database Management System (DBMS). (c) A compute grid is used to process large datasets efficiently. All datasets (reads, alignments, processed results) are stored as large files on local disks directly attached to each compute node, and the web application servers, as well as in a shared network file system. The software automatically performs data transfers between the shared file system and local storage disks and optimizes these transfers to maximize the overall analysis throughput of the system. The system relies on production quality software components (Apache web server, Tomcat application server, Oracle/JDBC DBMS, and Sun Grid Engine, Linux and Network File System) that are already available and used in many academic institutions.
Gene Expression Analysis performance.
| GobyWeb | Myrna | ||||
| System | BWA | GSNAP | System | Bowtie | |
|
| 3 | 10 | |||
|
| 72 | 80 | |||
|
| 15 m | 1 h15 m | |||
|
| 2 h22 m | 25 h20 m | 2 h56 m | ||
|
| 21 m | 80 m | |||
|
| 36 m | 142 m | 1520 m | 155 m | 176 m |
|
| 225 m | 1603 m | 331 m | ||
|
| $0.34 | $0.34 | |||
|
| $3.88 | $27.43 | $44.00 | ||
|
| $27,000 | $0 | |||
Wall clock times for analysis of 72 Pickrell et al 36 bp RNA-Seq samples.
Performance of spliced alignments with GNSAP and STAR.
| Aligner | GSNAP | STAR |
|
| 373 m | 78 m |
Alignments were performed with the GobyWeb and the GSNAP or STAR alignment plugin. One 50 bp single end RNA-Seq sample with about 43 million reads.
Figure 4Visualizing spliced RNA-Seq alignments done with GobyWeb and the GSNAP or STAR aligners.
This figure was constructed with the Integrative Genomics Viewer (IGV), which directly supports alignments in Goby format. Alignments in the Goby format are substantially smaller than in BAM format, and can be directly downloaded from GobyWeb for interactive visualization with IGV. The plot provides a visual comparison of spliced alignments generated with the GobyWeb GSNAP and STAR plugins over the LAD1 gene (human).
Pathogen detection performance.
| Viral organism detected | N | Comments |
| Human herpesvirus 4 type 1 (EBV) | 72 | The HapMap lymphocyte samples were transformed with EBV to yield individual lymphoblastoid cell lines. Detecting EBV in these samples is therefore expected. |
| Human herpesvirus 4 (EBV) | 72 | |
| Macacine herpesvirus 4 | 23 | Likely mis-detected because of close homology with the EBV virus (less than 10 viral contigs per sample are detected in a subset of samples). |
| Enterobacteria phage phi X 174 | 2 | Likely spike-in with Ilumina PhiX phage DNA. |
Pathogen detection took 53 m for the 72 Pickrell et al RNA-Seq samples. N: Number of samples where GobyWeb identified at least one viral contig from the specified organism. See tag: DNOAOZI.
DNA methylation analyses.
| GobyWeb | ||||||
| System | GSNAP | Bismark | Last | Analysis of individual cytosines | Analysis of annotated regions | |
|
| 78 m | |||||
|
| 14 h03 m | 4 h13 m | 2 h33 m | |||
|
| 17 m | 5 m | ||||
|
| 860 m | 975 m | 170 m | |||
We analyzed 6 RRBS samples organized in two groups to detect differentially methylated regions and bases.
Figure 5Scalable Table Views.
GobyWeb offers web-based table views that scale to support tables of results with hundred of millions of rows. Users can subset the table to keep specific columns, as well as rows that match complex filters on column values. This mechanism makes it possible for end-users to work with very large tables and download only interesting subsets of the data, even over slow Internet connections. In this snapshot, the table viewer displays results from a base-level methylation analysis (tag = RQLDONK). The panel “Filtered list of elements” displays the current view of the table. The panel at the bottom makes it possible for end-users to select which subset of columns they need to visualize/download. The filters help users identify columns by keyword. Text boxes under each column are used to enter filtering criteria on the specific column.
Figure 6DNA methylation data analyzed with GobyWeb and visualized with IGV.
GobyWeb produces data files in formats directly supported by the Integrative Genomics Viewer (IGV). This figure presents the results of methylation analysis over regions and individual bases for the Dnmt public datasets [17]. The bottom insert shows a smaller region with more details of the methylation rate at individual bases. Three rows per strand are shown, corresponding to 3 control and 3 induced samples. Integration with IGV makes it possible to visualize DNA methylation rates alongside other types of annotations or data types supported by IGV. The genomic region shown was selected among the regions that show one of the smaller p-values when comparing the control and induced group (empirical p-value, GobyWeb).
Comparison of GobyWeb with other NGS Analysis Systems.
| System | Efficient Use of Storage capacity | HTS data assays supported | Flexibility | Automatically installs dependencies: | HTS data/General purpose system | Parallel analyses on compute grid | |
| software | data | ||||||
| Anduril | ++ | Varied, depends on available workflows | +++++(5) | Y | Some | General purpose | N |
| Galaxy | +(1) | RNASeq pipelines provided | +++ (6) | Y | Y | General purpose | Limited (11) |
| GenePattern | ++ (2) | Varied, depends on available workflows | +++ | Some (8) | N (9) | General purpose | N (12) |
| GobyWeb (this report) | +++ (3) | RNA-Seq, DNA methylation, DNA-Seq | ++ (7) | Y | Y | HTS data | Y (13) |
| MyRna | ++ (4) | RNA-Seq | + | Y | N | HTS data | Y (14) |
| Taverna | ++ | Varied, depends on available workflows | +++ (6) | N/A | Y (10) | General purpose | Y (15) |
Notes: (1) Supports text files.
(2) Supports any file format.
(3) Supports Goby compressed file formats.
(4) Supports BAM.
(5) Provides a scripting language specialized for pipeline development.
(6) Pipelines can be constructed by connecting components graphically.
(7) Pipelines are optimized for analysis of large HTS datasets.
(8) For example, installs Perl on Windows, but not on other platforms.
(9) Data often needs to be installed on execution server.
(10) Relies on remote web services.
(11) Very few file formats support parallelization.
(12) Runs analyses on a single server.
(13) All analyses are fully parallel.
(14) Deploys to Amazon AWS for Hadoop.
(15) Some extensions support execution on grid.