| Literature DB >> 27137889 |
Enis Afgan1, Dannon Baker1, Marius van den Beek2, Daniel Blankenberg3, Dave Bouvier3, Martin Čech3, John Chilton3, Dave Clements1, Nate Coraor3, Carl Eberhard1, Björn Grüning4, Aysam Guerler1, Jennifer Hillman-Jackson3, Greg Von Kuster5, Eric Rasche6, Nicola Soranzo7, Nitesh Turaga1, James Taylor8, Anton Nekrutenko9, Jeremy Goecks10.
Abstract
High-throughput data production technologies, particularly 'next-generation' DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible, transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale.Entities:
Mesh:
Year: 2016 PMID: 27137889 PMCID: PMC4987906 DOI: 10.1093/nar/gkw343
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Galaxy analysis interface consisting of tool menu (left pane), tool interface (center pane), history (right pane).
Figure 2.Galaxy's graphical workflow editor, show part of a sample workflow.
Figure 3.Multi-history viewer. Datasets can be copied among histories by dragging. Here one can also see the results of dynamic search functionality: in history PNAS 2014 the search bar contains a partial keyword, Mark, causing the history to refresh and to show only datasets produced by the tool MarkDuplicates. As this particular history is very large (thousands of items) this functionality greatly simplifies analyses.
Figure 4.(A) Dataset collections simplify analysis of large numbers of files. A Galaxy history with a paired-end DNA re-sequencing dataset from 28 individuals contains 56 files (each green box is a file). It is difficult to understand this history because there are so many files and because forward (R1) and reverse (R2) reads are unordered. As these files are analyzed and more outputs/files are created, it becomes very difficult to navigate around the history and understand how files are connected as inputs and outputs of particular tools or analyses. Dataset collections make analysis of this mix of files straightforward by grouping all files into a collection that can be analyzed as a single unit. This example demonstrates using collections with paired end data, but collections can be created for any set of files. (B) Creation of a paired collection from the history shown in panel A. Because dataset names use a uniform nomenclature for forward and reverse reads, the collection creation form can automatically determine pairings. (C) Pairing these datasets generates a single item (a Collection) in Galaxy's history. (D) Clicking on this newly created Collection expands it and shows its content (only first three datasets are shown). (E) Galaxy's BWA interface takes the entire dataset collection as a single input.
Figure 5.Selection panel for Galaxy numerical visualizations showing the variety of plots that can be created.