| Literature DB >> 24109552 |
Peter J A Cock1, Björn A Grüning, Konrad Paszkiewicz, Leighton Pritchard.
Abstract
The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of "effector" proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen's predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu).Entities:
Keywords: Accessibility; Annotation; Effector proteins; Galaxy; Genomics; Pipeline; Reproducibility; Sequence analysis; Workflow
Year: 2013 PMID: 24109552 PMCID: PMC3792188 DOI: 10.7717/peerj.167
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Schematic of effector infiltration into a host plant cell.
Plant pests and pathogens introduce effector proteins into the host plant cell (green), where they can target and manipulate plant biochemistry to the benefit of the pathogen (Dodds & Rathjen, 2010). Effectors may be delivered by haustorial ingression from a fungus or oomycete such as P. infestans (orange), via the bacterial Type III secretion system mechanism (blue), by nematodes via injection into the plant cell though a needle-like stylet (red), or many other processes (not illustrated). Where effectors may be identified by sequence properties, candidate effector proteins can be computationally predicted using Galaxy.
Figure 2Screenshot of the Galaxy PSORTb v3.0 wrapper.
The left hand pane (A) holds a menu of tools which is configurable by the Galaxy administrator. The central pane (B) shows the currently selected tool or dataset, here PSORTb. The right hand pane (C) holds the current datasets, and is empty in this example. The tool interface presents the user with familiar drop down list controls (in this example a file selector and other parameters), option radio-selectors, check boxes, or text boxes as defined in the tool configuration file. For PSORTb, text input boxes are restricted to only accept numeric values. Tool input parameters are followed by a blue “Execute” button that runs the tool when clicked. Below this, user documentation and citation information are provided.
Summary of Galaxy tools, wrappers for existing tools, and sample workflows discussed in this manuscript.
Some Galaxy tools are new pieces of software written specifically for use within Galaxy, others are wrappers allowing an existing tool to be used within Galaxy. Most of the tools wrapped are freely available under an open source license, however those marked ⋆ are proprietary but free for academic use only, while † indicates free to download but with unspecified terms. Galaxy workflows are saved recipes or pipelines which automate running one or more Galaxy tools. See Materials and Methods for more details.
| Galaxy Tool Shed URL, description | Type | References |
|---|---|---|
|
| ||
| Standalone NCBI BLAST+ tools | Wrappers | |
|
| ||
|
| ||
| BLAST datatype definitions (BLAST XML, databases) | Datatypes | |
|
| ||
|
| ||
| BLAST top hit descriptions | New tool | |
|
| ||
| Blast2GO for pipelines (b2g4pipe) | Wrapper† | |
|
| ||
|
| ||
| MIRA assembler | Wrapper |
|
|
| ||
| Augustus, for eukaryotic gene finding | Wrapper |
|
|
| ||
| Glimmer3, for prokaryotic gene finding | Wrappers |
|
|
| ||
| RepeatMasker, for screening DNA sequences | Wrapper |
|
|
| ||
| InterProScan | Wrapper | |
|
| ||
|
| ||
| SignalP, for signal peptide prediction | Wrapper⋆ |
|
| TMHMM, for trans-membrane domain prediction | Wrapper⋆ |
|
| PSORTb, for bacterial/archaeal proteins | Wrapper |
|
| WoLF PSORT, for fungi/animal/plant proteins | Wrapper⋆ |
|
| Promoter, for eukaryotic PolII promoters | Wrapper⋆ |
|
| Oomycete RXLR motifs | New Tool | |
|
| ||
|
| ||
| PredictNLS, predict nuclear localization sequence | Rewrite |
|
|
| ||
| NLStradamus, a nuclear localization sequence predictor | Wrapper |
|
|
| ||
| NoD, nucleolar localization sequence detector | Wrapper† | |
|
| ||
| EffectiveT3, predicts bacterial type III secretion signals | Wrapper | |
|
| ||
| Venn Diagrams from (gene) identifier lists | New Tool | |
|
| ||
| Filter sequences by (gene) identifier | New Tool | |
|
| ||
| Select sequences by (gene) identifier | New Tool | |
|
| ||
| Rename identifiers in sequence files | New Tool | |
|
| ||
| FASTQ deinterlacer for paired reads | New Tool | |
|
| ||
| Open reading frame (ORF) and crude coding sequence (CDS) prediction | New Tool | |
|
| ||
| Glimmer gene calling with training-set | Workflow | |
|
| ||
| Secreted proteins using SignalP and THMHMM | Workflow |
|
|
| ||
| Venn Diagram comparison of oomycete RXLR predictions | Workflow |
|
Figure 3Screenshot from Galaxy workflow editor illustrating the Glimmer3 gene finding example discussed in “Basic assembly and gene calling”.
Figure 4Screenshot from Galaxy workflow editor illustrating the simple RXLR and Venn Diagram example discussed in “Simple comparison of RXLR predictions”.