| Literature DB >> 21791102 |
Ivo D Dinov1, Federica Torri, Fabio Macciardi, Petros Petrosyan, Zhizhong Liu, Alen Zamanyan, Paul Eggert, Jonathan Pierce, Alex Genco, James A Knowles, Andrew P Clark, John D Van Horn, Joseph Ames, Carl Kesselman, Arthur W Toga.
Abstract
BACKGROUND: Contemporary informatics and genomics research require efficient, flexible and robust management of large heterogeneous data, advanced computational tools, powerful visualization, reliable hardware infrastructure, interoperability of computational resources, and detailed data and analysis-protocol provenance. The Pipeline is a client-server distributed computational environment that facilitates the visual graphical construction, execution, monitoring, validation and dissemination of advanced data analysis protocols.Entities:
Mesh:
Year: 2011 PMID: 21791102 PMCID: PMC3199760 DOI: 10.1186/1471-2105-12-304
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Comparison of common graphical workflow environments
| Workflow Environment | Requires Tool Recompiling | Data Storage | Platform Independent | Client-Server Model | Grid Enabled | Application Area | URL |
|---|---|---|---|---|---|---|---|
| LONI Pipeline [ | N | External | Y | Y | Y | Area agnostic | |
| Taverna [ | Y | Internal (MIR) | Y | N | Y | Bioinformatics | |
| Kepler [ | Y | Internal (actors) | Y | N | Y | Area agnostic | |
| Triana [ | Y | Internal data structure | Y | N | Y | Hetero-geneous Apps | |
| Galaxy [ | N | External | N | Y | N | Bioinformatics | |
| Pipeline Pilot | Y | Internal | Y | N | N | Biochemistry | |
| AVS [ | Y | Internal | Y | N | N | Advanced Visualization | |
| VisTrails [ | Y | Internal | N | N | N | Scientific Visualization | |
| Bioclipse [ | N | Internal | Y | N | N | Biochemistry Bioinformatics | |
Y = yes, N = no; Taverna MIR plug-in, MIR = myGrid Information Repository; DRMAA = Distributed Resource Management Application API.
Figure 1An example of a completed Pipeline workflow (Local Shape Analysis) representing an end-to-end computational solution to a specific brain mapping problem. This pipeline protocol starts with the raw magnetic resonance imaging data for 2 cohorts (11 Alzheimer's disease patients and 10 age-matched normal controls). For each subject, the workflow automatically extracts a region of interest (left superior frontal gyrus, LSFG. using BrainParser [1]) and generates a 2D shape manifold model of the regional boundary [2,3]. Then the pipeline computes a mean LSFG shape using the normal subjects LSFG shapes, coregisters the LSFG shapes of all subjects to the mean (atlas) LSFG shape, and maps the locations of the statistically significant differences of the 3D displacement vector fields between the 2 cohorts. The insert images illustrate the mean LSFG shape (top-right), the LSFG for one subject (bottom-left), and the between-group statistical mapping results overlaid on the mean LSFG shape (bottom-right), red color indicates p-value < 0.01.
Figure 2High-level schematic representation of the communication between multiple local Pipeline clients connected to multiple remote Pipeline servers.
An example of a hierarchical alignment and assembly protocol specification
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| Input: alignment.bam file |
| Tool: samtools (rmdup) |
| Server Location:/applications/samtools-0.1.7_x86_64-linux |
| Output: alignment.rmdup.bam file |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
This protocol is implemented as a Pipeline graphical workflow and demonstrated in the Results section. Figure 3 shows the corresponding Pipeline graphical workflow implementing this genomics analysis protocol.
Figure 3A high-level group-folded representation of the alignment and assembly protocol, Table 2, as a Pipeline graphical workflow.
Figure 4A snapshot of the input parameters (data-sinks) for the miBLAST Pipeline workflow.
Figure 5A snapshot of the completed miBLAST Pipeline workflow. The insert image illustrates the final output result, see Table 3.
A fragment of the output result from the miBLAST pipeline workflow, see Figure 5
| | ||
| | ||
Figure 6A snapshot of the input parameters for the EMBOSS Matcher Pipeline workflow.
Figure 7A snapshot of the completed EMBOSS Matcher Pipeline workflow. The Insert image shows the output result of the local sequence alignment of hba_human and hbb_human.
A fragment of the text output of EMBOSS Matcher (see Figure 7)
| ######################################## |
| # Program: matcher |
| # Rundate: Tue 2 Nov 2010 15:50:56 |
| # Commandline: matcher |
| # -asequence tsw:hba_human |
| # -bsequence tsw:hbb_human |
| # -outfile/panfs/tmp/pipeline-edu/pipelnvr/2010November02_15h50m48s631ms/Matcher_1.Output-1.matcher |
| # Align_format: markx0 |
| # Report_file:/panfs/tmp/pipeline-edu/pipelnvr/2010November02_15h50m48s631ms/Matcher_1.Output-1.matcher |
| ######################################### |
| #======================================= |
| # |
| # Aligned_sequences: 2 |
| # 1: HBA_HUMAN |
| # 2: HBB_HUMAN |
| # Matrix: EBLOSUM62 |
| # Gap_penalty: 14 |
| # Extend_penalty: 4 |
Figure 8A snapshot of the input parameters for the mrFAST Indexing Pipeline workflow.
Figure 9A snapshot of the completed mrFAST Indexing Pipeline workflow.
A fragment of the text output of mrFAST Indexing workflow (see Figure 9)
Figure 10A snapshot of the input parameters for the GWASS Impute Pipeline workflow.
Figure 11A snapshot of the completed GWASS Impute Pipeline workflow.
A fragment of the text output of GWASS Impute workflow (see Figure 11)
| ... | |||||||||||||||
Figure 12Pipeline Server Library.
Figure 13A snapshot of the input parameters for this heterogeneous Pipeline workflow: EMBOSS: tsw:hba_human, tsw:hbb_human mrFAST: cofold-blue.fasta, query.fasta, dna.fasta.
Figure 14A snapshot of the completed heterogeneous (EMBOSS/mrFAST) Pipeline workflow.
A fragment of the text output of this heterogeneous pipeline workflow (see Figure 14)
Figure 15A snapshot of the input parameters for this heterogeneous Pipeline workflow.
Figure 16A snapshot of the completed heterogeneous Pipeline workflow. The image shows the expanded (raw, unfolded) version of the protocol which is analogous to the folded version of the same pipeline workflow illustrated on Figure 3. The folded version only demonstrates the major steps in the protocol and abstracts away some of the technical details, however, both versions of this protocol perform identical analyses.
A fragment of the text output of this heterogeneous pipeline workflow (see Figure 16)