| Literature DB >> 18053179 |
Lance Feagan1, Justin Rohrer1, Alexander Garrett1, Heather Amthauer1, Ed Komp1, David Johnson1, Adam Hock1, Terry Clark1, Gerald Lushington1, Gary Minden1, Victor Frost1.
Abstract
This paper presents the Bioinformatics Computational Journal (BCJ), a framework for conducting and managing computational experiments in bioinformatics and computational biology. These experiments often involve series of computations, data searches, filters, and annotations which can benefit from a structured environment. Systems to manage computational experiments exist, ranging from libraries with standard data models to elaborate schemes to chain together input and output between applications. Yet, although such frameworks are available, their use is not widespread-ad hoc scripts are often required to bind applications together. The BCJ explores another solution to this problem through a computer based environment suitable for on-site use, which builds on the traditional laboratory notebook paradigm. It provides an intuitive, extensible paradigm designed for expressive composition of applications. Extensive features facilitate sharing data, computational methods, and entire experiments. By focusing on the bioinformatics and computational biology domain, the scope of the computational framework was narrowed, permitting us to implement a capable set of features for this domain. This report discusses the features determined critical by our system and other projects, along with design issues. We illustrate the use of our implementation of the BCJ on two domain-specific examples.Entities:
Year: 2007 PMID: 18053179 PMCID: PMC2228283 DOI: 10.1186/1751-0473-2-9
Source DB: PubMed Journal: Source Code Biol Med ISSN: 1751-0473
Figure 1Sample BCJ Session.
Execution capabilities of analyzed systems
| Tool | Cluster-Based Execution | Workflow/Pipeline | Web Services | Internal Tool Extensibility | External Tool Extensibility |
| PathPort | No | No | Yes | Yes (Web Services) | Yes |
| BioCoRE | Yes | No | No | No | No |
| PISE | No | Pipelines | No | Yes (XML) | No |
| MIGenAS | Not locally | Pipelines | No | No | No |
| Pegasys | Yes | Workflow | No | Yes (XML) | No |
| Wildfire | Yes | Workflow | Yes | No | Yes |
| Taverna | Yes (Grid Based) | Workflow | Yes | Yes (Web Services) | No |
| BCJ | Yes | Workflow | No | Yes (XML) | Yes |
Workflow design of analyzed systems
| Tool | Web Interface Capabilities | Drag-and-Drop Creation | Iteration/Conditional Branching |
| PathPort | No | No | No |
| BioCoRE | Yes | No | No |
| PISE | Yes | No | No |
| MIGenAS | Yes | No | No |
| Pegasys | No | No | No |
| Wildfire | Yes (Client Applet) | Yes | Yes |
| Taverna | No | No | Yes |
| BCJ | No | Yes | No |
Data management properties of analyzed systems
| Searching | |||||
| Tool | Data Management | Workflow (Experiment) Management | Data | Workflows | Data Export/Import to/from Other Tools |
| PathPort | VBI Curated Data | No | Yes | No | Yes |
| BioCoRE | Yes (BioFS) | No | No | No | Yes |
| PISE | No | No | No | No | Yes |
| MIGenAS | No | No | No | No | Yes |
| Pegasys | No | Yes | No | No | Yes |
| Wildfire | No | Yes | No | No | Yes |
| Taverna | No | Yes | No | No | Yes |
| BCJ | Yes | Yes | Yes | Yes | Yes |
Additional features of analyzed systems
| Tool | Data Provenance | Experiment Repeatability | Group Access | Encryption | Collaboration | Annotation |
| PathPort | No | No | No | No | Yes (Exporting Data) | No |
| BioCoRE | No | No | Yes | Yes | Yes | Yes |
| PISE | No | No | No | No | Yes (Exporting Workflows) | No |
| MIGenAS | No | No | No | No | No | No |
| Pegasys | No | No | No | No | No | No |
| Wildfire | No | No | No | No | Yes (Exporting Workflows) | Yes |
| Taverna | Yes | No | No | No | Yes (Exporting Workflows) | Yes |
| BCJ | Yes | Yes | Yes | Yes | Yes (Natively) | Yes |
Resources currently defined in BCJ
| Sequence search and alignment | SimpleBlastall, blastn, blastp, ClustalW, extractSequences |
| HMMER (biosequence analysis using hidden Markov models) | hmmAlign, hmmBuild, hmmCalibrate, hmmSearch, hmmit, hmmer2sam |
| GROMACS (a molecular dynamics package) | editconf, genbox, grompp, mdrun, pdb2gmx |
| GLIMMER (a system for finding genes in microbial DNA) | glimmer |
| GrailEXP (predicts exons, genes, repeats, and CpG islands) | grailAlign, grailCPG, grailExon, grailGeneAssembly, grailRepeats |
| EMBOSS (The European Molecular Biology Open Software Suite) | backtranseq, banana, bl2seq, btwisted, cai, chips, codcmp, coderet, compseq, cpgreport, distmat, einverted, equicktandem, fuzznuc, garnier, geecee, iep, marscan, msbar, newcoils, newcpgseek, octanol, palindrome, pepcoil, pepinfo, pepstats, primersearch, profit, prophecy, prophet, recoder, redata, shuffleseq, silent, water |
Figure 2Complete Experimental Workflow.
Figure 3HMMER Workflow.
Figure 4Dependency View.
Figure 5Experiment Definition.
Figure 6Experiment Results.
Figure 7Adding an Annotation to the Experiment.
Figure 8MD Simulation Workflow.
Figure 9Tutorial Experiment Workflow.
Figure 10Experiment Definition.