| Literature DB >> 20413634 |
Joshua Orvis1, Jonathan Crabtree, Kevin Galens, Aaron Gussman, Jason M Inman, Eduardo Lee, Sreenath Nampally, David Riley, Jaideep P Sundaram, Victor Felix, Brett Whitty, Anup Mahurkar, Jennifer Wortman, Owen White, Samuel V Angiuoli.
Abstract
MOTIVATION: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users.Entities:
Mesh:
Year: 2010 PMID: 20413634 PMCID: PMC2881353 DOI: 10.1093/bioinformatics/btq167
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Architecture diagram showing process and data flow from pipeline creation in Ergatis, processing of wXML by Workflow Engine, job scheduling on a computational grid by SGE and finally optional data loading into a Chado relational database instance.
Selected Ergatis components by classification
| Component type | Count | Examples |
|---|---|---|
| Gene prediction | 14 | fgenesh, glimmer3, genscan, RNAmmer |
| HMM alignment | 4 | hmmpfam, panther |
| Sequence masking | 2 | repeatmasker, seg |
| Functional prediction | 12 | SignalP, tmhmm, pFunc |
| Phylogeny/binning | 3 | RDP, stap |
| Multiple alignment | 3 | clustalw, MUSCLE |
| Pairwise alignment | 14 | NCBI blast suite, WU-BLAST, BER |
Ergatis release v2.r12 currently contains 162 components that can be used to form complex bioinformatics analysis pipelines.