| Literature DB >> 25342933 |
Ravi K Madduri1, Dinanath Sulakhe1, Lukasz Lacinski1, Bo Liu1, Alex Rodriguez1, Kyle Chard1, Utpal J Dave1, Ian T Foster1.
Abstract
We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing (NGS) genomic data. This system achieves a high degree of end-to-end automation that encompasses every stage of data analysis including initial data retrieval from remote sequencing centers or storage (via the Globus file transfer system); specification, configuration, and reuse of multi-step processing pipelines (via the Galaxy workflow system); creation of custom Amazon Machine Images and on-demand resource acquisition via a specialized elastic provisioner (on Amazon EC2); and efficient scheduling of these pipelines over many processors (via the HTCondor scheduler). The system allows biomedical researchers to perform rapid analysis of large NGS datasets in a fully automated manner, without software installation or a need for any local computing infrastructure. We report performance and cost results for some representative workloads.Entities:
Keywords: Cloud; HPC; HTC; NGS; workflows
Year: 2014 PMID: 25342933 PMCID: PMC4203657 DOI: 10.1002/cpe.3274
Source DB: PubMed Journal: Concurr Comput ISSN: 1532-0626 Impact factor: 1.536