| Literature DB >> 29163119 |
Tara M Madhyastha1, Natalie Koh1, Trevor K M Day1, Moises Hernández-Fernández2, Austin Kelley1, Daniel J Peterson1, Sabreena Rajan1, Karl A Woelfer1, Jonathan Wolf1, Thomas J Grabowski1,3.
Abstract
The contribution of this paper is to identify and describe current best practices for using Amazon Web Services (AWS) to execute neuroimaging workflows "in the cloud." Neuroimaging offers a vast set of techniques by which to interrogate the structure and function of the living brain. However, many of the scientists for whom neuroimaging is an extremely important tool have limited training in parallel computation. At the same time, the field is experiencing a surge in computational demands, driven by a combination of data-sharing efforts, improvements in scanner technology that allow acquisition of images with higher image resolution, and by the desire to use statistical techniques that stress processing requirements. Most neuroimaging workflows can be executed as independent parallel jobs and are therefore excellent candidates for running on AWS, but the overhead of learning to do so and determining whether it is worth the cost can be prohibitive. In this paper we describe how to identify neuroimaging workloads that are appropriate for running on AWS, how to benchmark execution time, and how to estimate cost of running on AWS. By benchmarking common neuroimaging applications, we show that cloud computing can be a viable alternative to on-premises hardware. We present guidelines that neuroimaging labs can use to provide a cluster-on-demand type of service that should be familiar to users, and scripts to estimate cost and create such a cluster.Entities:
Keywords: cloud computing; neuroimaging pipelines; reproducibility; workflow
Year: 2017 PMID: 29163119 PMCID: PMC5675877 DOI: 10.3389/fninf.2017.00063
Source DB: PubMed Journal: Front Neuroinform ISSN: 1662-5196 Impact factor: 4.081
Benchmarking matrix.
| Application | Data type | EC2 instance type | |||||||
|---|---|---|---|---|---|---|---|---|---|
| benchmarked | m4.large | m4.xlarge | m4.4xlarge | c4.large | c4.xlarge | c4.4xlarge | c4.8xlarge | g2.2xlarge | |
| Freesurfer Recon-all | Same | X | X | X | X | ||||
| (Downsampled Images) | Different | X | X | X | X | ||||
| Freesurfer Recon-all | Same | X | X | X | X | ||||
| (High Resolution Images) | Different | X | X | X | X | ||||
| FSL PROBTRACKX | Same | X | |||||||
| Different | |||||||||
| FSL BEDPOSTX | Same | X | |||||||
| Different | X | ||||||||
| Neuropointillist | Same | X | X | X | X | X | X | ||
| Different | X | X | X | X | x | X | |||
Ratio of execution time on reference workstation to AWS (using the average of timings on all c4 and m4 instances together), and timings on all c4 and m4 instances separately.
| Application | Workstation/AWS | Workstation/c4 | Workstation/m4 |
|---|---|---|---|
| Neuropointillist | 0.65 | 0.71 | 0.60 |
| FreeSurfer | 0.60 | 0.61 | 0.59 |
| FreeSurfer High Resolution Pipeline, Stage 2 | 0.63 | 0.64 | 0.60 |