| Literature DB >> 21901085 |
Vincent A Fusaro1, Prasad Patil, Erik Gafni, Dennis P Wall, Peter J Tonellato.
Abstract
In this overview to biomedical computing in the cloud, we discussed two primary ways to use the cloud (a single instance or cluster), provided a detailed example using NGS mapping, and highlighted the associated costs. While many users new to the cloud may assume that entry is as straightforward as uploading an application and selecting an instance type and storage options, we illustrated that there is substantial up-front effort required before an application can make full use of the cloud's vast resources. Our intention was to provide a set of best practices and to illustrate how those apply to a typical application pipeline for biomedical informatics, but also general enough for extrapolation to other types of computational problems. Our mapping example was intended to illustrate how to develop a scalable project and not to compare and contrast alignment algorithms for read mapping and genome assembly. Indeed, with a newer aligner such as Bowtie, it is possible to map the entire African genome using one m2.2xlarge instance in 48 hours for a total cost of approximately $48 in computation time. In our example, we were not concerned with data transfer rates, which are heavily influenced by the amount of available bandwidth, connection latency, and network availability. When transferring large amounts of data to the cloud, bandwidth limitations can be a major bottleneck, and in some cases it is more efficient to simply mail a storage device containing the data to AWS (http://aws.amazon.com/importexport/). More information about cloud computing, detailed cost analysis, and security can be found in references.Entities:
Mesh:
Year: 2011 PMID: 21901085 PMCID: PMC3161908 DOI: 10.1371/journal.pcbi.1002147
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
A summary of AWS pricing for basic computation, storage, and data transfer.
| Resource Type | Example Use | AWS Service | Service Unit | CPUs (#xGHz) | Memory (GB) | Cost ($/Hr) |
|
| Running a 51-node cluster (50 m2.2xlarge workers and one m1.small master) for 8 hours costs | EC2 | t1.micro | 2×1 | 0.6 | 0.020 |
| m1.small | 1×1 | 1.7 | 0.085 | |||
| m1.large | 2×2 | 7.5 | 0.340 | |||
| m1.xlarge | 4×2 | 15 | 0.680 | |||
| c1.medium | 2×2.5 | 1.7 | 0.170 | |||
| c1.xlarge | 8×2.5 | 7 | 0.680 | |||
| m2.xlarge | 2×3.25 | 17.1 | 0.500 | |||
| m2.2xlarge | 4×3.25 | 34.2 | 1.000 | |||
| m2.4xlarge | 8×3.25 | 68.4 | 2.000 | |||
| cc1.4xlarge | 2×(4×4.19) | 23 | 1.600 | |||
| cg1.4xlarge | 2×(4×4.19) | 22 | 2.100 | |||
|
|
|
|
|
|
|
|
|
| Maintaining 5 buckets (4×50 GB data files and 1×30 GB results) for 4 months costs | S3 | S3 Bucket | Virtually unlimited | First 1 TB | 0.140 |
| Next 49 TB | 0.125 | |||||
| Next 450 TB | 0.110 | |||||
| Next 500 TB | 0.095 | |||||
| Next 4000 TB | 0.080 | |||||
| 5,000+ TB | 0.055 | |||||
| Attaching 3 EBS volumes to an instance (2×100 GB and 1×30 GB) for 1 month costs | EBS | EBS Volume | Up to 1 TB | N/A | 0.100 | |
|
|
|
|
|
|
| |
|
| Uploading 230 GB of data to S3 and downloading 12 GB of results costs | EC2, S3 | I/O | Data IN | 0.000 (free) | |
| Data OUT First 1 GB | 0.000 (free) | |||||
| Data OUT Next 10 TB | 0.120 | |||||
| Data OUT Next 40 TB | 0.090 | |||||
| Data OUT Next 100 TB | 0.070 | |||||
| Data OUT 150 TB+ | 0.050 | |||||
| Between AWS Services | 0.000 | |||||
| EBS | I/O | Per 1 m I/O Requests | 0.100 | |||
| S3 | API Request | PUT, COPY, POST, LIST Request | 0.01 (per 1,000) | |||
| GET Request | 0.01 (per 10,000) | |||||
Prices are current as of 7/05/11.
CPUs are in terms of a 1-GHz Opteron 2007 processor, unless otherwise noted. For example, a machine with four 1-GHz processors would be listed as 4×1.
CPU is a quad-core Xeon X5570, i.e., two quad-core CPUs, where each core is 4.19 GHz.
CPU is a quad-core Xeon X5570, and instance includes two NVIDIA Tesla "Fermi" M2050 GPUs.
Costs reflect standard EC2 use with Linux OS. Costs increase when using Windows and decrease when using Reserved Instances (up-front payment) or Spot Instances (user-specified price on unused EC2 capacity).
Within same AWS availability region (e.g., AWS US-East).
Request costs are more difficult to estimate, and are usually more pertinent when databases and other similar services are involved. Programs like IOSTAT can be used to estimate EBS requests.
Figure 1Step-wise framework for creating a scalable NGS computing application.
Using your local computer, ssh into an instance running in AWS. The costs are representative of actual development time, data transfer into and out of the cloud, and the compute time using AWS (Table 1). The costs presented may vary, as AWS frequently updates their pricing structure. (A) An additional 3 hours were included for installing programs and testing the instance for the prototyping phase. (B) An additional 2 hours were included in developing the scalable application to learn how to use the cluster management software. (C) For the final scaled application, we used a 38-instance cluster.