| Literature DB >> 21645377 |
Tung Nguyen1, Weisong Shi, Douglas Ruden.
Abstract
BACKGROUND: Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS). However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface.Entities:
Year: 2011 PMID: 21645377 PMCID: PMC3127959 DOI: 10.1186/1756-0500-4-171
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1The MapReduce framework.
Figure 2CloudAligner architecture.
Compare CloudAligner features with its counterparts
| CloudAligner | CloudBurst | RMAP | |
|---|---|---|---|
| Mismatch Mapping | ✓ | ✓ | ✓ |
| Bisulfite Mapping | ✓ | ✓ | |
| Pair-end Mapping | ✓ | ✓ | |
| Fastq input | ✓ | ✓ | |
| SAM output | ✓ | ||
| Executable in Cloud | ✓ | ✓ | |
This table summarizes the features of CloudAligner and other closely related tools.
The detail configuration of machines of the main testbed
| Type | Machines # | CPU | Memory | HDD | OS |
|---|---|---|---|---|---|
| Server | 1 | 4 cores AMD 2GHz | 6GB | 250GB | 64 bits Ubuntu Server 9.04 |
| Server | 12 | 1 core Intel Xeon CPU 2.80GHz | 4GB | 40GB | 64 bits CentOS |
This table details the configuration of our main Hadoop testbed.
Figure 3The performance of CloudBurst and CloudAligner on small data.
Figure 4The performance of CloudBurst and CloudAligner on larger data.
Figure 5The performance of CloudAligner and CloudBurst in Amazon Elastic MapReduce.
Figure 6The effect of the number of maps on the performance of CloudAligner.
Figure 7Pair-end mapping in Amazon EC2.
Figure 8Bisulfite mapping in Amazon EC2.
The detail configuration of outdated machines for the heterogeneity tests
| Machines # | CPU | Memory | HDD | OS |
|---|---|---|---|---|
| 7 | 1 core Intel XEON CPU 1.80 GHz | 512 MB | 160 GB | 32 bits CentOS |
| 21 | 1 core Intel Pentium III | 512 MB | 20 GB | 32 bits Ubuntu 8.04 |
This table details our local Hadoop testbed configuration for the heterogeneity experiment.