Literature DB >> 23223611

Exploiting parallel R in the cloud with SPRINT.

M Piotrowski1, G A McGilvary, T M Sloan, M Mewissen, A D Lloyd, T Forster, L Mitchell, P Ghazal, J Hill.   

Abstract

BACKGROUND: Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need.
OBJECTIVES: Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazon's Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance.
METHODS: The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences.
RESULTS: It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of the algorithm. Resource underutilization can further improve the time to result. End-user's location impacts on costs due to factors such as local taxation.
CONCLUSIONS: Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds.

Entities:  

Mesh:

Year:  2012        PMID: 23223611      PMCID: PMC3547073          DOI: 10.3414/ME11-02-0039

Source DB:  PubMed          Journal:  Methods Inf Med        ISSN: 0026-1270            Impact factor:   2.176


  13 in total

1.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

Authors:  Ron Edgar; Michael Domrachev; Alex E Lash
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

Review 2.  DNA microarray technology: devices, systems, and applications.

Authors:  Michael J Heller
Journal:  Annu Rev Biomed Eng       Date:  2002-03-22       Impact factor: 9.590

3.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

4.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

Review 5.  Computational approaches to analysis of DNA microarray data.

Authors:  J Quackenbush
Journal:  Yearb Med Inform       Date:  2006

6.  Estimating the number of clusters in DNA microarray data.

Authors:  Nadia Bolshakova; F Azuaje
Journal:  Methods Inf Med       Date:  2006       Impact factor: 2.176

7.  Optimization of a parallel permutation testing function for the SPRINT R package.

Authors:  Savvas Petrou; Terence M Sloan; Muriel Mewissen; Thorsten Forster; Michal Piotrowski; Bartosz Dobrzelecki; Peter Ghazal; Arthur Trew; Jon Hill
Journal:  Concurr Comput       Date:  2011-06-23       Impact factor: 1.536

8.  ArrayExpress update--an archive of microarray and high-throughput sequencing-based functional genomics experiments.

Authors:  Helen Parkinson; Ugis Sarkans; Nikolay Kolesnikov; Niran Abeygunawardena; Tony Burdett; Miroslaw Dylag; Ibrahim Emam; Anna Farne; Emma Hastings; Ele Holloway; Natalja Kurbatova; Margus Lukk; James Malone; Roby Mani; Ekaterina Pilicheva; Gabriella Rustici; Anjan Sharma; Eleanor Williams; Tomasz Adamusiak; Marco Brandizi; Nataliya Sklyar; Alvis Brazma
Journal:  Nucleic Acids Res       Date:  2010-11-10       Impact factor: 16.971

9.  SPRINT: a new parallel framework for R.

Authors:  Jon Hill; Matthew Hambley; Thorsten Forster; Muriel Mewissen; Terence M Sloan; Florian Scharinger; Arthur Trew; Peter Ghazal
Journal:  BMC Bioinformatics       Date:  2008-12-29       Impact factor: 3.169

10.  Large-scale integration of cancer microarray data identifies a robust common cancer signature.

Authors:  Lei Xu; Donald Geman; Raimond L Winslow
Journal:  BMC Bioinformatics       Date:  2007-07-30       Impact factor: 3.169

View more
  1 in total

1.  From bed to bench: bridging from informatics practice to theory: an exploratory analysis.

Authors:  R Haux; C U Lehmann
Journal:  Appl Clin Inform       Date:  2014-10-29       Impact factor: 2.342

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.