Literature DB >> 29659792

Tracking the NGS revolution: managing life science research on shared high-performance computing clusters.

Martin Dahlö1,2,3, Douglas G Scofield2,4, Wesley Schaal1,2,3, Ola Spjuth1,2,3.   

Abstract

Background: Next-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences.
Results: The number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat. Conclusions: Hosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases.

Entities:  

Mesh:

Year:  2018        PMID: 29659792      PMCID: PMC5928410          DOI: 10.1093/gigascience/giy028

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


  22 in total

1.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

Review 2.  Approximate Bayesian Computation (ABC) in practice.

Authors:  Katalin Csilléry; Michael G B Blum; Oscar E Gaggiotti; Olivier François
Journal:  Trends Ecol Evol       Date:  2010-05-18       Impact factor: 17.712

3.  The Amber biomolecular simulation programs.

Authors:  David A Case; Thomas E Cheatham; Tom Darden; Holger Gohlke; Ray Luo; Kenneth M Merz; Alexey Onufriev; Carlos Simmerling; Bing Wang; Robert J Woods
Journal:  J Comput Chem       Date:  2005-12       Impact factor: 3.376

4.  The MaSuRCA genome assembler.

Authors:  Aleksey V Zimin; Guillaume Marçais; Daniela Puiu; Michael Roberts; Steven L Salzberg; James A Yorke
Journal:  Bioinformatics       Date:  2013-08-29       Impact factor: 6.937

Review 5.  The next-generation sequencing revolution and its impact on genomics.

Authors:  Daniel C Koboldt; Karyn Meltz Steinberg; David E Larson; Richard K Wilson; Elaine R Mardis
Journal:  Cell       Date:  2013-09-26       Impact factor: 41.582

6.  ABCtoolbox: a versatile toolkit for approximate Bayesian computations.

Authors:  Daniel Wegmann; Christoph Leuenberger; Samuel Neuenschwander; Laurent Excoffier
Journal:  BMC Bioinformatics       Date:  2010-03-04       Impact factor: 3.169

Review 7.  A field guide to whole-genome sequencing, assembly and annotation.

Authors:  Robert Ekblom; Jochen B W Wolf
Journal:  Evol Appl       Date:  2014-06-24       Impact factor: 5.183

8.  Singularity: Scientific containers for mobility of compute.

Authors:  Gregory M Kurtzer; Vanessa Sochat; Michael W Bauer
Journal:  PLoS One       Date:  2017-05-11       Impact factor: 3.240

9.  Fast and accurate long-read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2010-01-15       Impact factor: 6.937

Review 10.  Recommendations on e-infrastructures for next-generation sequencing.

Authors:  Ola Spjuth; Erik Bongcam-Rudloff; Johan Dahlberg; Martin Dahlö; Aleksi Kallio; Luca Pireddu; Francesco Vezzi; Eija Korpelainen
Journal:  Gigascience       Date:  2016-06-07       Impact factor: 6.524

View more
  6 in total

1.  Tracking the NGS revolution: managing life science research on shared high-performance computing clusters.

Authors:  Martin Dahlö; Douglas G Scofield; Wesley Schaal; Ola Spjuth
Journal:  Gigascience       Date:  2018-05-01       Impact factor: 6.524

2.  MaRe: Processing Big Data with application containers on Apache Spark.

Authors:  Marco Capuccini; Martin Dahlö; Salman Toor; Ola Spjuth
Journal:  Gigascience       Date:  2020-05-01       Impact factor: 6.524

3.  FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics.

Authors:  Sree K Chanumolu; Mustafa Albahrani; Hasan H Otu
Journal:  BMC Bioinformatics       Date:  2019-08-15       Impact factor: 3.169

4.  On-demand virtual research environments using microservices.

Authors:  Marco Capuccini; Anders Larsson; Matteo Carone; Jon Ander Novella; Noureddin Sadawi; Jianliang Gao; Salman Toor; Ola Spjuth
Journal:  PeerJ Comput Sci       Date:  2019-11-11

5.  0s and 1s in marine molecular research: a regional HPC perspective.

Authors:  Haris Zafeiropoulos; Anastasia Gioti; Stelios Ninidakis; Antonis Potirakis; Savvas Paragkamian; Nelina Angelova; Aglaia Antoniou; Theodoros Danis; Eliza Kaitetzidou; Panagiotis Kasapidis; Jon Bent Kristoffersen; Vasileios Papadogiannis; Christina Pavloudi; Quoc Viet Ha; Jacques Lagnel; Nikos Pattakos; Giorgos Perantinos; Dimitris Sidirokastritis; Panagiotis Vavilis; Georgios Kotoulas; Tereza Manousaki; Elena Sarropoulou; Costas S Tsigenopoulos; Christos Arvanitidis; Antonios Magoulas; Evangelos Pafilis
Journal:  Gigascience       Date:  2021-08-18       Impact factor: 6.524

6.  SciPipe: A workflow library for agile development of complex and dynamic bioinformatics pipelines.

Authors:  Samuel Lampa; Martin Dahlö; Jonathan Alvarsson; Ola Spjuth
Journal:  Gigascience       Date:  2019-05-01       Impact factor: 6.524

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.