Literature DB >> 34283824

Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources.

Maximilian Hanussek1,2, Felix Bartusch1,2, Jens Krüger1.   

Abstract

The large amount of biological data available in the current times, makes it necessary to use tools and applications based on sophisticated and efficient algorithms, developed in the area of bioinformatics. Further, access to high performance computing resources is necessary, to achieve results in reasonable time. To speed up applications and utilize available compute resources as efficient as possible, software developers make use of parallelization mechanisms, like multithreading. Many of the available tools in bioinformatics offer multithreading capabilities, but more compute power is not always helpful. In this study we investigated the behavior of well-known applications in bioinformatics, regarding their performance in the terms of scaling, different virtual environments and different datasets with our benchmarking tool suite BOOTABLE. The tool suite includes the tools BBMap, Bowtie2, BWA, Velvet, IDBA, SPAdes, Clustal Omega, MAFFT, SINA and GROMACS. In addition we added an application using the machine learning framework TensorFlow. Machine learning is not directly part of bioinformatics but applied to many biological problems, especially in the context of medical images (X-ray photographs). The mentioned tools have been analyzed in two different virtual environments, a virtual machine environment based on the OpenStack cloud software and in a Docker environment. The gained performance values were compared to a bare-metal setup and among each other. The study reveals, that the used virtual environments produce an overhead in the range of seven to twenty-five percent compared to the bare-metal environment. The scaling measurements showed, that some of the analyzed tools do not benefit from using larger amounts of computing resources, whereas others showed an almost linear scaling behavior. The findings of this study have been generalized as far as possible and should help users to find the best amount of resources for their analysis. Further, the results provide valuable information for resource providers to handle their resources as efficiently as possible and raise the user community's awareness of the efficient usage of computing resources.

Entities:  

Year:  2021        PMID: 34283824     DOI: 10.1371/journal.pcbi.1009244

Source DB:  PubMed          Journal:  PLoS Comput Biol        ISSN: 1553-734X            Impact factor:   4.475


  31 in total

1.  Multiple sequence alignment using partial order graphs.

Authors:  Christopher Lee; Catherine Grasso; Mark F Sharlow
Journal:  Bioinformatics       Date:  2002-03       Impact factor: 6.937

2.  SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.

Authors:  Anton Bankevich; Sergey Nurk; Dmitry Antipov; Alexey A Gurevich; Mikhail Dvorkin; Alexander S Kulikov; Valery M Lesin; Sergey I Nikolenko; Son Pham; Andrey D Prjibelski; Alexey V Pyshkin; Alexander V Sirotkin; Nikolay Vyahhi; Glenn Tesler; Max A Alekseyev; Pavel A Pevzner
Journal:  J Comput Biol       Date:  2012-04-16       Impact factor: 1.479

3.  Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2012-05-07       Impact factor: 6.937

4.  Clustal omega.

Authors:  Fabian Sievers; Desmond G Higgins
Journal:  Curr Protoc Bioinformatics       Date:  2014-12-12

5.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

6.  BayesHammer: Bayesian clustering for error correction in single-cell sequencing.

Authors:  Sergey I Nikolenko; Anton I Korobeynikov; Max A Alekseyev
Journal:  BMC Genomics       Date:  2013-01-21       Impact factor: 3.969

7.  SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.

Authors:  Elmar Pruesse; Jörg Peplies; Frank Oliver Glöckner
Journal:  Bioinformatics       Date:  2012-05-03       Impact factor: 6.937

8.  Singularity: Scientific containers for mobility of compute.

Authors:  Gregory M Kurtzer; Vanessa Sochat; Michael W Bauer
Journal:  PLoS One       Date:  2017-05-11       Impact factor: 3.240

9.  Optimizing de novo assembly of short-read RNA-seq data for phylogenomics.

Authors:  Ya Yang; Stephen A Smith
Journal:  BMC Genomics       Date:  2013-05-14       Impact factor: 3.969

10.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools.

Authors:  Christian Quast; Elmar Pruesse; Pelin Yilmaz; Jan Gerken; Timmy Schweer; Pablo Yarza; Jörg Peplies; Frank Oliver Glöckner
Journal:  Nucleic Acids Res       Date:  2012-11-28       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.