Literature DB >> 29106455

ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads.

Altti Ilari Maarala1,2, Zurab Bzhalava3, Joakim Dillner3, Keijo Heljanko1,2, Davit Bzhalava3.   

Abstract

Motivation: Next Generation Sequencing (NGS) technology enables identification of microbial genomes from massive amount of human microbiomes more rapidly and cheaper than ever before. However, the traditional sequential genome analysis algorithms, tools, and platforms are inefficient for performing large-scale metagenomic studies on ever-growing sample data volumes. Currently, there is an urgent need for scalable analysis pipelines that enable harnessing all the power of parallel computation in computing clusters and in cloud computing environments. We propose ViraPipe, a scalable metagenome analysis pipeline that is able to analyze thousands of human microbiomes in parallel in tolerable time. The pipeline is tuned for analyzing viral metagenomes and the software is applicable for other metagenomic analyses as well. ViraPipe integrates parallel BWA-MEM read aligner, MegaHit De novo assembler, and BLAST and HMMER3 sequence search tools. We show the scalability of ViraPipe by running experiments on mining virus related genomes from NGS datasets in a distributed Spark computing cluster.
Results: ViraPipe analyses 768 human samples in 210 minutes on a Spark computing cluster comprising 23 nodes and 1288 cores in total. The speedup of ViraPipe executed on 23 nodes was 11x compared to the sequential analysis pipeline executed on a single node. The whole process includes parallel decompression, read interleaving, BWA-MEM read alignment, filtering and normalizing of non-human reads, De novo contigs assembling, and searching of sequences with BLAST and HMMER3 tools. Contact: ilari.maarala@aalto.fi. Availability and implementation: https://github.com/NGSeq/ViraPipe.

Entities:  

Mesh:

Year:  2018        PMID: 29106455     DOI: 10.1093/bioinformatics/btx702

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  6 in total

1.  V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput data.

Authors:  Susana Posada-Céspedes; David Seifert; Ivan Topolsky; Kim Philipp Jablonski; Karin J Metzner; Niko Beerenwinkel
Journal:  Bioinformatics       Date:  2021-01-20       Impact factor: 6.937

2.  Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services.

Authors:  Inès Krissaane; Carlos De Niz; Alba Gutiérrez-Sacristán; Gabor Korodi; Nneka Ede; Ranjay Kumar; Jessica Lyons; Arjun Manrai; Chirag Patel; Isaac Kohane; Paul Avillach
Journal:  J Am Med Inform Assoc       Date:  2020-07-27       Impact factor: 4.497

3.  Human exposome assessment platform.

Authors:  Roxana Merino Martinez; Heimo Müller; Stefan Negru; Alex Ormenisan; Laila Sara Arroyo Mühr; Xinyue Zhang; Frederik Trier Møller; Mark S Clements; Zisis Kozlakidis; Ville N Pimenoff; Bartlomiej Wilkowski; Martin Boeckhout; Hanna Öhman; Steven Chong; Andreas Holzinger; Matti Lehtinen; Evert-Ben van Veen; Piotr Bała; Martin Widschwendter; Jim Dowling; Juha Törnroos; Michael P Snyder; Joakim Dillner
Journal:  Environ Epidemiol       Date:  2021-12-03

4.  Current trends for customized biomedical software tools.

Authors:  Haseeb Ahmad Khan
Journal:  Bioinformation       Date:  2017-12-31

5.  DisCVR: Rapid viral diagnosis from high-throughput sequencing data.

Authors:  Maha Maabar; Andrew J Davison; Matej Vučak; Fiona Thorburn; Pablo R Murcia; Rory Gunson; Massimo Palmarini; Joseph Hughes
Journal:  Virus Evol       Date:  2019-08-26

6.  Distributed hybrid-indexing of compressed pan-genomes for scalable and fast sequence alignment.

Authors:  Altti Ilari Maarala; Ossi Arasalo; Daniel Valenzuela; Veli Mäkinen; Keijo Heljanko
Journal:  PLoS One       Date:  2021-08-03       Impact factor: 3.240

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.