Literature DB >> 27012178

MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices.

Dinghua Li1, Ruibang Luo2, Chi-Man Liu3, Chi-Ming Leung1, Hing-Fung Ting1, Kunihiko Sadakane4, Hiroshi Yamashita4, Tak-Wah Lam5.   

Abstract

The study of metagenomics has been much benefited from low-cost and high-throughput sequencing technologies, yet the tremendous amount of data generated make analysis like de novo assembly to consume too much computational resources. In late 2014 we released MEGAHIT v0.1 (together with a brief note of Li et al. (2015) [1]), which is the first NGS metagenome assembler that can assemble genome sequences from metagenomic datasets of hundreds of Giga base-pairs (bp) in a time- and memory-efficient manner on a single server. The core of MEGAHIT is an efficient parallel algorithm for constructing succinct de Bruijn Graphs (SdBG), implemented on a graphical processing unit (GPU). The software has been well received by the assembly community, and there is interest in how to adapt the algorithms to integrate popular assembly practices so as to improve the assembly quality, as well as how to speed up the software using better CPU-based algorithms (instead of GPU). In this paper we first describe the details of the core algorithms in MEGAHIT v0.1, and then we show the new modules to upgrade MEGAHIT to version v1.0, which gives better assembly quality, runs faster and uses less memory. For the Iowa Prairie Soil dataset (252Gbp after quality trimming), the assembly quality of MEGAHIT v1.0, when compared with v0.1, has a significant improvement, namely, 36% increase in assembly size and 23% in N50. More interestingly, MEGAHIT v1.0 is no slower than before (even running with the extra modules). This is primarily due to a new CPU-based algorithm for SdBG construction that is faster and requires less memory. Using CPU only, MEGAHIT v1.0 can assemble the Iowa Prairie Soil sample in about 43h, reducing the running time of v0.1 by at least 25% and memory usage by up to 50%. MEGAHIT v1.0, exhibiting a smaller memory footprint, can process even larger datasets. The Kansas Prairie Soil sample (484Gbp), the largest publicly available dataset, can now be assembled using no more than 500GB of memory in 7.5days. The assemblies of these datasets (and other large metgenomic datasets), as well as the software, are available at the website https://hku-bal.github.io/megabox.
Copyright © 2016 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Metagenome assembly; Parallel computing; Succinct data structure

Mesh:

Substances:

Year:  2016        PMID: 27012178     DOI: 10.1016/j.ymeth.2016.02.020

Source DB:  PubMed          Journal:  Methods        ISSN: 1046-2023            Impact factor:   3.608


  307 in total

1.  Challenges in benchmarking metagenomic profilers.

Authors:  Zheng Sun; Shi Huang; Meng Zhang; Qiyun Zhu; Niina Haiminen; Anna Paola Carrieri; Yoshiki Vázquez-Baeza; Laxmi Parida; Ho-Cheol Kim; Rob Knight; Yang-Yu Liu
Journal:  Nat Methods       Date:  2021-05-13       Impact factor: 28.547

2.  Efficient Nitrification and Low-Level N2O Emission in a Weakly Acidic Bioreactor at Low Dissolved-Oxygen Levels Are Due to Comammox.

Authors:  Deyong Li; Fang Fang; Guoqiang Liu
Journal:  Appl Environ Microbiol       Date:  2021-05-11       Impact factor: 4.792

3.  Metagenomics-guided analysis of microbial chemolithoautotrophic phosphite oxidation yields evidence of a seventh natural CO2 fixation pathway.

Authors:  Israel A Figueroa; Tyler P Barnum; Pranav Y Somasekhar; Charlotte I Carlström; Anna L Engelbrektson; John D Coates
Journal:  Proc Natl Acad Sci U S A       Date:  2017-11-28       Impact factor: 11.205

4.  Assessment of metagenomic assemblers based on hybrid reads of real and simulated metagenomic sequences.

Authors:  Ziye Wang; Ying Wang; Jed A Fuhrman; Fengzhu Sun; Shanfeng Zhu
Journal:  Brief Bioinform       Date:  2020-05-21       Impact factor: 11.622

5.  Cenote-Taker 2 democratizes virus discovery and sequence annotation.

Authors:  Michael J Tisza; Anna K Belford; Guillermo Domínguez-Huerta; Benjamin Bolduc; Christopher B Buck
Journal:  Virus Evol       Date:  2020-12-30

6.  Genomic and metagenomic insights into the microbial community of a thermal spring.

Authors:  Renato Pedron; Alfonso Esposito; Irene Bianconi; Edoardo Pasolli; Adrian Tett; Francesco Asnicar; Mario Cristofolini; Nicola Segata; Olivier Jousson
Journal:  Microbiome       Date:  2019-01-23       Impact factor: 14.650

7.  Diversity of nitrogen cycling genes at a Midwest long-term ecological research site with different management practices.

Authors:  Zheng Li; Alison M Cupples
Journal:  Appl Microbiol Biotechnol       Date:  2021-05-04       Impact factor: 4.813

8.  Host-Specific Evolutionary and Transmission Dynamics Shape the Functional Diversification of Staphylococcus epidermidis in Human Skin.

Authors:  Wei Zhou; Michelle Spoto; Rachel Hardy; Changhui Guan; Elizabeth Fleming; Peter J Larson; Joseph S Brown; Julia Oh
Journal:  Cell       Date:  2020-01-30       Impact factor: 41.582

Review 9.  Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit.

Authors:  Fernando Meyer; Till-Robin Lesker; David Koslicki; Adrian Fritz; Alexey Gurevich; Aaron E Darling; Alexander Sczyrba; Andreas Bremges; Alice C McHardy
Journal:  Nat Protoc       Date:  2021-03-01       Impact factor: 13.491

10.  Localized effect of treated wastewater effluent on the resistome of an urban watershed.

Authors:  Christopher N Thornton; Windy D Tanner; James A VanDerslice; William J Brazelton
Journal:  Gigascience       Date:  2020-11-19       Impact factor: 6.524

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.