Junwei Luo1, Jianxin Wang2, Weilong Li2, Zhen Zhang2, Fang-Xiang Wu3, Min Li2, Yi Pan4. 1. School of Information Science and Engineering, Central South University, ChangSha, 410083, China, College of Computer Science and Technology, Henan Polytechnic University, JiaoZuo, 454000, China. 2. School of Information Science and Engineering, Central South University, ChangSha, 410083, China. 3. Division of Biomedical Engineering, University of Saskatchewan, Saskatchewan, S7N 5A9, Canada and. 4. Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.
Abstract
MOTIVATION: In genome assembly, as coverage of sequencing and genome size growing, most current softwares require a large memory for handling a great deal of sequence data. However, most researchers usually cannot meet the requirements of computing resources which prevent most current softwares from practical applications. RESULTS: In this article, we present an update algorithm called EPGA2, which applies some new modules and can bring about improved assembly results in small memory. For reducing peak memory in genome assembly, EPGA2 adopts memory-efficient DSK to count K-mers and revised BCALM to construct De Bruijn Graph. Moreover, EPGA2 parallels the step of Contigs Merging and adds Errors Correction in its pipeline. Our experiments demonstrate that all these changes in EPGA2 are more useful for genome assembly. AVAILABILITY AND IMPLEMENTATION: EPGA2 is publicly available for download at https://github.com/bioinfomaticsCSU/EPGA2.
MOTIVATION: In genome assembly, as coverage of sequencing and genome size growing, most current softwares require a large memory for handling a great deal of sequence data. However, most researchers usually cannot meet the requirements of computing resources which prevent most current softwares from practical applications. RESULTS: In this article, we present an update algorithm called EPGA2, which applies some new modules and can bring about improved assembly results in small memory. For reducing peak memory in genome assembly, EPGA2 adopts memory-efficient DSK to count K-mers and revised BCALM to construct De Bruijn Graph. Moreover, EPGA2 parallels the step of Contigs Merging and adds Errors Correction in its pipeline. Our experiments demonstrate that all these changes in EPGA2 are more useful for genome assembly. AVAILABILITY AND IMPLEMENTATION: EPGA2 is publicly available for download at https://github.com/bioinfomaticsCSU/EPGA2.