Ivan Sović1, Krešimir Križanović2, Karolj Skala1, Mile Šikić3. 1. Centre for Informatics and Computing, Ruđer Bošković Institute, 10000 Zagreb, Croatia. 2. Department of Electronic Systems and Information Processing, Faculty of Electrical Engineering and Computing, University of Zagreb, 10000 Zagreb, Croatia. 3. Department of Electronic Systems and Information Processing, Faculty of Electrical Engineering and Computing, University of Zagreb, 10000 Zagreb, Croatia Bioinformatics Institute, Singapore 138671, Singapore.
Abstract
MOTIVATION: Recent emergence of nanopore sequencing technology set a challenge for established assembly methods. In this work, we assessed how existing hybrid and non-hybrid de novo assembly methods perform on long and error prone nanopore reads. RESULTS: We benchmarked five non-hybrid (in terms of both error correction and scaffolding) assembly pipelines as well as two hybrid assemblers which use third generation sequencing data to scaffold Illumina assemblies. Tests were performed on several publicly available MinION and Illumina datasets of Escherichia coli K-12, using several sequencing coverages of nanopore data (20×, 30×, 40× and 50×). We attempted to assess the assembly quality at each of these coverages, in order to estimate the requirements for closed bacterial genome assembly. For the purpose of the benchmark, an extensible genome assembly benchmarking framework was developed. Results show that hybrid methods are highly dependent on the quality of NGS data, but much less on the quality and coverage of nanopore data and perform relatively well on lower nanopore coverages. All non-hybrid methods correctly assemble the E. coli genome when coverage is above 40×, even the non-hybrid method tailored for Pacific Biosciences reads. While it requires higher coverage compared to a method designed particularly for nanopore reads, its running time is significantly lower. AVAILABILITY AND IMPLEMENTATION: https://github.com/kkrizanovic/NanoMark CONTACT: mile.sikic@fer.hr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Recent emergence of nanopore sequencing technology set a challenge for established assembly methods. In this work, we assessed how existing hybrid and non-hybrid de novo assembly methods perform on long and error prone nanopore reads. RESULTS: We benchmarked five non-hybrid (in terms of both error correction and scaffolding) assembly pipelines as well as two hybrid assemblers which use third generation sequencing data to scaffold Illumina assemblies. Tests were performed on several publicly available MinION and Illumina datasets of Escherichia coli K-12, using several sequencing coverages of nanopore data (20×, 30×, 40× and 50×). We attempted to assess the assembly quality at each of these coverages, in order to estimate the requirements for closed bacterial genome assembly. For the purpose of the benchmark, an extensible genome assembly benchmarking framework was developed. Results show that hybrid methods are highly dependent on the quality of NGS data, but much less on the quality and coverage of nanopore data and perform relatively well on lower nanopore coverages. All non-hybrid methods correctly assemble the E. coli genome when coverage is above 40×, even the non-hybrid method tailored for Pacific Biosciences reads. While it requires higher coverage compared to a method designed particularly for nanopore reads, its running time is significantly lower. AVAILABILITY AND IMPLEMENTATION: https://github.com/kkrizanovic/NanoMark CONTACT: mile.sikic@fer.hr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Fan Zhang; Lena Christiansen; Jerushah Thomas; Dmitry Pokholok; Ros Jackson; Natalie Morrell; Yannan Zhao; Melissa Wiley; Emily Welch; Erich Jaeger; Ana Granat; Steven J Norberg; Aaron Halpern; Maria C Rogert; Mostafa Ronaghi; Jay Shendure; Niall Gormley; Kevin L Gunderson; Frank J Steemers Journal: Nat Biotechnol Date: 2017-06-26 Impact factor: 54.908
Authors: Scott Quainoo; Jordy P M Coolen; Sacha A F T van Hijum; Martijn A Huynen; Willem J G Melchers; Willem van Schaik; Heiman F L Wertheim Journal: Clin Microbiol Rev Date: 2017-10 Impact factor: 26.132
Authors: Marcel van der Merwe; Michael D Jukes; Lukasz Rabalski; Caroline Knox; John K Opoku-Debrah; Sean D Moore; Martyna Krejmer-Rabalska; Boguslaw Szewczyk; Martin P Hill Journal: Int J Mol Sci Date: 2017-11-03 Impact factor: 5.923
Authors: Sergey Koren; Brian P Walenz; Konstantin Berlin; Jason R Miller; Nicholas H Bergman; Adam M Phillippy Journal: Genome Res Date: 2017-03-15 Impact factor: 9.043
Authors: Sophie George; Louise Pankhurst; Alasdair Hubbard; Antonia Votintseva; Nicole Stoesser; Anna E Sheppard; Amy Mathers; Rachel Norris; Indre Navickaite; Chloe Eaton; Zamin Iqbal; Derrick W Crook; Hang T T Phan Journal: Microb Genom Date: 2017-06-09