Literature DB >> 35567771

ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long-Read Genome Assemblies.

Janet X Li1,2, Lauren Coombe1, Johnathan Wong1, Inanç Birol1,3, René L Warren1.   

Abstract

High-quality genome assemblies are crucial to many biological studies, and utilizing long sequencing reads can help achieve higher assembly contiguity. While long reads can resolve complex and repetitive regions of a genome, their relatively high associated error rates are still a major limitation. Long reads generally produce draft genome assemblies with lower base quality, which must be corrected with a genome polishing step. Hybrid genome polishing solutions can greatly improve the quality of long-read genome assemblies by utilizing more accurate short reads to validate bases and correct errors. Currently available hybrid polishing methods rely on read alignments, and are therefore memory-intensive and do not scale well to large genomes. Here we describe ntEdit+Sealer, an alignment-free, k-mer-based genome finishing protocol that employs memory-efficient Bloom filters. The protocol includes ntEdit for correcting base errors and small indels, and for marking potentially problematic regions, then Sealer for filling both assembly gaps and problematic regions flagged by ntEdit. ntEdit+Sealer produces highly accurate, error-corrected genome assemblies, and is available as a Makefile pipeline from https://github.com/bcgsc/ntedit_sealer_protocol.
© 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Automated long-read genome finishing with short reads Support Protocol: Selecting optimal values for k-mer lengths (k) and Bloom filter size (b). © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC.

Entities:  

Keywords:  Bloom filter; assembly finishing; hybrid assembly polishing; k-mer; long-read genome assembly

Mesh:

Year:  2022        PMID: 35567771      PMCID: PMC9196995          DOI: 10.1002/cpz1.442

Source DB:  PubMed          Journal:  Curr Protoc        ISSN: 2691-1299


  15 in total

1.  Informed and automated k-mer size selection for genome assembly.

Authors:  Rayan Chikhi; Paul Medvedev
Journal:  Bioinformatics       Date:  2013-06-03       Impact factor: 6.937

2.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors:  Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal:  Bioinformatics       Date:  2015-06-09       Impact factor: 6.937

Review 3.  The Third Revolution in Sequencing Technology.

Authors:  Erwin L van Dijk; Yan Jaszczyszyn; Delphine Naquin; Claude Thermes
Journal:  Trends Genet       Date:  2018-06-22       Impact factor: 11.639

4.  ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long-Read Genome Assemblies.

Authors:  Janet X Li; Lauren Coombe; Johnathan Wong; Inanç Birol; René L Warren
Journal:  Curr Protoc       Date:  2022-05

5.  Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes.

Authors:  Kishwar Shafin; Trevor Pesout; Ryan Lorig-Roach; Marina Haukness; Hugh E Olsen; Colleen Bosworth; Joel Armstrong; Kristof Tigyi; Nicholas Maurer; Sergey Koren; Fritz J Sedlazeck; Tobias Marschall; Simon Mayes; Vania Costa; Justin M Zook; Kelvin J Liu; Duncan Kilburn; Melanie Sorensen; Katy M Munson; Mitchell R Vollger; Jean Monlong; Erik Garrison; Evan E Eichler; Sofie Salama; David Haussler; Richard E Green; Mark Akeson; Adam Phillippy; Karen H Miga; Paolo Carnevali; Miten Jain; Benedict Paten
Journal:  Nat Biotechnol       Date:  2020-05-04       Impact factor: 54.908

6.  Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement.

Authors:  Bruce J Walker; Thomas Abeel; Terrance Shea; Margaret Priest; Amr Abouelliel; Sharadha Sakthikumar; Christina A Cuomo; Qiandong Zeng; Jennifer Wortman; Sarah K Young; Ashlee M Earl
Journal:  PLoS One       Date:  2014-11-19       Impact factor: 3.240

7.  ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter.

Authors:  Shaun D Jackman; Benjamin P Vandervalk; Hamid Mohamadi; Justin Chu; Sarah Yeo; S Austin Hammond; Golnaz Jahesh; Hamza Khan; Lauren Coombe; Rene L Warren; Inanc Birol
Journal:  Genome Res       Date:  2017-02-23       Impact factor: 9.043

8.  Fast and accurate de novo genome assembly from long uncorrected reads.

Authors:  Robert Vaser; Ivan Sović; Niranjan Nagarajan; Mile Šikić
Journal:  Genome Res       Date:  2017-01-18       Impact factor: 9.043

9.  ntEdit: scalable genome sequence polishing.

Authors:  René L Warren; Lauren Coombe; Hamid Mohamadi; Jessica Zhang; Barry Jaquish; Nathalie Isabel; Steven J M Jones; Jean Bousquet; Joerg Bohlmann; Inanç Birol
Journal:  Bioinformatics       Date:  2019-11-01       Impact factor: 6.937

Review 10.  Opportunities and challenges in long-read sequencing data analysis.

Authors:  Shanika L Amarasinghe; Shian Su; Xueyi Dong; Luke Zappia; Matthew E Ritchie; Quentin Gouil
Journal:  Genome Biol       Date:  2020-02-07       Impact factor: 13.583

View more
  1 in total

1.  ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long-Read Genome Assemblies.

Authors:  Janet X Li; Lauren Coombe; Johnathan Wong; Inanç Birol; René L Warren
Journal:  Curr Protoc       Date:  2022-05
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.