Literature DB >> 33436980

Scalable long read self-correction and assembly polishing with multiple sequence alignment.

Pierre Morisse1, Camille Marchet2, Antoine Limasset2, Thierry Lecroq3, Arnaud Lefebvre3.   

Abstract

Third-generation sequencing technologies allow to sequence long reads of tens of kbp, that are expected to solve various problems. However, they display high error rates, currently capped around 10%. Self-correction is thus regularly used in long reads analysis projects. We introduce CONSENT, a new self-correction method that relies both on multiple sequence alignment and local de Bruijn graphs. To ensure scalability, multiple sequence alignment computation benefits from a new and efficient segmentation strategy, allowing a massive speedup. CONSENT compares well to the state-of-the-art, and performs better on real Oxford Nanopore data. Specifically, CONSENT is the only method that efficiently scales to ultra-long reads, and allows to process a full human dataset, containing reads reaching up to 1.5 Mbp, in 10 days. Moreover, our experiments show that error correction with CONSENT improves the quality of Flye assemblies. Additionally, CONSENT implements a polishing feature, allowing to correct raw assemblies. Our experiments show that CONSENT is 2-38x times faster than other polishing tools, while providing comparable results. Furthermore, we show that, on a human dataset, assembling the raw data and polishing the assembly is less resource consuming than correcting and then assembling the reads, while providing better results. CONSENT is available at https://github.com/morispi/CONSENT .

Entities:  

Year:  2021        PMID: 33436980      PMCID: PMC7804095          DOI: 10.1038/s41598-020-80757-5

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


  30 in total

1.  Multiple sequence alignment using partial order graphs.

Authors:  Christopher Lee; Catherine Grasso; Mark F Sharlow
Journal:  Bioinformatics       Date:  2002-03       Impact factor: 6.937

2.  Assembly of long, error-prone reads using repeat graphs.

Authors:  Mikhail Kolmogorov; Jeffrey Yuan; Yu Lin; Pavel A Pevzner
Journal:  Nat Biotechnol       Date:  2019-04-01       Impact factor: 54.908

3.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

Authors:  Chen-Shan Chin; David H Alexander; Patrick Marks; Aaron A Klammer; James Drake; Cheryl Heiner; Alicia Clum; Alex Copeland; John Huddleston; Evan E Eichler; Stephen W Turner; Jonas Korlach
Journal:  Nat Methods       Date:  2013-05-05       Impact factor: 28.547

4.  FLAS: fast and high-throughput algorithm for PacBio long-read self-correction.

Authors:  Ergude Bao; Fei Xie; Changjin Song; Dandan Song
Journal:  Bioinformatics       Date:  2019-10-15       Impact factor: 6.937

5.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2016-03-19       Impact factor: 6.937

6.  HINGE: long-read assembly achieves optimal repeat resolution.

Authors:  Govinda M Kamath; Ilan Shomorony; Fei Xia; Thomas A Courtade; David N Tse
Journal:  Genome Res       Date:  2017-03-20       Impact factor: 9.043

7.  Accurate self-correction of errors in long reads using de Bruijn graphs.

Authors:  Leena Salmela; Riku Walve; Eric Rivals; Esko Ukkonen
Journal:  Bioinformatics       Date:  2017-03-15       Impact factor: 6.937

8.  Versatile genome assembly evaluation with QUAST-LG.

Authors:  Alla Mikheenko; Andrey Prjibelski; Vladislav Saveliev; Dmitry Antipov; Alexey Gurevich
Journal:  Bioinformatics       Date:  2018-07-01       Impact factor: 6.937

9.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory.

Authors:  Mark J Chaisson; Glenn Tesler
Journal:  BMC Bioinformatics       Date:  2012-09-19       Impact factor: 3.169

10.  LoRDEC: accurate and efficient long read error correction.

Authors:  Leena Salmela; Eric Rivals
Journal:  Bioinformatics       Date:  2014-08-26       Impact factor: 6.937

View more
  8 in total

Review 1.  Genome sequence assembly algorithms and misassembly identification methods.

Authors:  Yue Meng; Yu Lei; Jianlong Gao; Yuxuan Liu; Enze Ma; Yunhong Ding; Yixin Bian; Hongquan Zu; Yucui Dong; Xiao Zhu
Journal:  Mol Biol Rep       Date:  2022-09-23       Impact factor: 2.742

2.  Accurate long-read de novo assembly evaluation with Inspector.

Authors:  Yu Chen; Yixin Zhang; Amy Y Wang; Min Gao; Zechen Chong
Journal:  Genome Biol       Date:  2021-11-14       Impact factor: 13.583

3.  phasebook: haplotype-aware de novo assembly of diploid genomes from long reads.

Authors:  Xiao Luo; Xiongbin Kang; Alexander Schönhuth
Journal:  Genome Biol       Date:  2021-10-27       Impact factor: 13.583

4.  RATTLE: reference-free reconstruction and quantification of transcriptomes from Nanopore sequencing.

Authors:  Ivan de la Rubia; Akanksha Srivastava; Wenjing Xue; Joel A Indi; Silvia Carbonell-Sala; Julien Lagarde; M Mar Albà; Eduardo Eyras
Journal:  Genome Biol       Date:  2022-07-08       Impact factor: 17.906

5.  Strainline: full-length de novo viral haplotype reconstruction from noisy long reads.

Authors:  Xiao Luo; Xiongbin Kang; Alexander Schönhuth
Journal:  Genome Biol       Date:  2022-01-20       Impact factor: 13.583

6.  High molecular weight DNA extraction strategies for long-read sequencing of complex metagenomes.

Authors:  Florian Trigodet; Karen Lolans; Emily Fogarty; Alon Shaiber; Hilary G Morrison; Luis Barreiro; Bana Jabri; A Murat Eren
Journal:  Mol Ecol Resour       Date:  2022-02-07       Impact factor: 8.678

7.  Identification of a dual orange/far-red and blue light photoreceptor from an oceanic green picoplankton.

Authors:  Yuko Makita; Shigekatsu Suzuki; Keiji Fushimi; Setsuko Shimada; Aya Suehisa; Manami Hirata; Tomoko Kuriyama; Yukio Kurihara; Hidefumi Hamasaki; Emiko Okubo-Kurihara; Kazutoshi Yoshitake; Tsuyoshi Watanabe; Masaaki Sakuta; Takashi Gojobori; Tomoko Sakami; Rei Narikawa; Haruyo Yamaguchi; Masanobu Kawachi; Minami Matsui
Journal:  Nat Commun       Date:  2021-06-16       Impact factor: 14.919

8.  Development of a Biocontained Toluene-Degrading Bacterium for Environmental Protection.

Authors:  Masahito Ishikawa; Takaaki Kojima; Katsutoshi Hori
Journal:  Microbiol Spectr       Date:  2021-07-28
  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.