Literature DB >> 33349243

A comprehensive evaluation of long read error correction methods.

Haowen Zhang1, Chirag Jain1, Srinivas Aluru2,3.   

Abstract

BACKGROUND: Third-generation single molecule sequencing technologies can sequence long reads, which is advancing the frontiers of genomics research. However, their high error rates prohibit accurate and efficient downstream analysis. This difficulty has motivated the development of many long read error correction tools, which tackle this problem through sampling redundancy and/or leveraging accurate short reads of the same biological samples. Existing studies to asses these tools use simulated data sets, and are not sufficiently comprehensive in the range of software covered or diversity of evaluation measures used.
RESULTS: In this paper, we present a categorization and review of long read error correction methods, and provide a comprehensive evaluation of the corresponding long read error correction tools. Leveraging recent real sequencing data, we establish benchmark data sets and set up evaluation criteria for a comparative assessment which includes quality of error correction as well as run-time and memory usage. We study how trimming and long read sequencing depth affect error correction in terms of length distribution and genome coverage post-correction, and the impact of error correction performance on an important application of long reads, genome assembly. We provide guidelines for practitioners for choosing among the available error correction tools and identify directions for future research.
CONCLUSIONS: Despite the high error rate of long reads, the state-of-the-art correction tools can achieve high correction quality. When short reads are available, the best hybrid methods outperform non-hybrid methods in terms of correction quality and computing resource usage. When choosing tools for use, practitioners are suggested to be careful with a few correction tools that discard reads, and check the effect of error correction tools on downstream analysis. Our evaluation code is available as open-source at https://github.com/haowenz/LRECE .

Entities:  

Keywords:  Benchmark; Error correction; Evaluation; Long read

Mesh:

Year:  2020        PMID: 33349243      PMCID: PMC7751105          DOI: 10.1186/s12864-020-07227-0

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


  47 in total

Review 1.  A survey of error-correction methods for next-generation sequencing.

Authors:  Xiao Yang; Sriram P Chockalingam; Srinivas Aluru
Journal:  Brief Bioinform       Date:  2012-04-06       Impact factor: 11.622

2.  SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.

Authors:  Anton Bankevich; Sergey Nurk; Dmitry Antipov; Alexey A Gurevich; Mikhail Dvorkin; Alexander S Kulikov; Valery M Lesin; Sergey I Nikolenko; Son Pham; Andrey D Prjibelski; Alexey V Pyshkin; Alexander V Sirotkin; Nikolay Vyahhi; Glenn Tesler; Max A Alekseyev; Pavel A Pevzner
Journal:  J Comput Biol       Date:  2012-04-16       Impact factor: 1.479

3.  QUAST: quality assessment tool for genome assemblies.

Authors:  Alexey Gurevich; Vladislav Saveliev; Nikolay Vyahhi; Glenn Tesler
Journal:  Bioinformatics       Date:  2013-02-19       Impact factor: 6.937

4.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

Authors:  Chen-Shan Chin; David H Alexander; Patrick Marks; Aaron A Klammer; James Drake; Cheryl Heiner; Alicia Clum; Alex Copeland; John Huddleston; Evan E Eichler; Stephen W Turner; Jonas Korlach
Journal:  Nat Methods       Date:  2013-05-05       Impact factor: 28.547

5.  proovread: large-scale high-accuracy PacBio correction through iterative short read consensus.

Authors:  Thomas Hackl; Rainer Hedrich; Jörg Schultz; Frank Förster
Journal:  Bioinformatics       Date:  2014-07-10       Impact factor: 6.937

6.  Efficiency of PacBio long read correction by 2nd generation Illumina sequencing.

Authors:  Medhat Mahmoud; Marek Zywicki; Tomasz Twardowski; Wojciech M Karlowski
Journal:  Genomics       Date:  2017-12-18       Impact factor: 5.736

7.  Hercules: a profile HMM-based hybrid error correction algorithm for long reads.

Authors:  Can Firtina; Ziv Bar-Joseph; Can Alkan; A Ercument Cicek
Journal:  Nucleic Acids Res       Date:  2018-11-30       Impact factor: 16.971

8.  LRCstats, a tool for evaluating long reads correction methods.

Authors:  Sean La; Ehsan Haghshenas; Cedric Chauve
Journal:  Bioinformatics       Date:  2017-11-15       Impact factor: 6.937

9.  Resolving the complexity of the human genome using single-molecule sequencing.

Authors:  Mark J P Chaisson; John Huddleston; Megan Y Dennis; Peter H Sudmant; Maika Malig; Fereydoun Hormozdiari; Francesca Antonacci; Urvashi Surti; Richard Sandstrom; Matthew Boitano; Jane M Landolin; John A Stamatoyannopoulos; Michael W Hunkapiller; Jonas Korlach; Evan E Eichler
Journal:  Nature       Date:  2014-11-10       Impact factor: 49.962

10.  HALC: High throughput algorithm for long read error correction.

Authors:  Ergude Bao; Lingxiao Lan
Journal:  BMC Bioinformatics       Date:  2017-04-05       Impact factor: 3.169

View more
  8 in total

Review 1.  Nanopore sequencing and its application to the study of microbial communities.

Authors:  Laura Ciuffreda; Héctor Rodríguez-Pérez; Carlos Flores
Journal:  Comput Struct Biotechnol J       Date:  2021-03-07       Impact factor: 7.271

2.  Functional meta-omics provide critical insights into long- and short-read assemblies.

Authors:  Valentina Galata; Susheel Bhanu Busi; Benoît Josef Kunath; Laura de Nies; Magdalena Calusinska; Rashi Halder; Patrick May; Paul Wilmes; Cédric Christian Laczny
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

3.  A high-quality genome and comparison of short- versus long-read transcriptome of the palaearctic duck Aythya fuligula (tufted duck).

Authors:  Ralf C Mueller; Patrik Ellström; Kerstin Howe; Marcela Uliano-Silva; Richard I Kuo; Katarzyna Miedzinska; Amanda Warr; Olivier Fedrigo; Bettina Haase; Jacquelyn Mountcastle; William Chow; James Torrance; Jonathan M D Wood; Josef D Järhult; Mahmoud M Naguib; Björn Olsen; Erich D Jarvis; Jacqueline Smith; Lél Eöry; Robert H S Kraus
Journal:  Gigascience       Date:  2021-12-20       Impact factor: 6.524

4.  Multi-Omics Analysis of Gene and Protein Candidates Possibly Related to Tetrodotoxin Accumulation in the Skin of Takifugu flavidus.

Authors:  Huimin Feng; Kun Qiao; Chunchun Wang; Bei Chen; Min Xu; Hua Hao; Zhen Huang; Zhiyu Liu; Qin Wang
Journal:  Mar Drugs       Date:  2021-11-15       Impact factor: 5.118

5.  Full-Length Transcriptome Reconstruction Reveals the Genetic Mechanisms of Eyestalk Displacement and Its Potential Implications on the Interspecific Hybrid Crab (Scylla serrata ♀ × S. paramamosain ♂).

Authors:  Shaopan Ye; Xiaoyan Yu; Huiying Chen; Yin Zhang; Qingyang Wu; Huaqiang Tan; Jun Song; Hafiz Sohaib Ahmed Saqib; Ardavan Farhadi; Mhd Ikhwanuddin; Hongyu Ma
Journal:  Biology (Basel)       Date:  2022-07-07

6.  Comparison of ONT and CCS sequencing technologies on the polyploid genome of a medicinal plant showed that high error rate of ONT reads are not suitable for self-correction.

Authors:  Peng Zeng; Zunzhe Tian; Yuwei Han; Hao Hu; Jing Cai; Weixiong Zhang; Tinggan Zhou; Yingmei Peng
Journal:  Chin Med       Date:  2022-08-09       Impact factor: 4.546

7.  Chromosomal-level reference genome assembly of the North American wolverine (Gulo gulo luscus): a resource for conservation genomics.

Authors:  Si Lok; Timothy N H Lau; Brett Trost; Amy H Y Tong; Richard F Wintle; Mark D Engstrom; Elise Stacy; Lisette P Waits; Matthew Scrafford; Stephen W Scherer
Journal:  G3 (Bethesda)       Date:  2022-07-29       Impact factor: 3.542

Review 8.  Variant calling: Considerations, practices, and developments.

Authors:  Stepanka Zverinova; Victor Guryev
Journal:  Hum Mutat       Date:  2021-12-16       Impact factor: 4.700

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.