Literature DB >> 30895306

FLAS: fast and high-throughput algorithm for PacBio long-read self-correction.

Ergude Bao1,2, Fei Xie3, Changjin Song1, Dandan Song3.   

Abstract

MOTIVATION: The third generation PacBio long reads have greatly facilitated sequencing projects with very large read lengths, but they contain about 15% sequencing errors and need error correction. For the projects with long reads only, it is challenging to make correction with fast speed, and also challenging to correct a sufficient amount of read bases, i.e. to achieve high-throughput self-correction. MECAT is currently among the fastest self-correction algorithms, but its throughput is relatively small (Xiao et al., 2017).
RESULTS: Here, we introduce FLAS, a wrapper algorithm of MECAT, to achieve high-throughput long-read self-correction while keeping MECAT's fast speed. FLAS finds additional alignments from MECAT prealigned long reads to improve the correction throughput, and removes misalignments for accuracy. In addition, FLAS also uses the corrected long-read regions to correct the uncorrected ones to further improve the throughput. In our performance tests on Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana and human long reads, FLAS can achieve 22.0-50.6% larger throughput than MECAT. FLAS is 2-13× faster compared to the self-correction algorithms other than MECAT, and its throughput is also 9.8-281.8% larger. The FLAS corrected long reads can be assembled into contigs of 13.1-29.8% larger N50 sizes than MECAT.
AVAILABILITY AND IMPLEMENTATION: The FLAS software can be downloaded for free from this site: https://github.com/baoe/flas. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2019        PMID: 30895306     DOI: 10.1093/bioinformatics/btz206

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  5 in total

Review 1.  Nanopore sequencing technology, bioinformatics and applications.

Authors:  Yunhao Wang; Yue Zhao; Audrey Bollas; Yuru Wang; Kin Fai Au
Journal:  Nat Biotechnol       Date:  2021-11-08       Impact factor: 54.908

Review 2.  Genome sequence assembly algorithms and misassembly identification methods.

Authors:  Yue Meng; Yu Lei; Jianlong Gao; Yuxuan Liu; Enze Ma; Yunhong Ding; Yixin Bian; Hongquan Zu; Yucui Dong; Xiao Zhu
Journal:  Mol Biol Rep       Date:  2022-09-23       Impact factor: 2.742

3.  Scalable long read self-correction and assembly polishing with multiple sequence alignment.

Authors:  Pierre Morisse; Camille Marchet; Antoine Limasset; Thierry Lecroq; Arnaud Lefebvre
Journal:  Sci Rep       Date:  2021-01-12       Impact factor: 4.379

Review 4.  A comprehensive evaluation of long read error correction methods.

Authors:  Haowen Zhang; Chirag Jain; Srinivas Aluru
Journal:  BMC Genomics       Date:  2020-12-21       Impact factor: 3.969

5.  ARAMIS: From systematic errors of NGS long reads to accurate assemblies.

Authors:  E Sacristán-Horcajada; S González-de la Fuente; R Peiró-Pastor; F Carrasco-Ramiro; R Amils; J M Requena; J Berenguer; B Aguado
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.