Literature DB >> 24919879

Blue: correcting sequencing errors using consensus and context.

Paul Greenfield1, Konsta Duesing2, Alexie Papanicolaou2, Denis C Bauer2.   

Abstract

MOTIVATION: Bioinformatics tools, such as assemblers and aligners, are expected to produce more accurate results when given better quality sequence data as their starting point. This expectation has led to the development of stand-alone tools whose sole purpose is to detect and remove sequencing errors. A good error-correcting tool would be a transparent component in a bioinformatics pipeline, simply taking sequence data in any of the standard formats and producing a higher quality version of the same data containing far fewer errors. It should not only be able to correct all of the types of errors found in real sequence data (substitutions, insertions, deletions and uncalled bases), but it has to be both fast enough and scalable enough to be usable on the large datasets being produced by current sequencing technologies, and work on data derived from both haploid and diploid organisms.
RESULTS: This article presents Blue, an error-correction algorithm based on k-mer consensus and context. Blue can correct substitution, deletion and insertion errors, as well as uncalled bases. It accepts both FASTQ and FASTA formats, and corrects quality scores for corrected bases. Blue also maintains the pairing of reads, both within a file and between pairs of files, making it compatible with downstream tools that depend on read pairing. Blue is memory efficient, scalable and faster than other published tools, and usable on large sequencing datasets. On the tests undertaken, Blue also proved to be generally more accurate than other published algorithms, resulting in more accurately aligned reads and the assembly of longer contigs containing fewer errors. One significant feature of Blue is that its k-mer consensus table does not have to be derived from the set of reads being corrected. This decoupling makes it possible to correct one dataset, such as small set of 454 mate-pair reads, with the consensus derived from another dataset, such as Illumina reads derived from the same DNA sample. Such cross-correction can greatly improve the quality of small (and expensive) sets of long reads, leading to even better assemblies and higher quality finished genomes.
AVAILABILITY AND IMPLEMENTATION: The code for Blue and its related tools are available from http://www.bioinformatics.csiro.au/Blue. These programs are written in C# and run natively under Windows and under Mono on Linux.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Substances:

Year:  2014        PMID: 24919879     DOI: 10.1093/bioinformatics/btu368

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  31 in total

1.  Investigating the NAD-ME biochemical pathway within C4 grasses using transcript and amino acid variation in C4 photosynthetic genes.

Authors:  Alexander Watson-Lazowski; Alexie Papanicolaou; Robert Sharwood; Oula Ghannoum
Journal:  Photosynth Res       Date:  2018-08-04       Impact factor: 3.573

2.  Comparison of error correction algorithms for Ion Torrent PGM data: application to hepatitis B virus.

Authors:  Liting Song; Wenxun Huang; Juan Kang; Yuan Huang; Hong Ren; Keyue Ding
Journal:  Sci Rep       Date:  2017-08-14       Impact factor: 4.379

3.  Insights from the Genomes of Microbes Thriving in Uranium-Enriched Sediments.

Authors:  Brodie Sutcliffe; Anthony A Chariton; Andrew J Harford; Grant C Hose; Sarah Stephenson; Paul Greenfield; David J Midgley; Ian T Paulsen
Journal:  Microb Ecol       Date:  2017-11-11       Impact factor: 4.552

4.  Genomic insights into the carbohydrate catabolism of Cairneyella variabilis gen. nov. sp. nov., the first reports from a genome of an ericoid mycorrhizal fungus from the southern hemisphere.

Authors:  David J Midgley; Carly P Rosewarne; Paul Greenfield; Dongmei Li; Cassandra J Vockler; Catherine J Hitchcock; Nicole A Sawyer; Robyn Brett; Jacqueline Edwards; John I Pitt; Nai Tran-Dinh
Journal:  Mycorrhiza       Date:  2016-02-09       Impact factor: 3.387

Review 5.  Genome sequence assembly algorithms and misassembly identification methods.

Authors:  Yue Meng; Yu Lei; Jianlong Gao; Yuxuan Liu; Enze Ma; Yunhong Ding; Yixin Bian; Hongquan Zu; Yucui Dong; Xiao Zhu
Journal:  Mol Biol Rep       Date:  2022-09-23       Impact factor: 2.742

6.  Complex modular architecture around a simple toolkit of wing pattern genes.

Authors:  Steven M Van Belleghem; Pasi Rastas; Alexie Papanicolaou; Simon H Martin; Carlos F Arias; Megan A Supple; Joseph J Hanly; James Mallet; James J Lewis; Heather M Hines; Mayte Ruiz; Camilo Salazar; Mauricio Linares; Gilson R P Moreira; Chris D Jiggins; Brian A Counterman; W Owen McMillan; Riccardo Papa
Journal:  Nat Ecol Evol       Date:  2017-01-30       Impact factor: 15.460

7.  Complete Genome Sequence of Sporisorium scitamineum and Biotrophic Interaction Transcriptome with Sugarcane.

Authors:  Lucas M Taniguti; Patricia D C Schaker; Juliana Benevenuto; Leila P Peters; Giselle Carvalho; Alessandra Palhares; Maria C Quecine; Filipe R S Nunes; Maria C P Kmit; Alvan Wai; Georg Hausner; Karen S Aitken; Paul J Berkman; James A Fraser; Paula M Moolhuijzen; Luiz L Coutinho; Silvana Creste; Maria L C Vieira; João P Kitajima; Claudia B Monteiro-Vitorello
Journal:  PLoS One       Date:  2015-06-12       Impact factor: 3.240

8.  Genome Sequence of Fungal Species No.11243, Which Produces the Antifungal Antibiotic FR901469.

Authors:  Makoto Matsui; Tatsuya Yokoyama; Kaoru Nemoto; Toshitaka Kumagai; Goro Terai; Masanori Arita; Masayuki Machida; Takashi Shibata
Journal:  Genome Announc       Date:  2015-04-02

9.  Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing.

Authors:  Melissa C Keinath; Vladimir A Timoshevskiy; Nataliya Y Timoshevskaya; Panagiotis A Tsonis; S Randal Voss; Jeramiah J Smith
Journal:  Sci Rep       Date:  2015-11-10       Impact factor: 4.379

10.  A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF.

Authors:  Yingnan Cong; Yao-Ban Chan; Mark A Ragan
Journal:  Sci Rep       Date:  2016-07-25       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.