Literature DB >> 20180274

Recount: expectation maximization based error correction tool for next generation sequencing data.

Edward Wijaya1, Martin C Frith, Yutaka Suzuki, Paul Horton.   

Abstract

Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods developed for SAGE data can reduce these false counts to a considerable degree, until now they have not been implemented in a scalable way. Recently, a program named FREC has been developed to address this problem for next generation sequencing data. In this paper, we introduce RECOUNT, our implementation of an Expectation Maximization algorithm for tag count correction and compare it to FREC. Using both the reference genome and simulated data, we find that RECOUNT performs as well or better than FREC, while using much less memory (e.g. 5GB vs. 75GB). Furthermore, we report the first analysis of tag count correction with real data in the context of gene expression analysis. Our results show that tag count correction not only increases the number of mappable tags, but can make a real difference in the biological interpretation of next generation sequencing data. RECOUNT is an open-source C++ program available at http://seq.cbrc.jp/recount.

Entities:  

Mesh:

Year:  2009        PMID: 20180274

Source DB:  PubMed          Journal:  Genome Inform        ISSN: 0919-9454


  19 in total

1.  ECHO: a reference-free short-read error correction algorithm.

Authors:  Wei-Chun Kao; Andrew H Chan; Yun S Song
Journal:  Genome Res       Date:  2011-04-11       Impact factor: 9.043

2.  Whole-genome sequence variation, population structure and demographic history of the Dutch population.

Authors: 
Journal:  Nat Genet       Date:  2014-06-29       Impact factor: 38.330

3.  Incorporating sequence quality data into alignment improves DNA read mapping.

Authors:  Martin C Frith; Raymond Wan; Paul Horton
Journal:  Nucleic Acids Res       Date:  2010-01-27       Impact factor: 16.971

4.  Repeat-aware modeling and correction of short read errors.

Authors:  Xiao Yang; Srinivas Aluru; Karin S Dorman
Journal:  BMC Bioinformatics       Date:  2011-02-15       Impact factor: 3.169

5.  Quake: quality-aware detection and correction of sequencing errors.

Authors:  David R Kelley; Michael C Schatz; Steven L Salzberg
Journal:  Genome Biol       Date:  2010-11-29       Impact factor: 13.583

6.  RecountDB: a database of mapped and count corrected transcribed sequences.

Authors:  Edward Wijaya; Martin C Frith; Kiyoshi Asai; Paul Horton
Journal:  Nucleic Acids Res       Date:  2011-12-01       Impact factor: 16.971

7.  Error correction of high-throughput sequencing datasets with non-uniform coverage.

Authors:  Paul Medvedev; Eric Scott; Boyko Kakaradov; Pavel Pevzner
Journal:  Bioinformatics       Date:  2011-07-01       Impact factor: 6.937

8.  Complete Genome Sequence of the Carbazole Degrader Pseudomonas resinovorans Strain CA10 (NBRC 106553).

Authors:  Masaki Shintani; Akira Hosoyama; Shoko Ohji; Keiko Tsuchikane; Hiromi Takarada; Atsushi Yamazoe; Nobuyuki Fujita; Hideaki Nojiri
Journal:  Genome Announc       Date:  2013-07-25

9.  Probabilistic error correction for RNA sequencing.

Authors:  Hai-Son Le; Marcel H Schulz; Brenna M McCauley; Veronica F Hinman; Ziv Bar-Joseph
Journal:  Nucleic Acids Res       Date:  2013-04-04       Impact factor: 16.971

10.  Digital transcriptome profiling of normal and glioblastoma-derived neural stem cells identifies genes associated with patient survival.

Authors:  Pär G Engström; Diva Tommei; Stefan H Stricker; Christine Ender; Steven M Pollard; Paul Bertone
Journal:  Genome Med       Date:  2012-10-09       Impact factor: 11.117

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.