Literature DB >> 21482625

ECHO: a reference-free short-read error correction algorithm.

Wei-Chun Kao1, Andrew H Chan, Yun S Song.   

Abstract

Developing accurate, scalable algorithms to improve data quality is an important computational challenge associated with recent advances in high-throughput sequencing technology. In this study, a novel error-correction algorithm, called ECHO, is introduced for correcting base-call errors in short-reads, without the need of a reference genome. Unlike most previous methods, ECHO does not require the user to specify parameters of which optimal values are typically unknown a priori. ECHO automatically sets the parameters in the assumed model and estimates error characteristics specific to each sequencing run, while maintaining a running time that is within the range of practical use. ECHO is based on a probabilistic model and is able to assign a quality score to each corrected base. Furthermore, it explicitly models heterozygosity in diploid genomes and provides a reference-free method for detecting bases that originated from heterozygous sites. On both real and simulated data, ECHO is able to improve the accuracy of previous error-correction methods by several folds to an order of magnitude, depending on the sequence coverage depth and the position in the read. The improvement is most pronounced toward the end of the read, where previous methods become noticeably less effective. Using a whole-genome yeast data set, it is demonstrated here that ECHO is capable of coping with nonuniform coverage. Also, it is shown that using ECHO to perform error correction as a preprocessing step considerably facilitates de novo assembly, particularly in the case of low-to-moderate sequence coverage depth.

Entities:  

Mesh:

Year:  2011        PMID: 21482625      PMCID: PMC3129260          DOI: 10.1101/gr.111351.110

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  32 in total

1.  An Eulerian path approach to DNA fragment assembly.

Authors:  P A Pevzner; H Tang; M S Waterman
Journal:  Proc Natl Acad Sci U S A       Date:  2001-08-14       Impact factor: 11.205

2.  Automated correction of genome sequence errors.

Authors:  Pawel Gajer; Michael Schatz; Steven L Salzberg
Journal:  Nucleic Acids Res       Date:  2004-01-26       Impact factor: 16.971

3.  Fragment assembly with short reads.

Authors:  Mark Chaisson; Pavel Pevzner; Haixu Tang
Journal:  Bioinformatics       Date:  2004-04-01       Impact factor: 6.937

4.  Reptile: representative tiling for short read error correction.

Authors:  Xiao Yang; Karin S Dorman; Srinivas Aluru
Journal:  Bioinformatics       Date:  2010-08-16       Impact factor: 6.937

5.  The sequence and analysis of duplication-rich human chromosome 16.

Authors:  Joel Martin; Cliff Han; Laurie A Gordon; Astrid Terry; Shyam Prabhakar; Xinwei She; Gary Xie; Uffe Hellsten; Yee Man Chan; Michael Altherr; Olivier Couronne; Andrea Aerts; Eva Bajorek; Stacey Black; Heather Blumer; Elbert Branscomb; Nancy C Brown; William J Bruno; Judith M Buckingham; David F Callen; Connie S Campbell; Mary L Campbell; Evelyn W Campbell; Chenier Caoile; Jean F Challacombe; Leslie A Chasteen; Olga Chertkov; Han C Chi; Mari Christensen; Lynn M Clark; Judith D Cohn; Mirian Denys; John C Detter; Mark Dickson; Mira Dimitrijevic-Bussod; Julio Escobar; Joseph J Fawcett; Dave Flowers; Dea Fotopulos; Tijana Glavina; Maria Gomez; Eidelyn Gonzales; David Goodstein; Lynne A Goodwin; Deborah L Grady; Igor Grigoriev; Matthew Groza; Nancy Hammon; Trevor Hawkins; Lauren Haydu; Carl E Hildebrand; Wayne Huang; Sanjay Israni; Jamie Jett; Phillip B Jewett; Kristen Kadner; Heather Kimball; Arthur Kobayashi; Marie-Claude Krawczyk; Tina Leyba; Jonathan L Longmire; Frederick Lopez; Yunian Lou; Steve Lowry; Thom Ludeman; Chitra F Manohar; Graham A Mark; Kimberly L McMurray; Linda J Meincke; Jenna Morgan; Robert K Moyzis; Mark O Mundt; A Christine Munk; Richard D Nandkeshwar; Sam Pitluck; Martin Pollard; Paul Predki; Beverly Parson-Quintana; Lucia Ramirez; Sam Rash; James Retterer; Darryl O Ricke; Donna L Robinson; Alex Rodriguez; Asaf Salamov; Elizabeth H Saunders; Duncan Scott; Timothy Shough; Raymond L Stallings; Malinda Stalvey; Robert D Sutherland; Roxanne Tapia; Judith G Tesmer; Nina Thayer; Linda S Thompson; Hope Tice; David C Torney; Mary Tran-Gyamfi; Ming Tsai; Levy E Ulanovsky; Anna Ustaszewska; Nu Vo; P Scott White; Albert L Williams; Patricia L Wills; Jung-Rung Wu; Kevin Wu; Joan Yang; Pieter Dejong; David Bruce; Norman A Doggett; Larry Deaven; Jeremy Schmutz; Jane Grimwood; Paul Richardson; Daniel S Rokhsar; Evan E Eichler; Paul Gilna; Susan M Lucas; Richard M Myers; Edward M Rubin; Len A Pennacchio
Journal:  Nature       Date:  2004-12-23       Impact factor: 49.962

6.  Quality scores and SNP detection in sequencing-by-synthesis systems.

Authors:  William Brockman; Pablo Alvarez; Sarah Young; Manuel Garber; Georgia Giannoukos; William L Lee; Carsten Russ; Eric S Lander; Chad Nusbaum; David B Jaffe
Journal:  Genome Res       Date:  2008-01-22       Impact factor: 9.043

7.  Base-calling of automated sequencer traces using phred. II. Error probabilities.

Authors:  B Ewing; P Green
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

8.  ALLPATHS: de novo assembly of whole-genome shotgun microreads.

Authors:  Jonathan Butler; Iain MacCallum; Michael Kleber; Ilya A Shlyakhter; Matthew K Belmonte; Eric S Lander; Chad Nusbaum; David B Jaffe
Journal:  Genome Res       Date:  2008-03-13       Impact factor: 9.043

9.  DNA sequencing with chain-terminating inhibitors.

Authors:  F Sanger; S Nicklen; A R Coulson
Journal:  Proc Natl Acad Sci U S A       Date:  1977-12       Impact factor: 11.205

10.  Whole-genome sequencing and assembly with high-throughput, short-read technologies.

Authors:  Andreas Sundquist; Mostafa Ronaghi; Haixu Tang; Pavel Pevzner; Serafim Batzoglou
Journal:  PLoS One       Date:  2007-05-30       Impact factor: 3.240

View more
  38 in total

1.  ChIP-Seq: technical considerations for obtaining high-quality data.

Authors:  Benjamin L Kidder; Gangqing Hu; Keji Zhao
Journal:  Nat Immunol       Date:  2011-09-20       Impact factor: 25.606

2.  Deep sequencing of 2009 influenza A/H1N1 virus isolated from volunteer human challenge study participants and natural infections.

Authors:  Yongli Xiao; Jae-Keun Park; Stephanie Williams; Mitchell Ramuta; Adriana Cervantes-Medina; Tyler Bristol; Sarah Smith; Lindsay Czajkowski; Alison Han; John C Kash; Matthew J Memoli; Jeffery K Taubenberger
Journal:  Virology       Date:  2019-06-13       Impact factor: 3.616

Review 3.  From next-generation resequencing reads to a high-quality variant data set.

Authors:  S P Pfeifer
Journal:  Heredity (Edinb)       Date:  2016-10-19       Impact factor: 3.821

4.  High-throughput RNA sequencing of a formalin-fixed, paraffin-embedded autopsy lung tissue sample from the 1918 influenza pandemic.

Authors:  Yong-Li Xiao; John C Kash; Stephen B Beres; Zong-Mei Sheng; James M Musser; Jeffery K Taubenberger
Journal:  J Pathol       Date:  2013-03       Impact factor: 7.996

5.  Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach.

Authors:  Xuan Zhang; Pengyao Ping; Gyorgy Hutvagner; Michael Blumenstein; Jinyan Li
Journal:  Nucleic Acids Res       Date:  2021-10-11       Impact factor: 16.971

6.  An integrated pipeline for de novo assembly of microbial genomes.

Authors:  Andrew Tritt; Jonathan A Eisen; Marc T Facciotti; Aaron E Darling
Journal:  PLoS One       Date:  2012-09-13       Impact factor: 3.240

7.  Ultrafast clustering algorithms for metagenomic sequence analysis.

Authors:  Weizhong Li; Limin Fu; Beifang Niu; Sitao Wu; John Wooley
Journal:  Brief Bioinform       Date:  2012-07-06       Impact factor: 11.622

8.  shortran: a pipeline for small RNA-seq data analysis.

Authors:  Vikas Gupta; Katharina Markmann; Christian N S Pedersen; Jens Stougaard; Stig U Andersen
Journal:  Bioinformatics       Date:  2012-08-22       Impact factor: 6.937

9.  Probabilistic error correction for RNA sequencing.

Authors:  Hai-Son Le; Marcel H Schulz; Brenna M McCauley; Veronica F Hinman; Ziv Bar-Joseph
Journal:  Nucleic Acids Res       Date:  2013-04-04       Impact factor: 16.971

10.  Estimation of sequencing error rates in short reads.

Authors:  Xin Victoria Wang; Natalie Blades; Jie Ding; Razvan Sultana; Giovanni Parmigiani
Journal:  BMC Bioinformatics       Date:  2012-07-30       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.