Literature DB >> 12888528

Correcting errors in shotgun sequences.

Martti T Tammi1, Erik Arner, Ellen Kindlund, Björn Andersson.   

Abstract

Sequencing errors in combination with repeated regions cause major problems in shotgun sequencing, mainly due to the failure of assembly programs to distinguish single base differences between repeat copies from erroneous base calls. In this paper, a new strategy designed to correct errors in shotgun sequence data using defined nucleotide positions, DNPs, is presented. The method distinguishes single base differences from sequencing errors by analyzing multiple alignments consisting of a read and all its overlaps with other reads. The construction of multiple alignments is performed using a novel pattern matching algorithm, which takes advantage of the symmetry between indices that can be computed for similar words of the same length. This allows for rapid construction of multiple alignments, with no previous pair-wise matching of sequence reads required. Results from a C++ implementation of this method show that up to 99% of sequencing errors can be corrected, while up to 87% of the single base differences remain and up to 80% of the corrected reads contain at most one error. The results also show that the method outperforms the error correction method used in the EULER assembler. The prototype software, MisEd, is freely available from the authors for academic use.

Entities:  

Mesh:

Year:  2003        PMID: 12888528      PMCID: PMC169956          DOI: 10.1093/nar/gkg653;

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  7 in total

1.  An Eulerian path approach to DNA fragment assembly.

Authors:  P A Pevzner; H Tang; M S Waterman
Journal:  Proc Natl Acad Sci U S A       Date:  2001-08-14       Impact factor: 11.205

2.  Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions, DNPs.

Authors:  Martti T Tammi; Erik Arner; Tom Britton; Björn Andersson
Journal:  Bioinformatics       Date:  2002-03       Impact factor: 6.937

3.  TRAP: Tandem Repeat Assembly Program produces improved shotgun assemblies of repetitive sequences.

Authors:  Martti T Tammi; Erik Arner; Björn Andersson
Journal:  Comput Methods Programs Biomed       Date:  2003-01       Impact factor: 5.428

Review 4.  Repetitive conundrums of centromere structure and function.

Authors:  E E Eichler
Journal:  Hum Mol Genet       Date:  1999-02       Impact factor: 6.150

5.  Base-calling of automated sequencer traces using phred. II. Error probabilities.

Authors:  B Ewing; P Green
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

6.  ReAligner: a program for refining DNA sequence multi-alignments.

Authors:  E L Anson; E W Myers
Journal:  J Comput Biol       Date:  1997       Impact factor: 1.479

7.  ARACHNE: a whole-genome shotgun assembler.

Authors:  Serafim Batzoglou; David B Jaffe; Ken Stanley; Jonathan Butler; Sante Gnerre; Evan Mauceli; Bonnie Berger; Jill P Mesirov; Eric S Lander
Journal:  Genome Res       Date:  2002-01       Impact factor: 9.043

  7 in total
  15 in total

1.  Automated correction of genome sequence errors.

Authors:  Pawel Gajer; Michael Schatz; Steven L Salzberg
Journal:  Nucleic Acids Res       Date:  2004-01-26       Impact factor: 16.971

2.  Cloning of a human parvovirus by molecular screening of respiratory tract samples.

Authors:  Tobias Allander; Martti T Tammi; Margareta Eriksson; Annelie Bjerkner; Annika Tiveljung-Lindell; Björn Andersson
Journal:  Proc Natl Acad Sci U S A       Date:  2005-08-23       Impact factor: 11.205

3.  Short read fragment assembly of bacterial genomes.

Authors:  Mark J Chaisson; Pavel A Pevzner
Journal:  Genome Res       Date:  2007-12-14       Impact factor: 9.043

4.  De novo fragment assembly with short mate-paired reads: Does the read length matter?

Authors:  Mark J Chaisson; Dumitru Brinza; Pavel A Pevzner
Journal:  Genome Res       Date:  2008-12-03       Impact factor: 9.043

5.  Quake: quality-aware detection and correction of sequencing errors.

Authors:  David R Kelley; Michael C Schatz; Steven L Salzberg
Journal:  Genome Biol       Date:  2010-11-29       Impact factor: 13.583

6.  What makes us human: revisiting an age-old question in the genomic era.

Authors:  Nitzan Mekel-Bobrov; Bruce T Lahn
Journal:  J Biomed Discov Collab       Date:  2006-11-29

7.  Computational biology methods and their application to the comparative genomics of endocellular symbiotic bacteria of insects.

Authors:  Jennifer Commins; Christina Toft; Mario A Fares
Journal:  Biol Proced Online       Date:  2009-03-11       Impact factor: 3.244

8.  CNV-seq, a new method to detect copy number variation using high-throughput sequencing.

Authors:  Chao Xie; Martti T Tammi
Journal:  BMC Bioinformatics       Date:  2009-03-06       Impact factor: 3.169

9.  Large-scale inference of the point mutational spectrum in human segmental duplications.

Authors:  Sigve Nakken; Einar A Rødland; Torbjørn Rognes; Eivind Hovig
Journal:  BMC Genomics       Date:  2009-01-22       Impact factor: 3.969

10.  Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data.

Authors:  Nicolas J Parker; Andrew G Parker
Journal:  Source Code Biol Med       Date:  2008-04-18
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.