Literature DB >> 11337481

Basecalling with LifeTrace.

D Walther1, G Bartha, M Morris.   

Abstract

A pivotal step in electrophoresis sequencing is the conversion of the raw, continuous chromatogram data into the actual sequence of discrete nucleotides, a process referred to as basecalling. We describe a novel algorithm for basecalling implemented in the program LifeTrace. Like Phred, currently the most widely used basecalling software program, LifeTrace takes processed trace data as input. It was designed to be tolerant to variable peak spacing by means of an improved peak-detection algorithm that emphasizes local chromatogram information over global properties. LifeTrace is shown to generate high-quality basecalls and reliable quality scores. It proved particularly effective when applied to MegaBACE capillary sequencing machines. In a benchmark test of 8372 dye-primer MegaBACE chromatograms, LifeTrace generated 17% fewer substitution errors, 16% fewer insertion/deletion errors, and 2.4% more aligned bases to the finished sequence than did Phred. For two sets totaling 6624 dye-terminator chromatograms, the performance improvement was 15% fewer substitution errors, 10% fewer insertion/deletion errors, and 2.1% more aligned bases. The processing time required by LifeTrace is comparable to that of Phred. The predicted quality scores were in line with observed quality scores, permitting direct use for quality clipping and in silico single nucleotide polymorphism (SNP) detection. Furthermore, we introduce a new type of quality score associated with every basecall: the gap-quality. It estimates the probability of a deletion error between the current and the following basecall. This additional quality score improves detection of single basepair deletions when used for locating potential basecalling errors during the alignment. We also describe a new protocol for benchmarking that we believe better discerns basecaller performance differences than methods previously published.

Mesh:

Year:  2001        PMID: 11337481      PMCID: PMC311100          DOI: 10.1101/gr.177901

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  12 in total

1.  Reliable identification of large numbers of candidate SNPs from public EST data.

Authors:  K H Buetow; M N Edmonson; A B Cassidy
Journal:  Nat Genet       Date:  1999-03       Impact factor: 38.330

2.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

3.  A software system for data analysis in automated DNA sequencing.

Authors:  M C Giddings; J Severin; M Westphall; J Wu; L M Smith
Journal:  Genome Res       Date:  1998-06       Impact factor: 9.043

4.  Base-calling of automated sequencer traces using phred. II. Error probabilities.

Authors:  B Ewing; P Green
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

5.  Estimation of errors in "raw" DNA sequences: a validation study.

Authors:  P Richterich
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

6.  A graph theoretic approach to the analysis of DNA sequencing data.

Authors:  A J Berno
Journal:  Genome Res       Date:  1996-02       Impact factor: 9.043

7.  A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors:  S B Needleman; C D Wunsch
Journal:  J Mol Biol       Date:  1970-03       Impact factor: 5.469

8.  Pattern recognition for automated DNA sequencing: I. On-line signal conditioning and feature extraction for basecalling.

Authors:  J B Golden; D Torgersen; C Tibbetts
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1993

9.  Assignment of position-specific error probability to primary DNA sequence data.

Authors:  C B Lawrence; V V Solovyev
Journal:  Nucleic Acids Res       Date:  1994-04-11       Impact factor: 16.971

10.  An adaptive, object oriented strategy for base calling in DNA sequence analysis.

Authors:  M C Giddings; R L Brumley; M Haker; L M Smith
Journal:  Nucleic Acids Res       Date:  1993-09-25       Impact factor: 16.971

View more
  5 in total

1.  Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs.

Authors:  Bastien Chevreux; Thomas Pfisterer; Bernd Drescher; Albert J Driesel; Werner E G Müller; Thomas Wetter; Sándor Suhai
Journal:  Genome Res       Date:  2004-05-12       Impact factor: 9.043

2.  Quality scores and SNP detection in sequencing-by-synthesis systems.

Authors:  William Brockman; Pablo Alvarez; Sarah Young; Manuel Garber; Georgia Giannoukos; William L Lee; Carsten Russ; Eric S Lander; Chad Nusbaum; David B Jaffe
Journal:  Genome Res       Date:  2008-01-22       Impact factor: 9.043

3.  SeqTrace: a graphical tool for rapidly processing DNA sequencing chromatograms.

Authors:  Brian J Stucky
Journal:  J Biomol Tech       Date:  2012-09

4.  SNPs by AFLP (SBA): a rapid SNP isolation strategy for non-model organisms.

Authors:  Jean-Claude Nicod; Carlo R Largiadèr
Journal:  Nucleic Acids Res       Date:  2003-03-01       Impact factor: 16.971

5.  Computational biology methods and their application to the comparative genomics of endocellular symbiotic bacteria of insects.

Authors:  Jennifer Commins; Christina Toft; Mario A Fares
Journal:  Biol Proced Online       Date:  2009-03-11       Impact factor: 3.244

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.