Literature DB >> 19661376

BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing.

Wei-Chun Kao1, Kristian Stevens, Yun S Song.   

Abstract

Extracting sequence information from raw images of fluorescence is the foundation underlying several high-throughput sequencing platforms. Some of the main challenges associated with this technology include reducing the error rate, assigning accurate base-specific quality scores, and reducing the cost of sequencing by increasing the throughput per run. To demonstrate how computational advancement can help to meet these challenges, a novel model-based base-calling algorithm, BayesCall, is introduced for the Illumina sequencing platform. Being founded on the tools of statistical learning, BayesCall is flexible enough to incorporate various features of the sequencing process. In particular, it can easily incorporate time-dependent parameters and model residual effects. This new approach significantly improves the accuracy over Illumina's base-caller Bustard, particularly in the later cycles of a sequencing run. For 76-cycle data on a standard viral sample, phiX174, BayesCall improves Bustard's average per-base error rate by approximately 51%. The probability of observing each base can be readily computed in BayesCall, and this probability can be transformed into a useful base-specific quality score with a high discrimination ability. A detailed study of BayesCall's performance is presented here.

Mesh:

Year:  2009        PMID: 19661376      PMCID: PMC2765266          DOI: 10.1101/gr.095299.109

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  9 in total

1.  An estimate of the crosstalk matrix in four-dye fluorescence-based DNA sequencing.

Authors:  L Li; T P Speed
Journal:  Electrophoresis       Date:  1999-06       Impact factor: 3.535

Review 2.  Emerging technologies in DNA sequencing.

Authors:  Michael L Metzker
Journal:  Genome Res       Date:  2005-12       Impact factor: 9.043

Review 3.  Whole-genome re-sequencing.

Authors:  David R Bentley
Journal:  Curr Opin Genet Dev       Date:  2006-10-18       Impact factor: 5.578

4.  Quality scores and SNP detection in sequencing-by-synthesis systems.

Authors:  William Brockman; Pablo Alvarez; Sarah Young; Manuel Garber; Georgia Giannoukos; William L Lee; Carsten Russ; Eric S Lander; Chad Nusbaum; David B Jaffe
Journal:  Genome Res       Date:  2008-01-22       Impact factor: 9.043

5.  Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Authors:  Heng Li; Jue Ruan; Richard Durbin
Journal:  Genome Res       Date:  2008-08-19       Impact factor: 9.043

6.  Automatic matrix determination in four dye fluorescence-based DNA sequencing.

Authors:  Z Yin; J Severin; M C Giddings; W A Huang; M S Westphall; L M Smith
Journal:  Electrophoresis       Date:  1996-06       Impact factor: 3.535

7.  Base-calling of automated sequencer traces using phred. II. Error probabilities.

Authors:  B Ewing; P Green
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

8.  Alta-Cyclic: a self-optimizing base caller for next-generation sequencing.

Authors:  Yaniv Erlich; Partha P Mitra; Melissa delaBastide; W Richard McCombie; Gregory J Hannon
Journal:  Nat Methods       Date:  2008-07-06       Impact factor: 28.547

9.  Probabilistic base calling of Solexa sequencing data.

Authors:  Jacques Rougemont; Arnaud Amzallag; Christian Iseli; Laurent Farinelli; Ioannis Xenarios; Felix Naef
Journal:  BMC Bioinformatics       Date:  2008-10-13       Impact factor: 3.169

  9 in total
  40 in total

1.  BM-map: Bayesian mapping of multireads for next-generation sequencing data.

Authors:  Yuan Ji; Yanxun Xu; Qiong Zhang; Kam-Wah Tsui; Yuan Yuan; Clift Norris; Shoudan Liang; Han Liang
Journal:  Biometrics       Date:  2011-04-22       Impact factor: 2.571

2.  SNP calling using genotype model selection on high-throughput sequencing data.

Authors:  Na You; Gabriel Murillo; Xiaoquan Su; Xiaowei Zeng; Jian Xu; Kang Ning; Shoudong Zhang; Jiankang Zhu; Xinping Cui
Journal:  Bioinformatics       Date:  2012-01-16       Impact factor: 6.937

3.  Intensity normalization improves color calling in SOLiD sequencing.

Authors:  Hao Wu; Rafael A Irizarry; Héctor Corrada Bravo
Journal:  Nat Methods       Date:  2010-05       Impact factor: 28.547

4.  OnlineCall: fast online parameter estimation and base calling for illumina's next-generation sequencing.

Authors:  Shreepriya Das; Haris Vikalo
Journal:  Bioinformatics       Date:  2012-05-07       Impact factor: 6.937

5.  ECHO: a reference-free short-read error correction algorithm.

Authors:  Wei-Chun Kao; Andrew H Chan; Yun S Song
Journal:  Genome Res       Date:  2011-04-11       Impact factor: 9.043

Review 6.  Next-generation sequencing in the clinic: promises and challenges.

Authors:  Jiekun Xuan; Ying Yu; Tao Qing; Lei Guo; Leming Shi
Journal:  Cancer Lett       Date:  2012-11-19       Impact factor: 8.679

7.  Single Nucleotide Polymorphism (SNP) Detection and Genotype Calling from Massively Parallel Sequencing (MPS) Data.

Authors:  Yun Li; Wei Chen; Eric Yi Liu; Yi-Hui Zhou
Journal:  Stat Biosci       Date:  2013-05

8.  Statistical Analyses of Next Generation Sequence Data: A Partial Overview.

Authors:  Susmita Datta; Somnath Datta; Seongho Kim; Sutirtha Chakraborty; Ryan S Gill
Journal:  J Proteomics Bioinform       Date:  2010-06-01

9.  Identification of rare alleles and their carriers using compressed se(que)nsing.

Authors:  Noam Shental; Amnon Amir; Or Zuk
Journal:  Nucleic Acids Res       Date:  2010-08-10       Impact factor: 16.971

10.  Development of a low bias method for characterizing viral populations using next generation sequencing technology.

Authors:  Stephanie M Willerth; Hélder A M Pedro; Lior Pachter; Laurent M Humeau; Adam P Arkin; David V Schaffer
Journal:  PLoS One       Date:  2010-10-22       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.