Literature DB >> 19912177

Model-based quality assessment and base-calling for second-generation sequencing data.

Héctor Corrada Bravo1, Rafael A Irizarry.   

Abstract

Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads-strings of A,C,G, or T's, between 30 and 100 characters long-which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base-calling. The complexity of the base-calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across-sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec-gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base-calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base-calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base-calling performance.
© 2009, The International Biometric Society.

Entities:  

Mesh:

Year:  2010        PMID: 19912177      PMCID: PMC2888717          DOI: 10.1111/j.1541-0420.2009.01353.x

Source DB:  PubMed          Journal:  Biometrics        ISSN: 0006-341X            Impact factor:   2.571


  21 in total

1.  Accurate multiplex polony sequencing of an evolved bacterial genome.

Authors:  Jay Shendure; Gregory J Porreca; Nikos B Reppas; Xiaoxia Lin; John P McCutcheon; Abraham M Rosenbaum; Michael D Wang; Kun Zhang; Robi D Mitra; George M Church
Journal:  Science       Date:  2005-08-04       Impact factor: 47.728

2.  Genome sequencing in microfabricated high-density picolitre reactors.

Authors:  Marcel Margulies; Michael Egholm; William E Altman; Said Attiya; Joel S Bader; Lisa A Bemben; Jan Berka; Michael S Braverman; Yi-Ju Chen; Zhoutao Chen; Scott B Dewell; Lei Du; Joseph M Fierro; Xavier V Gomes; Brian C Godwin; Wen He; Scott Helgesen; Chun Heen Ho; Chun He Ho; Gerard P Irzyk; Szilveszter C Jando; Maria L I Alenquer; Thomas P Jarvie; Kshama B Jirage; Jong-Bum Kim; James R Knight; Janna R Lanza; John H Leamon; Steven M Lefkowitz; Ming Lei; Jing Li; Kenton L Lohman; Hong Lu; Vinod B Makhijani; Keith E McDade; Michael P McKenna; Eugene W Myers; Elizabeth Nickerson; John R Nobile; Ramona Plant; Bernard P Puc; Michael T Ronan; George T Roth; Gary J Sarkis; Jan Fredrik Simons; John W Simpson; Maithreyan Srinivasan; Karrie R Tartaro; Alexander Tomasz; Kari A Vogt; Greg A Volkmer; Shally H Wang; Yong Wang; Michael P Weiner; Pengguang Yu; Richard F Begley; Jonathan M Rothberg
Journal:  Nature       Date:  2005-07-31       Impact factor: 49.962

3.  Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells.

Authors:  Ryan D Morin; Michael D O'Connor; Malachi Griffith; Florian Kuchenbauer; Allen Delaney; Anna-Liisa Prabhu; Yongjun Zhao; Helen McDonald; Thomas Zeng; Martin Hirst; Connie J Eaves; Marco A Marra
Journal:  Genome Res       Date:  2008-02-19       Impact factor: 9.043

4.  International genome project launched.

Authors:  Erika Check Hayden
Journal:  Nature       Date:  2008-01-24       Impact factor: 49.962

5.  The death of microarrays?

Authors:  Heidi Ledford
Journal:  Nature       Date:  2008-10-16       Impact factor: 49.962

6.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.

Authors:  John C Marioni; Christopher E Mason; Shrikant M Mane; Matthew Stephens; Yoav Gilad
Journal:  Genome Res       Date:  2008-06-11       Impact factor: 9.043

7.  Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors:  Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

8.  Alta-Cyclic: a self-optimizing base caller for next-generation sequencing.

Authors:  Yaniv Erlich; Partha P Mitra; Melissa delaBastide; W Richard McCombie; Gregory J Hannon
Journal:  Nat Methods       Date:  2008-07-06       Impact factor: 28.547

9.  Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data.

Authors:  Raja Jothi; Suresh Cuddapah; Artem Barski; Kairong Cui; Keji Zhao
Journal:  Nucleic Acids Res       Date:  2008-08-06       Impact factor: 16.971

10.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing.

Authors:  Juliane C Dohm; Claudio Lottaz; Tatiana Borodina; Heinz Himmelbauer
Journal:  Nucleic Acids Res       Date:  2008-07-26       Impact factor: 16.971

View more
  32 in total

1.  BM-map: Bayesian mapping of multireads for next-generation sequencing data.

Authors:  Yuan Ji; Yanxun Xu; Qiong Zhang; Kam-Wah Tsui; Yuan Yuan; Clift Norris; Shoudan Liang; Han Liang
Journal:  Biometrics       Date:  2011-04-22       Impact factor: 2.571

2.  Family-based association tests using genotype data with uncertainty.

Authors:  Zhaoxia Yu
Journal:  Biostatistics       Date:  2011-12-08       Impact factor: 5.899

Review 3.  Call for a quality standard for sequence-based assays in clinical microbiology: necessity for quality assessment of sequences used in microbial identification and typing.

Authors:  Anthony Underwood; Jonathan Green
Journal:  J Clin Microbiol       Date:  2010-11-10       Impact factor: 5.948

4.  John Storey by Mak H Craig.

Authors:  John Storey
Journal:  Nat Biotechnol       Date:  2011-04       Impact factor: 54.908

Review 5.  Next-generation sequencing in the clinic: promises and challenges.

Authors:  Jiekun Xuan; Ying Yu; Tao Qing; Lei Guo; Leming Shi
Journal:  Cancer Lett       Date:  2012-11-19       Impact factor: 8.679

6.  Statistical Analyses of Next Generation Sequence Data: A Partial Overview.

Authors:  Susmita Datta; Somnath Datta; Seongho Kim; Sutirtha Chakraborty; Ryan S Gill
Journal:  J Proteomics Bioinform       Date:  2010-06-01

7.  Development of a low bias method for characterizing viral populations using next generation sequencing technology.

Authors:  Stephanie M Willerth; Hélder A M Pedro; Lior Pachter; Laurent M Humeau; Adam P Arkin; David V Schaffer
Journal:  PLoS One       Date:  2010-10-22       Impact factor: 3.240

8.  Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis.

Authors:  T Tony Cai; X Jessie Jeng; Hongzhe Li
Journal:  J R Stat Soc Series B Stat Methodol       Date:  2012-11       Impact factor: 4.488

Review 9.  Identifying and mitigating bias in next-generation sequencing methods for chromatin biology.

Authors:  Clifford A Meyer; X Shirley Liu
Journal:  Nat Rev Genet       Date:  2014-09-16       Impact factor: 53.242

10.  BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions.

Authors:  Kasper D Hansen; Benjamin Langmead; Rafael A Irizarry
Journal:  Genome Biol       Date:  2012-10-03       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.