Literature DB >> 23339526

Empirical assessment of sequencing errors for high throughput pyrosequencing data.

Paulo G S da Fonseca1, Jorge A P Paiva, Luiz G P Almeida, Ana T R Vasconcelos, Ana T Freitas.   

Abstract

BACKGROUND: Sequencing-by-synthesis technologies significantly improve over the Sanger method in terms of speed and cost per base. However, they still usually fail to compete in terms of read length and quality. Current high-throughput implementations of the pyrosequencing technique yield reads whose length approach those of the capillary electrophoresis method. A less obvious question is whether their quality is affected by platform-specific sequencing errors.
RESULTS: We present an empirical study aimed at assessing the quality and characterising sequencing errors for high throughput pyrosequencing data. We have developed a procedure for extracting sequencing error data from genome assemblies and study their characteristics, in particular the length distribution of indel gaps and their relation to the sequence contexts where they occur. We used this procedure to analyse data from three prokaryotic genomes sequenced with the GS FLX technology. We also compared two models previously employed with success for peptide sequence alignment.
CONCLUSIONS: We observed an overall very low error rate in the analysed data, with indel errors being much more abundant than substitutions. We also observed a dependence between the length of the gaps and that of the homopolymer context where they occur. As with protein alignments, a power-law model seems to approximate the indel errors more accurately, although the results are not so conclusive as to justify a depart from the commonly used affine gap penalty scheme. In whichever case, however, our procedure can be used to estimate more realistic error model parameters.

Entities:  

Mesh:

Year:  2013        PMID: 23339526      PMCID: PMC3852801          DOI: 10.1186/1756-0500-6-25

Source DB:  PubMed          Journal:  BMC Res Notes        ISSN: 1756-0500


  21 in total

Review 1.  Pyrosequencing: history, biochemistry and future.

Authors:  Afshin Ahmadian; Maria Ehn; Sophia Hober
Journal:  Clin Chim Acta       Date:  2005-09-13       Impact factor: 3.786

2.  Genome sequencing in microfabricated high-density picolitre reactors.

Authors:  Marcel Margulies; Michael Egholm; William E Altman; Said Attiya; Joel S Bader; Lisa A Bemben; Jan Berka; Michael S Braverman; Yi-Ju Chen; Zhoutao Chen; Scott B Dewell; Lei Du; Joseph M Fierro; Xavier V Gomes; Brian C Godwin; Wen He; Scott Helgesen; Chun Heen Ho; Chun He Ho; Gerard P Irzyk; Szilveszter C Jando; Maria L I Alenquer; Thomas P Jarvie; Kshama B Jirage; Jong-Bum Kim; James R Knight; Janna R Lanza; John H Leamon; Steven M Lefkowitz; Ming Lei; Jing Li; Kenton L Lohman; Hong Lu; Vinod B Makhijani; Keith E McDade; Michael P McKenna; Eugene W Myers; Elizabeth Nickerson; John R Nobile; Ramona Plant; Bernard P Puc; Michael T Ronan; George T Roth; Gary J Sarkis; Jan Fredrik Simons; John W Simpson; Maithreyan Srinivasan; Karrie R Tartaro; Alexander Tomasz; Kari A Vogt; Greg A Volkmer; Shally H Wang; Yong Wang; Michael P Weiner; Pengguang Yu; Richard F Begley; Jonathan M Rothberg
Journal:  Nature       Date:  2005-07-31       Impact factor: 49.962

3.  Pyrobayes: an improved base caller for SNP discovery in pyrosequences.

Authors:  Aaron R Quinlan; Donald A Stewart; Michael P Strömberg; Gábor T Marth
Journal:  Nat Methods       Date:  2008-01-13       Impact factor: 28.547

Review 4.  The impact of next-generation sequencing technology on genetics.

Authors:  Elaine R Mardis
Journal:  Trends Genet       Date:  2008-02-11       Impact factor: 11.639

5.  The complete genome of an individual by massively parallel DNA sequencing.

Authors:  David A Wheeler; Maithreyan Srinivasan; Michael Egholm; Yufeng Shen; Lei Chen; Amy McGuire; Wen He; Yi-Ju Chen; Vinod Makhijani; G Thomas Roth; Xavier Gomes; Karrie Tartaro; Faheem Niazi; Cynthia L Turcotte; Gerard P Irzyk; James R Lupski; Craig Chinault; Xing-zhi Song; Yue Liu; Ye Yuan; Lynne Nazareth; Xiang Qin; Donna M Muzny; Marcel Margulies; George M Weinstock; Richard A Gibbs; Jonathan M Rothberg
Journal:  Nature       Date:  2008-04-17       Impact factor: 49.962

6.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

7.  Empirical and structural models for insertions and deletions in the divergent evolution of proteins.

Authors:  S A Benner; M A Cohen; G H Gonnet
Journal:  J Mol Biol       Date:  1993-02-20       Impact factor: 5.469

8.  An improved algorithm for matching biological sequences.

Authors:  O Gotoh
Journal:  J Mol Biol       Date:  1982-12-15       Impact factor: 5.469

9.  Logarithmic gap costs decrease alignment accuracy.

Authors:  Reed A Cartwright
Journal:  BMC Bioinformatics       Date:  2006-12-05       Impact factor: 3.169

10.  Accuracy and quality of massively parallel DNA pyrosequencing.

Authors:  Susan M Huse; Julie A Huber; Hilary G Morrison; Mitchell L Sogin; David Mark Welch
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.