BACKGROUND: The rapid evolution of 454 GS-FLX sequencing technology has not been accompanied by a reassessment of the quality and accuracy of the sequences obtained. Current strategies for decision-making and error-correction are based on an initial analysis by Huse et al. in 2007, for the older GS20 system based on experimental sequences. We analyze here the quality of 454 sequencing data and identify factors playing a role in sequencing error, through the use of an extensive dataset for Roche control DNA fragments. RESULTS: We obtained a mean error rate for 454 sequences of 1.07%. More importantly, the error rate is not randomly distributed; it occasionally rose to more than 50% in certain positions, and its distribution was linked to several experimental variables. The main factors related to error are the presence of homopolymers, position in the sequence, size of the sequence and spatial localization in PT plates for insertion and deletion errors. These factors can be described by considering seven variables. No single variable can account for the error rate distribution, but most of the variation is explained by the combination of all seven variables. CONCLUSIONS: The pattern identified here calls for the use of internal controls and error-correcting base callers, to correct for errors, when available (e.g. when sequencing amplicons). For shotgun libraries, the use of both sequencing primers and deep coverage, combined with the use of random sequencing primer sites should partly compensate for even high error rates, although it may prove more difficult than previous thought to distinguish between low-frequency alleles and errors.
BACKGROUND: The rapid evolution of 454 GS-FLX sequencing technology has not been accompanied by a reassessment of the quality and accuracy of the sequences obtained. Current strategies for decision-making and error-correction are based on an initial analysis by Huse et al. in 2007, for the older GS20 system based on experimental sequences. We analyze here the quality of 454 sequencing data and identify factors playing a role in sequencing error, through the use of an extensive dataset for Roche control DNA fragments. RESULTS: We obtained a mean error rate for 454 sequences of 1.07%. More importantly, the error rate is not randomly distributed; it occasionally rose to more than 50% in certain positions, and its distribution was linked to several experimental variables. The main factors related to error are the presence of homopolymers, position in the sequence, size of the sequence and spatial localization in PT plates for insertion and deletion errors. These factors can be described by considering seven variables. No single variable can account for the error rate distribution, but most of the variation is explained by the combination of all seven variables. CONCLUSIONS: The pattern identified here calls for the use of internal controls and error-correcting base callers, to correct for errors, when available (e.g. when sequencing amplicons). For shotgun libraries, the use of both sequencing primers and deep coverage, combined with the use of random sequencing primer sites should partly compensate for even high error rates, although it may prove more difficult than previous thought to distinguish between low-frequency alleles and errors.
Authors: Sverker Lundin; Henrik Stranneheim; Erik Pettersson; Daniel Klevebring; Joakim Lundeberg Journal: PLoS One Date: 2010-04-06 Impact factor: 3.240
Authors: Cécile Gubry-Rangin; Brigitte Hai; Christopher Quince; Marion Engel; Bruce C Thomson; Phillip James; Michael Schloter; Robert I Griffiths; James I Prosser; Graeme W Nicol Journal: Proc Natl Acad Sci U S A Date: 2011-12-08 Impact factor: 11.205
Authors: Wei Shao; Mary F Kearney; Valerie F Boltz; Jonathan E Spindler; John W Mellors; Frank Maldarelli; John M Coffin Journal: J Virol Methods Date: 2014-03-26 Impact factor: 2.014
Authors: Robert Lücking; James D Lawrey; Patrick M Gillevet; Masoumeh Sikaroodi; Manuela Dal-Forno; Simon A Berger Journal: J Mol Evol Date: 2013-12-17 Impact factor: 2.395
Authors: Wenjie Deng; Brandon S Maust; Dylan H Westfall; Lennie Chen; Hong Zhao; Brendan B Larsen; Shyamala Iyer; Yi Liu; James I Mullins Journal: Bioinformatics Date: 2013-07-29 Impact factor: 6.937
Authors: Martijs J Jonker; Wim C de Leeuw; Marino Marinković; Floyd R A Wittink; Han Rauwerda; Oskar Bruning; Wim A Ensink; Ad C Fluit; C H Boel; Mark de Jong; Timo M Breit Journal: Nucleic Acids Res Date: 2014-04-25 Impact factor: 16.971
Authors: Julia Dietz; Sven-Eric Schelhorn; Daniel Fitting; Ulrike Mihm; Simone Susser; Martin-Walter Welker; Caterina Füller; Martin Däumer; Gerlinde Teuber; Heiner Wedemeyer; Thomas Berg; Thomas Lengauer; Stefan Zeuzem; Eva Herrmann; Christoph Sarrazin Journal: J Virol Date: 2013-03-27 Impact factor: 5.103