MOTIVATION: Sequencing errors may bias the gene expression measurements made by Serial Analysis of Gene Expression (SAGE). They may introduce non-existent tags at low abundance and decrease the real abundance of other tags. These effects are increased in the longer tags generated in LongSAGE libraries. Current sequencing technology generates quite accurate estimates of sequencing error rates. Here we make use of the sequence neighborhood of SAGE tags and error estimates from the base-calling software to correct for such errors. RESULTS: We introduce a statistical model for the propagation of sequencing errors in SAGE and suggest an Expectation-Maximization (EM) algorithm to correct for them given observed sequences in a library and base-calling error estimates. We tested our method using simulated and experimental SAGE libraries. When comparing SAGE libraries, we found that sequencing errors can introduce considerable bias. High abundance tags may be falsely called as significantly differentially expressed, especially when comparing libraries with different levels of sequencing errors and/or of different size. Truly, differentially expressed tags have decreased significance as 'true'-tag counts are generally underestimated. This may alter if tags near the threshold of differential expression are called significant. Moreover, the number of different transcripts present in a library is overestimated as false tags are introduced at low abundance. Our correction method adjusts the tag counts to be closer to the true counts and is able to partly correct for biases introduced by sequencing errors. AVAILABILITY: An implementation using R is distributed as an R package. An online version is available at http://tagcalling.mbgproject.org
MOTIVATION: Sequencing errors may bias the gene expression measurements made by Serial Analysis of Gene Expression (SAGE). They may introduce non-existent tags at low abundance and decrease the real abundance of other tags. These effects are increased in the longer tags generated in LongSAGE libraries. Current sequencing technology generates quite accurate estimates of sequencing error rates. Here we make use of the sequence neighborhood of SAGE tags and error estimates from the base-calling software to correct for such errors. RESULTS: We introduce a statistical model for the propagation of sequencing errors in SAGE and suggest an Expectation-Maximization (EM) algorithm to correct for them given observed sequences in a library and base-calling error estimates. We tested our method using simulated and experimental SAGE libraries. When comparing SAGE libraries, we found that sequencing errors can introduce considerable bias. High abundance tags may be falsely called as significantly differentially expressed, especially when comparing libraries with different levels of sequencing errors and/or of different size. Truly, differentially expressed tags have decreased significance as 'true'-tag counts are generally underestimated. This may alter if tags near the threshold of differential expression are called significant. Moreover, the number of different transcripts present in a library is overestimated as false tags are introduced at low abundance. Our correction method adjusts the tag counts to be closer to the true counts and is able to partly correct for biases introduced by sequencing errors. AVAILABILITY: An implementation using R is distributed as an R package. An online version is available at http://tagcalling.mbgproject.org
Authors: Laura K Stultz; Alexandra Hunsucker; Sydney Middleton; Evan Grovenstein; Jacob O'Leary; Eliot Blatt; Mary Miller; James Mobley; Pamela K Hanson Journal: Metallomics Date: 2020-06-24 Impact factor: 4.526
Authors: Jaswinder Khattra; Allen D Delaney; Yongjun Zhao; Asim Siddiqui; Jennifer Asano; Helen McDonald; Pawan Pandoh; Noreen Dhalla; Anna-Liisa Prabhu; Kevin Ma; Stephanie Lee; Adrian Ally; Angela Tam; Danne Sa; Sean Rogers; David Charest; Jeff Stott; Scott Zuyderduyn; Richard Varhol; Connie Eaves; Steven Jones; Robert Holt; Martin Hirst; Pamela A Hoodless; Marco A Marra Journal: Genome Res Date: 2006-11-29 Impact factor: 9.043
Authors: Richard G Sánchez; R Ryley Parrish; Megan Rich; William M Webb; Roxanne M Lockhart; Kazuhito Nakao; Lara Ianov; Susan C Buckingham; Devin R Broadwater; Alistair Jenkins; Nihal C de Lanerolle; Mark Cunningham; Tore Eid; Kristen Riley; Farah D Lubin Journal: Neurobiol Dis Date: 2019-01-06 Impact factor: 5.996
Authors: Ravindra Boddu; Travis D Hull; Subhashini Bolisetty; Xianzhen Hu; Mark S Moehle; João Paulo Lima Daher; Ahmed Ibrahim Kamal; Reny Joseph; James F George; Anupam Agarwal; Lisa M Curtis; Andrew B West Journal: Hum Mol Genet Date: 2015-04-22 Impact factor: 6.150
Authors: Hugues Richard; Marcel H Schulz; Marc Sultan; Asja Nürnberger; Sabine Schrinner; Daniela Balzereit; Emilie Dagand; Axel Rasche; Hans Lehrach; Martin Vingron; Stefan A Haas; Marie-Laure Yaspo Journal: Nucleic Acids Res Date: 2010-02-11 Impact factor: 16.971