Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Statistical modeling of sequencing errors in SAGE libraries.

Literature DB >> 15262778

Statistical modeling of sequencing errors in SAGE libraries.

Tim Beissbarth¹, Lavinia Hyde, Gordon K Smyth, Chris Job, Wee-Ming Boon, Seong-Seng Tan, Hamish S Scott, Terence P Speed.

Abstract

MOTIVATION: Sequencing errors may bias the gene expression measurements made by Serial Analysis of Gene Expression (SAGE). They may introduce non-existent tags at low abundance and decrease the real abundance of other tags. These effects are increased in the longer tags generated in LongSAGE libraries. Current sequencing technology generates quite accurate estimates of sequencing error rates. Here we make use of the sequence neighborhood of SAGE tags and error estimates from the base-calling software to correct for such errors.
RESULTS: We introduce a statistical model for the propagation of sequencing errors in SAGE and suggest an Expectation-Maximization (EM) algorithm to correct for them given observed sequences in a library and base-calling error estimates. We tested our method using simulated and experimental SAGE libraries. When comparing SAGE libraries, we found that sequencing errors can introduce considerable bias. High abundance tags may be falsely called as significantly differentially expressed, especially when comparing libraries with different levels of sequencing errors and/or of different size. Truly, differentially expressed tags have decreased significance as 'true'-tag counts are generally underestimated. This may alter if tags near the threshold of differential expression are called significant. Moreover, the number of different transcripts present in a library is overestimated as false tags are introduced at low abundance. Our correction method adjusts the tag counts to be closer to the true counts and is able to partly correct for biases introduced by sequencing errors. AVAILABILITY: An implementation using R is distributed as an R package. An online version is available at http://tagcalling.mbgproject.org

Mesh：

Year: 2004 PMID： 15262778 DOI： 10.1093/bioinformatics/bth924

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

45 in total

1. Proteomics profiling of Madin-Darby canine kidney plasma membranes reveals Wnt-5a involvement during oncogenic H-Ras/TGF-beta-mediated epithelial-mesenchymal transition.

Authors: Yuan-Shou Chen; Rommel A Mathias; Suresh Mathivanan; Eugene A Kapp; Robert L Moritz; Hong-Jian Zhu; Richard J Simpson
Journal: Mol Cell Proteomics Date: 2010-05-28 Impact factor: 5.911

2. Proteomic analysis of the S. cerevisiae response to the anticancer ruthenium complex KP1019.

Authors: Laura K Stultz; Alexandra Hunsucker; Sydney Middleton; Evan Grovenstein; Jacob O'Leary; Eliot Blatt; Mary Miller; James Mobley; Pamela K Hanson
Journal: Metallomics Date: 2020-06-24 Impact factor: 4.526

3. Large-scale production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell lines.

Authors: Jaswinder Khattra; Allen D Delaney; Yongjun Zhao; Asim Siddiqui; Jennifer Asano; Helen McDonald; Pawan Pandoh; Noreen Dhalla; Anna-Liisa Prabhu; Kevin Ma; Stephanie Lee; Adrian Ally; Angela Tam; Danne Sa; Sean Rogers; David Charest; Jeff Stott; Scott Zuyderduyn; Richard Varhol; Connie Eaves; Steven Jones; Robert Holt; Martin Hirst; Pamela A Hoodless; Marco A Marra
Journal: Genome Res Date: 2006-11-29 Impact factor: 9.043

4. Human and rodent temporal lobe epilepsy is characterized by changes in O-GlcNAc homeostasis that can be reversed to dampen epileptiform activity.

Authors: Richard G Sánchez; R Ryley Parrish; Megan Rich; William M Webb; Roxanne M Lockhart; Kazuhito Nakao; Lara Ianov; Susan C Buckingham; Devin R Broadwater; Alistair Jenkins; Nihal C de Lanerolle; Mark Cunningham; Tore Eid; Kristen Riley; Farah D Lubin
Journal: Neurobiol Dis Date: 2019-01-06 Impact factor: 5.996

5. Leucine-rich repeat kinase 2 deficiency is protective in rhabdomyolysis-induced kidney injury.

Authors: Ravindra Boddu; Travis D Hull; Subhashini Bolisetty; Xianzhen Hu; Mark S Moehle; João Paulo Lima Daher; Ahmed Ibrahim Kamal; Reny Joseph; James F George; Anupam Agarwal; Lisa M Curtis; Andrew B West
Journal: Hum Mol Genet Date: 2015-04-22 Impact factor: 6.150

6. Quantitative miRNA expression analysis: comparing microarrays with next-generation sequencing.

Authors: Hanni Willenbrock; Jesper Salomon; Rolf Søkilde; Kim Bundvig Barken; Thomas Nøhr Hansen; Finn Cilius Nielsen; Søren Møller; Thomas Litman
Journal: RNA Date: 2009-09-10 Impact factor: 4.942

7. Proteomic analysis of brush-border membrane vesicles isolated from purified proximal convoluted tubules.

Authors: Scott J Walmsley; Corey Broeckling; Ann Hess; Jessica Prenni; Norman P Curthoys
Journal: Am J Physiol Renal Physiol Date: 2010-03-10

8. Peptide-based systems analysis of inflammation induced myeloid-derived suppressor cells reveals diverse signaling pathways.

Authors: Waeowalee Choksawangkarn; Lauren M Graham; Meghan Burke; Sang Bok Lee; Suzanne Ostrand-Rosenberg; Catherine Fenselau; Nathan J Edwards
Journal: Proteomics Date: 2016-07 Impact factor: 3.984

9. RNA-Seq gene expression estimation with read mapping uncertainty.

Authors: Bo Li; Victor Ruotti; Ron M Stewart; James A Thomson; Colin N Dewey
Journal: Bioinformatics Date: 2009-12-18 Impact factor: 6.937

10. Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments.

Authors: Hugues Richard; Marcel H Schulz; Marc Sultan; Asja Nürnberger; Sabine Schrinner; Daniela Balzereit; Emilie Dagand; Axel Rasche; Hans Lehrach; Martin Vingron; Stefan A Haas; Marie-Laure Yaspo
Journal: Nucleic Acids Res Date: 2010-02-11 Impact factor: 16.971