Literature DB >> 14871862

Correction of sequence-based artifacts in serial analysis of gene expression.

Viatcheslav R Akmaev1, Clarence J Wang.   

Abstract

MOTIVATION: Serial Analysis of Gene Expression (SAGE) is a powerful technology for measuring global gene expression, through rapid generation of large numbers of transcript tags. Beyond their intrinsic value in differential gene expression analysis, SAGE tag collections afford abundant information on the size and shape of the sample transcriptome and can accelerate novel gene discovery. These latter SAGE applications are facilitated by the enhanced method of Long SAGE. A characteristic of sequencing-based methods, such as SAGE and Long SAGE is the unavoidable occurrence of artifact sequences resulting from sequencing errors. By virtue of their low-random incidence, such tag errors have minimal impact on differential expression analysis. However, to fully exploit the value of large SAGE tag datasets, it is desirable to account for and correct tag artifacts.
RESULTS: We present estimates for occurrences of tag errors, and an efficient error correction algorithm. Error rate estimates are based on a stochastic model that includes the Polymerase chain reaction and sequencing error contributions. The correction algorithm, SAGEScreen, is a multi-step procedure that addresses ditag processing, estimation of empirical error rates from highly abundant tags, grouping of similar-sequence tags and statistical testing of observed counts. We apply SAGEScreen to Long SAGE libraries and compare error rates for several processing scenarios. Results with simulated tag collections indicate that SAGEScreen corrects 78% of recoverable tag errors and reduces the occurrences of singleton tags. AVAILABILITY: The SAGEScreen software is available for academic users from the first author.

Mesh:

Year:  2004        PMID: 14871862     DOI: 10.1093/bioinformatics/bth077

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  19 in total

1.  Large-scale production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell lines.

Authors:  Jaswinder Khattra; Allen D Delaney; Yongjun Zhao; Asim Siddiqui; Jennifer Asano; Helen McDonald; Pawan Pandoh; Noreen Dhalla; Anna-Liisa Prabhu; Kevin Ma; Stephanie Lee; Adrian Ally; Angela Tam; Danne Sa; Sean Rogers; David Charest; Jeff Stott; Scott Zuyderduyn; Richard Varhol; Connie Eaves; Steven Jones; Robert Holt; Martin Hirst; Pamela A Hoodless; Marco A Marra
Journal:  Genome Res       Date:  2006-11-29       Impact factor: 9.043

2.  Quantitative miRNA expression analysis: comparing microarrays with next-generation sequencing.

Authors:  Hanni Willenbrock; Jesper Salomon; Rolf Søkilde; Kim Bundvig Barken; Thomas Nøhr Hansen; Finn Cilius Nielsen; Søren Møller; Thomas Litman
Journal:  RNA       Date:  2009-09-10       Impact factor: 4.942

3.  Genomic analysis distinguishes phases of early development of the mouse atrio-ventricular canal.

Authors:  Pavle Vrljicak; Alex C Y Chang; Olena Morozova; Elizabeth D Wederell; Kyle Niessen; Marco A Marra; Aly Karsan; Pamela A Hoodless
Journal:  Physiol Genomics       Date:  2009-12-01       Impact factor: 3.107

4.  Exploring local immunological adaptation of two stickleback ecotypes by experimental infection and transcriptome-wide digital gene expression analysis.

Authors:  Tobias L Lenz; Christophe Eizaguirre; Björn Rotter; Martin Kalbe; Manfred Milinski
Journal:  Mol Ecol       Date:  2012-09-13       Impact factor: 6.185

5.  Identification of novel androgen-responsive genes by sequencing of LongSAGE libraries.

Authors:  Tammy L Romanuik; Gang Wang; Robert A Holt; Steven J M Jones; Marco A Marra; Marianne D Sadar
Journal:  BMC Genomics       Date:  2009-10-15       Impact factor: 3.969

6.  Using reads to annotate the genome: influence of length, background distribution, and sequence errors on prediction capacity.

Authors:  Nicolas Philippe; Anthony Boureux; Laurent Bréhélin; Jorma Tarhio; Thérèse Commes; Eric Rivals
Journal:  Nucleic Acids Res       Date:  2009-06-16       Impact factor: 16.971

7.  Gene expression profiling via LongSAGE in a non-model plant species: a case study in seeds of Brassica napus.

Authors:  Christian Obermeier; Bashir Hosseini; Wolfgang Friedt; Rod Snowdon
Journal:  BMC Genomics       Date:  2009-07-03       Impact factor: 3.969

8.  A human glomerular SAGE transcriptome database.

Authors:  Jenny Nyström; Wolfgang Fierlbeck; Anna Granqvist; Stephen C Kulak; Barbara J Ballermann
Journal:  BMC Nephrol       Date:  2009-06-05       Impact factor: 2.388

9.  A score system for quality evaluation of RNA sequence tags: an improvement for gene expression profiling.

Authors:  Daniel G Pinheiro; Pedro A F Galante; Sandro J de Souza; Marco A Zago; Wilson A Silva
Journal:  BMC Bioinformatics       Date:  2009-06-06       Impact factor: 3.169

10.  Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework.

Authors:  Michael A Gilchrist; Hong Qin; Russell Zaretzki
Journal:  BMC Bioinformatics       Date:  2007-10-18       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.