Literature DB >> 15608259

5'SAGE: 5'-end Serial Analysis of Gene Expression database.

Yasuhiro Kasai1, Shin-ichi Hashimoto, Tomoyuki Yamada, Jun Sese, Sumio Sugano, Kouji Matsushima, Shinichi Morishita.   

Abstract

To comprehensively identify transcription start sites and the frequencies of individual mRNAs in human cell libraries, a method of 5' end Serial Analysis of Gene Expression (SAGE) was developed recently, which makes it possible to collect a large amount of start site information, and subsequently, we have established a related database server called 5'SAGE. This database displays the observed frequencies of individual 5' end SAGE tags and previously unknown transcription start sites in the promoter regions, introns and intergenic regions of known genes. 5'SAGE will be useful for analyzing promoter regions and start site variation in different tissues, and is freely available at http://5sage.gi.k.u-tokyo.ac.jp/.

Entities:  

Mesh:

Year:  2005        PMID: 15608259      PMCID: PMC540039          DOI: 10.1093/nar/gki085

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The analysis of transcription start sites has attracted considerable attention in the recent years. There is heterogeneity in human mRNA start sites (1); 40–60% of human genes are transcribed alternatively (2), and 49% of multi-exon transcripts are accompanied by alternative splicing of the initial exon (3). Transcription start sites might be altered in a variety of different cell types or affected by environmental conditions, such as methylation. Although an extensive collection of transcription start sites for a large number of human genes is available (4), the frequencies of individual start sites are unclear. There is a need for high-throughput technology to monitor the statistics of start site occurrences for a comprehensive understanding of the start site gene expression mechanism. Microarrays are unsuitable for this purpose because of their inability to detect novel start sites. The serial analysis of gene expression (SAGE) method (5) has demonstrated its effectiveness at cataloging large quantities of expressed genes in cells or tissues from a variety of physiological, developmental and pathological states (6–11). The original SAGE5 generates short (10+4 bp) nucleotide sequences, called tags, derived from the 3′ ends of transcripts; however, typical tags are too short to be uniquely identified with their corresponding genes. This shortcoming was resolved using the LongSAGE method (12), a high-throughput means of profiling 21 bp tags, which are sufficiently long to be unambiguously identified with genes in most cases. However, existing SAGE methods are designed to monitor the 3′ ends of transcripts, and the challenge was to extend the SAGE method so that it would be capable of capturing the novel 5′ ends of transcripts and efficiently quantifying individual 5′ end occurrences. Recently, Hashimoto et al. (13) developed such a system for human cell lines, while Shiraki et al. (14) reported a system for mouse cell lines. The 5′SAGE database stores a collection of data accumulated by using the Hashimoto et al.'s system.

METHODS

Hashimoto et al. (13) have described the details of the method, and we present a brief summary here. The method first profiles 21 bp tags by using a novel way of combining the oligo-capping method (15), a modification of the oligo-capping method (16) and the LongSAGE method (6). Subsequently, these 5′SAGE tags are aligned with the human genome to locate their positions, to begin a search for neighboring mRNA start sites. We found that 19 893 of 25 684 5′SAGE tags in a human cell line, HEK293, were matched to the human genome. Of the 15 448 tags that hit a locus within the human genome, 85.8–96.1% of the 5′SAGE tags were assigned to within −500 to +200 nt of the mRNA start sites in the RefSeq, UniGene (17) and DBTSS (4) databases, while 1774 tags were within the introns of known genes or uncharacterized regions, indicating possible novel start sites.

USE OF 5′SAGE

In the 5′SAGE database server, users can browse transcription start sites and frequencies of individual genes by querying on the accession numbers of sequences in RefSeq, cluster identifiers in UniGene or symbol names, such as HDAC. To retrieve all the genes in the server, the word ‘ALL’ can be input at the query box. The user can impose additional conditions on the number of distinct start sites and the total frequency of 5′SAGE tags monitored for individual genes of interest. For instance, one can look for genes by monitoring five or more distinct start sites with 10 or more 5′SAGE tag occurrences. In response to the query, the system returns the list of qualifying genes. Clicking on each gene displays a window for browsing the transcription start sites (Figure 1).
Figure 1

The use of 5′SAGE. The ‘Start Site View’ indicates the frequencies of start sites using orange lines for the start points of the gene being considered, while the ‘Global View’ presents the overall structures of individual genes to illustrate alternative splice variants.

Two complementary views are provided for analyzing transcription start points. The ‘Start Site View’ initially displays the narrow, 150 bp region surrounding the transcription start site of the representation gene in RefSeq or UniGene, while the ‘Global View’ presents entire structures of individual transcripts that are helpful in comprehending alternatively spliced transcripts at a glance. Users can change the zoom magnification of each view independently by setting the ruler unit to an alternative base pair length. The thick horizontal blocks in the pictures represent exons. The orange vertical lines depict transcription start points; the depth of each orange line below the axis shows the frequency of the transcription start site. The thick, green, horizontal lines are CpG islands, which are regions of 50 or more bp consisting of at least 50% G or C nucleotides. Nucleotides are displayed when the ruler unit is set to 10 bp. For instance, Figure 1 shows the transcription start sites of neurofilament 3 (NEF3). Note the large number of start points detected for NEF3; most are novel, and some start at the second or third exon. Genes with many start sites are remarkably common. The ‘Transcription Start Site View’ also lists 5′SAGE tags, their distances from the representative start site, their frequencies and their nearest expressed sequence tags. We have performed Long SAGE on the 3′ ends of mRNA in HEK293 cells to validate the accuracy of our 5′SAGE results (13). The total frequency of 3′SAGE tags associated with the representative gene is also displayed with the 3′SAGE tag sequences and distances from the start site. 5′SAGE tags are typically more diverse than 3′SAGE tags. As 5′SAGE and 3SAGE tags are sampled independently at random, the Pearson correlation coefficient between the frequencies of 5′SAGE and 3SAGE tags indicates moderate similarity (13).

UPDATES AND FUTURE DIRECTIONS

As on October 2004, the 5′SAGE database presents transcription start sites collected from human cell lines, HEK293 and Ramos. Start site information in other human cell lines is being collected for the analysis of start point variation in different tissues, and will be made available at the same website.
  17 in total

1.  Analysis of human transcriptomes.

Authors:  V E Velculescu; S L Madden; L Zhang; A E Lash; J Yu; C Rago; A Lal; C J Wang; G A Beaudry; K M Ciriello; B P Cook; M R Dufault; A T Ferguson; Y Gao; T C He; H Hermeking; S K Hiraldo; P M Hwang; M A Lopez; H F Luderer; B Mathews; J M Petroziello; K Polyak; L Zawel; K W Kinzler
Journal:  Nat Genet       Date:  1999-12       Impact factor: 38.330

2.  A genomic view of alternative splicing.

Authors:  Barmak Modrek; Christopher Lee
Journal:  Nat Genet       Date:  2002-01       Impact factor: 38.330

3.  An anatomy of normal and malignant gene expression.

Authors:  Kathy Boon; Elisson C Osorio; Susan F Greenhut; Carl F Schaefer; Jennifer Shoemaker; Kornelia Polyak; Patrice J Morin; Kenneth H Buetow; Robert L Strausberg; Sandro J De Souza; Gregory J Riggins
Journal:  Proc Natl Acad Sci U S A       Date:  2002-07-15       Impact factor: 11.205

4.  DBTSS, DataBase of Transcriptional Start Sites: progress report 2004.

Authors:  Yutaka Suzuki; Riu Yamashita; Sumio Sugano; Kenta Nakai
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

5.  Gene expression profile in human leukocytes.

Authors:  Shin-ichi Hashimoto; Shigenori Nagai; Jun Sese; Takuji Suzuki; Aya Obata; Taku Sato; Nobuaki Toyoda; Hong-Yan Dong; Makoto Kurachi; Tomoyuki Nagahata; Ken-ichi Shizuno; Shinichi Morishita; Kouji Matsushima
Journal:  Blood       Date:  2003-01-09       Impact factor: 22.113

6.  Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome.

Authors:  Mihaela Zavolan; Shinji Kondo; Christian Schonbach; Jun Adachi; David A Hume; Yoshihide Hayashizaki; Terry Gaasterland
Journal:  Genome Res       Date:  2003-06       Impact factor: 9.043

7.  Using the transcriptome to annotate the genome.

Authors:  Saurabh Saha; Andrew B Sparks; Carlo Rago; Viatcheslav Akmaev; Clarence J Wang; Bert Vogelstein; Kenneth W Kinzler; Victor E Velculescu
Journal:  Nat Biotechnol       Date:  2002-05       Impact factor: 54.908

8.  Serial analysis of gene expression in human monocytes and macrophages.

Authors:  S Hashimoto; T Suzuki; H Y Dong; N Yamazaki; K Matsushima
Journal:  Blood       Date:  1999-08-01       Impact factor: 22.113

9.  The Mouse SAGE Site: database of public mouse SAGE libraries.

Authors:  Petr Divina; Jirí Forejt
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

10.  Database resources of the National Center for Biotechnology.

Authors:  David L Wheeler; Deanna M Church; Scott Federhen; Alex E Lash; Thomas L Madden; Joan U Pontius; Gregory D Schuler; Lynn M Schriml; Edwin Sequeira; Tatiana A Tatusova; Lukas Wagner
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

View more
  4 in total

1.  Acetylcholine receptor organization in membrane domains in muscle cells: evidence for rapsyn-independent and rapsyn-dependent mechanisms.

Authors:  Joachim Piguet; Christoph Schreiter; Jean-Manuel Segura; Horst Vogel; Ruud Hovius
Journal:  J Biol Chem       Date:  2010-10-26       Impact factor: 5.157

2.  Tissue-specific differences in human transfer RNA expression.

Authors:  Kimberly A Dittmar; Jeffrey M Goodenbour; Tao Pan
Journal:  PLoS Genet       Date:  2006-11-13       Impact factor: 5.917

3.  PAZAR: a framework for collection and dissemination of cis-regulatory sequence annotation.

Authors:  Elodie Portales-Casamar; Stefan Kirov; Jonathan Lim; Stuart Lithwick; Magdalena I Swanson; Amy Ticoll; Jay Snoddy; Wyeth W Wasserman
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

4.  SAGExplore: a web server for unambiguous tag mapping in serial analysis of gene expression oriented to gene discovery and annotation.

Authors:  Tomás Norambuena; Rodrigo Malig; Francisco Melo
Journal:  Nucleic Acids Res       Date:  2007-07-10       Impact factor: 16.971

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.