Literature DB >> 17132829

ECgene: an alternative splicing database update.

Yeunsook Lee1, Younghee Lee, Bumjin Kim, Youngah Shin, Seungyoon Nam, Pora Kim, Namshin Kim, Won-Hyong Chung, Jaesang Kim, Sanghyuk Lee.   

Abstract

ECgene (http://genome.ewha.ac.kr/ECgene) was developed to provide functional annotation for alternatively spliced genes. The applications encompass the genome-based transcript modeling for alternative splicing (AS), domain analysis with Gene Ontology (GO) annotation and expression analysis based on the EST and SAGE data. We have expanded the ECgene's AS modeling and EST clustering to nine organisms for which sufficient EST data are available in the GenBank. As for the human genome, we have also introduced several new applications to analyze differential expression. ECprofiler is an ontology-based candidate gene search system that allows users to select an arbitrary combination of gene expression pattern and GO functional categories. DEGEST is a database of differentially expressed genes and isoforms based on the EST information. Importantly, gene expression is analyzed at three distinctive levels-gene, isoform and exon levels. The user interfaces for functional and expression analyses have been substantially improved. ASviewer is a dedicated java application that visualizes the transcript structure and functional features of alternatively spliced variants. The SAGE part of the expression module provides many additional features including SNP, differential expression and alternative tag positions.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 17132829      PMCID: PMC1716719          DOI: 10.1093/nar/gkl992

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Alternative splicing (AS) is an eukaryote-specific cellular mechanism of creating diverse mRNA structures by differential use of splice sites (1). We have seen substantial progress in understanding the significance and mechanism of AS via both computational and experimental approaches. Several studies have revealed the role of AS in developmental regulation (2), evolutionary processes (3) and even in psychological behavior (4). Burge and coworkers developed computational methods to identify regulatory elements of AS—i.e. enhancers and silencers of splicing (5,6). High-throughput experimental techniques such as splice arrays have become commercially available recently. Proper functional annotation is an essential part in understanding the role of splice variants at the genome scale (7). Many databases and applications have been developed to annotate genomes so far. European community (especially the EBI) has made significant efforts to include splice variants as a part of their Ensembl genome annotation project. Tharanaj and coworkers have developed a series of databases (ASD, AltSplice and AltTrans) by datamining GenBank sequences and PubMed literatures (8,9). AceView provides a comprehensive overview of functional and structural aspects of alternatively spliced genes for human, worm and Arabidopsis genomes (10). Lee et al. (11) developed algorithms and databases (ASAP; alternative splicing annotation project) to analyze AS at the genome-wide level. Recently they developed an algorithm to predict the full-length mRNA models which is critical in understanding the significance of a given AS at the transcript level, not at the individual exon level (12). At the time of writing, they updated the ASAP database to ASAP II which covers 17 organisms and supports comparative analysis of splice variants (). Holste et al. (13) developed the Hollywood database in which the conservation of AS pattern in human and mouse can be examined. Numerous other databases (14,15) are available either to model the diverse gene structures or to predict the splice variants (e.g. see the website for the NAR database issues; ). Differential expression has become an essential aspect in finding potential therapeutic targets and biomarkers. SAGE and EST data have been successfully used to find differentially expressed genes (DEG) in various organs and cancerous tissues (16,17). Lee and coworkers extended the bioinformatics search to find differentially expressed splice variants in various tissues and cancers (18,19). Recently, Gupta et al. (20) developed a database and a web server that display tissue-specific transcripts and genes using UniGene EST cluster. Such database clearly indicates the importance of understanding differential expression of alternatively spliced variants. We developed the ECgene algorithm and the accompanying web site in 2004. The algorithm introduced a novel combination of genome-based EST clustering and graph-based transcript assembly procedures (21). The database provided functional annotations for alternatively spliced genes that included the domain, Gene Ontology (GO) and expression pattern analysis based on the EST and SAGE data (22). In this update, we have expanded the ECgene's EST clustering and mRNA modeling to support nine organisms whose genome maps are available. The species thus included are human, mouse, rat, worm, fruit fly, zebrafish, dog, chicken and Rhesus monkey. The genome-based version provides improved EST clustering compared to the transcript-based clustering. Furthermore, mRNA modeling of splice variants is automatically incorporated in the assembly procedure. We have also developed several new applications and utilities for functional annotation of alternatively spliced genes in the human genome. Notably, a java-based viewer with several novel features visualizes AS so that users can compare splice variants efficiently. The viewer combines the advantages of the genome browser and transcript viewer in a single user interface by supporting variable intron scaling. This is in contrast to the use of two separate windows in the INTRIS program (23). Furthermore, functional domains of encoded proteins and splicing-regulatory elements are indicated in this new interface to facilitate understanding the functional significance and regulatory mechanism of AS. Expression pattern analysis includes many new features as well. We also added several new programs to identify DEG and isoforms in various organs and/or cancer tissues. Together with the new features, ECgene should represent an even more useful tool in biomarker discovery.

APPLICATIONS AND WEB INTERFACE

Figure 1 shows the overview of the ECgene web site. The updated version consists of two main components—expansion of ECgene clustering to various organisms and annotation of the human genome. New tools are added to examine differential expression pattern which may aid identifying tissue- and/or cancer-specific genes. Links to applications are provided inside the picture as well as in the tab menu for user convenience. Relevant databases and applications are briefly discussed below.
Figure 1

Overview of the ECgene web site. Click on the application name launches the application.

Overview of the ECgene web site. Click on the application name launches the application.

ECgene clustering and gene modeling for alternative splicing

The ECgene algorithm was applied to nine organisms that include most of the important model organisms. This implies that we have the mRNA model and the subcluster for each splice variant, in addition to the genome-based EST clusters which are equivalent to the UniGene clusters. The result is quite similar to the TIGR Gene Indices that provides clustering and assembly for eukaryotic genomes (24). However, the genome-based method is superior to the transcript-based method in terms of clustering accuracy with a limitation that it can be applied only to organisms with the genome map. Subclusters and mRNA models are available at the ECgene download site. We also provide the ECgene genome browser that shows the genomic alignment of mRNA models and EST sequences as custom tracks in the UCSC genome browser (25). This allows users to access ample annotation tracks in the UCSC genome browser database, thereby facilitating the deduction of functional significance of each splice variant. Table 1 compares the extent of AS for the Drosophila melanogaster genome in several databases including the FlyBase (26), DEDB (27) and ASAP II. Although the number of spliced genes is comparable between databases, ECgene shows that a significantly larger number of genes that are alternatively spliced.
Table 1

Comparison of AS statistics for the Drosophila melanogaster genome

DEDBaFlyBaseb Release 4.3ASAP II Unigene #40ECgenec Part A
No. of genes13 51416 63514 166
    No. of spliced genes (multi-exon genes)10 96611 058968311 657
No. of transcripts18 56719 17126 661
    No. of spliced transcripts13 40816 48923 853
No. of alternatively spliced genes2721281418414275
    Percentage of alternatively spliced genes among multi-exon genes25251937

aCurrent version of DEDB is based on the FlyBase Release 4.2.1.

bGenes and transcripts for the FlyBase were downloaded from the UCSC table browser for the dm2 genome.

cFull statistics including ECgene part B and C is available in the website.

Comparison of AS statistics for the Drosophila melanogaster genome aCurrent version of DEDB is based on the FlyBase Release 4.2.1. bGenes and transcripts for the FlyBase were downloaded from the UCSC table browser for the dm2 genome. cFull statistics including ECgene part B and C is available in the website.

Functional annotation—ECfunction and ASviewer

ECfunction was developed to effectively visualize the mRNA structure and functional domains of alternatively spliced genes so that users can readily recognize any changes in the functional domains due to AS. We improved the user interface by switching to java applets that allow both zooming and intron scaling in real time. Variable intron scaling allows a seamless transition from the genome browser to the transcript or protein viewers. Thus, the detailed gene structure as well as known functional features in the genomic, mRNA and protein sequences can be readily visualized in a single user interface. Importantly, candidate splicing-regulatory signals such as the ESE (exon splicing enhancer) (5) and ESS (exon splicing silencer) (6) can be visualized with the transcript structure, which would be valuable information in studying the mechanism of AS. ASviewer extends the features of ECfunction to support other gene models including RefSeq, Ensembl and AceView. The transcript models can be readily compared using the detailed information for exons and introns available in the baloon help. It is possible to upload the custom mRNA models and annotations into the viewer. We also provide a utility to print the genomic sequence in a similar way to the UCSC genome browser (25). The character style and color can be specified for individual mRNA models which would facilitate the detailed comparison of various predicted mRNA models.

Expression annotation—ESTexpress and SAGEexpress

ECgene's expression annotation is based on EST and SAGE data. We divided the previous version of ECexpression into two separate applications (ESTexpress and SAGEexpress) providing more specific and detailed information for each data type. ESTexpress analyzes ∼8600 human cDNA libraries and illustrates the inferred gene expression in various tissues and cancers. An option of using non-normalized libraries is also available to obtain quantitative prediction ignoring ESTs from the normalized cDNA libraries. SAGEexpress is substantially improved to provide diverse search options and detailed analysis on alternative tags. The search interface closely follows the widely used SAGEmap of NCBI and the SAGE Genie at NCI (28,29). Our tag-to-gene assignment is based on the mRNA models of ECgene. We also provide information on alternative tags stemming from alternative polyA tails, internal restriction sites and the single nucleotide polymorphisms (SNP).

Differential expression—ECprofiler, DEGEST and DEGSAGE

Special efforts have been made to facilitate the examination of the differential expression which is an issue of major importance in the field of biomarker and drug target discovery. ECprofiler is a candidate gene search system that mines EST clusters for genes with desired expression pattern and function. Specifically, the expression ontology used for cDNA library classification includes three categories—organ/tissue/cell-type, pathology and developmental stage. Both gene expression and function are implemented in ontology-based hierarchical structures. Java implementation allows users to select any combination of nodes in all categories including choice of multiple nodes and subnode expansion. We also provide a powerful search engine and diverse filtering options such as motifs, number of ESTs and libraries and the specificities. DEGEST is a database of DEG, splice variants (isoforms) and AS events covering 52 tissues and cancer types. Chi-squared test was performed for EST clusters and subclusters from ECgene clustering to identify DEG and isoforms. DEGEST is unique in providing isoform level analysis. The background distribution of statistical test can be either the ESTs in the gene or the whole dbEST. This allows users to obtain transcripts with specific expression at the isoform level even though the gene itself has no specificity at all. DEGEST also provides specific AS events that show differential expression. AS events are classified into exon-skipping, alternative donor/acceptor sites and intron retention. Diverse filtering options are available for user convenience. DEGSAGE tests the SAGE tags for differential expression using ∼300 SAGE libraries. We support 28 organs/tissues and cancer types. Since SAGE is inherently an mRNA-based technique, a gene may have several tags or a tag may correspond to several splice variants. We compute a representative tag to deduce expression at the gene level. The problem of tag uniqueness is included in the application. ECprofiler and DEGSAGE run as server-client applications in real time, and the response may be slow. It is thus strongly recommended to specify the genomic region of interest within a chromosome in running ECprofiler. Although we support the genome-wide search, it should be noted that this may take over 30 min. DEGEST is a simple query system to the database that stores all results in pre-computed form for fast response.

CONCLUSION AND FUTURE DIRECTION

ECgene is an ongoing project with a collection of diverse databases and applications focused on AS. ASePCR emulates the RT–PCR experiment in various tissues. ChimerDB is a database of fusion sequences that contains chromosomal translocation. Various utilities to explore differential expression are available only for the human genome at this point. We plan to extend our functional and expression analyses to other model organisms. ECgene clustering and gene modeling will be applied to other species with a completed genome map as well. Frequent update is critical, and we plan to update ESTs on a bimonthly basis. Whole genome re-calculation takes extensive computation and will thus be updated once or twice a year depending on the amount of additional sequence data. The stable ID system is under development as well.
  28 in total

Review 1.  Alternative pre-mRNA splicing and proteome expansion in metazoans.

Authors:  Tom Maniatis; Bosiljka Tasic
Journal:  Nature       Date:  2002-07-11       Impact factor: 49.962

2.  SAGE Genie: a suite with panoramic view of gene expression.

Authors:  Peng Liang
Journal:  Proc Natl Acad Sci U S A       Date:  2002-08-23       Impact factor: 11.205

3.  Genome-wide detection of tissue-specific alternative splicing in the human transcriptome.

Authors:  Qiang Xu; Barmak Modrek; Christopher Lee
Journal:  Nucleic Acids Res       Date:  2002-09-01       Impact factor: 16.971

4.  RESCUE-ESE identifies candidate exonic splicing enhancers in vertebrate exons.

Authors:  William G Fairbrother; Gene W Yeo; Rufang Yeh; Paul Goldstein; Matthew Mawson; Phillip A Sharp; Christopher B Burge
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

5.  The Alternative Splicing Gallery (ASG): bridging the gap between genome and transcriptome.

Authors:  Jeremy Leipzig; Pavel Pevzner; Steffen Heber
Journal:  Nucleic Acids Res       Date:  2004-08-03       Impact factor: 16.971

6.  Gene expression profiles in normal and cancer cells.

Authors:  L Zhang; W Zhou; V E Velculescu; S E Kern; R H Hruban; S R Hamilton; B Vogelstein; K W Kinzler
Journal:  Science       Date:  1997-05-23       Impact factor: 47.728

7.  Discovery of novel splice forms and functional analysis of cancer-specific alternative splicing in human expressed sequences.

Authors:  Qiang Xu; Christopher Lee
Journal:  Nucleic Acids Res       Date:  2003-10-01       Impact factor: 16.971

8.  ASAP: the Alternative Splicing Annotation Project.

Authors:  Christopher Lee; Levan Atanelov; Barmak Modrek; Yi Xing
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

9.  Evolution of exon-intron structure and alternative splicing in fruit flies and malarial mosquito genomes.

Authors:  Dmitry B Malko; Vsevolod J Makeev; Andrey A Mironov; Mikhail S Gelfand
Journal:  Genome Res       Date:  2006-03-06       Impact factor: 9.043

10.  AceView: a comprehensive cDNA-supported gene and transcripts annotation.

Authors:  Danielle Thierry-Mieg; Jean Thierry-Mieg
Journal:  Genome Biol       Date:  2006-08-07       Impact factor: 13.583

View more
  20 in total

1.  Systems-level analysis of proteolytic events in increased vascular permeability and complement activation in skin inflammation.

Authors:  Ulrich auf dem Keller; Anna Prudova; Ulrich Eckhard; Barbara Fingleton; Christopher M Overall
Journal:  Sci Signal       Date:  2013-01-15       Impact factor: 8.192

2.  Comprehensive splicing graph analysis of alternative splicing patterns in chicken, compared to human and mouse.

Authors:  Elsa Chacko; Shoba Ranganathan
Journal:  BMC Genomics       Date:  2009-07-07       Impact factor: 3.969

Review 3.  Alternative splicing in the regulation of cholesterol homeostasis.

Authors:  Marisa W Medina; Ronald M Krauss
Journal:  Curr Opin Lipidol       Date:  2013-04       Impact factor: 4.776

Review 4.  Genomic landscape of developing male germ cells.

Authors:  Tin-Lap Lee; Alan Lap-Yin Pang; Owen M Rennert; Wai-Yee Chan
Journal:  Birth Defects Res C Embryo Today       Date:  2009-03

5.  Comparative proteomics reveals a significant bias toward alternative protein isoforms with conserved structure and function.

Authors:  Iakes Ezkurdia; Angela del Pozo; Adam Frankish; Jose Manuel Rodriguez; Jennifer Harrow; Keith Ashman; Alfonso Valencia; Michael L Tress
Journal:  Mol Biol Evol       Date:  2012-03-22       Impact factor: 16.240

6.  A novel source for miR-21 expression through the alternative polyadenylation of VMP1 gene transcripts.

Authors:  Judit Ribas; Xiaohua Ni; Mark Castanares; Minzhi M Liu; David Esopi; Srinivasan Yegnasubramanian; Ronald Rodriguez; Joshua T Mendell; Shawn E Lupold
Journal:  Nucleic Acids Res       Date:  2012-04-13       Impact factor: 16.971

7.  Review: Alternative Splicing (AS) of Genes As An Approach for Generating Protein Complexity.

Authors:  Bishakha Roy; Larisa M Haupt; Lyn R Griffiths
Journal:  Curr Genomics       Date:  2013-05       Impact factor: 2.236

8.  Genome-wide analysis of alternative splicing in cow: implications in bovine as a model for human diseases.

Authors:  Elsa Chacko; Shoba Ranganathan
Journal:  BMC Genomics       Date:  2009-12-03       Impact factor: 3.969

9.  A unique, consistent identifier for alternatively spliced transcript variants.

Authors:  Alberto Riva; Graziano Pesole
Journal:  PLoS One       Date:  2009-10-28       Impact factor: 3.240

Review 10.  Alternative splicing for diseases, cancers, drugs, and databases.

Authors:  Jen-Yang Tang; Jin-Ching Lee; Ming-Feng Hou; Chun-Lin Wang; Chien-Chi Chen; Hurng-Wern Huang; Hsueh-Wei Chang
Journal:  ScientificWorldJournal       Date:  2013-05-22
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.