Literature DB >> 15980556

T-STAG: resource and web-interface for tissue-specific transcripts and genes.

Shobhit Gupta1, Martin Vingron, Stefan A Haas.   

Abstract

T-STAG (tissue-specific transcripts and genes) is a resource and web-interface, designated to analyze tissue/tumor-specific expression patterns in human and mouse transcriptomes. It integrates our refined prediction of specific expression patterns both in genes as well as in individual isoforms with man-mouse orthology data. In combination with the features for combining/contrasting the genes expressed in different tissues, T-STAG implicates important biological applications, such as the detection of differentially expressed genes in tumors, the retrieval of orthologs with significant expression in the same tissue etc. Additionally, our refined categorization of expressed sequence tags (ESTs) according to the normalization of cDNA libraries allows searching for putative low-abundant transcripts. The results are tightly linked to our visualization tools, GeneNest (expression patterns of genes) and SpliceNest (gene structure and alternative splicing). The user-friendly interface of T-STAG offers a platform for comprehensive analysis of tissue and/or tumor-specific expression patterns revealed by the EST data. T-STAG is freely accessible at http://tstag.molgen.mpg.de.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 15980556      PMCID: PMC1160111          DOI: 10.1093/nar/gki350

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The complex differences in protein pools related to different cell types are a result of variation in the interpretation of the same genomic sequence. This variation is caused by various regulatory control mechanisms operating at transcriptional, post-transcriptional, translational and post-translational level. At the transcriptional level, control is achieved via transcription factors (1,2), which recognize certain cis-regulatory elements of the target genes and modulate their expression, occasionally in a tissue-specific manner (3,4). An additional regulatory mechanism is alternative splicing, which is controlled by exonic and intronic enhancers/silencers that allow differential expression of alternative mRNAs from the same primary transcript (5,6). Sometimes this mechanism also operates in a tissue-specific manner (7). Anomalous expression of the genes involved in these regulatory mechanisms is known to result in diseases (8,9). Among the most popular methods to estimate and analyze expression patterns are serial analysis of gene expression [SAGE (10)] and expressed sequence tags [ESTs (11)] based methods. While both SAGE and ESTs are useful for analyzing expression patterns of genes [electronic northern (12)], a fraction of ESTs that cover isoform-specific parts [ASD (13), MAASE (14), ASAP (15) and SpliceNest (16)] also enable the detection of tissue/tumor specific alternative isoforms (17,18). However, the congruence between EST coverage and expression pattern is disturbed due to varying experimental protocols of EST generation (19), thereby implicating the need for a more refined methodology for estimating expression levels (20). In the this paper, we describe T-STAG (tissue-specific transcripts and genes), a resource and web-interface, which integrates predictions of specific expression patterns of both genes as well as of individual isoforms, thus allowing to address additional or more specialized biological questions. Our detailed categorization of ESTs (normalized, disease related), the features to compare subsets of genes and the integration of tissue specific genes/isoforms implicate a wide range of applications. Among these are the detection of expression patterns of low-abundant transcripts and the identification of differential expression of genes in tumors. Above all, it allows for contrasting the tissue-specifically expressed isoforms with the background expression of all isoforms of the respective gene. The additional integration of manmouse orthology data enables the comparison of expression profiles in orthologous genes. In combination with the user-friendly web-interface, T-STAG offers a platform for comprehensive analyses of expression patterns of genes as well as individual isoforms.

METHODS

T-STAG () is designed for detailed investigation of tissue/tumor specific expression in genes and transcripts predicted using EST data. The following resources are integrated via the web-database. Gene expression estimates. The EST clusters (genes) and the annotation of EST libraries is derived from GeneNest database based on Unigene build 161 (August 2003) for human and Unigene build 118 (December 2002) for mouse (21). The tissue distribution of ESTs in a cluster relative to random background is translated into numerical estimates (P-values) of the likelihood of observing such a tissue distribution by chance (Haas, S. A. et al., manuscript in preparation). Therefore, a low P-value for a given gene–tissue pair reflects either significant and/or specific expression of the gene in the respective tissue. Transcript expression data. The GeneNest (22) consensus sequences are mapped to the genome sequence (Human: April 2003 freeze of HUGO and Mouse: February 2002 freeze from the Mouse Genome Sequencing Consortium) and alternative isoforms are predicted with confidence values, using the EST coverage and splice signal indicators as a measure of reliability [SpliceNest (16,23)]. Parts of these putative transcripts that are specifically covered either by ESTs related to a single tissue or only by ESTs derived from tumor-related libraries are then labeled as tissue- or tumor-specific splice events, respectively (20). Manmouse orthologs. The human and mouse protein sequences are taken from RefSeq. Pairs of sequences with best bidirectional PBLAST alignment scores are defined as orthologs. The corresponding mRNA sequences are then inferred using TBLASTN of protein sequences with the respective reference sequences, thereby providing a link to the Unigene clusters. Finally, the gene expression estimates, the transcript expression data and the manmouse ortholog data is integrated via a relational database system (postgres). In order to enhance the practical applicability of the resource, a user-friendly web interface (Figure 1) is designed with download option to facilitate integration of the data into external applications.
Figure 1

The T-STAG query interface. The interface is arranged in three main sections: (i) basic information and chromosomal location: various gene identities, accessions, keywords and chromosomal location can be specified. (ii) Splicing information: this can be used to select types of splicing or define a quality cutoff for alternative splice prediction depending on the particular application. (iii) Tissue information: in this block, the user can specify the tissues of interest. Information related to second tissue can be used to specify additional tissues in which the candidate genes should (not) be expressed. The organism can be switched in order to look at human and mouse orthologs. To limit the selection to specifically expressed genes, the number of additional tissues with significant expression can be restricted.

RESULTS AND DISCUSSION

Tissue-specific expression of genes and splice isoforms

Tissue-specific regulation of gene/isoform expression is known to play critical functional roles as in the case of many known genes, such as complement regulator CD46 (24) and phosphodiesterase [PDE7 (25)], apart from being associated with certain general mechanisms [tissue-specific RNA surveillance (26)]. The tissue-specific genes identified via the T-STAG database frequently include several known ones, as in the case of eye-specific genes, in which 19 out of the top 20 have already been described to be functionally related to eye (e.g. rhodopsin, crystallin, opticin etc.). A similar evaluation performed for alternative isoforms also revealed a number of already known tissue-specific splice events among the top ranking matches. For example, most of the known genes containing putative kidney-specific transcripts are experimentally described to contain kidney-related isoforms [AFP (27), SLC22A8 (28), WNK1 (29), GLS (30)]. The so far unannotated tissue-specific genes/isoforms with significant EST evidence need to be further analyzed, some of which after being screened for functional and/or possibly tissue-specific domains could be (partially) annotated.

Rare genes/alternative isoforms and disease-related genes/isoforms

The EST data provides an estimate of the low-abundant genes/isoforms by the virtue of the differing protocols of EST generation. Owing to the inherent over-representation of rare transcripts in normalized libraries (19), isoforms and genes that are represented only by such libraries are likely to be lowly expressed. This property of normalized libraries is utilized in the T-STAG to filter out those transcripts that are likely to be lowly expressed. A large fraction of the tissue-specific alternative isoforms is observed to be such lowly expressed ones (20), which even though in low abundance may still have crucial functions. For example, in one of the alternative isoforms of gene WNK1, an alternative promoter controls the expression of a kidney-specific and kinase-defective isoform (29). Owing to our annotation of tumor- and disease-associated EST libraries, the T-STAG database allows the retrieval of genes/isoforms that are significantly expressed in tumor-or disease-related tissues. However, in tumor cells an overall loss of control is observed in different parts of regulation machinery (31,32), leading to a large number of genes with abnormal expression levels. Therefore, it is more informative to focus only on those genes that show significant differential expression in tumors as compared with the normal cell types.

Comparing expression patterns

In order to detect such genes that are differentially expressed in tumors, in accordance with some microarray-based methods (33), the predicted tumor-specific genes can be contrasted with another set of genes that are significantly expressed in the respective healthy tissue (defined in the form of P-values). This can be achieved by using the subtraction feature of the T-STAG database. Several of the top ranking genes revealed in this fashion are already known cancer-related genes. In brain tumor, for example, 6 of the top 10 genes have already been described to be tumor-associated. These include some genes which are suggested as tumor markers [OLIG1 (34) and CRF (35)]. Alternatively, by using the addition feature of T-STAG, anatomically or functionally related tissues can be grouped together. For example, heart and muscle, for which six genes with significant expression in both tissues are revealed. This set includes titin, which is already known to play a critical role for both heart (36) and skeletal muscle (37). In addition, seemingly non-related pairs of tissues might also have biologically meaningful set of genes in common. For example, in the case of eye and pineal gland, we identified a group of genes (CRX, OTX2 and PDE6), which are already annotated to be functional in both tissues [PDE6 (38)]. Furthermore, OTX2 is a known transcription factor that regulates the expression of the gene CRX both in eye and in pineal gland (39), thereby hinting toward the existence of a common functional/regulatory pathway in these tissues. Some of the remaining genes in the dataset, most of which are currently annotated to be functional only in eye (such as RCV1, RTDBN, potassium voltage-gated channel etc.) are therefore potential candidates that may be regulated by the same molecular mechanism. With respect to the analysis of individual isoforms, the addition and subtraction features of the T-STAG database can be applied to further categorize the tissue-specific isoforms. First of the two categories consists of tissue-specific isoforms related to those genes for which other transcripts show different/ubiquitous expression pattern. Tissue-specific expression observed in such transcripts is likely to be regulated at the level of splicing (40). In contrast, the second category comprises tissue-specific splice events that are observed in genes, for which all related transcripts are also highly expressed in the same tissue. These transcripts may reflect tissue-specific transcription (41,42), rather than tissue-specific splicing. Notably, such observations may be biased due to other post-transcriptional events, such as nonsense-mediated decay [(NMD) (43)], which might occur with different stringencies in different tissues. In our data, we observe a large number of tissue-specific transcripts for both these categories, e.g. 187 human brain-specific transcripts potentially undergo specific alternative splicing, while 91 specific transcripts are likely to be the consequence of specific regulation of entire genes.

Evolutionarily conserved expression patterns

The integration of orthology data with expression data enables the retrieval of evolutionarily conserved expression patterns in mouse and human. This provides an additional schema for defining orthologs in a more stringent fashion. However, the emergence of expression in additional tissues, like in the case of gene ACRBP which is expressed only in mouse testis but is additionally expressed in human brain, may reflect evolution of novel functions.

The web-interface

The interface (Figure 1) is user-friendly and flexible with possibilities to define cutoffs (P-values for gene expression, quality values related to alternative splicing) based on individual applications. Additional restricted datasets based on individual applications can be generated by providing keywords and/or chromosomal location, thereby enabling queries like ‘Give me All kinases expressed in human and mouse brain’. The HTML output provides tight links to the visualization tools, GeneNest (EST resource and visualization) as well as SpliceNest (gene structure and alternative splice visualization), which allows a detailed inspection of candidate genes and transcripts.

CONCLUSIONS

T-STAG is a resource and web-interface that allows comprehensive evaluation of tissue/tumor-specific expression both on the level of genes as well as on the level of individual transcripts. The resource is currently available for human and mouse with integrated manmouse orthology data. In combination with the respective gene expression estimates, it provides an opportunity to compare expression patterns between orthologous genes. The comparison capability of the resource resolves the differential expression of genes both with respect to different tissues and with respect to normal versus tumor cell types. T-STAG also provides opportunity to categorize the tissue-specific transcripts that are potentially regulated at the transcriptional level and those that are likely to be tissue-specifically spliced. In essence, coupled with a comprehensive user-friendly web-interface, the T-STAG aims at serving as a resource for detailed computational analysis of expression patterns derived from EST data.

Future developments

Future development will include the prediction of developmental stage specific genes and isoforms. We plan to extend the database to include other organisms. Additionally, we plan to compare the EST-based gene expression estimates with gene expression profiles derived from microarray data. The consensus between these two independent datasets would provide a platform for the detection of common regulatory motifs among coexpressed genes.

SUPPLEMENTARY MATERIAL

Supplementary Material is available at NAR Online.
  42 in total

1.  GeneNest: automated generation and visualization of gene indices.

Authors:  S A Haas; T Beissbarth; E Rivals; A Krause; M Vingron
Journal:  Trends Genet       Date:  2000-11       Impact factor: 11.639

2.  Genome-wide detection of tissue-specific alternative splicing in the human transcriptome.

Authors:  Qiang Xu; Barmak Modrek; Christopher Lee
Journal:  Nucleic Acids Res       Date:  2002-09-01       Impact factor: 16.971

Review 3.  Derangement of growth and differentiation control in oncogenesis.

Authors:  Paul G Corn; Wafik S El-Deiry
Journal:  Bioessays       Date:  2002-01       Impact factor: 4.345

4.  Oligodendrocyte lineage genes (OLIG) as molecular markers for human glial brain tumors.

Authors:  Q R Lu; J K Park; E Noll; J A Chan; J Alberta; D Yuk; M G Alzamora; D N Louis; C D Stiles; D H Rowitch; P M Black
Journal:  Proc Natl Acad Sci U S A       Date:  2001-08-28       Impact factor: 11.205

5.  Impaired organic anion transport in kidney and choroid plexus of organic anion transporter 3 (Oat3 (Slc22a8)) knockout mice.

Authors:  Douglas H Sweet; David S Miller; John B Pritchard; Yuko Fujiwara; David R Beier; Sanjay K Nigam
Journal:  J Biol Chem       Date:  2002-05-13       Impact factor: 5.157

Review 6.  Role of HuD and other RNA-binding proteins in neural development and plasticity.

Authors:  Nora Perrone-Bizzozero; Federico Bolognani
Journal:  J Neurosci Res       Date:  2002-04-15       Impact factor: 4.164

7.  ASAP: the Alternative Splicing Annotation Project.

Authors:  Christopher Lee; Levan Atanelov; Barmak Modrek; Yi Xing
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

8.  Database resources of the National Center for Biotechnology.

Authors:  David L Wheeler; Deanna M Church; Scott Federhen; Alex E Lash; Thomas L Madden; Joan U Pontius; Gregory D Schuler; Lynn M Schriml; Edwin Sequeira; Tatiana A Tatusova; Lukas Wagner
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

9.  Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans.

Authors:  Benjamin P Lewis; Richard E Green; Steven E Brenner
Journal:  Proc Natl Acad Sci U S A       Date:  2002-12-26       Impact factor: 11.205

Review 10.  Finding signals that regulate alternative splicing in the post-genomic era.

Authors:  Andrea N Ladd; Thomas A Cooper
Journal:  Genome Biol       Date:  2002-10-23       Impact factor: 13.583

View more
  4 in total

1.  ECgene: an alternative splicing database update.

Authors:  Yeunsook Lee; Younghee Lee; Bumjin Kim; Youngah Shin; Seungyoon Nam; Pora Kim; Namshin Kim; Won-Hyong Chung; Jaesang Kim; Sanghyuk Lee
Journal:  Nucleic Acids Res       Date:  2006-11-28       Impact factor: 16.971

2.  An integrated approach for the systematic identification and characterization of heart-enriched genes with unknown functions.

Authors:  Shizuka Uchida; André Schneider; Marion Wiesnet; Benno Jungblut; Polina Zarjitskaya; Katharina Jenniches; Karsten Grosse Kreymborg; Werner Seeger; Thomas Braun
Journal:  BMC Genomics       Date:  2009-03-06       Impact factor: 3.969

3.  CpG-depleted promoters harbor tissue-specific transcription factor binding signals--implications for motif overrepresentation analyses.

Authors:  Helge G Roider; Boris Lenhard; Aditi Kanhere; Stefan A Haas; Martin Vingron
Journal:  Nucleic Acids Res       Date:  2009-09-06       Impact factor: 16.971

4.  PASTAA: identifying transcription factors associated with sets of co-regulated genes.

Authors:  Helge G Roider; Thomas Manke; Sean O'Keeffe; Martin Vingron; Stefan A Haas
Journal:  Bioinformatics       Date:  2008-12-09       Impact factor: 6.937

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.