Literature DB >> 17142241

TassDB: a database of alternative tandem splice sites.

Michael Hiller1, Swetlana Nikolajewa, Klaus Huse, Karol Szafranski, Philip Rosenstiel, Stefan Schuster, Rolf Backofen, Matthias Platzer.   

Abstract

Subtle alternative splice events at tandem splice sites are frequent in eukaryotes and substantially increase the complexity of transcriptomes and proteomes. We have developed a relational database, TassDB (TAndem Splice Site DataBase), which stores extensive data about alternative splice events at GYNGYN donors and NAGNAG acceptors. These splice events are of subtle nature since they mostly result in the insertion/deletion of a single amino acid or the substitution of one amino acid by two others. Currently, TassDB contains 114 554 tandem splice sites of eight species, 5209 of which have EST/mRNA evidence for alternative splicing. In addition, human SNPs that affect NAGNAG acceptors are annotated. The database provides a user-friendly interface to search for specific genes or for genes containing tandem splice sites with specific features as well as the possibility to download large datasets. This database should facilitate further experimental studies and large-scale bioinformatics analyses of tandem splice sites. The database is available at http://helios.informatik.uni-freiburg.de/TassDB/.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 17142241      PMCID: PMC1669710          DOI: 10.1093/nar/gkl762

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Alternative splicing is a very important step during pre-mRNA processing. As most of the human genes with multiple exons express more than one transcript, alternative splicing is considered to be a major mechanism for producing a complex proteome from a limited number of genes (1). The different transcripts of one gene can be translated into functionally different protein isoforms (2) or can be degraded by nonsense-mediated mRNA decay (3). The regulation of alternative splicing plays a role in several important processes such as the formation and function of synapses (4), axon guidance in Drosophila (5,6) and T-cell activation (7). Furthermore, defects in alternative splicing are causative for a number of human diseases (8,9) and thought to contribute to cancer development (10). Thus alternative splicing is also of therapeutic interest (11). While much research focused on larger alternative splice events such as exon skipping, it recently became clear that numerous alternative splice events result in only subtle changes of the mRNA and of the protein (12–14). The most widespread type is the alternative splicing at acceptor sites with the pattern NAGNAG (N stands for A, C, G, or T, throughout the paper we write T instead of U also when referring to an RNA sequence) (12,15,16). In such a motif, both AGs represent potential alternative acceptor sites which result in transcripts that differ by only 3 nt (the NAG). About 6% of all human acceptors are NAGNAG acceptors. Based on expressed sequence tag (EST)/mRNA data 16% of all NAGNAGs and noteworthy 39% of the tandem acceptors with a HAGHAG pattern (also denoted ‘plausible’ NAGNAGs, H stands for A, C, or T) are currently known to be alternatively spliced. Furthermore, we recently found evidence for alternative splicing at donor splice sites with the motifs GTNGTN, GCNGTN and GTNGCN (denoted as GYNGYN donors, Y stands for C or T) where both GT/GC donors are used (17). We denote a tandem splice site as confirmed if the usage of both splice sites is represented by at least one EST/mRNA and unconfirmed otherwise. Although the term ‘tandem splice site’ refers to any pair of neighboring splice sites, in our database we collected data about NAGNAG acceptors and GYNGYN donors. Apart from their frequency, subtle alternative splice events are of interest since several cases are known to result in functionally different protein isoforms (16,18–22) and alternative NAGNAG splicing in the untranslated region (UTR) can affect the translational efficiency (23). Moreover, the effect for the protein might be drastic since a premature stop codon can be created (12,17). Many NAGNAG acceptors are conserved between human and mouse and the ratio of the two splice forms can be highly controlled in a tissue-specific manner (12,16,24). Furthermore, SNPs that affect a NAGNAG acceptor can be relevant for human disease as demonstrated for the ABCA4 gene (25) and suggested for many other genes (26). While previous databases on alternative splicing do not store such subtle splice events (27–29), recent databases contain confirmed tandem splice sites (30–32). However, they do not contain unconfirmed tandem splice sites and do not allow to search for tandem splice sites with specific features. To facilitate further experimental studies as well as large-scale bioinformatics analyses of tandem splice sites, we have developed a relational database, TassDB (TAndem Splice Site DataBase), which provides large collections of GYNGYN donors and NAGNAG acceptors in eight species. Since these subtle splice events can easily be overlooked in experimental systems (a 3 nt difference between two bands is barely visible on an agarose gel) and additional alternative splice events are likely to be missed in current EST data, TassDB also stores unconfirmed tandem splice sites. A user interface allows to search for genes of interest and to get all relevant information about the respective tandem splice sites. It is also possible to search for genes harboring GYNGYNs/NAGNAGs with specific features. Finally, TassDB annotates NAGNAGs that are affected by a SNP and also contains 51 tandem acceptors whose NAGNAG pattern can only be observed in another SNP-allele and not in the human genome reference sequence.

TassDB

Data

As alternatively spliced NAGNAG acceptors and GYNGYN donors occur in a large number of lineages including vertebrates, flies and nematodes, TassDB stores information about the tandem splice sites of eight species: Homo sapiens, Canis familiaris, Mus musculus, Rattus norvegicus, Gallus gallus, Danio rerio, Drosophila melanogaster and Caenorhabditis elegans. Our annotation pipeline is based on transcript-to-genome mappings taken from the UCSC genome browser (33). Apart from the RefSeq annotation that was used for all species, we additionally used Ensembl transcripts for human, rat, chicken, zebrafish, the UCSC ‘knownGene’ set for human, mouse, rat (34), flyBase transcripts for Drosophila and wormBase transcripts for C.elegans. The exon–intron structure as well as the annotation of the open reading frame was taken from the UCSC annotation. To identify alternatively spliced NAGNAGs and GYNGYNs, we used BLAST against all ESTs and mRNAs from the respective species as described in Refs (12,17). All transcripts and expressed sequences were downloaded in April 2006. The SNPs that affect NAGNAG acceptors were taken from (26).

Database design

The primary aim of TassDB is to provide information that is specific for the tandem splice site and the putative alternative splice event. Thus, we collected the following data: the splice site motif, its genomic locus, its location in the transcript (5′/3′-UTR or CDS with intron phase 0/1/2), the impact of the splice event on the protein, the sequences and length of the up-/downstream exon and the intron, and information about the ESTs/mRNAs that indicate usage of one of the two splice sites. As the degree of similarity of a splice site to the overall consensus is an important criterion to distinguish alternatively from non-alternatively spliced NAGNAG acceptors (15), we also computed the maximum entropy scores for both splice sites in a tandem (35). The basic design of TassDB was driven by the idea to separate splice site specific data from transcript specific data. For example, the GYNGYN/NAGNAG motif, the genomic locus and the splice site scores are independent of transcript annotation. However, features such as intron phase, protein impact and EST confirmation depend on the annotation and the exon–intron structure of the transcript. Thus, one tandem splice site can have multiple transcript specific data. For example, the intron 13 of the PHF1 gene that contains a CAGCAG acceptor is in intron phase 2 according to the annotation of NM_024165. Due to skipping of the upstream 95 nt exon, this intron is in phase 1 according to the annotation of another transcript NM_002636. Thus, the protein impact of the CAGCAG is the insertion/deletion (indel) of a Ser in NM_024165 but indel Ala in NM_002636 (Figure 1A). Another example is the CAGCAG acceptor of intron 1 of the CBX1 gene having two alternative first exons (represented by the transcripts ENST00000225603 and NM_006807). We found EST evidences for alternative CAGCAG splicing if the upstream first exon is used (ENST00000225603) but not if the downstream first exon is used (NM_006807) (Figure 1B). Whether this is a biological phenomenon or simply the consequence of a lack of sufficient EST data, deserves further research.
Figure 1

(A) Example of a NAGNAG acceptor whose protein impact differs in the annotation of two transcripts. (B) Example of a NAGNAG acceptor that is only confirmed according the exon-intron structure of one but not a second transcript.

(A) Example of a NAGNAG acceptor whose protein impact differs in the annotation of two transcripts. (B) Example of a NAGNAG acceptor that is only confirmed according the exon-intron structure of one but not a second transcript.

User interface

The most frequent use of TassDB might be a search for tandem splice sites of a given gene. To this end, TassDB provides a quick search interface where a user only specifies a gene symbol or a transcript accession and gets the entire information of both confirmed and unconfirmed GYNGYNs and NAGNAGs for this gene. A more advanced task is to select tandem splice sites with specific features for further experimental or computational analysis. To this end, TassDB provides an advanced search interface where the user can restrict the search to GYNGYNs or NAGNAGs with the following features: (i) pattern of the splice site, (ii) number of ESTs/mRNAs that match both splice forms, (iii) location in the UTR or in the coding sequence (CDS) (as well as specific intron phases), and (iv) protein impact (Figure 2). Thus, it is easy to formulate queries such as: (i) Show all confirmed NAGNAGs that result in single amino acid events. (ii) Show all tandem splice sites where both splice forms are represented by at least three ESTs/mRNAs and that are located in the 5′-UTR. (iii) Show all confirmed GYNGYNs where one donor has a GC dinucleotide. Additionally, the search can be restricted to certain genes. The result of the search consists of two parts: (i) summary table of affected genes and their number of tandem splice sites and (ii) detailed tables with information about the tandem splice sites. The detailed result tables also provide links to the ESTs/mRNAs for both splice forms as well as links to the UCSC genome browser. If the transcript specific data differ between transcripts, TassDB will show detailed result tables with more than two columns (Figure 1). Features that differ between transcripts are shown in black while those that are equal in all transcripts are shown in grey colour.
Figure 2

Screenshot of the advanced search interface: one can search for genes harboring tandem splice sites with a specific pattern [such as GTNGCN for tandem donors or HAGHAG (plausible) for tandem acceptors], a certain number of ESTs/mRNAs matching both splice forms, a location in the UTR or CDS and a specific protein impact. Furthermore, one can search only for GYNGYNs or NAGNAGs.

Screenshot of the advanced search interface: one can search for genes harboring tandem splice sites with a specific pattern [such as GTNGCN for tandem donors or HAGHAG (plausible) for tandem acceptors], a certain number of ESTs/mRNAs matching both splice forms, a location in the UTR or CDS and a specific protein impact. Furthermore, one can search only for GYNGYNs or NAGNAGs. As SNPs that affect NAGNAG acceptors are predictive for variation in alternative splicing, TassDB also annotates the SNP data found in our previous study (26) including 51 polymorphic tandem acceptors whose NAGNAG pattern is not visible in the genome reference sequence. Such an example is the APPBP1 gene where acceptor pattern of intron 6 is AAACAG in the human reference genome sequence. The SNP rs363209 leads to a second allele with an AAGCAG pattern, thus this G-allele creates a novel tandem acceptor. TassDB always contains the sequence with the NAGNAG acceptor (in this case the G-allele). Links to dbSNP are provided. Finally, TassDB provides an interface where the user can send an arbitrary SQL select query to the database. The relational database schema with table and column names are given on the web page. Thus, TassDB can be used to retrieve large datasets for further computational analysis of tandem splice sites.

Statistics

The current statistics of TassDB are shown in Table 1. It is evident that tandem splice sites are widespread in all eight species and that with the exception of C.elegans for NAGNAGs numerous of them are already known to be alternatively spliced.
Table 1

Summary of the current content of TassDB

No. of transcriptsaNo. of ESTs/mRNAsbGYNGYNNAGNAG
TotalConfirmedTotalConfirmed
Homo sapiens73 9237 948 53810 9951411.28%c11 964194516.3%c
Canis familiaris674361 1143120268155.6%
Mus musculus35 8974 952 7158723911.04%9815148715.2%
Rattus norvegicus38 347900 3547916200.25%86584154.8%
Gallus gallus27 996618 7368538190.22%93233774.0%
Danio rerio41 956879 2469334240.26%11 2343683.3%
Drosophila melanogaster39 733422 5353752330.88%184220811.3%
Caenorhabditis elegans45 410327 0817054320.45%4826340.7%
Sum303 93616 410 31956 6243600.64%57 93048498.4%

aTotal number of transcripts used for detecting tandem splice sites.

bTotal number of ESTs and mRNAs.

cNumber of confirmed tandems/number of all tandems.

Summary of the current content of TassDB aTotal number of transcripts used for detecting tandem splice sites. bTotal number of ESTs and mRNAs. cNumber of confirmed tandems/number of all tandems.

FUTURE DIRECTIONS

As tandem splice sites seem to occur in all species allowing alternative splicing, it would be desirable to include information for other species. In the future, we plan to include Bos taurus, Xenopus tropicalis, Takifugu rubripes and Anopheles gambiae as more ESTs and annotated genes become available. Furthermore, information about conservation of these splice events in other species as well as more data about the protein impact may be included. Since the database was designed to store information about any tandem splice site, an extension to tandem splice sites that are more than 3 nt apart is conceivable.
  35 in total

1.  Stochastic yet biased expression of multiple Dscam splice variants by individual cells.

Authors:  Guilherme Neves; Jacob Zucker; Mark Daly; Andrew Chess
Journal:  Nat Genet       Date:  2004-02-01       Impact factor: 38.330

2.  EASED: Extended Alternatively Spliced EST Database.

Authors:  Heike Pospisil; Alexander Herrmann; Ralf H Bortfeldt; Jens G Reich
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome.

Authors:  Mihaela Zavolan; Shinji Kondo; Christian Schonbach; Jun Adachi; David A Hume; Yoshihide Hayashizaki; Terry Gaasterland
Journal:  Genome Res       Date:  2003-06       Impact factor: 9.043

4.  Single-nucleotide polymorphisms in NAGNAG acceptors are highly predictive for variations of alternative splicing.

Authors:  Michael Hiller; Klaus Huse; Karol Szafranski; Niels Jahn; Jochen Hampe; Stefan Schreiber; Rolf Backofen; Matthias Platzer
Journal:  Am J Hum Genet       Date:  2005-12-22       Impact factor: 11.025

5.  Glucocorticoid receptor structure and function in glucocorticoid-resistant small cell lung carcinoma cells.

Authors:  D W Ray; J R Davis; A White; A J Clark
Journal:  Cancer Res       Date:  1996-07-15       Impact factor: 12.701

6.  Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals.

Authors:  Gene Yeo; Christopher B Burge
Journal:  J Comput Biol       Date:  2004       Impact factor: 1.479

7.  Transcriptome and genome conservation of alternative splicing events in humans and mice.

Authors:  C W Sugnet; W J Kent; M Ares; D Haussler
Journal:  Pac Symp Biocomput       Date:  2004

8.  Alternative splicing in disease and therapy.

Authors:  Mariano A Garcia-Blanco; Andrew P Baraniak; Erika L Lasda
Journal:  Nat Biotechnol       Date:  2004-05       Impact factor: 54.908

9.  Two alternatively spliced forms of the human insulin-like growth factor I receptor have distinct biological activities and internalization kinetics.

Authors:  G Condorelli; R Bueno; R J Smith
Journal:  J Biol Chem       Date:  1994-03-18       Impact factor: 5.157

10.  Alternative splicing of Drosophila Dscam generates axon guidance receptors that exhibit isoform-specific homophilic binding.

Authors:  Woj M Wojtowicz; John J Flanagan; S Sean Millard; S Lawrence Zipursky; James C Clemens
Journal:  Cell       Date:  2004-09-03       Impact factor: 41.582

View more
  14 in total

1.  Assessing the fraction of short-distance tandem splice sites under purifying selection.

Authors:  Michael Hiller; Karol Szafranski; Rileen Sinha; Klaus Huse; Swetlana Nikolajewa; Philip Rosenstiel; Stefan Schreiber; Rolf Backofen; Matthias Platzer
Journal:  RNA       Date:  2008-02-11       Impact factor: 4.942

2.  Constant splice-isoform ratios in human lymphoblastoid cells support the concept of a splico-stat.

Authors:  Marcel Kramer; Klaus Huse; Uwe Menzel; Oliver Backhaus; Philip Rosenstiel; Stefan Schreiber; Jochen Hampe; Matthias Platzer
Journal:  Genetics       Date:  2011-01-10       Impact factor: 4.562

Review 3.  Function of alternative splicing.

Authors:  Olga Kelemen; Paolo Convertini; Zhaiyi Zhang; Yuan Wen; Manli Shen; Marina Falaleeva; Stefan Stamm
Journal:  Gene       Date:  2012-08-15       Impact factor: 3.688

4.  The Drosophila melanogaster transcriptome by paired-end RNA sequencing.

Authors:  Bryce Daines; Hui Wang; Liguo Wang; Yumei Li; Yi Han; David Emmert; William Gelbart; Xia Wang; Wei Li; Richard Gibbs; Rui Chen
Journal:  Genome Res       Date:  2010-12-22       Impact factor: 9.043

5.  TassDB2 - A comprehensive database of subtle alternative splicing events.

Authors:  Rileen Sinha; Thorsten Lenser; Niels Jahn; Ulrike Gausmann; Swetlana Friedel; Karol Szafranski; Klaus Huse; Philip Rosenstiel; Jochen Hampe; Stefan Schuster; Michael Hiller; Rolf Backofen; Matthias Platzer
Journal:  BMC Bioinformatics       Date:  2010-04-29       Impact factor: 3.169

6.  Some novel intron positions in conserved Drosophila genes are caused by intron sliding or tandem duplication.

Authors:  Jörg Lehmann; Carina Eisenhardt; Peter F Stadler; Veiko Krauss
Journal:  BMC Evol Biol       Date:  2010-05-26       Impact factor: 3.260

Review 7.  Alternative Splicing May Not Be the Key to Proteome Complexity.

Authors:  Michael L Tress; Federico Abascal; Alfonso Valencia
Journal:  Trends Biochem Sci       Date:  2016-10-03       Impact factor: 13.807

8.  Comparative proteomics reveals a significant bias toward alternative protein isoforms with conserved structure and function.

Authors:  Iakes Ezkurdia; Angela del Pozo; Adam Frankish; Jose Manuel Rodriguez; Jennifer Harrow; Keith Ashman; Alfonso Valencia; Michael L Tress
Journal:  Mol Biol Evol       Date:  2012-03-22       Impact factor: 16.240

9.  Increased complexity of Tmem16a/Anoctamin 1 transcript alternative splicing.

Authors:  Kate E O'Driscoll; Rachel A Pipe; Fiona C Britton
Journal:  BMC Mol Biol       Date:  2011-08-08       Impact factor: 2.946

10.  Complexity of bidirectional transcription and alternative splicing at human RCAN3 locus.

Authors:  Federica Facchin; Lorenza Vitale; Eva Bianconi; Francesco Piva; Flavia Frabetti; Pierluigi Strippoli; Raffaella Casadei; Maria Chiara Pelleri; Allison Piovesan; Silvia Canaider
Journal:  PLoS One       Date:  2011-09-22       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.