Literature DB >> 20929868

DBASS3 and DBASS5: databases of aberrant 3'- and 5'-splice sites.

Emanuele Buratti1, Martin Chivers, Gyulin Hwang, Igor Vorechovsky.   

Abstract

DBASS3 and DBASS5 provide comprehensive repositories of new exon boundaries that were induced by pathogenic mutations in human disease genes. Aberrant 5'- and 3'-splice sites were activated either by mutations in the consensus sequences of natural exon-intron junctions (cryptic sites) or elsewhere ('de novo' sites). DBASS3 and DBASS5 currently contain approximately 900 records of cryptic and de novo 3'- and 5'-splice sites that were produced by over a thousand different mutations in approximately 360 genes. DBASS3 and DBASS5 data can be searched by disease phenotype, gene, mutation, location of aberrant splice sites in introns and exons and their distance from authentic counterparts, by bibliographic references and by the splice-site strength estimated with several prediction algorithms. The user can also retrieve reference sequences of both aberrant and authentic splice sites with the underlying mutation. These data will facilitate identification of introns or exons frequently involved in aberrant splicing, mutation analysis of human disease genes and study of germline or somatic mutations that impair RNA processing. Finally, this resource will be useful for fine-tuning splice-site prediction algorithms, better definition of auxiliary splicing signals and design of new reporter assays. DBASS3 and DBASS5 are freely available at http://www.dbass.org.uk/.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20929868      PMCID: PMC3013770          DOI: 10.1093/nar/gkq887

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Pre-mRNA splicing removes intervening sequences or introns from eukaryotic precursor messenger RNAs (pre-mRNAs) to ensure accurate gene expression (1). Apart from joining consecutive exons together, this process is capable of selective removal or inclusion of exonic and intronic segments in mRNA, generating distinct transcripts from a single gene, often in a cell type-specific or developmental- or gender-dependent manner (2–4). Both constitutive and alternative splicing are controlled by sequence elements in the pre-mRNA that are recognized by a large ribonucleoprotein complex termed the spliceosome (1). These conserved but degenerate signals are located predominantly in introns and include 5′-splice sites (5′-ss) and 3′-splice sites (3′-ss), with upstream polypyrimidine tracts and the branch point sequence. Mutations in any of these cis-elements can dramatically alter splicing efficiency and result in genetic disease (5–8), but their consequences for RNA processing have been difficult to predict. The majority of mutations at the 5′-ss or 3′-ss consensus have been reported to cause skipping of one or more exons and activation of cryptic splice sites (9). Occasionally, splice-site mutations may also result in full intron retention or give rise to entirely novel ‘pseudoexons’ or cryptic exons, often in repetitive sequences, particularly in short interspersed nuclear elements (SINEs), including mammalian interspersed repeats and Alus (10–12). Finally, creation of de novo splice sites in large exons may remove internal exonic sequences and create pseudointrons. Traditional splicing signals contain only a half of the information necessary for accurate splice site recognition (13). The remaining information is provided by auxiliary signals in introns and exons, known as splicing enhancers and silencers (14,15), that are thought to interact with trans-acting factors and/or contribute to critical RNA structural motifs and a ‘splicing code’ (16,17). In addition, as the splicing and transcription machineries are tightly linked, splicing outcomes can be influenced by pre-mRNA processing kinetics and transcription (18). As a result, a ‘splicing mutation’ may affect not only RNA processing, but also transcription (19) and downstream expression pathways, including translation. For example creating or eliminating exons containing upstream open reading frames or altering splicing efficiency and general intron-mediated translation enhancement can dramatically influence the abundance of gene products (20,21). The fraction of gene mutations or variants that influence splicing and gene expression is thus likely to be larger than previously thought. Because copy-number or structural variants exceed single-nucleotides polymorphism by at least 2-fold (22), the overall contribution of human variability to differential pre-mRNA processing and downstream remodelling of ribonucleoprotein particles could be even higher. Better understanding of the complicated interplay of factors that control splice site choice would clearly benefit from convenient access to a comprehensive resource that pools sequences from scattered mutation reports in a wide-range of biomedical journals. Although rare, these reports provide valuable information about splice site selection in vivo. Currently, the Human Gene Mutation Database (HGMD) (23) and other locus-specific mutation databases (for example HPRT at www.ibiblio.org/dnam/des_hprt.htm or CFTR at www.genet.sickkids.on.ca/cftr/app and see www.hgvs.org/rec.html for other genes) give a list of splicing mutations. However, none of these databases provide critical and comprehensive information needed to understand why aberrant splice sites are selected. Apart from the initial overview of aberrant splicing (24,25), a comprehensive, regularly updated and publicly available tool is missing. Here, we describe DBASS3 and DBASS5, the databases of aberrant 3′-ss and 5′-ss in human disease genes and discuss their utility and importance for studying splice site selection.

RESULTS AND DISCUSSION

Criteria for data inclusion in DBASS3 and DBASS5

Both databases contain sequences of new exon–intron boundaries that were generated through naturally occurring and disease-causing variants or mutations, both germ-line and somatic. DBASS3 and DBASS5 contain mutation-induced and sequence-verified aberrant RNAs published in peer-reviewed communications over the past 30 years (from January 1981 to June 2010). Briefly, reports of cryptic and de novo splice sites were identified by searching PubMed (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi; queries: ‘mutation, splicing, cryptic’; ‘mutation, splicing, new acceptor’ or ‘mutation, splicing, new donor’) and home pages of peer-reviewed journals. A subset of case reports were identified by searching locus-specific mutation databases (http://archive.uwcm.ac.uk/uwcm/mg/docs/oth_mut.html). The search was restricted to human genes with sequence-verified aberrant RNA products. Reported sequences were manually checked against reference sequences available from human genome databases, including GenBank (http://www.ncbi.nlm.nih.gov/Genbank) (26) and Ensembl (http://www.ensembl.org) (27). The vast majority of aberrant transcripts were verified by amplifying reverse-transcribed total RNA extracted from blood samples taken from affected individuals and/or their family members. Neither database features exon skipping or full intron retention events in which no new exon–intron boundaries were generated, nor do they include polymorphisms that influence utilization of the NAGNAG type of 3′-ss, which was reported elsewhere (28).

Database design and data set summary

DBASS3 and DBASS5 were designed as retrieval and submission tools containing mutation-induced aberrant splice sites that resulted in a recognizable phenotype. The web application was created using the Microsoft ASP and ASP.Net server technology and SQL Server database software. A breakdown of updated DBASS3 and DBASS5 records by gene, phenotype and location of aberrant splice sites is shown in Tables 1 and 2, respectively.
Table 1.

Summary of aberrant 3′-splice sites in DBASS3

Location of cryptic or de novo 3′-splice sitesExon
Intron
Both
MutationIn the 3′-ss consensusa (cryptic)Elsewhere (‘de novo’)In the 3′-ss consensusa (cryptic)Elsewhere (‘de novo’)All mutations
Number of genes72303970170
Number of phenotypes67323771165
Number of cryptic and de novo 3′-ss (%)107 (34.3)48 (15.4)49 (15.7)108 (34.6)312 (100)
Number of aberrant 3′-ss affecting terminal exons1259428
Median distance (nucleotides) between authentic and aberrant 3′-splice sites1349−41−121

aThe 3′-ss consensus is YAG/G (Y is a pyrimidine, slash is the intron–exon boundary).

Table 2.

Summary of aberrant 5′-splice sites in DBASS5

Location of cryptic or de novo 5′-splice sitesExon
Intron
Both
MutationIn the 5′-ss consensusa (cryptic)Elsewhere (‘de novo’)In the 5′-ss consensusa (cryptic)Elsewhere (‘de novo’)All mutations
Number of genes1155711367255
Number of phenotypes1226112368281
Number of cryptic and de novo 5′-ss (%)203 (33.9)89 (15.2)221 (36.9)90 (14.0)603 (100)
Number of aberrant 5′-ss affecting terminal introns (%)826218
Median distance (nucleotides) between authentic and aberrant 5′-splice sites−43−594318−9

aThe 5′-ss consensus is MAG/GURAGU (M is A or C, R is purine and slash is the exon–intron junction).

Summary of aberrant 3′-splice sites in DBASS3 aThe 3′-ss consensus is YAG/G (Y is a pyrimidine, slash is the intron–exon boundary). Summary of aberrant 5′-splice sites in DBASS5 aThe 5′-ss consensus is MAG/GURAGU (M is A or C, R is purine and slash is the exon–intron junction). The initial analysis of DBASS3 and DBASS5 records (29,30) confirmed that cryptic splice sites were, on average, intrinsically stronger than mutated authentic (natural) sites but generally weaker than their authentic, wild-type counterparts (31). Analysis of DBASS3 and DBASS5 sequences also showed that the maximum entropy (ME) algorithm (32) gave the best overall discrimination between aberrant and authentic sites (29,30), while algorithms based only on the weight matrix, such as the Shapiro and Senapathy score (33) performed less successfully. The density of silencers and enhancers in the segments between authentic and aberrant sites was intermediate between exons and introns, supporting a gradient concept of exon and intron definition (34). Analysis of DBASS3 and DBASS5 data also revealed the underlying mutation pattern of aberrant splice sites and their location. For example breakdown of DBASS3 alterations showed that AG-creating mutations were much more common than previously estimated, totalling to ∼42% (29), as opposed to the initial estimate of 13% (24). De novo 5′-ss in introns were stronger than their authentic counterparts (30), but this did not apply to exonic de novo sites, suggesting that their activation in vivo is more reliant on exonic splicing enhancers or silencers, rather than on the intrinsic strength of the 5′-ss consensus (30). Finally, DBASS3 and DBASS5 data has been employed to develop other prediction tools, such as CRYP-SKIP (http://www.dbass.org.uk/cryp-skip/), which can distinguish between cryptic splice site activation and exon skipping upon mutation of 3′-ss or 5′-ss (35) or HOT-SKIP (http://www.dbass.org.uk/hot-skip/), which computes the ESS/ESE profile for all possible point mutations at each exon position and identifies nucleotide substitutions that are most likely to skip the exon (M. Raponi et al., submitted for publication). Such free practical utilities are useful for molecular diagnostics to facilitate identification of exonic changes that interfere with RNA processing and prediction of their phenotypic outcome.

User interface

DBASS3 and DBASS5 provide a quick search page where the user needs to specify a gene symbol (http://www.gene.ucl.ac.uk/nomenclature/), phenotype or Mendelian Inheritance in Man (MIM) number (www.ncbi.nlm.nih.gov/omim/) (36) or nucleotide sequences that match the input criteria. In an advanced search option, one can select a combination of several criteria, including phenotype, gene, mutation, location of aberrant splice sites in introns and exons, distance from authentic splice sites and relevant bibliographic references (Figure 1). The user can choose to browse through the list of all records. Investigators studying coupled processes of RNA splicing and end-processing can easily retrieve aberrant splice sites that were activated in terminal exons or introns. Not in the least, the user can search for aberrant splice sites (and their wild-type and mutated authentic counterparts) by their intrinsic strength. This option permits retrieval of splice sites according to various splice-site scores and user-defined cut-off points or desired intervals.
Figure 1.

Screenshot of the DBASS3 search page. DBASS3 and DBASS5 provide a quick search page where the user needs to specify a gene symbol, phenotype or nucleotide sequence. In the Advanced search option, one can select a combination of several criteria, including phenotype, gene, mutation, location of aberrant splice sites in introns and exons, distance from authentic (natural) splice sites, intrinsic strength of splice sites and bibliographic references.

Screenshot of the DBASS3 search page. DBASS3 and DBASS5 provide a quick search page where the user needs to specify a gene symbol, phenotype or nucleotide sequence. In the Advanced search option, one can select a combination of several criteria, including phenotype, gene, mutation, location of aberrant splice sites in introns and exons, distance from authentic (natural) splice sites, intrinsic strength of splice sites and bibliographic references. The search results can be expanded by clicking on the ‘View details’ button (Figure 1), leading to full details in each record (Figure 2), including gene, phenotype, MIM numbers, mutation (using traditional and official nomenclature where available), distance between authentic and aberrant splice sites, change of the reading frame (0, +1 and +2 nt), literature references with PubMed hyperlinks and nucleotide sequences flanking the authentic and aberrant 5′-ss. The ex vivo origin of aberrant RNA is indicated in the Comments fields of DBASS3 and DBASS5 records, which also contains additional information.
Figure 2.

Example of a DBASS3 record. Aberrant splice sites are shown as a slash in a genomic sequence. Disease-causing or–—predisposing mutations are denoted by a ‘greater than’ sign for nucleotide substitutions, by parentheses for deletions and by brackets for duplications or insertions. Intronic sequences are shown in blue lower case, exons are shown in green upper case. Cryptic exons are underlined.

Example of a DBASS3 record. Aberrant splice sites are shown as a slash in a genomic sequence. Disease-causing or–—predisposing mutations are denoted by a ‘greater than’ sign for nucleotide substitutions, by parentheses for deletions and by brackets for duplications or insertions. Intronic sequences are shown in blue lower case, exons are shown in green upper case. Cryptic exons are underlined. Finally, both DBASS3 and DBASS5 allow the users to submit newly published entries for their inclusion in the databases via a dedicated submission tool and also to register to obtain regular database updates by Email.

Nomenclature of aberrant splice sites

Although mutations anywhere in intron or exon can impair pre-mRNA splicing, most mutations that activate aberrant splice sites are found in or close to natural splice sites (29–31). If the mutation nomenclature incorporated the distance (in nucleotides) between mutation and natural exon–intron junction, this information would help to identify mutations that activate aberrant splice sites. For exonic mutations, however, neither traditional (37) nor official (www.hgvs.org/mutnomen/) mutation nomenclature assimilate these data. For intronic alterations, the distance can be derived from both the traditional and the official designation, although the latter does not provide this information when referring to a genomic sequence (for example g.234G>T). As most human mutation reports currently adhere to the official nomenclature, exonic mutations that affect splicing are less likely to be recognized and are probably under-reported. To allow the investigators immediate access to this critical information, both DBASS3 and DBASS5 still show the traditional mutation nomenclature for all records. In exons, single-nucleotide substitutions are simply preceded by ‘E’ (for exon), which is followed by an exon number, a ‘plus’ sign and the distance (in nucleotides) between the last authentic intron–exon junction and the point mutation. In addition to sequence, DBASS3 and DBASS5 users can see both the distance between mutation and authentic splice site and the distance between authentic and aberrant splice sites. Rather than relying merely on the nomenclature, this design ultimately facilitates rapid verification of splicing mutations and auxiliary splicing signals, giving the user immediate access to the sequence of newly intronized or exonized segments and the underlying DNA change.

Future directions

New releases of DBASS3 and DBASS5 data will reflect the growing number of studies reporting new aberrant splice sites. The updates will be facilitated by user-assisted submissions and proper reporting of disease-associated aberrant splices in biomedical journals, which should be endorsed by journal editors. With recent realization that single nucleotide variability is less extensive than insertion/deletion or copy-number variability (22), it will be interesting to catalogue structural alterations that influence pre-mRNA processing and alter the relative expression of pre-existing, alternatively spliced mRNAs, particularly in conserved gene families. DBASS5 and DBASS3 entries are also planned to be linked to external resources other than OMIM and Ensembl, such as the SNP databases (http://www.ncbi.nlm.nih.gov/snp/).

FUNDING

Juvenile Diabetes Research Foundation International (1-2008-47); EC grant EURASNET-LSHG-CT-2005-518238. Funding for open access charge: Juvenile Diabetes Research Foundation, European Union. Conflict of interest statement. None declared.
  36 in total

Review 1.  Alternative splicing: increasing diversity in the proteomic world.

Authors:  B R Graveley
Journal:  Trends Genet       Date:  2001-02       Impact factor: 11.639

Review 2.  Pre-mRNA splicing and human disease.

Authors:  Nuno André Faustino; Thomas A Cooper
Journal:  Genes Dev       Date:  2003-02-15       Impact factor: 11.361

Review 3.  Alternative pre-mRNA splicing and proteome expansion in metazoans.

Authors:  Tom Maniatis; Bosiljka Tasic
Journal:  Nature       Date:  2002-07-11       Impact factor: 49.962

Review 4.  Understanding alternative splicing: towards a cellular code.

Authors:  Arianne J Matlin; Francis Clark; Christopher W J Smith
Journal:  Nat Rev Mol Cell Biol       Date:  2005-05       Impact factor: 94.444

5.  Ab initio prediction of mutation-induced cryptic splice-site activation and exon skipping.

Authors:  Petr Divina; Andrea Kvitkovicova; Emanuele Buratti; Igor Vorechovsky
Journal:  Eur J Hum Genet       Date:  2009-01-14       Impact factor: 4.246

6.  Single base-pair substitutions in exon-intron junctions of human genes: nature, distribution, and consequences for mRNA splicing.

Authors:  Michael Krawczak; Nick S T Thomas; Bernd Hundrieser; Matthew Mort; Michael Wittig; Jochen Hampe; David N Cooper
Journal:  Hum Mutat       Date:  2007-02       Impact factor: 4.878

7.  Nonclassical splicing mutations in the coding and noncoding regions of the ATM Gene: maximum entropy estimates of splice junction strengths.

Authors:  Laura Eng; Gabriela Coutinho; Shareef Nahas; Gene Yeo; Robert Tanouye; Mahnoush Babaei; Thilo Dörk; Christopher Burge; Richard A Gatti
Journal:  Hum Mutat       Date:  2004-01       Impact factor: 4.878

8.  Ensembl's 10th year.

Authors:  Paul Flicek; Bronwen L Aken; Benoit Ballester; Kathryn Beal; Eugene Bragin; Simon Brent; Yuan Chen; Peter Clapham; Guy Coates; Susan Fairley; Stephen Fitzgerald; Julio Fernandez-Banet; Leo Gordon; Stefan Gräf; Syed Haider; Martin Hammond; Kerstin Howe; Andrew Jenkinson; Nathan Johnson; Andreas Kähäri; Damian Keefe; Stephen Keenan; Rhoda Kinsella; Felix Kokocinski; Gautier Koscielny; Eugene Kulesha; Daniel Lawson; Ian Longden; Tim Massingham; William McLaren; Karine Megy; Bert Overduin; Bethan Pritchard; Daniel Rios; Magali Ruffier; Michael Schuster; Guy Slater; Damian Smedley; Giulietta Spudich; Y Amy Tang; Stephen Trevanion; Albert Vilella; Jan Vogel; Simon White; Steven P Wilder; Amonida Zadissa; Ewan Birney; Fiona Cunningham; Ian Dunham; Richard Durbin; Xosé M Fernández-Suarez; Javier Herrero; Tim J P Hubbard; Anne Parker; Glenn Proctor; James Smith; Stephen M J Searle
Journal:  Nucleic Acids Res       Date:  2009-11-11       Impact factor: 16.971

9.  GenBank.

Authors:  Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; Eric W Sayers
Journal:  Nucleic Acids Res       Date:  2009-11-12       Impact factor: 16.971

10.  The Human Gene Mutation Database: 2008 update.

Authors:  Peter D Stenson; Matthew Mort; Edward V Ball; Katy Howells; Andrew D Phillips; Nick St Thomas; David N Cooper
Journal:  Genome Med       Date:  2009-01-22       Impact factor: 11.117

View more
  31 in total

1.  ATP7B variant c.1934T > G p.Met645Arg causes Wilson disease by promoting exon 6 skipping.

Authors:  Daniele Merico; Carl Spickett; Matthew O'Hara; Boyko Kakaradov; Amit G Deshwar; Phil Fradkin; Shreshth Gandhi; Jiexin Gao; Solomon Grant; Ken Kron; Frank W Schmitges; Zvi Shalev; Mark Sun; Marta Verby; Matthew Cahill; James J Dowling; Johan Fransson; Erno Wienholds; Brendan J Frey
Journal:  NPJ Genom Med       Date:  2020-04-08       Impact factor: 8.617

Review 2.  Pre-mRNA splicing in disease and therapeutics.

Authors:  Ravi K Singh; Thomas A Cooper
Journal:  Trends Mol Med       Date:  2012-07-18       Impact factor: 11.951

3.  Short-hairpin RNA against aberrant HBBIVSI-110(G>A) mRNA restores β-globin levels in a novel cell model and acts as mono- and combination therapy for β-thalassemia in primary hematopoietic stem cells.

Authors:  Petros Patsali; Panayiota Papasavva; Coralea Stephanou; Soteroulla Christou; Maria Sitarou; Michael N Antoniou; Carsten W Lederer; Marina Kleanthous
Journal:  Haematologica       Date:  2018-04-26       Impact factor: 9.941

4.  A Deep Intronic Variant Activates a Pseudoexon in the MTM1 Gene in a Family with X-Linked Myotubular Myopathy.

Authors:  Jamie Fitzgerald; Cori Feist; Paula Dietz; Stephen Moore; Donald Basel
Journal:  Mol Syndromol       Date:  2020-09-16

5.  Classification of missense substitutions in the BRCA genes: a database dedicated to Ex-UVs.

Authors:  Maxime P Vallée; Tiana C Francy; Megan K Judkins; Davit Babikyan; Fabienne Lesueur; Amanda Gammon; David E Goldgar; Fergus J Couch; Sean V Tavtigian
Journal:  Hum Mutat       Date:  2011-11-03       Impact factor: 4.878

Review 6.  The missing puzzle piece: splicing mutations.

Authors:  Marzena A Lewandowska
Journal:  Int J Clin Exp Pathol       Date:  2013-11-15

7.  The CYCLIN-DEPENDENT KINASE Module of the Mediator Complex Promotes Flowering and Reproductive Development in Pea.

Authors:  A S M Mainul Hasan; Jacqueline K Vander Schoor; Valerie Hecht; James L Weller
Journal:  Plant Physiol       Date:  2020-01-21       Impact factor: 8.340

8.  Cryptic transcripts from a ubiquitous plasmid origin of replication confound tests for cis-regulatory function.

Authors:  Nathan A Lemp; Kei Hiraoka; Noriyuki Kasahara; Christopher R Logg
Journal:  Nucleic Acids Res       Date:  2012-05-22       Impact factor: 16.971

9.  Review: Alternative Splicing (AS) of Genes As An Approach for Generating Protein Complexity.

Authors:  Bishakha Roy; Larisa M Haupt; Lyn R Griffiths
Journal:  Curr Genomics       Date:  2013-05       Impact factor: 2.236

10.  Clinical application of exome sequencing in undiagnosed genetic conditions.

Authors:  Anna C Need; Vandana Shashi; Yuki Hitomi; Kelly Schoch; Kevin V Shianna; Marie T McDonald; Miriam H Meisler; David B Goldstein
Journal:  J Med Genet       Date:  2012-05-11       Impact factor: 6.318

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.